** Removing non-stationarity in time series data** is crucial for accurate forecasting because many time series forecasting models assume stationarity, where the statistical properties of the time series do not change over time. Non-stationarity can manifest as trends, seasonality, or other forms of irregular patterns in the data.

The article comprehensively covers techniques and tests for removing non-stationarity in time series data, crucial for accurate forecasting, including detrending, seasonal adjustment, logarithmic transformation, differencing, and ADF/KPSS tests for stationarity validation.

## What is non-stationarity?

Non-stationarity refers to a property of a time series where the statistical properties of the data change over time. In other words, the mean, variance, or other statistical characteristics of the data series are not constant across different time periods. Non-stationarity can manifest in various ways, including trends, seasonality, and other irregular patterns.

Here are the key components of non-stationarity:

A trend exists when there is a long-term increase or decrease in the data over time. This could be linear, exponential, or some other form. Trends indicate systematic changes in the series over time.**Trend:**Seasonality refers to periodic fluctuations or patterns that occur at regular intervals within the data. For example, retail sales might exhibit higher values during holiday season each year.**Seasonality:**Variance refers to the measure of dispersion or spread of the data points around the mean. Non-constant variance, also known as heteroscedasticity, can indicate non-stationarity.**Variance:**Autocorrelation occurs when the correlation between observations at different time points is not constant. This can also indicate non-stationarity, particularly if the autocorrelation structure changes over time.**Autocorrelation:**

## How to remove non-stationarity?

**Trend:**

**Trend:**

- Detrending: Remove the trend component from the data. This can be achieved by fitting a regression line or using techniques like moving averages.
- Differencing: Take the difference between consecutive observations to remove the trend. This can be done once or multiple times until the data becomes stationary.

#### Seasonality:

- Seasonal Adjustment: Use techniques such as seasonal decomposition of time series (e.g., STL decomposition) to separate the seasonal component from the data.
- Seasonal Differencing: Take differences between observations at the same season of different years to remove seasonality.

#### Variance:

- Transformation: Apply transformations such as logarithmic, square root, or Box-Cox transformation to stabilize the variance and make it more constant over time.

#### Autocorrelation:

- Differencing: Besides removing trends, differencing can also help reduce autocorrelation by eliminating dependence between consecutive observations.
- Autoregressive Integrated Moving Average (ARIMA): Utilize ARIMA models, which incorporate differencing to handle autocorrelation.

## Tests to Determine Stationarity

**Augmented Dickey-Fuller (ADF) Test****:**

[Tex]H_0[/Tex]**Null Hypothesis (**The time series has a unit root, indicating it is non-stationary.**):**[Tex]H_1[/Tex]**Alternate Hypothesis (**The time series does not have a unit root, indicating it is stationary.**):**The ADF test statistic is compared to critical values from the ADF distribution to determine whether the null hypothesis can be rejected.**Test Statistic:**If the test statistic is less than the critical value, the null hypothesis is rejected, and the series is considered stationary. Otherwise, if the test statistic is greater than the critical value, the null hypothesis is not rejected, and the series is considered non-stationary.**Decision Rule:**

** Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test**:

[Tex]H_0[/Tex]**Null Hypothesis (**The series is stationary around a deterministic trend.**):**[Tex]H_1 [/Tex]**Alternate Hypothesis (**The series has a unit root, indicating it is non-stationary.**):**The KPSS test statistic is compared to critical values to determine whether the null hypothesis can be rejected.**Test Statistic:**If the test statistic is greater than the critical value, the null hypothesis is rejected, and the series is considered non-stationary. If the test statistic is less than the critical value, the null hypothesis is not rejected, and the series is considered stationary.**Decision Rule:**

## Implementation of Removing Non Stationarity

This section presents essential data preprocessing techniques for achieving stationarity in time series analysis. Techniques include detrending, seasonal adjustment, logarithmic transformation, and differencing, followed by stationarity tests to validate the transformations, ensuring robust and accurate analysis of the data.

#### Importing Necessary Libraries and Creating Sample Data

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Sample data
date_rng = pd.date_range(start='2022-01-01', end='2022-12-31', freq='D')
ts = pd.Series(np.random.randn(len(date_rng)), index=date_rng)

Detrending using a rolling window

`ts_detrended = ts - ts.rolling(window=30).mean()`

: This calculates the detrended series by subtracting the rolling mean from the original time series`ts`

. The`rolling(window=30).mean()`

computes the rolling mean over a window of size 30.- Plotting: This code plots both the original and detrended series using
`matplotlib`

.

# Detrending using a rolling window
ts_detrended = ts - ts.rolling(window=30).mean()
# Plot original and detrended series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_detrended, label='Detrended', linestyle='--')
plt.legend()
plt.show()

**Output:**

#### Test to determine stationarity

from statsmodels.tsa.stattools import adfuller
# Test for stationarity after detrending
result_detrended = adfuller(ts_detrended.dropna())
print(f'ADF Statistic (Detrended): {result_detrended[0]}')
print(f'p-value (Detrended): {result_detrended[1]}')
print(f'Critical Values (Detrended): {result_detrended[4]}')

**Output:**

ADF Statistic (Detrended): -18.559254822829608 p-value (Detrended): 2.0882820619850462e-30 Critical Values (Detrended): {'1%': -3.4500219858626227, '5%': -2.870206553997666, '10%': -2.571387268879483}

The p-value is very small, indicating strong evidence against the null hypothesis. In this case, the null hypothesis is that the series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the detrended series is stationary.

The computed ADF statistic, indicating the strength of evidence against the null hypothesis of non-stationarity. Here, it is significantly negative, suggesting strong evidence in favor of stationarity.

### Seasonal Adjustment

from statsmodels.tsa.seasonal import STL
# Seasonal adjustment
stl = STL(ts, seasonal=13) # Assuming yearly seasonality
res = stl.fit()
ts_seasonal_adj = ts - res.seasonal
# Plot original and seasonally adjusted series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_seasonal_adj, label='Seasonally Adjusted', linestyle='--')
plt.legend()
plt.show()

O**utput:**

**Test for stationarity:**

# Test for stationarity after seasonal adjustment
result_seasonal_adj = adfuller(ts_seasonal_adj.dropna())
print(f'ADF Statistic (Seasonally Adjusted): {result_seasonal_adj[0]}')
print(f'p-value (Seasonally Adjusted): {result_seasonal_adj[1]}')
print(f'Critical Values (Seasonally Adjusted): {result_seasonal_adj[4]}')

**Output:**

ADF Statistic (Seasonally Adjusted): -4.651034555303582 p-value (Seasonally Adjusted): 0.00010390367939221074 Critical Values (Seasonally Adjusted): {'1%': -3.4491725955218655, '5%': -2.8698334971428574, '10%': -2.5711883591836733}

The p-value is small, indicating that there is strong evidence against the null hypothesis. In this case, the null hypothesis is that the series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the seasonally adjusted series is stationary.

The computed ADF statistic, which measures the strength of evidence against the null hypothesis of non-stationarity. In this case, the statistic is negative, indicating evidence in favor of stationarity.

### Logarithmic Transformation

# Transformation (e.g., logarithmic)
ts_log = np.log(ts)
# Plot original and transformed series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_log, label='Log Transformed', linestyle='--')
plt.legend()
plt.show()

**Output:**

**Test for Stationarity**

**Test for Stationarity**

# Test for stationarity after variance stabilization (log transformation)
result_log = adfuller(ts_log.dropna())
print(f'ADF Statistic (Log Transformed): {result_log[0]}')
print(f'p-value (Log Transformed): {result_log[1]}')
print(f'Critical Values (Log Transformed): {result_log[4]}')

**Output:**

ADF Statistic (Log Transformed): -14.60629558553864 p-value (Log Transformed): 4.08969119294649e-27 Critical Values (Log Transformed): {'1%': -3.467004502498507, '5%': -2.8776444997243558, '10%': -2.575355189707274}

The p-value is very small, indicating strong evidence against the null hypothesis. In this case, the null hypothesis is that the series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the log-transformed series is stationary.

The computed ADF statistic, which measures the strength of evidence against the null hypothesis of non-stationarity. In this case, the statistic is significantly negative, indicating strong evidence in favor of stationarity.

**Differencing to Remove Auto Correlation**

**Differencing to Remove Auto Correlation**

# Differencing to reduce autocorrelation
ts_diff = ts.diff().dropna()
# Plot original and differenced series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_diff, label='Differenced', linestyle='--')
plt.legend()
plt.show()

**Output:**

#### Test for Stationarity

# Test for stationarity after differencing
result_diff = adfuller(ts_diff.dropna())
print(f'ADF Statistic (Differenced): {result_diff[0]}')
print(f'p-value (Differenced): {result_diff[1]}')
print(f'Critical Values (Differenced): {result_diff[4]}')

**Output:**

ADF Statistic (Differenced): -8.439660110734907 p-value (Differenced): 1.7773358987173984e-13 Critical Values (Differenced): {'1%': -3.4492815848836296, '5%': -2.8698813715275406, '10%': -2.5712138845950587}

The p-value is very small, indicating strong evidence against the null hypothesis. In this case, the null hypothesis is that the differenced series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the differenced series is stationary.

The computed ADF statistic, indicating the strength of evidence against the null hypothesis of non-stationarity. In this case, the statistic is significantly negative, suggesting strong evidence in favor of stationarity.