How to Remove Non-Stationarity in Time Series Forecasting

Last Updated : 05 Apr, 2024

Removing non-stationarity in time series data is crucial for accurate forecasting because many time series forecasting models assume stationarity, where the statistical properties of the time series do not change over time. Non-stationarity can manifest as trends, seasonality, or other forms of irregular patterns in the data.

The article comprehensively covers techniques and tests for removing non-stationarity in time series data, crucial for accurate forecasting, including detrending, seasonal adjustment, logarithmic transformation, differencing, and ADF/KPSS tests for stationarity validation.

What is non-stationarity?

Non-stationarity refers to a property of a time series where the statistical properties of the data change over time. In other words, the mean, variance, or other statistical characteristics of the data series are not constant across different time periods. Non-stationarity can manifest in various ways, including trends, seasonality, and other irregular patterns.

Here are the key components of non-stationarity:

Trend: A trend exists when there is a long-term increase or decrease in the data over time. This could be linear, exponential, or some other form. Trends indicate systematic changes in the series over time.
Seasonality: Seasonality refers to periodic fluctuations or patterns that occur at regular intervals within the data. For example, retail sales might exhibit higher values during holiday season each year.
Variance: Variance refers to the measure of dispersion or spread of the data points around the mean. Non-constant variance, also known as heteroscedasticity, can indicate non-stationarity.
Autocorrelation: Autocorrelation occurs when the correlation between observations at different time points is not constant. This can also indicate non-stationarity, particularly if the autocorrelation structure changes over time.

How to remove non-stationarity?

Trend:

Detrending: Remove the trend component from the data. This can be achieved by fitting a regression line or using techniques like moving averages.
Differencing: Take the difference between consecutive observations to remove the trend. This can be done once or multiple times until the data becomes stationary.

Seasonality:

Seasonal Adjustment: Use techniques such as seasonal decomposition of time series (e.g., STL decomposition) to separate the seasonal component from the data.
Seasonal Differencing: Take differences between observations at the same season of different years to remove seasonality.

Variance:

Transformation: Apply transformations such as logarithmic, square root, or Box-Cox transformation to stabilize the variance and make it more constant over time.

Autocorrelation:

Differencing: Besides removing trends, differencing can also help reduce autocorrelation by eliminating dependence between consecutive observations.
Autoregressive Integrated Moving Average (ARIMA): Utilize ARIMA models, which incorporate differencing to handle autocorrelation.

Tests to Determine Stationarity

Augmented Dickey-Fuller (ADF) Test:

Null Hypothesis ([Tex]H_0[/Tex]): The time series has a unit root, indicating it is non-stationary.
Alternate Hypothesis ([Tex]H_1[/Tex]): The time series does not have a unit root, indicating it is stationary.
Test Statistic: The ADF test statistic is compared to critical values from the ADF distribution to determine whether the null hypothesis can be rejected.
Decision Rule: If the test statistic is less than the critical value, the null hypothesis is rejected, and the series is considered stationary. Otherwise, if the test statistic is greater than the critical value, the null hypothesis is not rejected, and the series is considered non-stationary.

Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test:

Null Hypothesis ([Tex]H_0[/Tex]): The series is stationary around a deterministic trend.
Alternate Hypothesis ([Tex]H_1 [/Tex]): The series has a unit root, indicating it is non-stationary.
Test Statistic: The KPSS test statistic is compared to critical values to determine whether the null hypothesis can be rejected.
Decision Rule: If the test statistic is greater than the critical value, the null hypothesis is rejected, and the series is considered non-stationary. If the test statistic is less than the critical value, the null hypothesis is not rejected, and the series is considered stationary.

Implementation of Removing Non Stationarity

This section presents essential data preprocessing techniques for achieving stationarity in time series analysis. Techniques include detrending, seasonal adjustment, logarithmic transformation, and differencing, followed by stationarity tests to validate the transformations, ensuring robust and accurate analysis of the data.

Importing Necessary Libraries and Creating Sample Data

Python3

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Sample data
date_rng = pd.date_range(start='2022-01-01', end='2022-12-31', freq='D')
ts = pd.Series(np.random.randn(len(date_rng)), index=date_rng)

Detrending using a rolling window

ts_detrended = ts - ts.rolling(window=30).mean(): This calculates the detrended series by subtracting the rolling mean from the original time series ts. The rolling(window=30).mean() computes the rolling mean over a window of size 30.
Plotting: This code plots both the original and detrended series using matplotlib.

Python3

# Detrending using a rolling window
ts_detrended = ts - ts.rolling(window=30).mean()

# Plot original and detrended series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_detrended, label='Detrended', linestyle='--')
plt.legend()
plt.show()

Output:

download-(35)-(1)

Test to determine stationarity

Python3

from statsmodels.tsa.stattools import adfuller

# Test for stationarity after detrending
result_detrended = adfuller(ts_detrended.dropna())
print(f'ADF Statistic (Detrended): {result_detrended[0]}')
print(f'p-value (Detrended): {result_detrended[1]}')
print(f'Critical Values (Detrended): {result_detrended[4]}')

Output:

ADF Statistic (Detrended): -18.559254822829608 p-value (Detrended): 2.0882820619850462e-30 Critical Values (Detrended): {'1%': -3.4500219858626227, '5%': -2.870206553997666, '10%': -2.571387268879483}

The p-value is very small, indicating strong evidence against the null hypothesis. In this case, the null hypothesis is that the series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the detrended series is stationary.

The computed ADF statistic, indicating the strength of evidence against the null hypothesis of non-stationarity. Here, it is significantly negative, suggesting strong evidence in favor of stationarity.

Seasonal Adjustment

Python3

from statsmodels.tsa.seasonal import STL

# Seasonal adjustment
stl = STL(ts, seasonal=13)  # Assuming yearly seasonality
res = stl.fit()
ts_seasonal_adj = ts - res.seasonal

# Plot original and seasonally adjusted series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_seasonal_adj, label='Seasonally Adjusted', linestyle='--')
plt.legend()
plt.show()

Output:

download-(35)-(1)

Test for stationarity:

Python3

# Test for stationarity after seasonal adjustment
result_seasonal_adj = adfuller(ts_seasonal_adj.dropna())
print(f'ADF Statistic (Seasonally Adjusted): {result_seasonal_adj[0]}')
print(f'p-value (Seasonally Adjusted): {result_seasonal_adj[1]}')
print(f'Critical Values (Seasonally Adjusted): {result_seasonal_adj[4]}')

Output:

ADF Statistic (Seasonally Adjusted): -4.651034555303582 p-value (Seasonally Adjusted): 0.00010390367939221074 Critical Values (Seasonally Adjusted): {'1%': -3.4491725955218655, '5%': -2.8698334971428574, '10%': -2.5711883591836733}

The p-value is small, indicating that there is strong evidence against the null hypothesis. In this case, the null hypothesis is that the series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the seasonally adjusted series is stationary.

The computed ADF statistic, which measures the strength of evidence against the null hypothesis of non-stationarity. In this case, the statistic is negative, indicating evidence in favor of stationarity.

Logarithmic Transformation

Python3

# Transformation (e.g., logarithmic)
ts_log = np.log(ts)

# Plot original and transformed series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_log, label='Log Transformed', linestyle='--')
plt.legend()
plt.show()

Output:

download-(37)

Test for Stationarity

Python3

# Test for stationarity after variance stabilization (log transformation)
result_log = adfuller(ts_log.dropna())
print(f'ADF Statistic (Log Transformed): {result_log[0]}')
print(f'p-value (Log Transformed): {result_log[1]}')
print(f'Critical Values (Log Transformed): {result_log[4]}')

Output:

ADF Statistic (Log Transformed): -14.60629558553864 p-value (Log Transformed): 4.08969119294649e-27 Critical Values (Log Transformed): {'1%': -3.467004502498507, '5%': -2.8776444997243558, '10%': -2.575355189707274}

The computed ADF statistic, which measures the strength of evidence against the null hypothesis of non-stationarity. In this case, the statistic is significantly negative, indicating strong evidence in favor of stationarity.

Differencing to Remove Auto Correlation

Python3

# Differencing to reduce autocorrelation
ts_diff = ts.diff().dropna()

# Plot original and differenced series
plt.figure(figsize=(14, 7))
plt.plot(ts, label='Original')
plt.plot(ts_diff, label='Differenced', linestyle='--')
plt.legend()
plt.show()

Output:

download-(38)

Test for Stationarity

Python3

# Test for stationarity after differencing
result_diff = adfuller(ts_diff.dropna())
print(f'ADF Statistic (Differenced): {result_diff[0]}')
print(f'p-value (Differenced): {result_diff[1]}')
print(f'Critical Values (Differenced): {result_diff[4]}')

Output:

ADF Statistic (Differenced): -8.439660110734907 p-value (Differenced): 1.7773358987173984e-13 Critical Values (Differenced): {'1%': -3.4492815848836296, '5%': -2.8698813715275406, '10%': -2.5712138845950587}

The p-value is very small, indicating strong evidence against the null hypothesis. In this case, the null hypothesis is that the differenced series has a unit root (i.e., it is non-stationary). The small p-value suggests that we can reject the null hypothesis and conclude that the differenced series is stationary.

The computed ADF statistic, indicating the strength of evidence against the null hypothesis of non-stationarity. In this case, the statistic is significantly negative, suggesting strong evidence in favor of stationarity.

Suggest improvement

How to Check if Time Series Data is Stationary with Python?

Share your thoughts in the comments

How to Remove Non-Stationarity in Time Series Forecasting

What is non-stationarity?

How to remove non-stationarity?

Trend:

Seasonality:

Variance:

Autocorrelation:

Tests to Determine Stationarity

Implementation of Removing Non Stationarity

Importing Necessary Libraries and Creating Sample Data

Detrending using a rolling window

Test to determine stationarity

Seasonal Adjustment

Logarithmic Transformation

Test for Stationarity

Differencing to Remove Auto Correlation

Test for Stationarity

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?