Open In App

AutoCorrelation

Autocorrelation is a fundamental concept in time series analysis. Autocorrelation is a statistical concept that assesses the degree of correlation between the values of variable at different time points. The article aims to discuss the fundamentals and working of Autocorrelation.

What is Autocorrelation?

Autocorrelation measures the degree of similarity between a given time series and the lagged version of that time series over successive time periods. It is similar to calculating the correlation between two different variables except in Autocorrelation we calculate the correlation between two different versions Xt and Xt-k of the same time series.

Calculation of Autocorrelation

Mathematically, autocorrelation coefficient is denoted by the symbol ρ (rho) and is expressed as ρ(k), where 'k' represents the time lag or the number of intervals between the observations. The autocorrelation coefficient is computed using Pearson correlation or covariance.

For a time series dataset, the autocorrelation at lag 'k' (ρ(k)) is determined by comparing the values of the variable at time 't' with the values at time 't-k'.

[Tex]\rho(k) = \frac{Cov(X_t, X_{t-k})}{σ(X_t) \cdot σ(X_{t-k})} [/Tex]

Here,

Interpretation of Autocorrelation

Use of Autocorrelation

What is Partial Autocorrelation?

In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressed the values of the time series at all shorter lags. It is different from the autocorrelation function, which does not control other lags.

Partial correlation quantifies the relationship between a specific observation and its lagged values. This helps us to examine the direct influence of past time point on the current time point, excluding the indirect influence through the other lagged values. It seeks to determine the unique correlation between a specific time point and another time point, accounting for the influence of the time points in between.

[Tex]PACF(T_i, k) = \frac{[Cov(T_i|T_{i-1}, T_{i-2}...T_{i-k+1}], [T_{i-k}|T_{i-1}, T_{i-2}...T_{i-k+1}]}{\sigma_{[T_i|T_{i-1}, T_{i-2}...T_{i-k+1}]} \cdot \sigma_{[T_{i-k}|T_{i-k}, T_{i-2}...T_{i-k+1}]}} [/Tex]

Here,

Testing For Autocorrelation - Durbin-Watson Test

Durbin Watson test is a statistical test use to detect the presence of autocorrelation in the residuals of a regression analysis. The value of DW statistic always ranges between 0 and 4.

In stock market, positive autocorrelation (when DW<2) in stock prices suggests that the price movements have a persistent trend. Positive autocorrelation indicates that the variable increased or decreased on a previous day, there is a there is a tendency for it to follow the same direction on the current day. For example, if the stock fell yesterday, there is a higher likelihood it will fall today. Whereas the negative autocorrelation (when DW>2) indicates that if a variable increased or decreased on a previous day, there is a tendency for it to move in the opposite direction on the current day. For example, if the stock fell yesterday, there is a greater likelihood it will rise today.

Assumptions for the Durbin-Watson Test:

Calculation of DW Statistics

Where et is the residual of error from the Ordinary Least Squares (OLS) method.

The null hypothesis and alternate hypothesis for the Durbin-Watson Test are:

Formula of DW Statistics

[Tex]d = \frac{\sum_{t=2}^{T}(e_t - e_{t-1})^2}{\sum_{t=1}^{T}e_{t}^{2}} [/Tex]

Here,

Interpretation of DW Statistics

Decision Rule

Need For Autocorrelation in Time Series

Autocorrelation is important in time series as:

  1. Autocorrelation helps reveal repeating patterns or trends within a time series. By analyzing how a variable correlates with its past values at different lags, analysts can identify the presence of cyclic or seasonal patterns in the data. For example, in economic data, autocorrelation may reveal whether certain economic indicators exhibit regular patterns over specific time intervals, such as monthly or quarterly cycles.
  2. Financial analysts and traders often use autocorrelation to analyze historical price movements in financial markets. By identifying autocorrelation patterns in past price changes, they may attempt to predict future price movements. For instance, if there is a positive autocorrelation at a specific lag, indicating a trend in price movements, traders might use this information to inform their predictions and trading strategies.
  3. The Autocorrelation Function (ACF) is a crucial tool for modeling time series data. ACF helps identify which lags have significant correlations with the current observation. In time series modeling, understanding the autocorrelation structure is essential for selecting appropriate models. For instance, if there is a significant autocorrelation at a particular lag, it may suggest the presence of an autoregressive (AR) component in the model, influencing the current value based on past values. The ACF plot allows analysts to observe the decay of autocorrelation over lags, guiding the choice of lag values to include in autoregressive models.

Autocorrelation Vs Correlation

  1. Autocorrelation refers to the correlation between a variable and its past values at different lags in a time series. It focuses on understanding the temporal patterns within a single variable. Correlation representations the statistical association between two distinct variables. It focuses on accessing the strength and direction of the relationship between separate variables.
  2. Autocorrelation measures metrics as ACF and PACF, which quantify the correlation between a variable and its lagged values. Correlation measures using coefficients like Pearson correlation coefficient for linear relationships or Spearman rank correlation for non-linear relationships, providing a single value ranging from -1 to 1.

Difference Between Autocorrelation and Multicollinearity

Feature

Autocorrelation

Multicollinearity

Definition

Correlation between a variable and its lagged values

Correlation between independent variables in a model

Focus

Relationship within a single variable over time

Relationship among multiple independent variables

Purpose

Identifying temporal patterns in time series data

Detecting interdependence among predictor variables

Nature of Relationship

Examines correlation between a variable and its past values

Investigates correlation between independent variables

Impact on the model

Can lead to biased parameter estimates in time series models

Can lead to inflated standard errors and difficulty in isolating individual variable effects

Statistical Test

Ljung-Box test, Durbin-Watson statistic

Variance Inflation Factor (VIF), correlation matrix, condition indices

How to calculate Autocorrelation in Python?

This section demonstrates how to calculate the autocorrelation in python along with the interpretation of the graphs. We will be using google stock price dataset, you can download the dataset from here.

Importing Libraries and Dataset

We have used Pandas, NumPy, Matplotlib, statsmodel, linear regression model and tsaplots.

# Importing necessary dependencies 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from statsmodels.stats.stattools import durbin_watson
from statsmodels.regression.linear_model import OLS
from statsmodels.graphics.tsaplots import plot_acf

goog_stock_Data = pd.read_csv('GOOG.csv', header=0, index_col=0)
goog_stock_Data['Adj Close'].plot()
plt.show()

Output:


download-(8)


Here, we have plotted the adjusted close price of the Google stock.

Plotting Autocorrelation Function

# Plot the autocorrelation for stock price data with 0.05 significance level
plot_acf(goog_stock_Data['Adj Close'], alpha =0.05)
plt.show()

Output:

download-(9)


The graph plotted above represent autocorrelation at different lags in the time series. In the ACF plot, the x-axis typically represents the lag or time gap between observations, while the y-axis represents the autocorrelation coefficients. Here, we can see that there is some autocorrelation for significance level 0.05. The peak above the horizontal axis indicates positive autocorrelation, suggesting repeating pattern at the corresponding lag.

The Autocorrelation Function plot represents the autocorrelation coefficients for a time series dataset at different lag values.

Performing Durbin-Watson Test

#Code for Durbin Watson test
df = pd.DataFrame(goog_stock_Data,columns=['Date','Adj Close'])
X =np.arange(len(df[['Adj Close']]))
Y = np.asarray(df[['Adj Close']])
X = sm.add_constant(X)

# Fit the ordinary least square method.
ols_res = OLS(Y,X).fit()
# apply durbin watson statistic on the ols residual
durbin_watson(ols_res.resid)

Output:

0.13568583561262496

The DW statistics value is 0.13 falls in the range close to 0, indicating strong positive autocorrelation.

How to Handle Autocorrelation?

To handle autocorrelation in a model,

Also Check:

Frequently Asked Questions (FAQs)

Q. What is autocorrelation vs. correlation?

Correlation looks at how two things are connected, while autocorrelation checks how a thing is linked to its own earlier versions over time.

Q. Why is autocorrelation a problem?

Autocorrelation poses a challenge for many statistical tests since it indicates a lack of independence among values.

Q. What are the types of autocorrelations?

Types of Autocorrelations:

  • Positive Autocorrelation
  • Negative Autocorrelation
  • Zero Autocorrelation
  • Cross-Lag Autocorrelation

Q. What is the principle of autocorrelation?

The principle of autocorrelation is rooted in the idea that the values of a variable in a time series are correlated with their own past values. Autocorrelation measures the strength and direction of this relationship at different time lags.

Q. What is the difference between cross-correlation and autocorrelation?

Autocorrelation measures the correlation of a variable with its own past values, while cross-correlation measures the correlation between two different variables at various time lags. Autocorrelation focuses on the internal relationship within a single time series, while cross-correlation assesses the association between two distinct time series.

Article Tags :