Autocorrelation is the measure of the degree of similarity between a given time series and the lagged version of that time series over successive time periods. It is similar to calculating the correlation between two different variables except in Autocorrelation we calculate the correlation between two different versions Xt and Xt-k of the same time series.
Given time-series measurements, Y1, Y2,…YN at time X1, X2, …XN, the lag k autocorrelation function is defined as:
An autocorrelation of +1 represents perfectly positive correlations and -1 represents a perfectly negative correlation.
- An autocorrelation test is used to detect randomness in the time-series. In many statistical processes, our assumption is that the data generated is random. For checking randomness, we need to check for the autocorrelation of lag 1.
- To determine whether there is a relation between past and future values of time series, we try to lag between different values.
In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressed the values of the time series at all shorter lags. It is different from the autocorrelation function, which does not control other lags.
The formula for calculating PACF at lag k is:
where Ti | T(i-1), T(i-2) … T(i-k+1) is the value of residual (error) obtained from fitting a multivariate linear model to T(i-1), T(i-2)…T(i-k+1) for predicting Ti
Testing For Autocorrelation
Durbin-Watson test is used to measure the amount of autocorrelation in residuals from the regression analysis. Durbin Watson test is used to check for the first-order autocorrelation.
Assumptions for the Durbin-Watson Test:
- The errors are normally distributed and the mean is 0.
- The errors are stationary.
The test statistics are calculated with the following formula.
Where et is the residual of error from the Ordinary Least Squares (OLS) method.
The null hypothesis and alternate hypothesis for the Durbin-Watson Test are
- H0: No first-order autocorrelation.
- H1: There is some first-order correlation.
The Durbin Watson test has values between 0 and 4. Below is the table containing values and their interpretations:
- 2: No autocorrelation. Generally, we assume 1.5 to 2.5 as no correlation.
- 0- <2: positive autocorrelation. The more close it to 0, the more signs of positive autocorrelation.
- >2 -4: negative autocorrelation. The more close it to 4, the more signs of negative autocorrelation.
Python code implementation
pandas as pd
matplotlib.pyplot as plt
statsmodels.api as sm
plot_acf , plot_pacf
Stock Price (Adj Close) data
- Here, we can see that Durbin-Watson statistics are closer to 0. Hence, there is some positive autocorrelation to the linear model.
Autocorrelation plot for different lags
- Above is the autocorrelation plot for different lags. Here, we can see that there is some autocorrelation for significance level 0.05.
Partial Autocorrelation graph for different lags
- From the partial autocorrelation, Here, we can see for a 0.05 level of significance there is some partial autocorrelation for the different values of lags. For lag 0 the 100% partial autocorrelation is obvious but for lag 1 also the partial autocorrelation is very high.
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses
are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!