A lag plot is a special type of scatter plot in which the X-axis represents the dataset with some time units behind or ahead as compared to the Y-axis. The difference between these time units is called lag or lagged and it is represented by k.
The lag plot contains the following axes:
- Vertical axis: Yi for all i
- Horizontal axis: Yi-k for all i, where k is lag value
The lag plot is used to answer the following questions:
- Distribution of Model: Distribution of model here means deciding what is the shape of data on the basis of the lag plot. Below are some examples of lag plot and their original plot:
- If the lag plot is linear, then the underlying structure is of the autoregressive model.
- If the lag plot is of elliptical shape, then the underlying structure represents a continuous periodic function such as sine, cosine, etc.
- Outliers: Outliers are a set of data points that represent the extreme values in the distribution
- Randomness in data: The lag plot is also useful for checking whether the given dataset is random or not. If there is randomness in the data then it will be reflected in the lag plot, if there is no pattern in the lag plot.
- Seasonality: If there is seasonality in the plot then, it will give a periodic lag plot.
- Autocorrelation: If the lag plot gives a linear plot, then it means the autocorrelation is present in the data, whether there is positive autocorrelation or negative that depends upon the slope of the line of the dataset. If more data is concentrated on the diagonal in lag plot, it means there is a strong autocorrelation.
- In this implementation, we will be NumPy and SciPy libraries, these are pre-installed in Colab but it can be installed in local environment using pip install. We will be using GOOGLE stock price data and Flicker data for this implementation.