Open In App

Autoregressive (AR) Model for Time Series Forecasting

Autoregressive models, often abbreviated as AR models, are a fundamental concept in time series analysis and forecasting. They have widespread applications in various fields, including finance, economics, climate science, and more. In this comprehensive guide, we will explore autoregressive models, how they work, their types, and practical examples.

Autoregressive Models

Autoregressive models belong to the family of time series models. These models capture the relationship between an observation and several lagged observations (previous time steps). The core idea is that the current value of a time series can be expressed as a linear combination of its past values, with some random noise.



Mathematically, an autoregressive model of order p, denoted as AR(p), can be expressed as:



Where:

Autocorrelation (ACF) in Autoregressive Models

Autocorrelation, often denoted as “ACF” (Autocorrelation Function), is a fundamental concept in time series analysis and autoregressive models. It refers to the correlation between a time series and a lagged version of itself. In the context of autoregressive models, autocorrelation measures how closely the current value of a time series is related to its past values, specifically those at different time lags.

Here’s a breakdown of the concept of autocorrelation in autoregressive models:

Types of Autoregressive Models

AR(1) Model:

AR(p) Model:

Implementing AR Model for predicting Temperature

Step 1: Importing Data

In the first step, we import the required libraries and the temperature dataset.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
 
 
# Set a random seed for reproducibility
np.random.seed(0)
 
# Load your temperature dataset with columns "Date" and "Temperature"
data = pd.read_excel('Data.xlsx')
 
# Make sure your "Date" column is in datetime format
data['Date'] = pd.to_datetime(data['Date'])
 
# Sorting the data by date (if not sorted)
data = data.sort_values(by='Date')
 
# Resetting the index
data.set_index('Date', inplace=True)
 
data.dropna(inplace=True)

                    


The data is visualized in this step.

# Visualize the data
plt.figure(figsize=(12, 6))
plt.plot( data['Temperature '], label='Data')
plt.xlabel('Date')
plt.ylabel('Temperature')
plt.legend()
plt.title('Temperature Data')
plt.show()

                    

Output:

Step 2: Data Preprocessing

Now that we have our synthetic data, we need to preprocess it. We’ll create lag features, split the data into training and testing sets, and format it for modeling.

# Adding lag features to the DataFrame
for i in range(1, 6):  # Creating lag features up to 5 days
    data[f'Lag_{i}'] = data['Temperature '].shift(i)
 
# Drop rows with NaN values resulting from creating lag features
data.dropna(inplace=True)
 
# Split the data into training and testing sets
train_size = int(0.8 * len(data))
train_data = data[:train_size]
test_data = data[train_size:]
 
# Define the input features (lag features) and target variable
 
y_train = train_data['Temperature ']
 
y_test = test_data['Temperature ']

                    

ACF Plot

The Autocorrelation Function (ACF) plot is a graphical tool used to visualize and assess the autocorrelation of a time series data at different lags. The ACF plot helps you understand how the current value of a time series is correlated with its past values. You can create an ACF plot in Python using the plot_acf function from the Stats models library.

from statsmodels.graphics.tsaplots import plot_acf
series = data['Temperature ']
plot_acf(series)
plt.show()

                    

Output:

ACF Plot

The graph shows, the autocorrelation values for the first 20 lags. The plot displays autocorrelation values at different lags, with lags on x-axis and autocorrelation values on the y-axis. The graph helps us to identify the significant lags where autocorrelation values are outside the confidence interval (indicated by the shaded region).

We can observe a significant correlation from lag=1 to lag=4. We check the correlation of the lagged values using the approach mentioned below:

data['Temperature '].corr(data['Temperature '].shift(1))

                    

Output:

0.7997281316018658

Lag=1 provides us with the highest correlation value of 0.799. Similarly, we have checked with lag= 2, 3, 4. For the shift set to 4, we get the correlation as 0.31.

Step 3: Model Building

We’ll build an autoregressive model using AutoReg model.

from statsmodels.tsa.ar_model import AutoReg
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.api import AutoReg
from sklearn.metrics import mean_absolute_error, mean_squared_error
 
# Create and train the autoregressive model
lag_order = 1 # Adjust this based on the ACF plot
ar_model = AutoReg(y_train, lags=lag_order)
ar_results = ar_model.fit()

                    

Step 4: Model Evaluation

Evaluate the model’s performance using Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE).

# Make predictions on the test set
y_pred = ar_results.predict(start=len(train_data), end=len(train_data) + len(test_data) - 1, dynamic=False)
#print(y_pred)
 
# Calculate MAE and RMSE
mae = mean_absolute_error(y_test, y_pred)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f'Mean Absolute Error: {mae:.2f}')
print(f'Root Mean Squared Error: {rmse:.2f}')

                    

Output:

Mean Absolute Error: 1.59
Root Mean Squared Error: 2.30

In the code, ar_results is an ARIMA model fitted to our time series data. To make predictions on the test set, we use the predict method of the ARIMA model. Here’s how it works:

Step 5: Visualization

Visualize the model’s predictions against the actual temperature data. Finally, the predictions made by the AutoReg model are visualized using Matplotlib library.

Actual Predictions Plot:

# Visualize the results
plt.figure(figsize=(12, 6))
plt.plot(test_data["Date"] ,y_test, label='Actual Temperature')
plt.plot( test_data["Date"],y_pred, label='Predicted Temperature', linestyle='--')
plt.xlabel('Date')
plt.ylabel('Temperature')
plt.legend()
plt.title('Temperature Prediction with Autoregressive Model')
plt.show()

                    

Output:

Forecast Plot:

# Define the number of future time steps you want to predict (1 week)
forecast_steps = 7
 
# Extend the predictions into the future for one year
future_indices = range(len(test_data), len(test_data) + forecast_steps)
future_predictions = ar_results.predict(start=len(train_data), end=len(train_data) + len(test_data) + forecast_steps - 1, dynamic=False)
 
# Create date indices for the future predictions
future_dates = pd.date_range(start=test_data['Date'].iloc[-1], periods=forecast_steps, freq='D')
 
# Plot the actual data, existing predictions, and one year of future predictions
plt.figure(figsize=(12, 6))
plt.plot(test_data['Date'], y_test, label='Actual Temperature')
plt.plot(test_data['Date'], y_pred, label='Predicted Temperature', linestyle='--')
plt.plot(future_dates, future_predictions[-forecast_steps:], label='Future Predictions', linestyle='--', color='red')
plt.xlabel('Date')
plt.ylabel('Temperature')
plt.legend()
plt.title('Temperature Prediction with Autoregressive Model')
plt.show()

                    

Output:

Benefits and Drawbacks of Autoregressive Models

Autoregressive models (AR models) are a class of time series models that have their own set of benefits and drawbacks. Understanding these can help in choosing when to use them and when to consider alternative modeling approaches.

Benefits of Autoregressive Models:

Drawbacks of Autoregressive Models:

Conclusion

Autoregressive (AR) models provide a powerful framework for analyzing and forecasting time series data. We explored the fundamental concepts of AR models, from understanding autocorrelation to fitting models and making future predictions. By generating a simulated temperature dataset, we were able to apply AR modeling. AR models are particularly useful when dealing with stationary time series data, where past values influence future observations. The choice of lag order is a crucial step, and it can be determined by examining the Autocorrelation Function (ACF) plot.

As we demonstrated, AR models offer a practical approach to forecasting. However, they have their limitations and are most effective when the underlying data exhibits some degree of autocorrelation. For more complex time series data, other models like ARIMA or SARIMA may be more appropriate.

The ability to make accurate forecasts is a valuable asset in various domains, from finance to economics and beyond. By mastering Autoregressive models and understanding their applications, analysts and data scientists can make informed decisions based on historical data, helping to anticipate future trends and make better choices.


Article Tags :