Open In App

Model Selection for ARIMA

Time series data analysis plays a pivotal role in various fields such as finance, economics, weather forecasting, and more. The Autoregressive Integrated Moving Average (ARIMA) model stands as one of the fundamental tools for forecasting future values based on historical patterns within time series data. However, selecting the appropriate parameters for an ARIMA model is crucial to ensure accurate predictions.

What is ARIMA?

ARIMA, standing for Autoregressive Integrated Moving Average, is a widely used statistical method for time series forecasting. It combines three key components to model data:



  1. Autoregression (AR): This component relates the present value to its past values through a regression equation.
  2. Differencing (I for Integrated): It involves differencing the time series data to make it stationary, ensuring that the mean and variance are constant over time.
  3. Moving Average (MA): This component uses the dependency between an observation and a residual error from a moving average model applied to lagged observations.

Components of ARIMA

1. Autoregression (AR):

The autoregressive part (AR) of an ARIMA model is represented by the parameter p. It signifies the dependence of the current observation on its previous values. Mathematically, an AR(p) model can be represented as:



Here, Yt is the current observation, c is a constant, ϕ1 to ϕp are the autoregressive parameters, and ϵt represents the error term at time t.

2. Differencing (I):

The differencing part of ARIMA is represented by the parameter d. It involves transforming a non-stationary time series into a stationary one by differencing consecutive observations. The differencing operation can be applied multiple times until stationarity is achieved. The formula for differencing is straightforward:

Yt`= Yt -Yt-1

Here:

The differencing process is typically applied multiple times until stationarity is achieved. The notation I(d) indicates the order of differencing required for stationarity.

3. Moving Average (MA):

The moving average part (MA) of an ARIMA model is represented by the parameter q. It indicates the dependence of the current observation on the previous forecast errors. Mathematically, an MA(q) model can be represented as:

Here, Yt is the current observation, c is a constant, ϵt is the error at time t, and θ1 to θq are the moving average parameters.

Final Formula of ARIMA:

The general formula for a non-seasonal ARIMA model is represented as ARIMA(p,d,q):

Here:

The terms p,d,q in ARIMA(p,d,q) indicate:

The ARIMA model aims to capture the temporal dependencies and patterns in the time series data, making it suitable for forecasting future values.

Working Principles

  1. Identifying Stationarity: ARIMA models require the time series data to be stationary. Stationarity implies that the statistical properties of the time series (like mean and variance) remain constant over time.
  2. Parameter Estimation: Estimating the parameters p, d, and q involves analyzing the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots of the time series data. ACF helps determine the MA order (q), while PACF aids in determining the AR order (p).
  3. Model Fitting: Once the parameters are determined, the ARIMA model is fitted to the data. This involves minimizing the error (often using methods like maximum likelihood estimation) to obtain the most suitable coefficients for the autoregressive and moving average terms.
  4. Forecasting: After fitting the model, it can be used to forecast future values by iterating over time.

Mathematical Aspects

Model Parameters in ARIMA

The ARIMA model is defined by three main parameters: p, d, and q.

  1. p (AR order): Represents the number of autoregressive terms and is denoted by p. It refers to the number of past observations that directly influence the current value.
  2. d (Differencing order): Represents the number of differences needed to make the time series stationary. It involves computing the differences between consecutive observations.
  3. q (MA order): Denoted by q, it represents the number of lagged forecast errors in the prediction equation.

Selecting the appropriate values for these parameters significantly impacts the model’s forecasting capability. However, determining the right values is often a challenging task.

Model Selection Methods for ARIMA

1. Visual Inspection:

2. Parameter Grid Search:

3. Automated Techniques:

4. Cross-validation:

5. Information Criteria:

6. Model Comparison:

7. Stepwise Methods:

ARIMA Model Selection in R

Loading Libraries:

Loading Dataset:

# Load necessary libraries
library(forecast)
library(tseries)
 
# Load the AirPassengers dataset (built-in in R)
data("AirPassengers")

                    

Convert to Time Series:

Visual Inspection:

# Convert the dataset to a time series object
passengers_ts <- ts(AirPassengers, frequency = 12)
 
# Visual inspection
# Time series plot
plot(passengers_ts, main = "International Airline Passengers")

                    

Output:


Model Selection for ARIMA


ACF Plot

# Automatically determine the lags using 'acf' and 'pacf' functions
acf(passengers_ts, main = "ACF Plot")
pacf(passengers_ts, main = "PACF Plot")

                    

Output:


Model Selection for ARIMA


PACF Plot


Model Selection for ARIMA


Parameter Grid Search:

Automated Model Selection:

# Parameter grid search
# Grid search using auto.arima with a range of possible values for p, d, and q
auto_model <- auto.arima(passengers_ts, seasonal = FALSE, stepwise = FALSE,
                         approximation = FALSE,
                         ic = "aic")
 
# Automated technique - Auto-ARIMA
# Using the auto.arima function for automated model selection
auto_arima_model <- auto.arima(passengers_ts)

                    

Cross-validation:

Choosing the Best Model:

# Cross-validation
# Time series cross-validation
cv <- tsCV(passengers_ts, function(x) forecast(auto_arima_model, h = 1)$mean)
 
# Choosing the best model based on information criteria
best_model <- auto.arima(passengers_ts, ic = "aic"# or "bic" for BIC
best_model

                    

Output:

Series: passengers_ts 
ARIMA(0,1,1)(2,1,0)[12]

Coefficients:
ma1 sar1 sar2
-0.3634 -0.1239 0.1911
s.e. 0.0899 0.0934 0.1036

sigma^2 = 133.5: log likelihood = -505.59
AIC=1019.18 AICc=1019.5 BIC=1030.68

This output provides information about the ARIMA model with drift selected based on the AIC criterion for the AirPassengers dataset.

Loading Libraries and Dataset

# Load necessary libraries
library(forecast)
library(tseries)
 
# Load the Johnson & Johnson quarterly earnings dataset
data("JohnsonJohnson")

                    

Convert Data to Time Series

# Convert the dataset to a time series object
jj_ts <- ts(JohnsonJohnson, start = c(1960, 1), frequency = 4)
 
# Plot the time series data
plot(jj_ts, main = "Johnson & Johnson Quarterly Earnings per Share")

                    

Output:


Model Selection for ARIMA


Visual Inspection and Parameter Identification:

# ACF and PACF plots for identifying parameters
acf(jj_ts, main = "ACF Plot")
pacf(jj_ts, main = "PACF Plot")

                    

Output:


Model Selection for ARIMA

Model Selection for ARIMA


acf(jj_ts, main = “ACF Plot”) and pacf(jj_ts, main = “PACF Plot”) generate Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots to identify potential values for the ARIMA model’s parameters.

Conclusion

The selection of ARIMA model parameters is a critical aspect of time series forecasting. Employing a combination of visual inspection, systematic search methods, automated techniques, and cross-validation aids in identifying the most appropriate values for p, d, and q. Nonetheless, understanding the data’s characteristics and the trade-offs between model complexity and accuracy remains essential for effective model selection in ARIMA-based forecasting.


Article Tags :