Open In App

Python | ARIMA Model for Time Series Forecasting

Last Updated : 19 Feb, 2020
Like Article

A Time Series is defined as a series of data points indexed in time order. The time order can be daily, monthly, or even yearly. Given below is an example of a Time Series that illustrates the number of passengers of an airline per month from the year 1949 to 1960.

Time Series Forecasting
Time Series forecasting is the process of using a statistical model to predict future values of a time series based on past results.

Some Use Cases

  • To predict the number of incoming or churning customers.
  • To explaining seasonal patterns in sales.
  • To detect unusual events and estimate the magnitude of their effect.
  • To Estimate the effect of a newly launched product on number of sold units.
  • Components of a Time Series:

    • Trend:The trend shows a general direction of the time series data over a long period of time. A trend can be increasing(upward), decreasing(downward), or horizontal(stationary).
    • Seasonality:The seasonality component exhibits a trend that repeats with respect to timing, direction, and magnitude. Some examples include an increase in water consumption in summer due to hot weather conditions, or an increase in the number of airline passengers during holidays each year.
    • Cyclical Component: These are the trends with no set repetition over a particular period of time. A cycle refers to the period of ups and downs, booms and slums of a time series, mostly observed in business cycles. These cycles do not exhibit a seasonal variation but generally occur over a time period of 3 to 12 years depending on the nature of the time series.
    • Irregular Variation: These are the fluctuations in the time series data which become evident when trend and cyclical variations are removed. These variations are unpredictable, erratic, and may or may not be random.
    • ETS Decomposition
      ETS Decomposition is used to separate different components of a time series. The term ETS stands for Error, Trend, and Seasonality.
    • Code: ETS Decomposition of Airline Passengers Dataset:

      # Importing required libraries
      import numpy as np
      import pandas as pd
      import matplotlib.pylot as plt
      from statsmodels.tsa.seasonal import seasonal_decompose
      # Read the AirPassengers dataset
      airline = pd.read_csv('AirPassengers.csv',
                             index_col ='Month',
                             parse_dates = True)
      # Print the first five rows of the dataset
      # ETS Decomposition
      result = seasonal_decompose(airline['# Passengers'], 
                                  model ='multiplicative')
      # ETS plot 



      ARIMA Model for Time Series Forecasting
      ARIMA stands for autoregressive integrated moving average model and is specified by three order parameters: (p, d, q).

    • AR(p) Autoregression – a regression model that utilizes the dependent relationship between a current observation and observations over a previous period.An auto regressive (AR(p)) component refers to the use of past values in the regression equation for the time series.
    • I(d) Integration – uses differencing of observations (subtracting an observation from observation at the previous time step) in order to make the time series stationary. Differencing involves the subtraction of the current values of a series with its previous values d number of times.
    • MA(q) Moving Average – a model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations. A moving average component depicts the error of the model as a combination of previous error terms. The order q represents the number of terms to be included in the model.
    • Types of ARIMA Model

    • ARIMA:Non-seasonal Autoregressive Integrated Moving Averages
    • SARIMA:Seasonal ARIMA
    • SARIMAX:Seasonal ARIMA with exogenous variables
    • Pyramid Auto-ARIMA

      The ‘auto_arima’ function from the ‘pmdarima’ library helps us to identify the most optimal parameters for an ARIMA model and returns a fitted ARIMA model.

      Code : Parameter Analysis for the ARIMA model

      # To install the library
      pip install pmdarima
      # Import the library
      from pmdarima import auto_arima
      # Ignore harmless warnings
      import warnings
      # Fit auto_arima function to AirPassengers dataset
      stepwise_fit = auto_arima(airline['# Passengers'], start_p = 1, start_q = 1,
                                max_p = 3, max_q = 3, m = 12,
                                start_P = 0, seasonal = True,
                                d = None, D = 1, trace = True,
                                error_action ='ignore',   # we don't want to know if an order does not work
                                suppress_warnings = True# we don't want convergence warnings
                                stepwise = True)           # set to stepwise
      # To print the summary



      Code : Fit ARIMA Model to AirPassengers dataset

      # Split data into train / test sets
      train = airline.iloc[:len(airline)-12]
      test = airline.iloc[len(airline)-12:] # set one year(12 months) for testing
      # Fit a SARIMAX(0, 1, 1)x(2, 1, 1, 12) on the training set
      from statsmodels.tsa.statespace.sarimax import SARIMAX
      model = SARIMAX(train['# Passengers'], 
                      order = (0, 1, 1), 
                      seasonal_order =(2, 1, 1, 12))
      result =



      Code : Predictions of ARIMA Model against the test set

      start = len(train)
      end = len(train) + len(test) - 1
      # Predictions for one-year against the test set
      predictions = result.predict(start, end,
                                   typ = 'levels').rename("Predictions")
      # plot predictions and actual values
      predictions.plot(legend = True)
      test['# Passengers'].plot(legend = True)



      Code : Evaluate the model using MSE and RMSE

      # Load specific evaluation tools
      from sklearn.metrics import mean_squared_error
      from import rmse
      # Calculate root mean squared error
      rmse(test["# Passengers"], predictions)
      # Calculate mean squared error
      mean_squared_error(test["# Passengers"], predictions)



      Code : Forecast using ARIMA Model

      # Train the model on the full dataset
      model = model = SARIMAX(airline['# Passengers'], 
                              order = (0, 1, 1), 
                              seasonal_order =(2, 1, 1, 12))
      result =
      # Forecast for the next 3 years
      forecast = result.predict(start = len(airline), 
                                end = (len(airline)-1) + 3 * 12
                                typ = 'levels').rename('Forecast')
      # Plot the forecast values
      airline['# Passengers'].plot(figsize = (12, 5), legend = True)
      forecast.plot(legend = True)



      Previous Article
      Next Article

    Similar Reads

    ARIMA vs SARIMA Model
    Time series data, consisting of observations measured at regular intervals, is prevalent across various domains. Accurately forecasting future values from this data is crucial for informed decision-making. Two powerful statistical models, ARIMA and SARIMA, are widely used in time series forecasting. In this tutorial, we will explore the difference
    11 min read
    Random Forest for Time Series Forecasting using R
    Random Forest is an ensemble machine learning method that can be used for time series forecasting. It is based on decision trees and combines multiple decision trees to make more accurate predictions. Here's a complete explanation along with an example of using Random Forest for time series forecasting in R. Time Series ForecastingTime series forec
    7 min read
    Univariate Time Series Analysis and Forecasting
    Time series data is one of the most challenging tasks in machine learning as well as the real-world problems related to data because the data entities not only depend on the physical factors but mostly on the chronological order in which they have occurred. We can forecast a target value in the time series based on a single feature that is univaria
    18 min read
    Time Series and Forecasting Using R
    Time series forecasting is the process of using historical data to make predictions about future events. It is commonly used in fields such as finance, economics, and weather forecasting. R is a powerful programming language and software environment for statistical computing and graphics that is widely used for time series forecasting. What is Time
    10 min read
    Time Series Analysis and Forecasting
    Time series analysis and forecasting are crucial for predicting future trends, behaviors, and behaviours based on historical data. It helps businesses make informed decisions, optimize resources, and mitigate risks by anticipating market demand, sales fluctuations, stock prices, and more. Additionally, it aids in planning, budgeting, and strategizi
    22 min read
    How to Remove Non-Stationarity in Time Series Forecasting
    Removing non-stationarity in time series data is crucial for accurate forecasting because many time series forecasting models assume stationarity, where the statistical properties of the time series do not change over time. Non-stationarity can manifest as trends, seasonality, or other forms of irregular patterns in the data. The article comprehens
    7 min read
    Time Series Forecasting with Support Vector Regression
    Time series forecasting is a critical aspect of data analysis, with applications spanning from financial markets to weather predictions. In recent years, Support Vector Regression (SVR) has emerged as a powerful tool for time series forecasting due to its ability to handle nonlinear relationships and high-dimensional data. In this project, we'll de
    11 min read
    How to Predict NaN (Missing Values) of a Dataframe Using ARIMA in Python?
    Answer: Use ARIMA to model the time series excluding NaNs, then predict the missing values based on the fitted model and insert these predictions back into the original series.Predicting missing values in a time series data using the ARIMA (AutoRegressive Integrated Moving Average) model involves several key steps. ARIMA is a popular method for tim
    2 min read
    Box-Jenkins Methodology for ARIMA Models
    Time series data records data points with respect to time intervals. The analysis of such dataset is important to recognize patterns and making predictions as well as providing informative insights. Box-Jenkins model is a forecasting method that is used to forecasts time series data for a specific period of time. In this article we will be taking a
    11 min read
    Inventory Demand Forecasting using Machine Learning - Python
    The vendors who are selling everyday items need to keep their stock up to date so, that no customer returns from their shop empty hand. Inventory Demand Forecasting using Machine Learning In this article, we will try to implement a machine learning model which can predict the stock amount for the different products which are sold in different store
    7 min read
    Article Tags :
    Practice Tags :