Open In App

Time Series and Forecasting Using R

Improve
Improve
Like Article
Like
Save
Share
Report

Time series forecasting is the process of using historical data to make predictions about future events. It is commonly used in fields such as finance, economics, and weather forecasting. R is a powerful programming language and software environment for statistical computing and graphics that is widely used for time series forecasting.

What is Time Series Forecasting?

Time series forecasting focuses on making predictions about future events or values using past and present data points. Data points are gathered over time, and this technique is used in a variety of fields, such as sales, finance, weather forecasting, and economics. The following are some important ideas and methods to consider when carrying out time series forecasting.

To apply any models of time series forecasting we need to make it stationary. Stationary means time series should have constant mean, constant variance and constant autocorrelation. We need to remove seasonality and trends from the data. Seasonality can be additive or multiplicative. We use different transformation on data to remove trends and seasonality. We can test the time series stationary or not by using Dickey-Fuller test.

There are multiple methods and tricks present to make the time series stationary. We need to remove seasonal elements and trends before making any forecast. Most of the real-world time series data contains both seasonality and trends. Time series forecasting model does not need these properties for better forecasting. To detrend the time series we can take moving average. This can be done by window functions . We can also use linear regression and fit a line along with the data. Removing seasonality is difficult and it needs a lot of domain knowledge. For example, in a drug sale, seasonality may exist due to winter in each year. We can use locally weighted scatterplot smoothing in some cases to remove seasonality.

Time Series Data: Time series data consists of observations or measurements collected at regular time intervals. These data points are typically plotted over time, and the goal of time series forecasting is to predict future values in this sequence.

Components of Time Series:

  • Trend: The long-term movement or direction in the data. Trends can be upward (increasing), downward (decreasing), or flat (constant).
  • Seasonality: Repeating patterns or fluctuations that occur at fixed intervals. For example, sales of winter clothing may exhibit a yearly seasonality pattern.
  • Cyclic Patterns: Longer-term, non-seasonal patterns that may not have fixed intervals. Cyclic patterns represent oscillations in the data that are not tied to a specific season.
  • Irregularity (Noise): Random, unpredictable fluctuations in the data.

Time Series Forecasting Methods

Time series forecasting methods are techniques used to make predictions about future values in a time series based on historical and current data. There are several well-established methods for time series forecasting, each with its own strengths and weaknesses. Here are some of the most commonly used time series forecasting methods.

  1. Autoregressive Integrated Moving Average (ARIMA)
  2. Seasonal Decomposition of Time Series (STL)
  3. Seasonal Autoregressive Integrated Moving-Average (SARIMA)

There are so many more methods are available but these are the most common methods for time series forecasting.

Packages for Time Series Forcasting in R

In R Programming Language There are several R packages available for time series forecasting, including.

  1. “forecast”: This package provides a wide range of methods for time series forecasting, including exponential smoothing, ARIMA, and neural networks.
  2. “tseries”: This package provides functions for time series analysis and forecasting, including functions for decomposing time series data, and for fitting and forecasting models such as ARIMA.
  3. “prophet”: This package is developed by Facebook, it provides a simple and fast way to perform time series forecasting using additive models. It is designed for business time-series data and it is easy to interpret and to use.
  4. “rugarch”: This package provides a flexible and powerful framework for fitting and forecasting volatility models, including GARCH and its variants.
  5. “stlplus”: This package provides functions for decomposing time series data using the STL algorithm, which is useful for removing seasonal and trend components from time series data.

To use these packages, first, they need to be installed and loaded into R. Then, the time series data must be prepared and cleaned, and the appropriate forecasting method can be applied. The forecasted values can then be plotted, evaluated and compared to the actual values.

Forecasting is nothing but a prediction. Analyzing time-series data, observing hidden patterns in it, and predicting future trends using the previous ones is called forecasting. Some of the cool real-world applications of time series forecasting are as follows:

Application

Timeseries data

Forecasting Results

Weather predictionThe temperature of a place collected for one monthForecast weather for the next few months in that place
Stock market price predictionStock market price data for one day and patternsPredict the stock market price for the next day
E-commerce and RetailSales data of a company for one yearPredict revenue and number of sales for next year
Industrial managementThe raw material used and available in 3-5 yearsRaw materials requirements prediction, profit prediction

Here we will use the AutoRegressive Integrated Moving Average which is nothing but the ARIMA method for forecasting using time series data. We will use AirPassengers(this dataset contains US airline passengers from 1949 to 1960) and forecast passenger data for 10 years that is from 1960-1970.

R
# this line will download forecast package in your IDE
install.packages('forecast')

library('forecast') 

To check the kind of class the “AirPassengers” dataset belongs to using we can use the class method.

R
class(AirPassengers)

Output:

ts

“ts” means it is timeseries data.

We can also see the content of the dataset.

R
AirPassengers

Output:

     Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
1952 171 180 193 181 183 218 230 242 209 191 172 194
1953 196 196 236 235 229 243 264 272 237 211 180 201
1954 204 188 235 227 234 264 302 293 259 229 203 229
1955 242 233 267 269 270 315 364 347 312 274 237 278
1956 284 277 317 313 318 374 413 405 355 306 271 306
1957 315 301 356 348 355 422 465 467 404 347 305 336
1958 340 318 362 348 363 435 491 505 404 359 310 337
1959 360 342 406 396 420 472 548 559 463 407 362 405
1960 417 391 419 461 472 535 622 606 508 461 390 432

Box Plot for AirPassengers monthly count

R
# Create a color palette for the box plot
my_colors <- rainbow(12)

# Box plot by month with customizations
boxplot(split(AirPassengers, cycle(AirPassengers)),
        xlab = "Month", ylab = "Number of Passengers",
        col = my_colors,  # Assign colors to each box
        border = "black",  # Set the border color
        main = "Monthly Air Passenger Counts by Month",
        names = month.abb,  # Use abbreviated month names as labels
        outline = FALSE)  # Remove outliers

Output:


Rplot16

Monthly Air Passenger Counts by Month


Plot the dataset to observe how the values have been changing from 1949 to 1960.

R
plot(AirPassengers)

Output:

Time v/s Passengers monthly count

Time v/s Passengers monthly count

Time series data are decomposed into three components :   

  1. Seasonal – Patterns that show how data is being changed over a certain period of time. Example – A clothing e-commerce website will have heavy traffic during festive seasons and less traffic during normal times. Here it is a seasonal pattern as value is being increased only at a certain period of time.
  2. Trend – It is a pattern that shows how values are being changed. For example how a website is running overall if running successfully trend goes up, if not, the trend comes down.
  3. Random – The remaining data of the time series after seasonal trends are removed is a random pattern. This is also known as noise.
R
data<-ts(AirPassengers, frequency=12)
d<-decompose(data, "multiplicative")
plot(d)

Output:

Patterns in the time series data

Patterns in the time series data

The parameter multiplicative is added because time series data changes with the trend, if not so, such kinds of data are called “additive”.

Now we forecast 10 years of data by using Arima() function.

R
model<-auto.arima(AirPassengers)
summary(model)

# h = 10*12 because, forecast is for 10 years for all 12 months
f<-forecast(model, level=c(95), h=10*12)
plot(f)

Output:

Series: AirPassengers 
ARIMA(2,1,1)(0,1,0)[12]
Coefficients:
ar1 ar2 ma1
0.5960 0.2143 -0.9819
s.e. 0.0888 0.0880 0.0292
sigma^2 = 132.3: log likelihood = -504.92
AIC=1017.85 AICc=1018.17 BIC=1029.35
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 1.3423 10.84619 7.86754 0.420698 2.800458 0.245628 -0.00124847
Forecast for the next 10 years

Forecast for the next 10 years

The provided ARIMA(2,1,1)(0,1,0)[12] model is designed for time series forecasting with a 12-month seasonal pattern. It includes a second-order autoregressive (AR) component, first-order differencing (I) to make the series stationary, and a first-order moving average (MA) term.

The model estimates the coefficients for these components and reports error measures. The AIC, AICc, and BIC values help assess model quality, with lower values indicating a better fit. The error measures, including RMSE and MAPE, evaluate the model’s predictive accuracy on the training data, while the log likelihood measures how well the model fits the data. Further evaluation on new data is needed to confirm its forecasting performance.

The shaded region covers all the values that can possibly occur in the future10 years and the blue color pattern is the average of all values in the shaded part. This is how we can forecast values using any time series dataset.

Advantages of using R for Time Series Forecasting:

  1. Large community: R has a large and active community of users and developers, which means that there are many resources and packages available for time series forecasting, and it also allows for easy collaboration and sharing of knowledge.
  2. Flexibility: R provides a wide range of tools and packages for time series forecasting, which allows for flexibility in selecting the appropriate method for a given dataset.
  3. Open-source: R is an open-source programming language, which means that it is free to use and can be modified to fit specific needs.
  4. Easy to use: R has a simple and intuitive syntax, which makes it easy to learn and use.
  5. High-quality visualization: R has powerful data visualization capabilities, which allows for easy interpretation and analysis of time series data.

Disadvantages of using R for Time Series Forecasting:

  1. Speed: R is an interpreted language, which can make it slower than compiled languages such as C or C++ for large datasets.
  2. Memory usage: R can be memory-intensive, which can be a problem for large datasets.
  3. Limited scalability: R is not designed for large-scale parallel computing, so it may not be suitable for large-scale time series forecasting tasks.
  4. Steep learning curve: R is a powerful programming language, but it has a steep learning curve, which can make it difficult for beginners.
  5. Lack of standardization: R provides a wide range of tools and packages for time series forecasting, which can lead to a lack of standardization in the way that time series forecasting tasks are performed, this could make it difficult to compare results across different studies.


Last Updated : 21 Mar, 2024
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads
Practice Tags :