# How to Check if Time Series Data is Stationary with Python?

Time series data are generally characterized by their temporal nature. This temporal nature adds a trend or seasonality to the data that makes it compatible for time series analysis and forecasting. Time-series data is said to be stationary if it doesn’t change with time or if they don’t have a temporal structure. So, it is highly necessary to check if the data is stationary. In time series forecasting, we cannot derive valuable insights from data if it is stationary.

Example plot of stationary data:

## Types of stationarity:

When it comes to identifying if the data is stationary, it means identifying the fine-grained notions of stationarity in the data. The types of stationarity observed in time series data include

1. Trend Stationary –  A time series that does not show a trend.
2. Seasonal Stationary – A time series that does not show seasonal changes.
3. Strictly Stationary – The joint distribution of observations is invariant to time shift.

## Stepwise Implementation

The following steps will let the user easily understand the method to check the given time series data is stationary.

## Python3

 `# import python pandas library` `import` `pandas as pd`   `# import python matplotlib library for plotting` `import` `matplotlib.pyplot as plt`   `# read the dataset using pandas read_csv() ` `# function` `data ``=` `pd.read_csv(``"daily-total-female-births-IN.csv"``,` `                   ``header``=``0``, index_col``=``0``)`   `# use simple line plot to see the distribution ` `# of the data` `plt.plot(data)`

Output:

### Step 2: Evaluating the descriptive statistics

This is usually done by splitting the data into two or more partitions and calculating the mean and variance for each group. If these first-order moments are consistent among these partitions, then we can assume that the data is stationary. Let’s use airlines passenger count data set between 1949 – 1960.

## Python3

 `# import python pandas library` `import` `pandas as pd`   `# import python matplotlib library for` `# plotting` `import` `matplotlib.pyplot as plt`   `# read the dataset using pandas read_csv() ` `# function` `data ``=` `pd.read_csv(``"AirPassengers.csv"``,` `                   ``header``=``0``, index_col``=``0``)`   `# print the first 6 rows of data` `print``(data.head(``10``))`   `# use simple line plot to understand the ` `# data distribution` `plt.plot(data)`

Output:

Now, let’s partition this data into different groups and calculate the mean and variance of different groups and check for consistency.

## Python3

 `# import the python pandas library` `import` `pandas as pd`   `# use pandas read_csv() function to read the dataset.` `data ``=` `pd.read_csv(``"AirPassengers.csv"``, header``=``0``, index_col``=``0``)`   `# extracting only the air passengers count from` `# the dataset using values function` `values ``=` `data.values`   `# getting the count to split the dataset into 3` `parts ``=` `int``(``len``(values)``/``3``)`   `# splitting the data into three parts` `part_1, part_2, part_3 ``=` `values[``0``:parts], values[parts:(` `    ``parts``*``2``)], values[(parts``*``2``):(parts``*``3``)]`   `# calculating the mean of the separated three ` `# parts of data individually.` `mean_1, mean_2, mean_3 ``=` `part_1.mean(), part_2.mean(), part_3.mean()`   `# calculating the variance of the separated ` `# three parts of data individually.` `var_1, var_2, var_3 ``=` `part_1.var(), part_2.var(), part_3.var()`   `# printing the mean of three groups` `print``(``'mean1=%f, mean2=%f, mean2=%f'` `%` `(mean_1, mean_2, mean_3))`   `# printing the variance of three groups` `print``(``'variance1=%f, variance2=%f, variance2=%f'` `%` `(var_1, var_2, var_3))`

Output:

The output clearly implies that the mean and variance of the three groups are considerably different from each other describing the data is non-stationary. Say for example if the means where mean_1 = 150, mean_2 = 160, mean_3 = 155 and variance_1 = 33, variance_2 = 35, variance_3 = 37, then we can conclude that the data is stationary. Sometimes this method can fail for some distributions, like log-norm distributions.

Let’s try the same example as above but take the log of the passengers’ count using NumPy’s log() function and check the results.

## Python3

 `# import python pandas library` `import` `pandas as pd`   `# import python matplotlib library for plotting` `import` `matplotlib.pyplot as plt`   `# import python numpy library` `import` `numpy as np`   `# read the dataset using pandas read_csv()` `# function` `data ``=` `pd.read_csv(``"AirPassengers.csv"``, header``=``0``, index_col``=``0``)`   `# extracting only the air passengers count ` `# from the dataset using values function` `values ``=` `log(data.values)`   `# printing the first 15 passenger count values` `print``(values[``0``:``15``])`   `# using simple line plot to understand the ` `# data distribution` `plt.plot(values)`

Output:

The output signifies there is some trend but not very steep as the previous case, now let’s compute the partition mean and variance.

## Python3

 `# getting the count to split the dataset` `# into 3 parts` `parts ``=` `int``(``len``(values)``/``3``)`   `# splitting the data into three parts.` `part_1, part_2, part_3 ``=` `values[``0``:parts], values[parts:(parts``*``2``)], values[(parts``*``2``):(parts``*``3``)]`   `# calculating the mean of the separated three ` `# parts of data individually.` `mean_1, mean_2, mean_3 ``=` `part_1.mean(), part_2.mean(), part_3.mean()`   `# calculating the variance of the separated three ` `# parts of data individually.` `var_1, var_2, var_3 ``=` `part_1.var(), part_2.var(), part_3.var()`   `# printing the mean of three groups` `print``(``'mean1=%f, mean2=%f, mean2=%f'` `%` `(mean_1, mean_2, mean_3))`   `# printing the variance of three groups` `print``(``'variance1=%f, variance2=%f, variance2=%f'` `%` `(var_1, var_2, var_3))`

Output:

Ideally, we would have expected the mean and variance to be very different but they are the same, in such cases, this method can terribly fail. In order to avoid this, we have another statistical test which is discussed below.

### Step 3: Augmented Dickey-Fuller test

This is a statistical test that is dedicatedly built to test whether univariate time series data is stationary or not. This test is based on a hypothesis and can tell us the degree of probability to which it can be accepted. It is often classified under one of the unit root tests, It determines how strongly, a univariate time series data follows a trend. Let’s define the null and alternate hypotheses,

• Ho (Null Hypothesis): The time series data is non-stationary
• H1 (alternate Hypothesis): The time series data is stationary

Assume alpha = 0.05, meaning (95% confidence). The test results are interpreted with a p-value if p > 0.05 fails to reject the null hypothesis, else if p <= 0.05 reject the null hypothesis. Now, let’s use the same air passengers dataset and test it using adfuller() statistical function provided by the stats model package, to check whether the data is stationary or not.

## Python3

 `# import python pandas package` `import` `pandas as pd`   `# import the adfuller function from statsmodel ` `# package to perform ADF test` `from` `statsmodels.tsa.stattools ``import` `adfuller`   `# read the dataset using pandas read_csv() function` `data ``=` `pd.read_csv(``"AirPassengers.csv"``, header``=``0``, index_col``=``0``)`   `# extracting only the passengers count using values function` `values ``=` `data.values`   `# passing the extracted passengers count to adfuller function.` `# result of adfuller function is stored in a res variable` `res ``=` `adfuller(values)`   `# Printing the statistical result of the adfuller test` `print``(``'Augmneted Dickey_fuller Statistic: %f'` `%` `res[``0``])` `print``(``'p-value: %f'` `%` `res[``1``])`   `# printing the critical values at different alpha levels.` `print``(``'critical values at different levels:'``)` `for` `k, v ``in` `res[``4``].items():` `    ``print``(``'\t%s: %.3f'` `%` `(k, v))`

Output:

As per our hypothesis, the ADF statistic is much greater than the critical values at different levels, and also the p-value is also greater than 0.05 which signifies, we can fail to reject the null hypothesis at 90%, 95%, and 99% confidence, meaning the time series data is strongly non-stationary.

Now, let’s try running the ADF test to the log normed values and cross-check our results.

## Python3

 `# import python pandas package` `import` `pandas as pd`   `# import the adfuller function from statsmodel` `# package to perform ADF test` `from` `statsmodels.tsa.stattools ``import` `adfuller`   `# import python numpy package` `import` `numpy as np`   `# read the dataset using pandas read_csv() function` `data ``=` `pd.read_csv(``"AirPassengers.csv"``, header``=``0``, index_col``=``0``)`   `# extracting only the passengers count using ` `# values function and applying log transform on it.` `values ``=` `log(data.values)`   `# passing the extracted passengers count to adfuller function.` `# result of adfuller function is stored in a res variable` `res ``=` `adfuller(values)`   `# Printing the statistical result of the adfuller test` `print``(``'Augmneted Dickey_fuller Statistic: %f'` `%` `res[``0``])` `print``(``'p-value: %f'` `%` `res[``1``])`   `# printing the critical values at different alpha levels.` `print``(``'critical values at different levels:'``)` `for` `k, v ``in` `res[``4``].items():` `    ``print``(``'\t%s: %.3f'` `%` `(k, v))`

Output:

As you can see, the ADF test one more times shows that the ADF statistic is much greater than the critical values at different levels, and also the p-value is much greater than 0.05 which signifies, we can fail to reject the null hypothesis at 90%, 95%, and 99% confidence, meaning the time series data is strongly non-stationary.

Hence, the ADF unit root test stands out to be a robust test to check whether a time series data is stationary or not.

Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!

Previous
Next