Python | Pandas dataframe.rolling()

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas dataframe.rolling() function provides the feature of rolling window calculations. The concept of rolling window calculation is most primarily used in signal processing and time series data. In a very simple words we take a window size of k at a time and perform some desired mathematical operation on it. A window of size k means k consecutive values at a time. In a very simple case all the ‘k’ values are equally weighted.

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning - Basic Level Course

Syntax : DataFrame.rolling(window, min_periods=None, freq=None, center=False, win_type=None, on=None, axis=0, closed=None)



Parameters :
window : Size of the moving window. This is the number of observations used for calculating the statistic. Each window will be a fixed size. If its an offset then this will be the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes.
min_periods : Minimum number of observations in window required to have a value (otherwise result is NA). For a window that is specified by an offset, this will default to 1.
freq : Frequency to conform the data to before computing the statistic. Specified as a frequency string or DateOffset object.
center : Set the labels at the center of the window.
win_type : Provide a window type. See the notes below.
on : For a DataFrame, column on which to calculate the rolling window, rather than the index
closed : Make the interval closed on the ‘right’, ‘left’, ‘both’ or ‘neither’ endpoints. For offset-based windows, it defaults to ‘right’. For fixed windows, defaults to ‘both’. Remaining cases not implemented for fixed windows.
axis : int or string, default 0

Note : The freq keyword is used to confirm time series data to a specified frequency by resampling the data. This is done with the default parameters of resample() (i.e. using the mean).
If win_type=none, then all the values in the window are evenly weighted. There are various other type of rolling window type. To learn more about the other rolling window type refer this scipy documentation.

For link to CSV file Used in Code, click here. This is a stock price data of Apple for a duration of 1 year from (13-11-17) to (13-11-18)

Example #1: Rolling sum with a window of size 3 on stock closing price column




# importing pandas as pd
import pandas as pd
  
# By default the "date" column was in string format,
#  we need to convert it into date-time format
# parse_dates =["date"], converts the "date" column to date-time format
  
# Resampling works with time-series data only
# so convert "date" column to index
# index_col ="date", makes "date" column
df = pd.read_csv("apple.csv", parse_dates =["date"], index_col ="date")
  
# Printing the first 10 rows of dataframe
df[:10]




# 3 indicates the window size
# we have selected 'triang' type window
# which returns triangular type window
  
# sum() function find the sum over
# all the windows in our data frame
df.close.rolling(3, win_type ='triang').sum()

Output :

 

Example #2: Rolling window mean over a window size of 3. we use default window type which is none. So all the values will be evenly weighted.




# importing pandas as pd
import pandas as pd
  
df = pd.read_csv("apple.csv", parse_dates =["date"], index_col ="date")
  
# close is the column on which
# we are performing the operation
# mean() function finds the mean over each window
df.close.rolling(3).mean()

Output :




Article Tags :