How to plot Timeseries based charts using Pandas?
A series of data points collected over the course of a time period, and that are time-indexed is known as Time Series data. These observations are recorded at successive equally spaced points in time. For Example, the ECG Signal, EEG Signal, Stock Market, Weather Data, etc., all are time-indexed and recorded over a period of time. Analyzing these data, and predicting future observations has a wider scope of research.
In this article, we will see how to implement EDA — Exploratory Data Analysis using Pandas Library in Python. We will try to infer the nature of the data over a specific period of time by plotting various graphs with matplotlib.pyplot, seaborn, statsmodels, and more packages.
For easy understanding of the plots and other functions, we will be creating a sample dataset with 16 rows and 5 columns which includes Date, A, B, C, D, and E columns.
Plotting the Time-Series Data
Plotting Timeseries based Line Chart:
Line charts are used to represent the relation between two data X and Y on a different axis.
Example 1: This plot shows the variation of Column A values from Jan 2020 till April 2020. Note that the values have a positive trend overall, but there are ups and downs over the course.
Example 2: Plotting with all variables.
Plotting Timeseries based Bar Plot:
A bar plot or bar chart is a graph that represents the category of data with rectangular bars with lengths and heights that is proportional to the values which they represent. The bar plots can be plotted horizontally or vertically. A bar chart describes the comparisons between the discrete categories. One of the axis of the plot represents the specific categories being compared, while the other axis represents the measured values corresponding to those categories.
Syntax: plt.bar(x, height, width, bottom, align)
This bar plot represents the variation of the ‘A’ column values. This can be used to compare the future and the fast values.
Plotting Timeseries based Rolling Mean Plots:
The mean of an n-sized window sliding from the beginning to the end of the data frame is known as Rolling Mean. If the window doesn’t have n observations, then NaN is returned.
Here, we will plot the time series with a rolling means plot:
- The Blue Plot Line represents the original ‘A’ column values while the Red Plot Line represents the Rolling mean of ‘A’ column values of window size = 2
- Through this plot, we infer that the rolling mean of a time-series data returns values with fewer fluctuations. The trend of the plot is retained but unwanted ups and downs which are of less significance are discarded.
- For plotting the decomposition of time-series data, box plot analysis, etc., it is a good practice to use a rolling mean data frame so that the fluctuations don’t affect the analysis, especially in forecasting the trend.
Time Series Decomposition:
It shows the observations and these four elements in the same plot:
- Trend Component: It shows the pattern of the data that spans across the various seasonal periods. It represents the variation of ‘A’ values over the period of 2 years with no fluctuations.
- Seasonal Component: This plot shows the ups and downs of the ‘A’ values i.e. the recurring normal variations.
- Residual Component: This is the leftover component after decomposing the ‘A’ values data into Trend and Seasonal Component.
- Observed Component: This trend and a seasonal component can be used to study the data for various purposes.
Plotting Timeseries based Autocorrelation Plot:
It is a commonly used tool for checking randomness in a data set. This randomness is ascertained by computing autocorrelation for data values at varying time lags. It shows the properties of a type of data known as a time series. These plots are available in most general-purpose statistical software programs. It can be plotted using the pandas.plotting.autocorrelation_plot().
Syntax: pandas.plotting.autocorrelation_plot(series, ax=None, **kwargs)
- series: This parameter is the Time series to be used to plot.
- ax: This parameter is a matplotlib axes object. Its default value is None.
Returns: This function returns an object of class matplotlip.axis.Axes
Considering the trend, seasonality, cyclic and residual, this plot shows the current value of the time-series data is related to the previous values. We can see that a significant proportion of the line shows an effective correlation with time, and we can use such correlation plots to study the internal dependence of time-series data.
Plotting Timeseries based Box Plot:
Box Plot is the visual representation of the depicting groups of numerical data through their quartiles. Boxplot is also used for detecting the outlier in data set. It captures the summary of the data efficiently with a simple box and whiskers and allows us to compare easily across groups. Boxplot summarizes a sample data using 25th, 50th and 75th percentiles.
Syntax: seaborn.boxplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None, whis=1.5, ax=None, **kwargs)
x, y, hue: Inputs for plotting long-form data.
data: Dataset for plotting. If x and y are absent, this is interpreted as wide-form.
color: Color for all of the elements.
Returns: It returns the Axes object with the plot drawn onto it.
Here, through these plots, we will be able to obtain an intuition of the ‘A’ value ranges of each year (Year-wise Box Plot) as well as each month (Month-wise Box Plot). Also, through the Month-wise Box Plot, we can observe that the value range is slightly higher in Jan and Feb, compared to other months.
This plot the achieved by dividing the current value of the ‘A’ column by the shifted value of the ‘A’ column. Default Shift is by one value. This plot is used to analyze the value stability on a daily basis.
Plotting Timeseries based Heatmap:
We can interpret the trend of the “A” column values across the years sampled over 12 months, variation of values across different years, etc. We can also infer how the values have changed from the average value. This heatmap is a really useful visualization. This Heatmap shows the variation of temperature across Years as well as Months, differentiated using a Colormap.