Open In App
Related Articles

Basic of Time Series Manipulation Using Pandas

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Report issue
Report

Although the time series is also available in the Scikit-learn library, data science professionals use the Pandas library as it has compiled more features to work on the DateTime series. We can include the date and time for every record and can fetch the records of DataFrame. 

We can find out the data within a certain range of dates and times by using the DateTime module of Pandas library.

Let’s discuss some major objectives of time series analysis using Pandas library.

Objectives of Time Series Analysis

  • Create a series of date
  • Work with data timestamp
  • Convert string data to timestamp
  • Slicing of data using timestamp
  • Resample your time series for different time period aggregates/summary statistics
  • Working with missing data

Now, let’s do some practical analysis of some data to demonstrate the use of Pandas’ time series.

Create DateTime Values with Pandas

To create a DateTime series using Pandas, we need the DateTime module and then we can create a DateTime range with the date_range method.

Example

Python3

import pandas as pd
from datetime import datetime
import numpy as np
  
range_date = pd.date_range(start ='1/1/2019', end ='1/08/2019', freq ='Min')
print(range_date)

                    

Output
DatetimeIndex(['2019-01-01 00:00:00', '2019-01-01 00:01:00',
               '2019-01-01 00:02:00', '2019-01-01 00:03:00',
               '2019-01-01 00:04:00', '2019-01-01 00:05:00',
               '2019-01-01 00:06:00', '2019-01-01 00:07:00',
               '2019-01-01 00:08:00', '2019-01-01 00:09:00',
               ...
               '2019-01-07 23:51:00', '2019-01-07 23:52:00',
               '2019-01-07 23:53:00', '2019-01-07 23:54:00',
               '2019-01-07 23:55:00', '2019-01-07 23:56:00',
               '2019-01-07 23:57:00', '2019-01-07 23:58:00',
               '2019-01-07 23:59:00', '2019-01-08 00:00:00'],
              dtype='datetime64[ns]', length=10081, freq='T')

Explanation:

Here in this code, we have created the timestamp based on minutes for date ranges from 1/1/2019 to 8/1/2019

We can vary the frequency by hours to minutes or seconds. 

This function will help you to track the record of data stored per minute. As we can see in the output the length of the datetime stamp is 10081. 

Note: Remember pandas use data type as datetime64[ns].

Determine the Data Type of an Element in the DateTime Range

To determine the type of an element in the DateTime range, we use indexing to fetch the element and then use the type function to know its data type.

Python3

import pandas as pd
from datetime import datetime
import numpy as np
  
range_date = pd.date_range(start ='1/1/2019', end ='1/08/2019', freq ='Min')
print(type(range_date[110]))

                    

Output
<class 'pandas._libs.tslibs.timestamps.Timestamp'>

Explanation: 

We are checking the type of our object named range_date

Create DataFrame with DateTime Index

To create a DataFrame with a DateTime index, we first need to create a DateTime range and then pass it to pandas.DataFrame method.

Python3

import pandas as pd
from datetime import datetime
import numpy as np
  
range_date = pd.date_range(start ='1/1/2019', end ='1/08/2019',freq ='Min')
df = pd.DataFrame(range_date, columns =['date'])
df['data'] = np.random.randint(0, 100, size =(len(range_date)))
  
print(df.head(10))

                    

Output
                  date  data
0 2019-01-01 00:00:00    49
1 2019-01-01 00:01:00    58
2 2019-01-01 00:02:00    48
3 2019-01-01 00:03:00    96
4 2019-01-01 00:04:00    42
5 2019-01-01 00:05:00     8
6 2019-01-01 00:06:00    20
7 2019-01-01 00:07:00    96
8 2019-01-01 00:08:00    48
9 2019-01-01 00:09:00    78

Explanation:

We first created a time series then converted this data into DataFrame and used the random function to generate the random data and map over the dataframe. Then to check the result we use the print function

To do time series manipulation, we need to have a DateTime index so that DataFrame is indexed on the timestamp. Here, we are adding one more new column in the Pandas DataFrame.

Convert DateTime elements to String format

The below example demonstrates how we can convert the DateTime elements of DateTime object to string format.

Python3

import pandas as pd
from datetime import datetime
import numpy as np
  
range_date = pd.date_range(start ='1/1/2019', end ='1/08/2019',freq ='Min')
  
df = pd.DataFrame(range_date, columns =['date'])
df['data'] = np.random.randint(0, 100, size =(len(range_date)))
  
string_data = [str(x) for x in range_date]
print(string_data[1:11])

                    

Output:

['2019-01-01 00:01:00', '2019-01-01 00:02:00', '2019-01-01 00:03:00', '2019-01-01 00:04:00', '2019-01-01 00:05:00', '2019-01-01 00:06:00', '2019-01-01 00:07:00', '2019-01-01 00:08:00', '2019-01-01 00:09:00', '2019-01-01 00:10:00'] 

Explanation: 

This code just uses the elements of data_rng and converts them to string and due to a lot of data we slice the data and print the first ten values list string_data

By using the for each loop in the list, we got all the values that are in the series range_date. When we are using date_range we always have to specify the start and end date.

Accessing Specific DateTime Element

The below example demonstrates how we access specific DateTime element of DateTime object.

Python3

import pandas as pd
from datetime import datetime
import numpy as np
  
range_data = pd.date_range(start ='1/1/2019', end ='1/08/2019', freq ='Min')
df = pd.DataFrame(range_data, columns =['date'])
df['data'] = np.random.randint(0, 100, size =(len(range_data)))
  
df['datetime'] = pd.to_datetime(df['date'])
df = df.set_index('datetime')
df.drop(['date'], axis = 1, inplace = True)
  
print(df['2019-01-05'][1:11])

                    

Output
                     data
datetime                 
2019-01-05 00:01:00    99
2019-01-05 00:02:00    21
2019-01-05 00:03:00    29
2019-01-05 00:04:00    98
2019-01-05 00:05:00     0
2019-01-05 00:06:00    72
2019-01-05 00:07:00    69
2019-01-05 00:08:00    53
2019-01-05 00:09:00     3
2019-01-05 00:10:00    37

Conclusion

Time series manipulation is a very important aspect of data analysis, and the Pandas library in Python provides useful modules and functions for this task.

In this tutorial, we have explained time series manipulation using the Pandas module. We have covered different objectives of time series analysis with examples. Practice time series operations with our Python codes, and improve the learning experience.



Last Updated : 02 Feb, 2024
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads