Skip to content
Related Articles

Related Articles

Improve Article
How to add metadata to a DataFrame or Series with Pandas in Python?
  • Last Updated : 11 Dec, 2020

Metadata, also known as data about the data. Metadata can give us data description, summary, storage in memory, and datatype of that particular data. We are going to display and create metadata.

Scenario:

  • We can get metadata simply by using info() command
  • We can add metadata to the existing data and can view the metadata of the created data.

Steps:

  • Create a data frame
  • View the metadata which is already existing
  • Create the metadata and view the metadata.

Here, we are going to create a data frame, and we can view and create metadata on the created data frame

View existing Metadata methods:



  • dataframe_name.info() – It will return the data types null values and memory usage in tabular format
  • dataframe_name.columns() – It will return an array which includes all the column names in the data frame
  • dataframe_name.describe() – It will give the descriptive statistics of the given numeric data frame column like mean, median, standard deviation etc.

Create Metadata

We can create the metadata for the particular data frame using dataframe.scale() and dataframe.offset() methods. They are used to represent the metadata.

Syntax:

dataframe_name.scale=value

dataframe_name.offset=value

Below are some examples which depict how to add metadata to a DataFrame or Series:

Example 1

Initially create and display a dataframe.



Python3




# import required modules
import pandas as pd
  
# intialise data of lists using dictionary
data = {'Name': ['Sravan', 'Deepak', 'Radha', 'Vani'],
        'College': ['vignan', 'vignan Lara', 'vignan', 'vignan'],
        'Department': ['CSE', 'IT', 'IT', 'CSE'],
        'Profession': ['Student', 'Assistant Professor',
                       'Programmer & ass. Proff',
                       'Programmer & Scholar'],
        'Age': [22, 32, 45, 37]
        }
  
# create dataframe
df = pd.DataFrame(data)
  
# print dataframe
df

Output:

Then check dataframe attributes and description.

Python3




# data information
df.info()
  
# data columns description
df.columns
  
# describing columns
df.describe()

Output:

Initialize offset and scale of the dataframe.

Python3




# initializing scale and offset
# for creating meta data
df.scale = 0.1
df.offset = 15
  
# display scale and and offset
print('Scale:', df.scale)
print('Offset:', df.offset)

Output:



We are storing data in hdf5 file format, and then we will display the dataframe along with its stored metadata. 

Python3




# store in hdf5 file format
storedata = pd.HDFStore('college_data.hdf5')
  
# data
storedata.put('data_01', df)
  
# including metadata
metadata = {'scale': 0.1, 'offset': 15}
  
# getting attributes
storedata.get_storer('data_01').attrs.metadata = metadata
  
# closing the storedata
storedata.close()
  
# getting data
with pd.HDFStore('college_data.hdf5') as storedata:
    data = storedata['data_01']
    metadata = storedata.get_storer('data_01').attrs.metadata
  
# display data
print('\nDataframe:\n', data)
  
# display stored data
print('\nStored Data:\n', storedata)
  
# display metadata
print('\nMetadata:\n', metadata)

Output:

Example 2

Series data structure in pandas will not support info and all methods. So we directly create metadata and display.

Python3




# import required module
import pandas as pd
  
# intialise data of lists using dictionary.
data = {'Name': ['Sravan', 'Deepak', 'Radha', 'Vani'],
        'College': ['vignan', 'vignan Lara', 'vignan', 'vignan'],
        'Department': ['CSE', 'IT', 'IT', 'CSE'],
        'Profession': ['Student', 'Assistant Professor',
                       'Programmer & ass. Proff',
                       'Programmer & Scholar'],
        'Age': [22, 32, 45, 37]
        }
  
# Create series
ser = pd.Series(data)
  
# display data
ser

Output:

Now we will store the metadata and then display it.

Python3




# storing data in hdf5 file format
storedata = pd.HDFStore('college_data.hdf5')
  
# data
storedata.put('data_01', ser)
  
# mentioning scale and offset
metadata = {'scale': 0.1, 'offset': 15}
  
storedata.get_storer('data_01').attrs.metadata = metadata
  
# storing close
storedata.close()
  
# getting attributes
with pd.HDFStore('college_data.hdf5') as storedata:
    data = storedata['data_01']
    metadata = storedata.get_storer('data_01').attrs.metadata
  
# display data
print('\nData:\n', data)
  
# display stored data
print('\nStored Data:\n', storedata)
  
# display Metadata
print('\nMetadata:\n', metadata)

Output:

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course

My Personal Notes arrow_drop_up
Recommended Articles
Page :