Open In App

How to create Pandas DataFrame from nested XML?

In this article, we will learn how to create Pandas DataFrame from nested XML. We will use the xml.etree.ElementTree module, which is a built-in module in Python for parsing or reading information from the XML file. The ElementTree represents the XML document as a tree and the Element represents only a single node of the tree.

Functions Used:

Here, we will use some functions to process out code which is stated below:



Approach

Input Nested XML Data




<?xml version="1.0" encoding="UTF-8"?>
  
       <Food>
  
           <Info>
           <Msg>Food Store items.</Msg>
           </Info>
  
           <store slNo="1">
               <foodItem>meat</foodItem>
               <price>200</price>
               <quantity>1kg</quantity>
               <discount>7%</discount>
           </store>
  
           <store slNo="2">
               <foodItem>fish</foodItem>
               <price>150</price>
               <quantity>1kg</quantity>
               <discount>5%</discount>
           </store>
  
           <store slNo="3">
               <foodItem>egg</foodItem>
               <price>100</price>
               <quantity>50 pieces</quantity>
               <discount>5%</discount>
           </store>
  
           <store slNo="4">
               <foodItem>milk</foodItem>
               <price>50</price>
               <quantity>1 litre</quantity>
               <discount>3%</discount>
           </store>
  
       </Food>

Example 1:

In this code below we have parsed the XML file. Give the complete path where you have saved the XML file within quotes. So here we need to use ElementTree.parse() function to read the data from the XML file and then the getroot() function to get the root. Then follow the steps given.






import xml.etree.ElementTree as ETree
import pandas as pd
  
# give the path where you saved the xml file
# inside the quotes
xmldata = "C: \\ProgramData\\Microsoft\\
    Windows\\Start Menu\\Programs\\
    Anaconda3(64-bit)\\xmltopandas.xml"
prstree = ETree.parse(xmldata)
root = prstree.getroot()
  
# print(root)
store_items = []
all_items = []
  
for storeno in root.iter('store'):
    
    store_Nr = storeno.attrib.get('slNo')
    itemsF = storeno.find('foodItem').text
    price = storeno.find('price').text
    quan = storeno.find('quantity').text
    dis = storeno.find('discount').text
  
    store_items = [store_Nr, itemsF, price, quan, dis]
    all_items.append(store_items)
  
xmlToDf = pd.DataFrame(all_items, columns=[
  'SL No', 'ITEM_NUMBER', 'PRICE', 'QUANTITY', 'DISCOUNT'])
  
print(xmlToDf.to_string(index=False))

Output:

Note: The XML file should be saved in the same directory or folder where your Python code saved.

Example 2:

We can also pass the XML content as a string inside triple quotes. In that case, we need to use the fromstring() function to read the string. Get the root using the ‘tag’ object and follow the same steps to convert it to a DataFrame as mentioned above.




import xml.etree.ElementTree as ETree
import pandas as pd
  
xmldata = '''<?xml version="1.0" encoding="UTF-8"?>
        <Food>
            <Info>
            <Msg>Food Store items.</Msg>
            </Info>
            <store slNo="1">
                <foodItem>meat</foodItem>
                <price>200</price>
                <quantity>1kg</quantity>
                <discount>7%</discount>
            </store>
            <store slNo="2">
                <foodItem>fish</foodItem>
                <price>150</price>
                <quantity>1kg</quantity>
                <discount>5%</discount>
            </store>
            <store slNo="3">
                <foodItem>egg</foodItem>
                <price>100</price>
                <quantity>50 pieces</quantity>
                <discount>5%</discount>
            </store>
            <store slNo="4">
                <foodItem>milk</foodItem>
                <price>50</price>
                <quantity>1 litre</quantity>
                <discount>3%</discount>
            </store>
        </Food>
'''
  
prstree = ETree.fromstring(xmldata)
root = prstree.tag
  
#print(root)
store_items = []
all_items = []
  
for storeno in prstree.findall('store'):  
    store_Nr = storeno.attrib.get('slNo')
    itemsF= storeno.find('foodItem').text
    price= storeno.find('price').text
    quan= storeno.find('quantity').text
    dis= storeno.find('discount').text
          
    store_items = [store_Nr,itemsF,price,quan,dis]
    all_items.append(store_items)
  
xmlToDf = pd.DataFrame(all_items,columns=[
  'SL No','ITEM_NUMBER','PRICE','QUANTITY','DISCOUNT'])        
  
print(xmlToDf.to_string(index=False))

Output:


Article Tags :