Parsing XML with DOM APIs in Python

Last Updated : 10 May, 2020

The Document Object Model (DOM) is a programming interface for HTML and XML(Extensible markup language) documents. It defines the logical structure of documents and the way a document is accessed and manipulated.

Parsing XML with DOM APIs in python is pretty simple. For the purpose of example we will create a sample XML document (sample.xml) as below:

<?xml version="1.0"?> 
<company> 
    <name>GeeksForGeeks Company</name> 
    <staff id="1"> 
        <name>Amar Pandey</name> 
        <salary>8.5 LPA</salary> 
    </staff> 
    <staff id="2"> 
        <name>Akbhar Khan</name> 
        <salary>6.5 LPA</salary> 
    </staff> 
    <staff id="3"> 
        <name>Anthony Walter</name> 
        <salary>3.2 LPA</salary> 
    </staff> 
</company> 

Now, let’s parse the above XML using python. The below code demonstrates the process,

from xml.dom import minidom 
  
doc = minidom.parse("sample.xml") 
  
# doc.getElementsByTagName returns the NodeList 
name = doc.getElementsByTagName("name")[0] 
print(name.firstChild.data) 
  
staffs = doc.getElementsByTagName("staff") 
for staff in staffs: 
        staff_id = staff.getAttribute("id") 
        name = staff.getElementsByTagName("name")[0] 
        salary = staff.getElementsByTagName("salary")[0] 
        print("id:% s, name:% s, salary:% s" %
              (staff_id, name.firstChild.data, salary.firstChild.data)) 

Output:

GeeksForGeeks Company
id:1, name: Amar Pandey, salary:8.5 LPA
id:2, name: Akbar Khan, salary:6.5 LPA
id:3, name: Anthony Walter, salary:3.2 LPA

The same can also be done using a user-defined function as shown in the code below:

from xml.dom import minidom 
  
doc = minidom.parse("sample.xml") 
  
# user-defined function 
def getNodeText(node): 
  
    nodelist = node.childNodes 
    result = [] 
    for node in nodelist: 
        if node.nodeType == node.TEXT_NODE: 
            result.append(node.data) 
    return ''.join(result) 
  
name = doc.getElementsByTagName("name")[0] 
print("Company Name : % s \n" % getNodeText(name)) 
  
  
staffs = doc.getElementsByTagName("staff") 
for staff in staffs: 
        staff_id = staff.getAttribute("id") 
        name = staff.getElementsByTagName("name")[0] 
        salary = staff.getElementsByTagName("salary")[0] 
        print("id:% s, name:% s, salary:% s" %
              (staff_id, getNodeText(name), getNodeText(salary))) 

Output:

Company Name : GeeksForGeeks Company 

id:1, name:Amar Pandey, salary:8.5 LPA
id:2, name:Akbhar Khan, salary:6.5 LPA
id:3, name:Anthony Walter, salary:3.2 LPA

Suggest improvement

XML parsing in Python

Share your thoughts in the comments

Parsing XML with DOM APIs in Python

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?