Skip to content
Related Articles

Related Articles

Improve Article
Parsing XML with DOM APIs in Python
  • Last Updated : 10 May, 2020

The Document Object Model (DOM) is a programming interface for HTML and XML(Extensible markup language) documents. It defines the logical structure of documents and the way a document is accessed and manipulated.

Parsing XML with DOM APIs in python is pretty simple. For the purpose of example we will create a sample XML document (sample.xml) as below:




<?xml version="1.0"?>
<company>
    <name>GeeksForGeeks Company</name>
    <staff id="1">
        <name>Amar Pandey</name>
        <salary>8.5 LPA</salary>
    </staff>
    <staff id="2">
        <name>Akbhar Khan</name>
        <salary>6.5 LPA</salary>
    </staff>
    <staff id="3">
        <name>Anthony Walter</name>
        <salary>3.2 LPA</salary>
    </staff>
</company>

Now, let’s parse the above XML using python. The below code demonstrates the process,




from xml.dom import minidom
  
doc = minidom.parse("sample.xml")
  
# doc.getElementsByTagName returns the NodeList
name = doc.getElementsByTagName("name")[0]
print(name.firstChild.data)
  
staffs = doc.getElementsByTagName("staff")
for staff in staffs:
        staff_id = staff.getAttribute("id")
        name = staff.getElementsByTagName("name")[0]
        salary = staff.getElementsByTagName("salary")[0]
        print("id:% s, name:% s, salary:% s" %
              (staff_id, name.firstChild.data, salary.firstChild.data))

Output:

GeeksForGeeks Company
id:1, name: Amar Pandey, salary:8.5 LPA
id:2, name: Akbar Khan, salary:6.5 LPA
id:3, name: Anthony Walter, salary:3.2 LPA

The same can also be done using a user-defined function as shown in the code below:






from xml.dom import minidom
  
doc = minidom.parse("sample.xml")
  
# user-defined function
def getNodeText(node):
  
    nodelist = node.childNodes
    result = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            result.append(node.data)
    return ''.join(result)
  
name = doc.getElementsByTagName("name")[0]
print("Company Name : % s \n" % getNodeText(name))
  
  
staffs = doc.getElementsByTagName("staff")
for staff in staffs:
        staff_id = staff.getAttribute("id")
        name = staff.getElementsByTagName("name")[0]
        salary = staff.getElementsByTagName("salary")[0]
        print("id:% s, name:% s, salary:% s" %
              (staff_id, getNodeText(name), getNodeText(salary)))

Output:

Company Name : GeeksForGeeks Company 

id:1, name:Amar Pandey, salary:8.5 LPA
id:2, name:Akbhar Khan, salary:6.5 LPA
id:3, name:Anthony Walter, salary:3.2 LPA

 Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.  

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course




My Personal Notes arrow_drop_up
Recommended Articles
Page :