Parsing XML with DOM APIs in Python

The Document Object Model (DOM) is a programming interface for HTML and XML(Extensible markup language) documents. It defines the logical structure of documents and the way a document is accessed and manipulated.

Parsing XML with DOM APIs in python is pretty simple. For the purpose of example we will create a sample XML document (sample.xml) as below:

filter_none

edit
close

play_arrow

link
brightness_4
code

<?xml version="1.0"?>
<company>
    <name>GeeksForGeeks Company</name>
    <staff id="1">
        <name>Amar Pandey</name>
        <salary>8.5 LPA</salary>
    </staff>
    <staff id="2">
        <name>Akbhar Khan</name>
        <salary>6.5 LPA</salary>
    </staff>
    <staff id="3">
        <name>Anthony Walter</name>
        <salary>3.2 LPA</salary>
    </staff>
</company>

chevron_right


Now, let’s parse the above XML using python. The below code demonstrates the process,

filter_none

edit
close

play_arrow

link
brightness_4
code

from xml.dom import minidom
  
doc = minidom.parse("sample.xml")
  
# doc.getElementsByTagName returns the NodeList
name = doc.getElementsByTagName("name")[0]
print(name.firstChild.data)
  
staffs = doc.getElementsByTagName("staff")
for staff in staffs:
        staff_id = staff.getAttribute("id")
        name = staff.getElementsByTagName("name")[0]
        salary = staff.getElementsByTagName("salary")[0]
        print("id:% s, name:% s, salary:% s" %
              (staff_id, name.firstChild.data, salary.firstChild.data))

chevron_right


Output:

GeeksForGeeks Company
id:1, name: Amar Pandey, salary:8.5 LPA
id:2, name: Akbar Khan, salary:6.5 LPA
id:3, name: Anthony Walter, salary:3.2 LPA

The same can also be done using a user-defined function as shown in the code below:



filter_none

edit
close

play_arrow

link
brightness_4
code

from xml.dom import minidom
  
doc = minidom.parse("sample.xml")
  
# user-defined function
def getNodeText(node):
  
    nodelist = node.childNodes
    result = []
    for node in nodelist:
        if node.nodeType == node.TEXT_NODE:
            result.append(node.data)
    return ''.join(result)
  
name = doc.getElementsByTagName("name")[0]
print("Company Name : % s \n" % getNodeText(name))
  
  
staffs = doc.getElementsByTagName("staff")
for staff in staffs:
        staff_id = staff.getAttribute("id")
        name = staff.getElementsByTagName("name")[0]
        salary = staff.getElementsByTagName("salary")[0]
        print("id:% s, name:% s, salary:% s" %
              (staff_id, getNodeText(name), getNodeText(salary)))

chevron_right


Output:

Company Name : GeeksForGeeks Company 

id:1, name:Amar Pandey, salary:8.5 LPA
id:2, name:Akbhar Khan, salary:6.5 LPA
id:3, name:Anthony Walter, salary:3.2 LPA

Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.

To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.




My Personal Notes arrow_drop_up

Developer with a dream

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.