The Document Object Model (DOM) is a programming interface for HTML and XML(Extensible markup language) documents. It defines the logical structure of documents and the way a document is accessed and manipulated.
Parsing XML with DOM APIs in python is pretty simple. For the purpose of example we will create a sample XML document (sample.xml) as below:
<? xml version = "1.0" ?>
< company >
< name >GeeksForGeeks Company</ name >
< staff id = "1" >
< name >Amar Pandey</ name >
< salary >8.5 LPA</ salary >
</ staff >
< staff id = "2" >
< name >Akbhar Khan</ name >
< salary >6.5 LPA</ salary >
</ staff >
< staff id = "3" >
< name >Anthony Walter</ name >
< salary >3.2 LPA</ salary >
</ staff >
</ company >
|
Now, let’s parse the above XML using python. The below code demonstrates the process,
from xml.dom import minidom
doc = minidom.parse( "sample.xml" )
name = doc.getElementsByTagName( "name" )[ 0 ]
print (name.firstChild.data)
staffs = doc.getElementsByTagName( "staff" )
for staff in staffs:
staff_id = staff.getAttribute( "id" )
name = staff.getElementsByTagName( "name" )[ 0 ]
salary = staff.getElementsByTagName( "salary" )[ 0 ]
print ( "id:% s, name:% s, salary:% s" %
(staff_id, name.firstChild.data, salary.firstChild.data))
|
Output:
GeeksForGeeks Company
id:1, name: Amar Pandey, salary:8.5 LPA
id:2, name: Akbar Khan, salary:6.5 LPA
id:3, name: Anthony Walter, salary:3.2 LPA
The same can also be done using a user-defined function as shown in the code below:
from xml.dom import minidom
doc = minidom.parse( "sample.xml" )
def getNodeText(node):
nodelist = node.childNodes
result = []
for node in nodelist:
if node.nodeType = = node.TEXT_NODE:
result.append(node.data)
return ''.join(result)
name = doc.getElementsByTagName( "name" )[ 0 ]
print ( "Company Name : % s \n" % getNodeText(name))
staffs = doc.getElementsByTagName( "staff" )
for staff in staffs:
staff_id = staff.getAttribute( "id" )
name = staff.getElementsByTagName( "name" )[ 0 ]
salary = staff.getElementsByTagName( "salary" )[ 0 ]
print ( "id:% s, name:% s, salary:% s" %
(staff_id, getNodeText(name), getNodeText(salary)))
|
Output:
Company Name : GeeksForGeeks Company
id:1, name:Amar Pandey, salary:8.5 LPA
id:2, name:Akbhar Khan, salary:6.5 LPA
id:3, name:Anthony Walter, salary:3.2 LPA
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!
Last Updated :
10 May, 2020
Like Article
Save Article