The Document Object Model (DOM) is a programming interface for HTML and XML(Extensible markup language) documents. It defines the logical structure of documents and the way a document is accessed and manipulated.
Parsing XML with DOM APIs in python is pretty simple. For the purpose of example we will create a sample XML document (sample.xml) as below:
<? xml version = "1.0" ?> < company > < name >GeeksForGeeks Company</ name > < staff id = "1" > < name >Amar Pandey</ name > < salary >8.5 LPA</ salary > </ staff > < staff id = "2" > < name >Akbhar Khan</ name > < salary >6.5 LPA</ salary > </ staff > < staff id = "3" > < name >Anthony Walter</ name > < salary >3.2 LPA</ salary > </ staff > </ company > |
Now, let’s parse the above XML using python. The below code demonstrates the process,
from xml.dom import minidom doc = minidom.parse( "sample.xml" ) # doc.getElementsByTagName returns the NodeList name = doc.getElementsByTagName( "name" )[ 0 ] print (name.firstChild.data) staffs = doc.getElementsByTagName( "staff" ) for staff in staffs: staff_id = staff.getAttribute( "id" ) name = staff.getElementsByTagName( "name" )[ 0 ] salary = staff.getElementsByTagName( "salary" )[ 0 ] print ( "id:% s, name:% s, salary:% s" % (staff_id, name.firstChild.data, salary.firstChild.data)) |
Output:
GeeksForGeeks Company id:1, name: Amar Pandey, salary:8.5 LPA id:2, name: Akbar Khan, salary:6.5 LPA id:3, name: Anthony Walter, salary:3.2 LPA
The same can also be done using a user-defined function as shown in the code below:
from xml.dom import minidom doc = minidom.parse( "sample.xml" ) # user-defined function def getNodeText(node): nodelist = node.childNodes result = [] for node in nodelist: if node.nodeType = = node.TEXT_NODE: result.append(node.data) return ''.join(result) name = doc.getElementsByTagName( "name" )[ 0 ] print ( "Company Name : % s \n" % getNodeText(name)) staffs = doc.getElementsByTagName( "staff" ) for staff in staffs: staff_id = staff.getAttribute( "id" ) name = staff.getElementsByTagName( "name" )[ 0 ] salary = staff.getElementsByTagName( "salary" )[ 0 ] print ( "id:% s, name:% s, salary:% s" % (staff_id, getNodeText(name), getNodeText(salary))) |
Output:
Company Name : GeeksForGeeks Company id:1, name:Amar Pandey, salary:8.5 LPA id:2, name:Akbhar Khan, salary:6.5 LPA id:3, name:Anthony Walter, salary:3.2 LPA
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course.