How to Parse and Modify XML in Python?

XML stands for Extensible Markup Language. It was designed to store and transport data. It was designed to be both human- and machine-readable. That’s why, the design goals of XML emphasize simplicity, generality, and usability across the Internet.

Note: For more information, refer to XML | Basics

Here we consider that the XML file is present in the memory. Please read the comments in the code for a clear understanding.

XML File:

python-parse-xml



Let us save the above XML file as “test.xml”. Before going further you should know that in XML we do not have predefined tags as we have in HTML. While writing XML the author has to define his/her own tags as well as the document structure. Now we need to parse this file and modify it using Python. We will be using “minidom” library of Python 3 to do the above task. This module does not come built-in with Python. To install this type the below command in the terminal.

pip install minidom

Reading XML

First we will be reading the contents of the XML file and then we will learn how to modify the XML file.
Example

filter_none

edit
close

play_arrow

link
brightness_4
code

import xml.dom.minidom as md
  
def main():
  
    # parsing the xml file and 
    # storing the contents in 
    # "file" object Put in the 
    # path of your XML file in
    # the parameter for parse() method.
    file = md.parse( "test.xml"
  
    # nodeName returns the type of 
    # the file(in our case it returns
    # document)
    print( file.nodeName )
  
    # firstChild.tagName returns the 
    # name of the first tag.Here it 
    # is "note"
    print( file.firstChild.tagName )
  
    firstname = file.getElementsByTagName( "fname" )
  
    # printing the first name
    print( "Name: " + firstname[ 0 ].firstChild.nodeValue ) 
  
    lastname = file.getElementsByTagName( "lname" )
  
    # printing the last name
    print( "Surname: " + lastname[ 0 ].firstChild.nodeValue )
  
    favgame = file.getElementsByTagName( "favgame" )
  
    # printing the favourite game
    print( "Favourite Game: " + favgame[ 0 ].firstChild.nodeValue )
  
    # Printing tag values having 
    # attributes(Here tag "player"
    # has "name" attribute)
    players = file.getElementsByTagName( "player" )
  
    for player in players:
        print( player.getAttribute( "name" ) )
  
if __name__=="__main__":
    main();

chevron_right


Output:

#document
note
Name: Jack
Surname: Shelby
Favourite Game: Football
Messi
Ronaldo
Mbappe

In the above Python code while printing First Name or Last Name we have used firstname[0] / lastname[0]. This is because there is only 1 “fname” and only 1 “lname” tag. For multiple same tags we can proceed like below.
XML:

python-parse-xml-2

Python

filter_none

edit
close

play_arrow

link
brightness_4
code

import xml.dom.minidom as md
  
def main():
  
    file = md.parse( "test.xml" )
    names = file.getElementsByTagName( "fname" )
  
    for name in names:
  
        print( name.firstChild.nodeValue )
  
if __name__=="__main__":
    main();

chevron_right


Output

Jack
John
Harry

Modifying XML

Now we have got a basic idea on how we can parse and read the contents of a XML file using Python. Now let us learn to modify an XML file.
XML File:

python-parse-xml-3

Let us add the following :

  • Height
  • Languages known by Jack

Let us delete the “hobby” tag. Also let us modify the age to 29.
Python Code:(Modifying XML)

filter_none

edit
close

play_arrow

link
brightness_4
code

import xml.dom.minidom as md
  
def main():
  
    file = md.parse("test.xml")
    
    height = file.createElement( "height" )
  
    # setting height value to 180cm
    height.setAttribute("val", "180 cm"
   
    # adding height tag to the "file" 
    # object
    file.firstChild.appendChild(height) 
  
    lan = [ "English", "Spanish", "French" ]
  
    # creating separate "lang" tags for
    # each language and adding it to 
    # "file" object
    for l in lan: 
          
        lang = file.createElement( "lang" )
        lang.setAttribute( "lng", l )
        file.firstChild.appendChild( lang )
  
    delete = file.getElementsByTagName( "hobby" )
  
    # deleting all occurences of a particular 
    # tag(here "hobby")
    for i in delete: 
  
        x = i.parentNode
        x.removeChild( i )
  
    # modifying the value of a tag(here "age")
    file.getElementsByTagName( "age" )[ 0 ].childNodes[ 0 ].nodeValue = "29" 
  
    # writing the changes in "file" object to 
    # the "test.xml" file
    with open( "test.xml", "w" ) as fs: 
  
        fs.write( file.toxml() )
        fs.close() 
  
if __name__=="__main__":
    main();

chevron_right


Output:

python-parse-xml-4

The last 3 lines of the Python code just converts the “file” object into XML using the toxml() method and writes it to the “test.xml” file. If you do not want to edit the original file and just want to print the modified XML then replace those 3 lines by:

print(file.toxml())



My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.