Modify XML files with Python


Python|Modifying/Parsing XML

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.The design goals of XML focus on simplicity, generality, and usability across the Internet.It is a textual data format with strong support via Unicode for different human languages. Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services.

XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree.To perform any operations like parsing, searching, modifying an XML file we use a module xml.etree.ElementTree .It has two classes.ElementTree represents the whole XML document as a tree which helps while performing the operations. Element represents a single node in this tree.Reading and writing from the whole document are done on the ElementTree level.Interactions with a single XML element and its sub-elements are done on the Element level.

Properties of Element:

Properties Description
Tag String identifying what kind of data the element represents.
Can be accessed using elementname.tag.
Number of Atrributes Strored as a python dictionary.
Can be accesses by elementname.attrib.
Text string String information regarding the element.
Child string Optional child elements string information.
Child Elements Number of child elements to a particular root.

PARSING:



We can parse XML data from a string or an XML document.Considering xml.etree.ElementTree as ET.
1. ET.parse(‘Filename’).getroot() -ET.parse(‘fname’)-creates a tree and then we extract the root by .getroot().

2. ET.fromstring(stringname) -To create a root from an XML data string.

Example 1:
XML document:

filter_none

edit
close

play_arrow

link
brightness_4
code

<?xml version="1.0"?>
<!--COUNTRIES is the root element--> 
<COUNTRIES>
    <country name="INDIA">
        <neighbor name="Dubai" direction="W"/>
    </country>
    <country name="Singapore">
        <neighbor name="Malaysia" direction="N"/>
    </country>
</COUNTRIES>

chevron_right


Python Code:

filter_none

edit
close

play_arrow

link
brightness_4
code

# importing the module.
import xml.etree.ElementTree as ET
XMLexample_stored_in_a_string ='''<?xml version ="1.0"?>
<COUNTRIES>
    <country name ="INDIA">
        <neighbor name ="Dubai" direction ="W"/>
    </country>
    <country name ="Singapore">
        <neighbor name ="Malaysia" direction ="N"/>
    </country>
</COUNTRIES>
'''
# parsing directly.
tree = ET.parse('xmldocument.xml')
root = tree.getroot()
# parsing using the string.
stringroot = ET.fromstring(XMLexample_stored_in_a_string)
# printing the root.
print(root)
print(stringroot)

chevron_right


Output:

outputexample1

Element methods:
1)Element.iter(‘tag’) -Iterates over all the child elements(Sub-tree elements)
2)Element.findall(‘tag’) -Finds only elements with a tag which are direct children of the current element.
3)Element.find(‘tag’) -Finds the first Child with the particular tag.
4)Element.get(‘tag’) -Accesses the elements attributes.
5)Element.text -Gives the text of the element.
6)Element.attrib-returns all the attributes present.
7)Element.tag-returns the element name.

Example 2:

filter_none

edit
close

play_arrow

link
brightness_4
code

import xml.etree.ElementTree as ET
XMLexample_stored_in_a_string ='''<?xml version ="1.0"?>
<States>
    <state name ="TELANGANA">
        <rank>1</rank>
        <neighbor name ="ANDHRA" language ="Telugu"/>
        <neighbor name ="KARNATAKA" language ="Kannada"/>
    </state>
    <state name ="GUJARAT">
        <rank>2</rank>
        <neighbor name ="RAJASTHAN" direction ="N"/>
        <neighbor name ="MADHYA PRADESH" direction ="E"/>
    </state>
    <state name ="KERALA">
        <rank>3</rank>
        <neighbor name ="TAMILNADU" direction ="S" language ="Tamil"/>
    </state>
</States>
'''
# parsing from the string.
root = ET.fromstring(XMLexample_stored_in_a_string)
# printing attributes of the root tags 'neighbor'.
for neighbor in root.iter('neighbor'):
    print(neighbor.attrib)
# finding the state tag and their child attributes.
for state in root.findall('state'):
    rank = state.find('rank').text
    name = state.get('name')
    print(name, rank)

chevron_right


Output:

Element methods output.

MODIFYING:
Modifying the XML document can also be done through Element methods.
Methods:
1)Element.set(‘attrname’, ‘value’) – Modifying element attributes.
2)Element.SubElement(parent, new_childtag) -creates a new child tag under the parent.
3)Element.write(‘filename.xml’)-creates the tree of xml into another file.
4)Element.pop() -delete a particular attribute.
5)Element.remove() -to delete a complete tag.

Example 3:
XML Document:

filter_none

edit
close

play_arrow

link
brightness_4
code

<?xml version="1.0"?>
<breakfast_menu>
    <food>
        <name itemid="11">Belgian Waffles</name>
        <price>5.95</price>
        <description>Two of our famous Belgian Waffles 
with plenty of real maple syrup</description>
        <calories>650</calories>
    </food>
    <food>
        <name itemid="21">Strawberry Belgian Waffles</name>
        <price>7.95</price>
        <description>Light Belgian waffles covered 
with strawberries and whipped cream</description>
        <calories>900</calories>
    </food>
    <food>
        <name itemid="31">Berry-Berry Belgian Waffles</name>
        <price>8.95</price>
        <description>Light Belgian waffles covered with 
an assortment of fresh berries and whipped cream</description>
        <calories>900</calories>
    </food>
    <food>
        <name itemid="41">French Toast</name>
        <price>4.50</price>
        <description>Thick slices made from our 
homemade sourdough bread</description>
        <calories>600</calories>
    </food>
</breakfast_menu>

chevron_right


Python Code:

filter_none

edit
close

play_arrow

link
brightness_4
code

import xml.etree.ElementTree as ET
  
mytree = ET.parse('xmldocument.xml.txt')
myroot = mytree.getroot()
  
# iterating throught the price values.
for prices in myroot.iter('price'):
    # updates the price value
    prices.text = str(float(prices.text)+10)
    # creates a new attribute 
    prices.set('newprices', 'yes')
  
# creating a new tag under the parent.
# myroot[0] here is the first food tag.
ET.SubElement(myroot[0], 'tasty')
for temp in myroot.iter('tasty'):
    # giving the value as Yes.
    temp.text = str('YES')
  
# deleting attributes in the xml.
# by using pop as attrib returns dictionary.
# removes the itemid attribute in the name tag of
# the second food tag.
myroot[1][0].attrib.pop('itemid')
  
# Removing the tag completely we use remove function.
# completely removes the third food tag.
myroot.remove(myroot[2])
  
mytree.write('output.xml')

chevron_right


Output:




My Personal Notes arrow_drop_up

Check out this Author's contributed articles.

If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please Improve this article if you find anything incorrect by clicking on the "Improve Article" button below.


Article Tags :

Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.