Extensible Markup Language, commonly known as XML is a language designed specifically to be easy to interpret by both humans and computers altogether. The language defines a set of rules used to encode a document in a specific format. In this article, methods have been described to read and write
XML files in python.
Note: In general, the process of reading the data from an XML file and analyzing its logical components is known as Parsing. Therefore, when we refer to reading a
xml file we are referring to parsing the XML document.
In this article, we would take a look at two libraries that could be used for the purpose of
xml parsing. They are:
- BeautifulSoup used alongside the lxml xml parser
- Elementtree library.
Using BeautifulSoup alongside with lxml parser
For the purpose of reading and writing the
xml file we would be using a Python library named
BeautifulSoup. In order to install the library, type the following command into the terminal.
pip install beautifulsoup4
Beautiful Soup supports the HTML parser included in Python’s standard library, but it also supports a number of third-party Python parsers. One is the
lxml parser (used for parsing XML/HTML documents).
lxml could be installed by running the following command in the command processor of your Operating system:
pip install lxml
Firstly we will learn how to read from an XML file. We would also parse data stored in it. Later we would learn how to create an XML file and write data to it.
Reading Data From an XML File
Their are two steps required to parse a xml file:-
- Finding Tags
- Extracting from tags
XML File used:
Writing an XML File
xml file is a primitive process, reason for that being the fact that
xml files aren’t encoded in a special way. Modifying sections of a
xml document requires one to parse through it at first. In the below code we would modify some sections of the aforementioned
Elementree module provides us with a plethora of tools for manipulating XML files. The best part about it being its inclusion in the standard Python’s built-in library. Therefore, one does not have to install any external modules for the purpose. Due to the
xmlformat being an inherently hierarchical data format, it is a lot easier to represent it by a tree. The module provides
ElementTree provides methods to represent whole XML document as a single tree.
In the later examples, we would take a look at discrete methods to read and write data to and from XML files.
Reading XML Files
To read an XML file using ElementTree, firstly, we import the ElementTree class found inside
xml library, under the name
ET (common convension). Then passed the filename of the
xml file to the
ElementTree.parse() method, to enable parsing of our
xml file. Then got the root (parent tag) of our
xml file using
getroot(). Then displayed (printed) the root tag of our xml file (non-explicit way). Then displayed the attributes of the sub-tag of our parent tag using
root for the first tag of parent
attrib for getting it’s attributes. Then we displayed the text enclosed within the 1st sub-tag of the 5th sub-tag of the tag root.
Writing XML Files
Now, we would take a look at some methods which could be used to write data on an xml document. In this example we would create a
xml file from scratch.
To do the same, firstly, we create a root (parent) tag under the name of chess usnig the command
ET.Element('chess'). All the tags would fall underneath this tag, i.e. once a root tag has been defined, other sub-elements could be created underneath it. Then we created a subtag/subelement named Opening inside the chess tag using the command
ET.SubElement(). Then we created two more subtags which are underneath the tag Opening named E4 and D4. Then we added attributes to the E4 and D4 tags using
set() which is a method found inside
SubElement(), which is used to define attributes to a tag. Then we added text between the E4 and D4 tags using the attribute
text found inside the
SubElement function. In the end we converted the datatype of the contents we were creating from
'xml.etree.ElementTree.Element' to bytes object, using the command
ET.tostring() (even though the function name is tostring() in certain implementations it converts the datatype to `bytes` rather then `str`). Finally, we flushed the data to a file named
gameofsquares.xml which is a opened in `wb` mode to allow writing binary data to it. In the end, we saved the data to our file.
Attention geek! Strengthen your foundations with the Python Programming Foundation Course and learn the basics.
To begin with, your interview preparations Enhance your Data Structures concepts with the Python DS Course. And to begin with your Machine Learning Journey, join the Machine Learning – Basic Level Course