This article focuses on how one can parse a XML file in Java.
XML : XML stands for eXtensible Markup Language. It was designed to store and transport data. It was designed to be both human- and machine-readable. That’s why, the design goals of XML emphasize simplicity, generality, and usability across the Internet.
Why StAX instead of SAX ?
- SAX: The SAX is a push model API which means that it is the API which calls your handler, not your handler that calls the API . The SAX parser thus “pushes” events into your handler. With this push model of API you have no control over how and when the parser iterates over the file. Once you start the parser, it iterates all the way until the end, calling your handler for each and every XML event in the input XML document.
SAX Parser --> Handler
- StAX : The StAX pull model means that it is your “handler” class that calls the parser API , not the other way around. Thus your handler class controls when the parser is to move on to the next event in the input. In other words, your handler “pulls” the XML events out of the parser. Additionally, you can stop the parsing at any point. The StAX parser is generally used instead of a file reader , when the input or database is given in the form of offline or online xml file .The pull model of is summarized like this:
Handler --> StAX Parser
Also StAX parser can read and write in the XML documents while SAX can only read. SAX provides the schema validation i.e. if the tags are nested correctly or XML is correctly written , but StAX provides no such method of schema validation.
Idea of How StAX parser works :
Input File : This is sample input file made by the author as an example to show how StAX parser is used . Save it as data.xml and run the code . XML database files usually are large and contains many tags nested within each other .
<company class="geeksforgeeks.org"> <name>Kunal Sharma</name> <title>Student</title> <email>firstname.lastname@example.org</email> <phone>(202) 456-1414</phone> </company>
name = geeksforgeeks.org Kunal Sharma Student email@example.com (202) 456-1414
How does StAX work in the above Code ?
After creating the eventReader in the above code with the help of factory pattern to create a XML file reader, it basically starts by reading the <…> tag . As soon as <…> tag comes, a boolean variable is set to true indicating that the tag has been opened. This tag matching is done by identifying whether it is a start tag or end tag. Since <…> tag indicates the starting, therefore it is matched by StartElement. Next comes the data reading part. In the next step, it reads the character/data by matching the element by isCharacters, this is done only if the starting tag that we require is opened or its boolean variable is set true. After this comes closing of element indicated by </…> tag. As soon it encounters </..> it checks which of the elements was opened or set to true and it sets that element boolean to false or closes it.
Basically each event is first opening the tag, reading its data and then closing it.
- References :
This article is contributed by Kunal Sharma. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to firstname.lastname@example.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.
Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.
Attention reader! Don’t stop learning now. Get hold of all the important Java Foundation and Collections concepts with the Fundamentals of Java and Java Collections Course at a student-friendly price and become industry ready. To complete your preparation from learning a language to DS Algo and many more, please refer Complete Interview Preparation Course.