Java SAX Library
SAX (Simple API for XML), is the most widely adopted API for XML in Java and is considered the de-facto standard. Although it started as a library exclusive for Java, it is now a well-known API distributed over a variety of programming languages. It is an open source project and has recently switched to SourceForge project infrastructure that makes it easier to track open SAX issues outside of the high-volume xml-dev list. The current latest version as of 01/10/2018 is SAX 2.0. It uses an event-driven serial-access mechanism for accessing XML documents and is frequently used by applets which need to access XML documents because it is the fastest and least memory consuming API available for parsing XML documents. The mechanism SAX uses makes it independent of the elements that came before, i.e. it is state independent.
Setting up DOM (Document Object Model) is easier than setting up SAX and SAX is harder to visualize than DOM because of its parser interpreting XML items based on events invoked. This also means you cannot return to a specific part of the SAX interpretation or rearrange them. And hence, user heavy applications should use DOM instead of SAX.
However, there are many reasons to familiarize yourself with SAX, even if you are using DOM. Below are the various advantages of SAX over DOM:
- Same Error Handling: The kind of exceptions generated by SAX and DOM are identical.
- Handling Validation Errors: If you want to throw an exception when a validation error occurs, you need to understand SAX error handling mechanisms.
- Converting Existing Data: In DOM, you can convert existing data set into XML. But to be able to implement it, you need a basic understanding of SAX.
Why or when to use SAX?
SAX uses an event model structure to convert or parse data to XML by simply modifying an existing application to deliver SAX events as it reads the data.
SAX is fast and efficient, but its event model makes it most useful for state-independent filtering. It calls different methods when an element tag is encountered and when a text is encountered. So, as long as the processing is state-independent (meaning that it does not depend on the elements that have come before), then SAX works fine.
It does not create an internal representation(tree structure) of the XML data like DOM, but instead simply sends the data to the application as it is read and hence consumes less memory.
The SAX API acts like a serial I/O stream and hence, it is highly recommended for simple applications that require XML parsers.
Classes in SAX Library: There are few classes in SAX library that makes the parsing work very easy. These are:
- HandlerBase: This class provides default implementations for DocumentHandler, ErrorHandler, DTDHandler, and EntityResolver: parser writers can use this to provide a default implementation when the user does not specify handlers, and application writers can subclass this to simplify handler writing.
- InputSource This class allows a SAX application to encapsulate information about an input source in a single object, which may include a public identifier, a system identifier, a byte stream (possibly with a specified encoding), and/or a character stream.
XML file to be parsed:
Java program to parse the file: