Taking control of your XML Parsing with StAX

The two most common XML parsing approaches both have their limitations. The DOM approach loads the whole XML doc into memory in it’s tree representation, which is great if you need to walk through the tree to process it’s content, but is inefficient if the XML is large or you are only looking for a small part of the XML. The SAX approach allows you to inspect the XML ‘on the fly’ and get notified of events (matching parts of the XML you are looking for). This avoids having to load it all into memory as in the DOM approach, but is a psuh-based approach which means once the parsing starts the XML streams through until the end of the document, and you have no control over navigating forward and backward through the stream.

java.net have a good introductory article today showing how the StAX API is used, which offers a pull-based approach. It avoids the limitations of DOM and SAX by allowing you to control the processing of the stream, and even terminate processing halfway though if you have found the data you are looking for, meaning you are not forced to wait until the end of the stream.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.