Alvin's Big Data Notebook : Java XML Parser DOM vs. SAX

1. SAX(simple API for XML)

In SAX, events are triggered when the XML is being parsed. When the parser is parsing the XML, and encounters a tag starting (e.g. <something>), then it triggers the tagStarted event (actual name of event might differ). Similarly when the end of the tag is met while parsing (</something>), it triggers tagEnded. Using a SAX parser implies you need to handle these events and make sense of the data returned with each event.

-Event based parser (Sequence of events).

-SAX parses the file at it reads i.e. Parses node by node.

-No memory constraints as it does not store the XML content in the memory.

-SAX is read only i.e. can’t insert or delete the node.

-Use SAX parser when memory content is large.

-SAX reads the XML file from top to bottom and backward navigation is not possible.

-Faster at run time.

Pros

event based
memory efficient
faster than DOM
supports schema validation

Cons

No object model, you have to tap into the events and create your self
Single parse of the xml and can only go forward
read only api
no xpath support
little bit harder to use

2. DOM

In DOM(document object model), there are no events triggered while parsing. The entire XML is parsed and a DOM tree (of the nodes in the XML) is generated and returned. Once parsed, the user can navigate the tree to access the various data previously embedded in the various nodes in the XML.

-Tree model parser(Object based) (Tree of nodes).

-DOM loads the file into the memory and then parse the file.

-Has memory constraints since it loads the whole XML file before parsing.

-DOM is read and write (can insert or delete the node).

-If the XML content is small then prefer DOM parser.

-Backward and forward search is possible for searching the tags and evaluation of the information inside the tags. So this gives the ease of navigation.

-Slower at run time.

Pros