JavaBlog.fr / Java.lu DEVELOPMENT,Java,WEB Java/XML: Parsing XML with JAXP (SAX, DOM standard APIs)

Java/XML: Parsing XML with JAXP (SAX, DOM standard APIs)

After, my post concerning the recommandation XSLT of XSL, here, I would present simple examples of parsing XML stream with the JAXP (Java APIs for XML Processing) API which is a common interface for creating, parsing and manipulating XML documents using the standard SAX, DOM and XSLTs.

XML parsing
XML has become indispensable in Information Systems Architectures and J2EE. Used as a standard format for data exchange, standardized by the W3C, the XML document is present everywhere in applications, databases, and is at the heart of EAI exchanges.

In this fact, the knowledge of the APIs of XML parsing like DOM, SAX is often necessary in the development of a J2EE application. Understand the differences, strengths and weaknesses of these APIs is important to avoid performance problems that may be encountered on these complex APIs.

So, to process the XML documents, an application needs an XML parser to tokenize and retrieve the data/objects in the XML streams. An XML parser is the programme between the application and the XML documents which reads a XML stream, ensures that is well-formed, and may validate the document against a DTD or schema definition XSD.

There are two standard APIs for parsing XML documents:
1. SAX (Simple API for XML)
2. DOM (Document Object Model)

The JAXP (Java APIs for XML Processing) provides a common interface for creating, parsing and manipulating XML documents using the standard SAX, DOM and XSLTs.

XML document
Before to begin with presentation and examples, here, the XML document people.xml used in the below examples:

<?xml version="1.0" encoding="UTF-8"?>
<people>
  <person ID="01245cdf45x">
    <title>M.</title>
    <name>Malcolm X</name>
    <name>Malik Shabazz</name>
    <name>Malcolm Little</name>
    <born>19 May 1925</born>
    <died>21 February 1965</died>
	<nationality>american</nationality>
  </person>
  <person ID="012qsabc3456002">
    <title>M.</title>
    <name>Mahatma Gandhi</name>
    <born>2 October 1869</born>
    <died>30 January 1948</died>
    <nationality>Indian</nationality>
  </person>
  <person ID="0457d7887897">
    <title>M.</title>
    <name>John F. Kennedy</name>
    <name>JFK</name>
    <name>Jack Kennedy</name>
    <born>20 January 1961</born>
    <died>22 November 1963</died>
	<nationality>american</nationality>
  </person>
</people>

SAX (Simple API for XML)
SAX is an event-driven API. A SAX Parser reports a document to an application as a series of events in callback methods of a handler. These callback methods are called when events occur during parsing for document start, document end, element start-tags, element end-tags, attributes, text context, entities, processing instructions, comments and others:

Below is a simple JAXP SAX parser to display all persons in the people.xml:

public class TestParsingXmlWithSAX {
	
	private String currentElement;
	private int peopleCount = 1;

	// Constructor
	public TestParsingXmlWithSAX() {
		try {
			// Create a SAX parser factory
			SAXParserFactory factory = SAXParserFactory.newInstance(); 
			
			// Obtain a SAX parser
			SAXParser saxParser = factory.newSAXParser();
			
			// XML Stream
			InputStream xmlStream = TestParsingXmlWithSAX.class.getResourceAsStream("people.xml");
			
			// Parse the given XML document using the callback handler
			saxParser.parse(xmlStream, new MySaxHandler()); 
			
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

	// Entry main method
	public static void main(String args[]) {
		new TestParsingXmlWithSAX();
	}

	/*
	 * Inner class for the Callback Handlers.
	 */
	class MySaxHandler extends DefaultHandler {
		
		// Callback to handle element start tag
		@Override
		public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
			currentElement = qName;
			if (currentElement.equals("person")) {
				System.out.println("Person " + peopleCount);
				peopleCount++;
				String personId = attributes.getValue("ID");
				System.out.println("\tID:\t" + personId);
			}
		}

		// Callback to handle element end tag
		@Override
		public void endElement(String uri, String localName, String qName) throws SAXException {
			currentElement = "";
		}

		// Callback to handle the character text data inside an element
		@Override
		public void characters(char[] chars, int start, int length) throws SAXException {
			if (currentElement.equals("title")) {
				System.out.println("\tTitle:\t" + new String(chars, start, length));
				
			} else if (currentElement.equals("name")) {
				System.out.println("\tName:\t" + new String(chars, start, length));
			}
		}
	}
}

… the ouputs in console would be:

Person 1
	ID:	01245cdf45x
	Title:	M.
	Name:	Malcolm X
	Name:	Malik Shabazz
	Name:	Malcolm Little
Person 2
	ID:	012qsabc3456002
	Title:	M.
	Name:	Mahatma Gandhi
Person 3
	ID:	0457d7887897
	Title:	M.
	Name:	John F. Kennedy
	Name:	JFK
	Name:	Jack Kennedy

DOM (Document Object Model)
DOM is an object-oriented API. The DOM parser builds a tree structure which represents an XML document. Then, the application can manipulate the nodes of this tree. The DOM API defines the mechanism for querying, traversing and manipulating the object model built:

Below is a simple JAXP DOM parser to display all persons in the people.xml:

public class TestParsingXmlWithDOM {

	public static void main(String[] args) throws Exception {

		// Create a DOM parser factory
		DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
		// Obtain a DOM builder
		DocumentBuilder docBuilder = factory.newDocumentBuilder();

		// XML Stream
		InputStream xmlStream = TestParsingXmlWithDOM.class.getResourceAsStream("people.xml");
		
		// Parse the given XML document 
		// in order to build a DOM tree representing the XML document
		Document doc = docBuilder.parse(xmlStream);

		// Return all the person elements as NodeList
		//NodeList personNodes = doc.getElementsByTagName("person"); 
		// Return the root element
		//Element root = doc.getDocumentElement();  

		// Get a list of all elements in the document
		// The wild card * matches all tags
		NodeList list = doc.getElementsByTagName("*");

		int peopleCount = 0;
		for (int i = 0; i < list.getLength(); i++) {
			
			// Get the elements person (attribute ID), title, names...
			Element element = (Element) list.item(i);
			String nodeName = element.getNodeName();
			
			if (nodeName.equals("person")) {
				peopleCount++;
				System.out.println("PERSON " + peopleCount);
				String personId = element.getAttribute("ID");
				System.out.println("\tID:\t" + personId);
			
			} else if (nodeName.equals("title")) {
				System.out.println("\tTitle:\t" + element.getChildNodes().item(0).getNodeValue());

			} else if (nodeName.equals("name")) {
				System.out.println("\tName:\t" + element.getChildNodes().item(0).getNodeValue());
			}
		} // end-for
	}
}

… the ouputs in console would be:

PERSON 1
	ID:	01245cdf45x
	Title:	M.
	Name:	Malcolm X
	Name:	Malik Shabazz
	Name:	Malcolm Little
PERSON 2
	ID:	012qsabc3456002
	Title:	M.
	Name:	Mahatma Gandhi
PERSON 3
	ID:	0457d7887897
	Title:	M.
	Name:	John F. Kennedy
	Name:	JFK
	Name:	Jack Kennedy

Source: test_xml_parsing.zip

That’s all!!!

Huseyin OZVEREN

Leave a Reply

Your email address will not be published.

Time limit is exhausted. Please reload CAPTCHA.

Related Post