JavaBlog.fr / Java.lu DEVELOPMENT,Java,WEB Java/XML: Parsing XML with JAXP (SAX, DOM standard APIs)

Java/XML: Parsing XML with JAXP (SAX, DOM standard APIs)

After, my post concerning the recommandation XSLT of XSL, here, I would present simple examples of parsing XML stream with the JAXP (Java APIs for XML Processing) API which is a common interface for creating, parsing and manipulating XML documents using the standard SAX, DOM and XSLTs.

XML parsing
XML has become indispensable in Information Systems Architectures and J2EE. Used as a standard format for data exchange, standardized by the W3C, the XML document is present everywhere in applications, databases, and is at the heart of EAI exchanges.

In this fact, the knowledge of the APIs of XML parsing like DOM, SAX is often necessary in the development of a J2EE application. Understand the differences, strengths and weaknesses of these APIs is important to avoid performance problems that may be encountered on these complex APIs.

So, to process the XML documents, an application needs an XML parser to tokenize and retrieve the data/objects in the XML streams. An XML parser is the programme between the application and the XML documents which reads a XML stream, ensures that is well-formed, and may validate the document against a DTD or schema definition XSD.

There are two standard APIs for parsing XML documents:
1. SAX (Simple API for XML)
2. DOM (Document Object Model)

The JAXP (Java APIs for XML Processing) provides a common interface for creating, parsing and manipulating XML documents using the standard SAX, DOM and XSLTs.

XML document
Before to begin with presentation and examples, here, the XML document people.xml used in the below examples:

01<?xml version="1.0" encoding="UTF-8"?>
02<people>
03  <person ID="01245cdf45x">
04    <title>M.</title>
05    <name>Malcolm X</name>
06    <name>Malik Shabazz</name>
07    <name>Malcolm Little</name>
08    <born>19 May 1925</born>
09    <died>21 February 1965</died>
10    <nationality>american</nationality>
11  </person>
12  <person ID="012qsabc3456002">
13    <title>M.</title>
14    <name>Mahatma Gandhi</name>
15    <born>2 October 1869</born>
16    <died>30 January 1948</died>
17    <nationality>Indian</nationality>
18  </person>
19  <person ID="0457d7887897">
20    <title>M.</title>
21    <name>John F. Kennedy</name>
22    <name>JFK</name>
23    <name>Jack Kennedy</name>
24    <born>20 January 1961</born>
25    <died>22 November 1963</died>
26    <nationality>american</nationality>
27  </person>
28</people>

SAX (Simple API for XML)
SAX is an event-driven API. A SAX Parser reports a document to an application as a series of events in callback methods of a handler. These callback methods are called when events occur during parsing for document start, document end, element start-tags, element end-tags, attributes, text context, entities, processing instructions, comments and others:

Below is a simple JAXP SAX parser to display all persons in the people.xml:

01public class TestParsingXmlWithSAX {
02     
03    private String currentElement;
04    private int peopleCount = 1;
05 
06    // Constructor
07    public TestParsingXmlWithSAX() {
08        try {
09            // Create a SAX parser factory
10            SAXParserFactory factory = SAXParserFactory.newInstance();
11             
12            // Obtain a SAX parser
13            SAXParser saxParser = factory.newSAXParser();
14             
15            // XML Stream
16            InputStream xmlStream = TestParsingXmlWithSAX.class.getResourceAsStream("people.xml");
17             
18            // Parse the given XML document using the callback handler
19            saxParser.parse(xmlStream, new MySaxHandler());
20             
21        } catch (Exception e) {
22            e.printStackTrace();
23        }
24    }
25 
26    // Entry main method
27    public static void main(String args[]) {
28        new TestParsingXmlWithSAX();
29    }
30 
31    /*
32     * Inner class for the Callback Handlers.
33     */
34    class MySaxHandler extends DefaultHandler {
35         
36        // Callback to handle element start tag
37        @Override
38        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
39            currentElement = qName;
40            if (currentElement.equals("person")) {
41                System.out.println("Person " + peopleCount);
42                peopleCount++;
43                String personId = attributes.getValue("ID");
44                System.out.println("\tID:\t" + personId);
45            }
46        }
47 
48        // Callback to handle element end tag
49        @Override
50        public void endElement(String uri, String localName, String qName) throws SAXException {
51            currentElement = "";
52        }
53 
54        // Callback to handle the character text data inside an element
55        @Override
56        public void characters(char[] chars, int start, int length) throws SAXException {
57            if (currentElement.equals("title")) {
58                System.out.println("\tTitle:\t" + new String(chars, start, length));
59                 
60            } else if (currentElement.equals("name")) {
61                System.out.println("\tName:\t" + new String(chars, start, length));
62            }
63        }
64    }
65}

… the ouputs in console would be:

01Person 1
02    ID: 01245cdf45x
03    Title:  M.
04    Name:   Malcolm X
05    Name:   Malik Shabazz
06    Name:   Malcolm Little
07Person 2
08    ID: 012qsabc3456002
09    Title:  M.
10    Name:   Mahatma Gandhi
11Person 3
12    ID: 0457d7887897
13    Title:  M.
14    Name:   John F. Kennedy
15    Name:   JFK
16    Name:   Jack Kennedy

DOM (Document Object Model)
DOM is an object-oriented API. The DOM parser builds a tree structure which represents an XML document. Then, the application can manipulate the nodes of this tree. The DOM API defines the mechanism for querying, traversing and manipulating the object model built:

Below is a simple JAXP DOM parser to display all persons in the people.xml:

01public class TestParsingXmlWithDOM {
02 
03    public static void main(String[] args) throws Exception {
04 
05        // Create a DOM parser factory
06        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
07        // Obtain a DOM builder
08        DocumentBuilder docBuilder = factory.newDocumentBuilder();
09 
10        // XML Stream
11        InputStream xmlStream = TestParsingXmlWithDOM.class.getResourceAsStream("people.xml");
12         
13        // Parse the given XML document
14        // in order to build a DOM tree representing the XML document
15        Document doc = docBuilder.parse(xmlStream);
16 
17        // Return all the person elements as NodeList
18        //NodeList personNodes = doc.getElementsByTagName("person");
19        // Return the root element
20        //Element root = doc.getDocumentElement(); 
21 
22        // Get a list of all elements in the document
23        // The wild card * matches all tags
24        NodeList list = doc.getElementsByTagName("*");
25 
26        int peopleCount = 0;
27        for (int i = 0; i < list.getLength(); i++) {
28             
29            // Get the elements person (attribute ID), title, names...
30            Element element = (Element) list.item(i);
31            String nodeName = element.getNodeName();
32             
33            if (nodeName.equals("person")) {
34                peopleCount++;
35                System.out.println("PERSON " + peopleCount);
36                String personId = element.getAttribute("ID");
37                System.out.println("\tID:\t" + personId);
38             
39            } else if (nodeName.equals("title")) {
40                System.out.println("\tTitle:\t" + element.getChildNodes().item(0).getNodeValue());
41 
42            } else if (nodeName.equals("name")) {
43                System.out.println("\tName:\t" + element.getChildNodes().item(0).getNodeValue());
44            }
45        } // end-for
46    }
47}

… the ouputs in console would be:

01PERSON 1
02    ID: 01245cdf45x
03    Title:  M.
04    Name:   Malcolm X
05    Name:   Malik Shabazz
06    Name:   Malcolm Little
07PERSON 2
08    ID: 012qsabc3456002
09    Title:  M.
10    Name:   Mahatma Gandhi
11PERSON 3
12    ID: 0457d7887897
13    Title:  M.
14    Name:   John F. Kennedy
15    Name:   JFK
16    Name:   Jack Kennedy

Source: test_xml_parsing.zip

That’s all!!!

Huseyin OZVEREN

Leave a Reply

Your email address will not be published.

Time limit is exhausted. Please reload CAPTCHA.

Related Post