JETZT ONLINE BESTELLEN
Add to Cart
Java and XML

Third Edition Januar 2007
ISBN 978-0-596-10149-7
479 Seiten
EUR41.00

Weitere Informationen zu diesem Buch

Inhaltsverzeichnis | Index | Probekapitel | Beispiele |


Inhaltsverzeichnis

	
Chapter 1: Introduction
Inhaltsvorschau
In the next two chapters, I’m going to give you a crash course in XML and constraints. Since there is so much material available on XML and related specifications, I’d rather cruise through this material quickly and get on to Java. For those of you who are completely new to XML, you might want to have a few of the following books around as reference:
XML in a Nutshell, by Elliotte Rusty Harold and W. Scott Means
Learning XML, by Erik Ray
Learning XSLT, by Michael Fitzgerald
XSLT, by Doug Tidwell
These are all O’Reilly books, and I have them scattered about my own workspace. With that said, let’s dive in.
It all begins with the XML 1.0 Recommendation, which you can read in its entirety at http://www.w3.org/TR/REC-xml. Example 1-1 shows an XML document that conforms to this specification. I’ll use it to illustrate several important concepts.
Example . A typical XML document is long and verbose
<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 

         xmlns:dc="http://purl.org/dc/elements/1.1/" 

         xmlns="http://purl.org/rss/1.0/" xmlns:admin="http://webns.net/mvcb/" 

         xmlns:l="http://purl.org/rss/1.0/modules/link/" 

         xmlns:content="http://purl.org/rss/1.0/modules/content/">

  <!--Generated by Blogger v5.0-->

  <channel rdf:about="http://www.neilgaiman.com/journal/journal.asp">

    <title>Neil Gaiman's Journal</title>

    <link>http://www.neilgaiman.com/journal/journal.asp</link>

    <description>Neil Gaiman's Journal</description>

    <dc:date>2005-04-30T01:57:38Z</dc:date>

    <dc:language>en-US</dc:language>

    <admin:generatorAgent rdf:resource="http://www.blogger.com/" />

    <admin:errorReportsTo rdf:resource="mailto:rss-errors@blogger.com" />

    <items>

      <rdf:Seq>

        <rdf:li 

  rdf:resource="http://www.neilgaiman.com/journal/2005/04/three-photographs.asp" />

        <rdf:li 

  rdf:resource="http://www.neilgaiman.com/journal/2005/04/jetlag-morning.asp" />

        <rdf:li 

  rdf:resource="http://www.neilgaiman.com/journal/2005/04/demon-days.asp" />

        <rdf:li 

  rdf:resource="http://www.neilgaiman.com/journal/2005/04/more-from-mailbag.asp" />

        <rdf:li 

  rdf:resource="http://www.neilgaiman.com/journal/2005/04/two-days.asp" />

        <rdf:li 

  rdf:resource="http://www.neilgaiman.com/journal/2005/04/finishing-things.asp" />

      </rdf:Seq>

    </items>

  </channel>



  <!-- and so on... -->

</rdf:RDF>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
XML 1.0
Inhaltsvorschau
It all begins with the XML 1.0 Recommendation, which you can read in its entirety at http://www.w3.org/TR/REC-xml. Example 1-1 shows an XML document that conforms to this specification. I’ll use it to illustrate several important concepts.
Example . A typical XML document is long and verbose
<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 

         xmlns:dc="http://purl.org/dc/elements/1.1/" 

         xmlns="http://purl.org/rss/1.0/" xmlns:admin="http://webns.net/mvcb/" 

         xmlns:l="http://purl.org/rss/1.0/modules/link/" 

         xmlns:content="http://purl.org/rss/1.0/modules/content/">

  <!--Generated by Blogger v5.0-->

  <channel rdf:about="http://www.neilgaiman.com/journal/journal.asp">

    <title>Neil Gaiman's Journal</title>

    <link>http://www.neilgaiman.com/journal/journal.asp</link>

    <description>Neil Gaiman's Journal</description>

    <dc:date>2005-04-30T01:57:38Z</dc:date>

    <dc:language>en-US</dc:language>

    <admin:generatorAgent rdf:resource="http://www.blogger.com/" />

    <admin:errorReportsTo rdf:resource="mailto:rss-errors@blogger.com" />

    <items>

      <rdf:Seq>

        <rdf:li 

  rdf:resource="http://www.neilgaiman.com/journal/2005/04/three-photographs.asp" />

        <rdf:li 

  rdf:resource="http://www.neilgaiman.com/journal/2005/04/jetlag-morning.asp" />

        <rdf:li 

  rdf:resource="http://www.neilgaiman.com/journal/2005/04/demon-days.asp" />

        <rdf:li 

  rdf:resource="http://www.neilgaiman.com/journal/2005/04/more-from-mailbag.asp" />

        <rdf:li 

  rdf:resource="http://www.neilgaiman.com/journal/2005/04/two-days.asp" />

        <rdf:li 

  rdf:resource="http://www.neilgaiman.com/journal/2005/04/finishing-things.asp" />

      </rdf:Seq>

    </items>

  </channel>



  <!-- and so on... -->

</rdf:RDF>
For those of you who are curious, this is the RSS feed for Neil Gaiman’s blog (http://www.neilgaiman.com). It uses a lot of RSS syntax, which I’ll cover in Chapter 12 in detail.
A lot of this specification describes what is mostly intuitive. If you’ve done any HTML authoring, or SGML, you’re already familiar with the concept of elements (such as
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
XML 1.1
Inhaltsvorschau
In February of 2004, the XML 1.1 specification was released by the World Wide Web Consortium (W3C; http://www.w3.org). If you don’t recall hearing much about XML 1.1, it’s no surprise; XML 1.1 was largely about Unicode conformance, and really didn’t affect XML as a whole that much, particularly for document authors and programmers not working with unusual character sets.
While XML was undergoing fairly minor maintenance updates, Unicode moved from Version 2.0 to 4.0. Since XML relies on Unicode for the characters allowed in XML element and attribute names, this had a ripple effect on document authors who wanted to use the new Unicode 4.0 characters in their documents. In XML 1.0, the specification had to explicitly permit characters to be in element and attribute names; as a result, new characters in later versions of Unicode were excluded for name usage by parsers. In XML 1.1—in an effort to avoid similar problems in the future—characters not explicitly forbidden are permitted. This means that if new characters are added in future Unicode versions, they can immediately be used in XML 1.1 documents.
If all of this doesn’t mean anything to you, then you probably don’t need to be too concerned about XML 1.1. Personally, I still type in version="1.0" and haven’t needed to change that yet. If you want to understand more about the intricacies of Unicode and XML 1.1, check out the complete specification at http://www.w3.org/TR/xml11.
All the tools and parsers used throughout this book will work with XML 1.0 and 1.1 documents.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
XML Transformations
Inhaltsvorschau
One of the cooler things about XML is the ability to transform it into something else. With the wealth of web-capable devices these days (computers, personal organizers, phones, DVRs, etc.), you never know what flavor of markup you need to deliver. Sometimes HTML works, sometimes XHTML (the XML flavor of HTML) is required, sometimes the Wireless Markup Language (WML) is supported; and sometimes you need something else entirely. In all of these cases, though, the basic data being displayed is the same; it’s just the formatting and presentation that changes. A great technique is to store the data in an XML document, and then transform that XML into various formats for display.
As useful as XML transformations can be, though, they are not simple to implement. In fact, rather than trying to specify the transformation of XML in the original XML 1.0 specification, the W3C has put out three separate recommendations to define how XML transformations work.
Because these three specifications are tied together tightly and are almost always used in concert, there is rarely a clear distinction between them. This can often make for a discussion that is easy to understand, but not necessarily technically correct. In other words, the term XSLT, which refers specifically to extensible stylesheet transformations, is often applied to both XSL and XPath. In the same fashion, XSL is often used as a grouping term for all three technologies. In this section, I distinguish among the three recommendations, and remain true to the letter of the specifications outlining these technologies. However, in the interest of clarity, I use XSL and XSLT interchangeably to refer to the complete transformation process throughout the rest of the book. Although this may not follow the letter of these specifications, it certainly follows their spirit, as well as avoiding lengthy definitions of simple concepts when you already understand what I mean.
XSL is the Extensible Stylesheet Language. It is defined as a language for expressing stylesheets. This broad definition is broken down into two parts:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
And More...
Inhaltsvorschau
Lest I mislead you into thinking that’s all that there is to XML, I want to make sure that you realize there are a multitude of other XML-related technologies. I can’t possibly get into them all in this chapter, or even in this book. You should take a quick glance at things like Cascading Style Sheets ( CSS) and XHTML if you are working on web design. Document authors will want to find out more about XLink and XPointer. XQuery will be of interest to database programmers. In other words, there’s something XML for pretty much every technology space right now. Take a look at the W3C XML activity page at http://www.w3.org/XML and see what looks interesting.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 2: Constraints
Inhaltsvorschau
It’s rare that you’ll be able to author XML without worrying about anyone else modifying your document, or anyone having to interpret the meaning of the document. The majority of the time, someone (or something) will have to figure out what your tags mean, what data is allowed within those tags, and how your document is structured. This is where constraint models come into play in the XML world. A constraint model defines the structure of your document and, to some degree, the data allowed within that structure.
In fact, if you take XML as being a data representation, you really can’t divorce a document (often called an instance) from its constraints (the schema). The instance document contains the data, and the schema gives form to that data. You can’t have one without the other; at least, not without introducing tremendous room for error. An instance document without a schema must be interpreted by the recipient; and do you really want him deciding what your elements and attributes meant?
There’s an argument that essentially goes like this: “Good XML should be structured so that it’s self-documenting.” That’s a good goal, but practically impossible. As a programmer, I often think my code is well documented and easily understood; but I’m assuming a certain level of expertise, and a certain approach to coding. Change just a few bits here and there, and someone else might reasonably interpret my “well-documented” code (or XML) completely differently than I might. Taking the time to write a schema solves this problem much more definitively.
There are three basic models for constraints in use today:
DTDs
Introduced as part of the XML 1.0 specification, DTDs are the oldest constraint model around in the XML world. They’re simply to use, but this simplicity comes at a price: DTDs are inflexible, and offer you little for data type validation as well.
XML Schema (XSD)
XML Schema is the W3C’s anointed successor to DTDs. XML Schemas are literally orders of magnitude more flexible than DTDs, and offer an almost dizzying array of support for various data types. However, just as DTDs were simple and limited, XML Schemas are flexible, complex, and (some would argue) bloated. It takes a lot of work to write a good schema, even for 50- or 100-line XML documents. For this reason, there’s been a lot of dissatisfaction with XML Schema, even though they are widely being used.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
DTDs
Inhaltsvorschau
A DTD defines how data is formatted. It must define each allowed element in an XML document, the allowed attributes, and—when appropriate—the acceptable attribute values for each element; it also indicates the nesting and occurrences of each element, and any external entities. DTDs can specify many other things about an XML document, but these basics are what I’ll focus on here.
This chapter is by no means an extensive treatment of DTDs, XML Schema, or RELAX NG. For more detail on all of these schema types, check out XML in a Nutshell by Elliotte Rusty Harold and W. Scott Means (O’Reilly), and RELAX NG by Eric van der Vlist (O’Reilly), both exhaustive works on XML and RELAX NG.
There’s remarkably little to a DTD’s semantics, although you will have to use a totally different syntax for notation than you do in XML (an annoyance corrected in both XML Schema and RELAX NG).

Elements

The bulk of the DTD is composed of ELEMENT definitions (covered in this section) and ATTRIBUTE definitions (covered in the next section). An element definition begins with the ELEMENT keyword, following the standard <! opening of a DTD tag, and then the name of the element. Following that name is the content model of the element. The content model is generally within parentheses and specifies what content can be included within the element. Take the item element, from the RSS 0.91 DTD (http://my.netscape.com/publish/formats/rss-0.91.dtd) as an example:
<!ELEMENT item (title | link | description)*>
This says that for any item element, there may be a title element, a link element, or a description element nested within that item. The “or” relationship is indicated by the pipe ( |) symbol; the OR applies to all elements within a group, indicated by the parentheses. In other words, for the grouping (title | link | description), one and only one of title, link, or description may appear. The asterisk after the grouping indicates a recurrence. Table 2-1 lists the complete set of DTD recurrence modifiers.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
XML Schema
Inhaltsvorschau
XML Schema seeks to improve upon DTDs by adding more typing and quite a few more constructs than DTDs, as well as using XML as the constraint representation format. I’m going to spend relatively little time here talking about schemas, because they are a behind the scenes detail for Java and XML. In the chapters where you’ll be working with schemas, I’ll address any specific points you need to be aware of. However, the specification for XML Schema is so enormous that it would take up an entire book of explanation on its own. As a matter of fact, XML Schema by Eric van der Vlist (O’Reilly) is just that: an entire book on XML Schema.
Before getting into the actual schema constructs, take a look at a typical XML Schema root element:
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" 

    xmlns:dw="http://www.ibm.com/developerWorks/" 

    elementFormDefault="unqualified" 

    attributeFormDefault="unqualified" version="4.0">
There’s quite a bit going on here, including two different namespace declarations. First, the XML Schema namespace itself is attached to the xsd prefix, allowing separation of XML Schema constructs from the elements and attributes being constrained. Next, the dw namespace is defined; this particular example is from the IBM DeveloperWorks XML article template, and dw is used for DeveloperWorks-specific constructs.
Then, the values of attributeFormDefault and elementFormDefault are set to "unqualified". This allows XML instance documents to omit namespace declarations on elements and attributes. Qualifications are a fairly tricky idea, largely because attributes in XML do not fall into the default namespace; they must explicitly be assigned to a namespace. For a lot more on qualification, check out the relevant portion of the XML Schema specification at http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/structures.html#element-schema.
Finally, the version attribute is given a value of "4.0". This is used to indicate the version of this particular schema, not of the XML Schema specification being used. The namespace assigned to the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
RELAX NG
Inhaltsvorschau
RELAX NG is, in many senses, the rebel child in the constraint family. While DTDs and XML Schema are both W3C specifications (or at least part of a specification, in the case of DTDs), RELAX NG is not endorsed or “blessed” by the W3C. And, even though it has been developed underneath the OASIS umbrella (http://www.oasis-open.org/home/index.php), RELAX NG is still seen as almost a grassroots effort to compete with—or at least provide an alternative to—XML Schema. Whatever you think about the political standing of RELAX NG, though, any good XML programmer should have RELAX NG in her constraint toolkit.
RELAX NG, like XML Schema, is pure XML. You start out by nesting everything within a grammar element:
<grammar xmlns="http://relaxng.org/ns/structure/1.0"

         datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">

  <!-- Content model for XML -->

</grammar>
This sets up the namespace for all the elements you used, which are of course all part of the RELAX NG syntax. datatypeLibrary lets the schema know where to pull data types (covered in the “Data types” section later) from, when you type elements and attributes. You don’t have to put this on your root element, but you’ll find that’s the best place to locate the reference; otherwise, you end up burying it somewhere in the middle of your schema, and that’s a maintenance pain.
Like the XML Schema specification, you should always use the same URI for the namespace here (http://relaxng.org/ns/structure/1.0).
You’ll find that most of the RELAX NG constructs are pretty intuitive; I’ll run through the highlights.

Elements

You define elements using the element keyword, and nestings within an XML document are represented by nestings with the RELAX NG schema:
<element name="phonebook">

  <element name="entry">

    <element name="firstName">

      <text/>

    </element>

    <element name="firstName">

      <text/>

    </element>

    <!-- etc... -->   

  </element>

</element>
In fact, you should already be seeing one of the cooler features of RELAX NG: its structure closely mirrors the structure of the document it’s constraining.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 3: SAX
Inhaltsvorschau
XML is fundamentally about data; programming with XML, then, has to be fundamentally about getting at that data. That process, called parsing, is the basic task of the APIs I’ll cover in the next several chapters. This chapter describes how an XML document is parsed, focusing on the events that occur within this process. These events are important, as they are all points where application-specific code can be inserted and data manipulation can occur.
I’m also going to introduce you to one of the two core XML APIs in Java: SAX, the Simple API for XML (http://www.saxproject.org). SAX is what makes insertion of this application-specific code into events possible. The interfaces provided in the SAX package are an important part of any programmer’s toolkit for handling XML. Even though the SAX classes are small and few in number, they provide a critical framework for Java and XML to operate within. Solid understanding of how they help in accessing XML data is critical to effectively leveraging XML in your Java programs.
For the impatient, the other of those two core APIs is DOM. Coverage of DOM begins in Chapter 5.
I’m increasingly of the “learning is best done by doing” philosophy, so I’m not going to hit you with a bunch of concept and theory before getting to code. SAX is a simple API, so you only need to understand its basic model, and how to get the API on your machine; beyond that, code will be your best teacher.
SAX uses a callback model for interacting with your code; you may also have heard this model called event-based programming. Whatever you call it, it’s a bit of a departure for object-oriented developers, so give it some time if you’re new to this type of programming.
In short, the parsing process is going to hum along, tearing through an XML document. Every time it encounters a tag, or comment, or text, or any other piece of XML, it calls back into your code, signaling that an event has occurred. Your code then has an opportunity to act, based on the details of that event.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Setting Up SAX
Inhaltsvorschau
I’m increasingly of the “learning is best done by doing” philosophy, so I’m not going to hit you with a bunch of concept and theory before getting to code. SAX is a simple API, so you only need to understand its basic model, and how to get the API on your machine; beyond that, code will be your best teacher.
SAX uses a callback model for interacting with your code; you may also have heard this model called event-based programming. Whatever you call it, it’s a bit of a departure for object-oriented developers, so give it some time if you’re new to this type of programming.
In short, the parsing process is going to hum along, tearing through an XML document. Every time it encounters a tag, or comment, or text, or any other piece of XML, it calls back into your code, signaling that an event has occurred. Your code then has an opportunity to act, based on the details of that event.
For example, if SAX encounters the opening tag of an element, it fires off a startElement event. It provides information about that event, such as the name of the element, its attributes, and so on, and then your code gets to respond. You, as a programmer, have to write code for each event that is important to you—from the start of a document to a comment to the end of an element. This process is summed up in Figure 3-1.
Figure 3-1: The parsing process is controlled by the parser and your code listens for events, responding as they occur
What’s different about this model is that your code is not active, in the sense that it doesn’t ever instruct the parser, “Hey, go and parse the next element.” It’s passive, in that it waits to be called, and then leaps into action. This takes a little getting used to, but you’ll be an old hand by the end of the chapter.
Swing and AWT programmers, as well as EJB experts, are familiar with this approach to programming.
Unsurprisingly, the SAX API is made up largely of interfaces that define these various callback methods. You would implement the ContentHandler
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing with SAX
Inhaltsvorschau
Without spending any further time on the preliminaries, it’s time to code. As a sample to familiarize you with SAX, this chapter details the SAXTreeViewer class. This utility uses SAX to parse an XML document, and displays the document visually as a Swing JTree.
If you don’t know anything about Swing, don’t worry; I don’t focus on that, but just use it for visual purposes. The focus will remain on SAX, and how events within parsing can be used to perform customized action.
The first thing you need to do in any SAX-based application is get an instance of a class that implements the SAX org.xml.sax.XMLReader interface; remember, this is why you downloaded a SAX-compliant parser in the first place.
SAX provides the org.xml.sax.XMLReader interface for all SAX-compliant XML parsers to implement. For example, the Xerces SAX parser implementation, org.apache.xerces.parsers.SAXParser, implements the XMLReader interface. If you have access to the source of your parser, you should see the same interface implemented in your parser’s main SAX parser class. Each XML parser must have one class (and sometimes has more than one) that implements this interface, and that is the class you need to instantiate to allow for parsing XML:
// Instantiate a Reader

XMLReader reader = 

  new org.apache.xerces.parsers.SAXParser(  );



// Do something with the parser

reader.parse(uri);
For newcomers to SAX, you may be wondering why XMLReader isn’t called Parser. In fact, it was in SAX 1.0, and then so many changes were introduced that the class had to be deprecated and renamed. As a result, you’ll call the parse(  ) method on the XMLReader class.
This approach ties you tightly to your parser vendor, though; you can use SAX’s org.xml.sax.helpers.XMLReaderFactory to get away from this:
XMLReader reader = XMLReaderFactory.createXMLReader(  );
Just set the org.xml.sax.driver system property, and you can get your vendor’s XMLReader implementation, without importing your vendor’s classes:
java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser

    [MyClassName]
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Content Handlers
Inhaltsvorschau
To let an application do something useful with XML data, you must register handlers with the SAX parser. A handler is nothing more than a set of callbacks that SAX defines; a group, if you will, of related events to which you might want to attach code.
There are four core handler interfaces defined by SAX 2.0: org.xml.sax.ContentHandler, org.xml.sax.ErrorHandler, org.xml.sax.DTDHandler, and org.xml.sax.EntityResolver.
In this chapter, I will discuss ContentHandler and ErrorHandler. I’ll leave discussion of DTDHandler and EntityResolver for the next chapter; it is enough for now to understand that EntityResolver and DTDHandler work just like the other handlers, but just group different behaviors.
Your classes implement one or more of these handlers and fill in the callback methods with working code (or, if you desire, no code at all; this effectively ignores a certain type of event). You then register your handler implementations using setContentHandler(  ), setErrorHandler(  ), setDTDHandler(  ), and setEntityResolver(  ), all on the XMLReader class (see Figure 3-4). Then the reader invokes the callback methods on the appropriate handlers during parsing.
Figure 3-4: The handler classes are all passed into the XMLReader interface, and then used during parsing to trigger programmer-defined behaviors
For the SAXTreeViewer example, start by implementing the ContentHandler interface. ContentHandler, as the name implies, details events related to the content of an XML document: elements, attributes, character data, etc. Add the following class to the end of your SAXTreeViewer.java source listing:
class JTreeHandler implements ContentHandler {



  /** Tree Model to add nodes to */

  private DefaultTreeModel treeModel;



  /** Current node to add sub-nodes to */

  private DefaultMutableTreeNode current;



  public JTreeHandler(DefaultTreeModel treeModel, 

                     DefaultMutableTreeNode base) {

    this.treeModel = treeModel;

    this.current = base;

  }



  // ContentHandler callback implementations

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Error Handlers
Inhaltsvorschau
In addition to providing the ContentHandler interface for handling parsing events, SAX provides an ErrorHandler interface that can be implemented to treat various error conditions that may arise during parsing (see Figure 3-10).
Figure 3-10: ErrorHandler defines only three methods, but how you implement these methods can have a huge impact on the user experience
This interface works in the same manner as the document handler already constructed, but defines only three callback methods. Through these three methods, all error conditions are handled and reported by SAX parsers.
Each method receives information about the error or warning that has occurred through a SAXParseException. This object holds the line number where the trouble was encountered, the URI of the document being treated (which could be the parsed document or an external reference within that document), and normal exception details such as a message and a printable stack trace. In addition, each of these methods can throw a SAXException. This may seem a bit odd at first: an exception handler that throws an exception? Keep in mind that each handler receives a parsing exception. This might be a warning that should not cause the parsing process to stop or an error that needs to be resolved for parsing to continue; however, the callback may need to perform system I/O or another operation that can throw another exception, and the method needs to be able to send any problems resulting from these actions up the application chain.
As an example, consider an error handler that receives error notifications and writes those errors to an error log. This callback method needs to be able to either append to or create an error log on the local filesystem. If a warning occurs within the process of parsing an XML document, the warning would be reported to this method. The intent of the warning is to give information to the callback and then continue parsing the document. However, if the error handler cannot write to the logfile, it should notify the parser and application that all parsing should stop. This can be done by catching any I/O exceptions and rethrowing these to the calling application, thus causing any further document parsing to stop. This common scenario is why error handlers must be able to throw exceptions (see Example 3-3).
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 4: Advanced SAX
Inhaltsvorschau
What you’ve seen regarding SAX so far is essentially the simplest way to process and parse XML. And while SAX is indeed named the Simple API for XML, it offers programmers much more than basic parsing and content handling. There is an array of settings that affect parser behavior, as well as several additional handlers for edge-case scenarios; if you need to specify exactly how strings should be interned, or what behavior should occur when a DTD declares a notation, or even differentiate between CDATA sections and regular text sections, SAX provides. In fact, you can even modify and write out XML using SAX (along with a few additional packages); SAX is a full-featured API, and this chapter will give you the lowdown on features that go beyond simple parsing.
I glossed over validation in the last chapter, and probably left you with a fair amount of questions. When I cover JAXP in Chapter 7, you’ll see that you can use either a method (setValidating(  )) or a set of classes (javax.xml.validation) to handle validation; you might expect to call a similar method—setValidation(  ) or something similar—to initiate validation in SAX. But then, there’s also namespace awareness, dealt with quite a bit in Chapter 2 (and Chapter 3, with respect to Q names and local names—maybe setNamespaceAwareness(  )? But what about schema validation? And setting the location of a schema to validate on, if the document doesn’t specify one? There’s also low-level behavior, like telling the parser what to do with entities (parse them? don’t parse them?), how to handle strings, and a lot more. As you can imagine, dealing with each of these could cause real API bloat, adding 20 or 30 methods to SAX’s XMLReader class. And, even worse, each time a new setting was needed (perhaps for the next type of constraint model supported? How about setRelaxNGSchema(  )?), the SAX API would have to add a method or two, and re-release a new version. Clearly, this isn’t a very effective approach to API design.
If this isn’t clear to you, check out Head First Design Patterns, by Elisabeth and Eric Freeman (O’Reilly). In particular, read up on Chapter 1 (pages 8 and 9), which details why it’s critical to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Properties and Features
Inhaltsvorschau
I glossed over validation in the last chapter, and probably left you with a fair amount of questions. When I cover JAXP in Chapter 7, you’ll see that you can use either a method (setValidating(  )) or a set of classes (javax.xml.validation) to handle validation; you might expect to call a similar method—setValidation(  ) or something similar—to initiate validation in SAX. But then, there’s also namespace awareness, dealt with quite a bit in Chapter 2 (and Chapter 3, with respect to Q names and local names—maybe setNamespaceAwareness(  )? But what about schema validation? And setting the location of a schema to validate on, if the document doesn’t specify one? There’s also low-level behavior, like telling the parser what to do with entities (parse them? don’t parse them?), how to handle strings, and a lot more. As you can imagine, dealing with each of these could cause real API bloat, adding 20 or 30 methods to SAX’s XMLReader class. And, even worse, each time a new setting was needed (perhaps for the next type of constraint model supported? How about setRelaxNGSchema(  )?), the SAX API would have to add a method or two, and re-release a new version. Clearly, this isn’t a very effective approach to API design.
If this isn’t clear to you, check out Head First Design Patterns, by Elisabeth and Eric Freeman (O’Reilly). In particular, read up on Chapter 1 (pages 8 and 9), which details why it’s critical to encapsulate what varies.
To address the ever-changing need to affect parser behavior, without causing constant API change, SAX 2 defines a standard mechanism for setting parser behavior: through the use of properties and features.
In SAX, a property is a setting that requires passing in some Object argument for the parser to use; for instance, certain types of handlers are set by specifying a URI and supplying the Object that implements that handler’s interface. A feature is a setting that is either on (true) or off (false). Several obvious examples come to mind: namespace awareness and validation, for example.
SAX includes the methods needed for setting properties and features in the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Resolving Entities
Inhaltsvorschau
You’ve already seen how to interact with content in the XML document you’re parsing (using ContentHandler), and how to deal with error conditions (ErrorHandler). Both of these are concerned specifically with the data in an XML document. What I haven’t talked about is the process by which the parser goes outside of the document and gets data. For example, consider a simple entity reference in an XML document:
<FM>

<P>Text placed in the public domain by Moby Lexical Tools, 1992.</P>

<P>SGML markup by Jon Bosak, 1992-1994.</P>

<P>XML version by Jon Bosak, 1996-1998.</P>

<P>&usage-terms;</P>

</FM>
Your schema then indicates to the parser how to resolve that entity:
<!ENTITY usage-terms  

    SYSTEM "http://www.newInstance.com/entities/usage-terms.xml">
At parse time, the usage-terms entity reference will be expanded (in this case, to “This work may be freely copied and distributed worldwide.”, as seen in Figure 4-1).
Figure 4-1: The usage-terms entity was resolved to a URI, which was then parsed and inserted into the document
However, there are several cases where you might not want this “default” behavior:
  • You don’t have network access, so you want the entity to resolve to a local copy of the referenced document (perhaps a version you’ve downloaded yourself).
  • You want to substitute your own content for the content specified in the schema.
You can short-circuit normal entity resolution using org.xml.sax.EntityResolver. This interface does exactly what it says: resolves entities. More important, it allows you to get involved in the entity resolution process. The interface defines only a single method, as shown in Figure 4-2.
Figure 4-2: There’s not much to the EntityResolver class; just a single, albeit useful, method
To insert your own logic into the resolution process, create an implementation of this interface, and register it with your XMLReader instance through setEntityResolver(  ). Once that’s done, every time the reader comes across an entity reference, it passes the public ID and system ID for that entity to the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Notations and Unparsed Entities
Inhaltsvorschau
After a rather extensive look at EntityResolver, I’m going to cruise through DTDHandler (also in org.xml.sax). In almost nine years of extensive SAX and XML programming, I’ve used this interface only once—in writing JDOM (covered in Chapter 9)—and even then, it was a rather obscure case. Still, if you work with unparsed entities often, are into parser internals, or just want to get into every nook and cranny of the SAX API, then you need to know about DTDHandler. The interface is shown in all its simplicity in Figure 4-4.
Figure 4-4: This handler is concerned with the declaration of certain XML types, rather than the actual content of those entities (if and when they are resolved)
The DTDHandler interface allows you to receive notification when a reader encounters an unparsed entity or notation declaration. Of course, both of these events occur in DTDs, not XML documents, which is why this is called DTDHandler. The two methods listed in Figure 4-4 do exactly what you would expect. The first reports a notation declaration, including its name, public ID, and system ID. Remember the NOTATION structure in DTDs? (Flip back to Chapter 2 if you’re unclear.)
<!NOTATION jpeg SYSTEM "images/jpeg">
The second method provides information about an unparsed entity declaration, which looks as follows:
<!ENTITY stars_logo SYSTEM "http://www.nhl.com/img/team/dal38.gif"

                    NDATA jpeg>
In both cases, you can take action at these occurrences if you create an implementation of DTDHandler and register it with your reader through the XMLReader’s setDTDHandler(  ) method. This is generally useful when writing low-level applications that must either reproduce XML content (such as an XML editor), or when you want to build up some Java representation of a DTD’s constraints (such as in a data binding implementation). In most other situations, it isn’t something you will need very often.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The DefaultHandler Class
Inhaltsvorschau
Because SAX is interface-driven, you have to do a lot of tedious work to get started with an XML-based application. For example, when you write your ContentHandler implementation, you have to implement each and every method of that interface, even if you aren’t inserting behavior into each callback. If you need an ErrorHandler, you add three more method implementations; using DTDHandler? That’s a few more. A lot of times, though, you’re writing lots of no-operation methods, as you only need to interact with a couple of key callbacks.
Fortunately, org.xml.sax.helpers.DefaultHandler can be a real boon in these situations. This class doesn’t define any behavior of its own; however, it does implement ContentHandler, ErrorHandler, EntityResolver, and DTDHandler, and provides empty implementations of each method of each interface. So you can have a single class (call it, for example, MyHandlerClass) that extends DefaultHandler. You then only override the callback methods you’re concerned with. You might implement startElement(  ), characters(  ), endElement(  ), and fatalError(  ), for example. In any combination of implemented methods, though, you’ll save tons of lines of code for methods you don’t need to provide action for, and make your code a lot clearer too. Then, the argument to setErrorHandler(  ), setContentHandler(  ), and setDTDHandler(  ) would be the same instance of this MyHandlerClass.
You can pass a DefaultHandler instance to setEntityResolver(  ) as well, although (as I’ve already said) I discourage mixing EntityResolver implementations in with these other handlers.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Extension Interfaces
Inhaltsvorschau
SAX provides several extension interfaces. These are interfaces that SAX parsers are not required to support; you’ll find these interfaces in org.xml.sax.ext. In some cases, you’ll have to download these directly from the SAX web site (http://www.saxproject.org), although most parsers will include these in the parser download.
Because parsers aren’t required to support these handlers, never write code that absolutely depends on them, unless you’re sure you won’t be changing parser. If you can provide enhanced features, but fallback to standard SAX, you’re in a much better position.
The first of these handlers is probably the most useful: org.xml.sax.ext.LexicalHandler. This handler provides methods that can receive notification of several lexical events in an XML document, such as comments, entity declarations, DTD declarations, and CDATA sections. In ContentHandler, these lexical events are essentially ignored, and you just get the data and declarations without notification of when or how they were provided.
This is not really a general-use handler, as most applications don’t need to know if text was in a CDATA section or not. However, if you are working with an XML editor, serializer, or other component that must know the exact format of the input document—and not just its contents—then the LexicalHandler can really help you out.
To see this guy in action, you first need to add an import statement for org.xml.sax.ext.LexicalHandler to your SAXTreeViewer.java source file. Once that’s done, you can add LexicalHandler to the implements clause in the nonpublic class JTreeContentHandler in that source file:
class JTreeHandler implements ContentHandler, ErrorHandler, LexicalHandler {
To get started, look at the first lexical event that might happen in processing an XML document: the start and end of a DTD reference or declaration. That triggers the startDTD(  ) and endDTD(  ) callbacks (I’ve coded up versions appropriate for SAXTreeViewer here):
public void startDTD(String name, String publicID,

                     String systemID)

  throws SAXException {



  DefaultMutableTreeNode dtdReference =

    new DefaultMutableTreeNode("DTD for '" + name + "'");

  if (publicID != null) {

    DefaultMutableTreeNode publicIDNode =

      new DefaultMutableTreeNode("Public ID: '" + publicID + "'");

    dtdReference.add(publicIDNode);

  }

  if (systemID != null) {

    DefaultMutableTreeNode systemIDNode =

      new DefaultMutableTreeNode("System ID: '" + systemID + "'");

    dtdReference.add(systemIDNode);

  }

  current.add(dtdReference);

}



public void endDTD( ) throws SAXException {

  // No action needed here

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Filters and Writers
Inhaltsvorschau
At this point, I want to diverge from the beaten path. There are a lot of additional features in SAX that can really turn you into a power developer, and take you beyond the confines of “standard” SAX. In this section, I’ll introduce you to two of these: SAX filters and writers. Using classes both in the standard SAX distribution and available separately from the SAX web site (http://www.saxproject.org), you can add some fairly advanced behavior to your SAX applications. This will also get you in the mindset of using SAX as a pipeline of events, rather than a single layer of processing.
First on the list is the org.xml.sax.XMLFilter class that comes in the basic SAX download, and should be included with any parser distribution supporting SAX 2. This class extends the XMLReader interface, and adds two new methods to that class, as shown in Figure 4-8.
Figure 4-8: Extra methods defined by the XMLFilter interface
It might not seem like there is much to say here; what’s the big deal, right? Well, by allowing a hierarchy of XMLReaders through this filtering mechanism, you can build up a processing chain, or pipeline, of events. To understand what I mean by a pipeline, you first need to understand the normal flow of a SAX parse:
  1. Events in an XML document are passed to the SAX reader.
  2. The SAX reader and registered handlers pass events and data to an application.
What developers started realizing, though, is that it is simple to insert one or more additional links into this chain:
  1. Events in an XML document are passed to the SAX reader.
  2. The SAX reader performs some processing and passes information to another SAX reader.
  3. Repeat until all SAX processing is done.
  4. Finally, the SAX reader and registered handlers pass events and data to an application.
It’s the middle two steps that create a pipeline, where one reader that performed specific processing passes its information on to another reader, repeatedly, instead of having to lump all code into one reader. When this pipeline is set up with multiple readers, modular and efficient programming results. And that’s what the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 5: DOM
Inhaltsvorschau
SAX is just one of several APIs that allow XML work to be done within Java. This chapter and the next will widen your API knowledge as I introduce the Document Object Model, commonly called the DOM. This API is quite a bit different from SAX, and complements the Simple API for XML in many ways. You’ll need both, as well as the other APIs and tools in the rest of this book, to be a competent XML developer.
Because DOM is fundamentally different from SAX, I’ll spend a good bit of time discussing the concepts behind DOM, and why it might be used instead of SAX for certain applications. Selecting any XML API involves tradeoffs, and choosing between DOM and SAX is certainly no exception. I’ll move on to possibly the most important topic: code. I’ll introduce you to a utility class that serializes DOM trees and will provide a pretty good look at the DOM structure and related classes. This will get you ready for some more advanced DOM work.
The DOM, unlike SAX, has its origins in the World Wide Web Consortium (W3C; online at http://www.w3.org). Whereas SAX is public domain software, developed through long discussions on the XML-dev mailing list, DOM is a standard—just like the actual XML specification. The DOM is designed to represent the content and model of XML documents across all programming languages and tools. On top of that specification, there are several language bindings. These bindings exist for JavaScript, Java, CORBA, and other languages, allowing the DOM to be a cross-platform and cross-language specification.
In addition to being different from SAX in regard to standardization and language bindings, the DOM is organized into “levels” instead of versions. DOM Level One is an accepted recommendation, and you can view the completed specification at http://www.w3.org/TR/REC-DOM-Level-1. Level 1 details the functionality and navigation of content within a document.
A document in the DOM is not just limited to XML, but can be HTML or other content models as well.
DOM Level 2, which was finalized in November of 2000, adds core functionality to DOM Level 1. There are also several additional DOM modules and options aimed at specific content models, such as XML, HTML, and CSS. These less-generic modules begin to “fill in the blanks” left by the more general tools provided in DOM Level 1. You can view the current DOM Level 2 Recommendation at
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Document Object Model
Inhaltsvorschau
The DOM, unlike SAX, has its origins in the World Wide Web Consortium (W3C; online at http://www.w3.org). Whereas SAX is public domain software, developed through long discussions on the XML-dev mailing list, DOM is a standard—just like the actual XML specification. The DOM is designed to represent the content and model of XML documents across all programming languages and tools. On top of that specification, there are several language bindings. These bindings exist for JavaScript, Java, CORBA, and other languages, allowing the DOM to be a cross-platform and cross-language specification.
In addition to being different from SAX in regard to standardization and language bindings, the DOM is organized into “levels” instead of versions. DOM Level One is an accepted recommendation, and you can view the completed specification at http://www.w3.org/TR/REC-DOM-Level-1. Level 1 details the functionality and navigation of content within a document.
A document in the DOM is not just limited to XML, but can be HTML or other content models as well.
DOM Level 2, which was finalized in November of 2000, adds core functionality to DOM Level 1. There are also several additional DOM modules and options aimed at specific content models, such as XML, HTML, and CSS. These less-generic modules begin to “fill in the blanks” left by the more general tools provided in DOM Level 1. You can view the current DOM Level 2 Recommendation at http://www.w3.org/TR/DOM-Level-2-Core. This is actually the recommendation for the DOM Core; all the supplemental modules are represented by their own specifications:
DOM Level 2 Views (http://www.w3.org/TR/DOM-Level-2-Views)
The Views module deals with interaction between an XML document and some type of stylesheet or presentation aspect. For instance, the same XML document could be styled by multiple CSS or XSL stylesheets; each of the resulting documents would be a view. It turns out that this module isn’t that useful, as Java tools for document transformation are plentiful; most parsers won’t support this module.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Serialization
Inhaltsvorschau
Typically, I’d come up with some clever example for using DOM at this point, and use it to demonstrate how the API works. However, DOM leaves a rather gaping hole, and filling that hole proves to be a good DOM tutorial, as well as having practical value. This hole, of course, is serialization. Serialization is the process of taking an XML document in memory, represented as a DOM tree, and writing it to disk (or to a stream).
If you’re lucky enough to have a parser that implements the DOM Level 3 Load and Save module, then outputting a DOM tree isn’t a problem for you. Most parsers don’t provide that support—or slap experimental all over it—and it becomes a real problem for DOM programming.
Before you can serialize a DOM tree representing some XML, though, you need to read that XML in the first place. Since you’ll usually be reading XML from a file, I’ll show you how to do just that. Example 5-1 is a sample class that takes an XML filename, and loads the document into a DOM tree, represented by the org.w3c.dom.Document interface.
Example . This test class reads in an XML document and loads it into a DOM tree
package javaxml3;



import java.io.File;

import java.io.FileInputStream;

import java.io.FileOutputStream;

import java.io.OutputStream;

import org.xml.sax.InputSource;

import org.w3c.dom.Document;



// Parser import

import org.apache.xerces.parsers.DOMParser;



public class SerializeTester {



  // File to read XML from

  private File inputXML;



  // File to serialize XML to

  private File outputXML;



  public SerializeTester(File inputXML) {

    this.inputXML = inputXML;

  }



  public void test(OutputStream outputStream) 

    throws Exception {



    DOMParser parser = new DOMParser(  );



    // Get the DOM tree as a Document object



    // Serialize

  }



  public static void main(String[] args) {

    if (args.length != 2) {

      System.out.println(

        "Usage: java javaxml3.SerializeTester " +

        "[XML document to read] " +

        "[filename to write output to]");

      return;

    }



    try {

      SerializeTester tester = 

        new SerializeTester(new File(args[0]));

      tester.test(new FileOutputStream(new File(args[1])));

    } catch (Exception e) {

      e.printStackTrace( );

    }

  }

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Modifying and Creating XML
Inhaltsvorschau
The biggest limitation when using SAX for dealing with XML is that you cannot change any of the XML structure you encounter, at least not without using filters and writers. Those aren’t intended to be used for wholesale document changes anyway, so you’ll need to use another API when you want to modify XML. DOM fits the bill nicely, as it provides XML creation and modification facilities.
In working with DOM, the process of creating an XML document is quite different from changing an existing one, so I’ll take them one at a time. This section gives you a fairly realistic example to mull over. If you’ve ever been to an online auction site like eBay, you know that the most important aspects of the auction are the ability to find items, and the ability to find out about items. These functions depend on a user entering in a description of an item, and the auction using that information. The better auction sites allow users to enter in some basic information as well as actual HTML descriptions, which means savvy users can bold, italicize, link, and add other formatting to their items’ descriptions. This provides a good case for using DOM.
To get started, a little bit of groundwork is needed. Example 5-3 shows a servlet that displays a simple HTML form that takes basic information about an item to be listed on an auction site. This would obviously be dressed up more for a real site, but you get the idea.
Example . This servlet-generated form submits the data it collects to itself
package javaxml3;



import java.io.File;

import java.io.IOException;

import java.io.PrintWriter;

import javax.servlet.ServletConfig;

import javax.servlet.ServletException;

import javax.servlet.http.HttpServlet;

import javax.servlet.http.HttpServletRequest;

import javax.servlet.http.HttpServletResponse;



// DOM imports

import org.w3c.dom.Attr;

import org.w3c.dom.Document;

import org.w3c.dom.DOMImplementation;

import org.w3c.dom.Element;

import org.w3c.dom.Text;



// Parser import

import org.apache.xerces.dom.DOMImplementationImpl;



public class UpdateItemServlet extends HttpServlet {



  private String outputDir;



  public void init(ServletConfig config) throws ServletException {

    super.init(config);

    outputDir = config.getInitParameter("OutputDirectory");

    if (outputDir == null) outputDir = "";

  }



  public void doGet(HttpServletRequest req, HttpServletResponse res)

    throws ServletException, IOException {



    // Get output

    PrintWriter out = res.getWriter( );

    res.setContentType("text/html");



    // Output HTML        

    out.println("<html>");

    out.println(" <head><title>Input/Update Item Listing</title></head>");

    out.println(" <body>");

    out.println("  <h1 align='center'>Input/Update Item Listing</h1>");

    out.println("  <p align='center'>");

    out.println("   <form method='POST' action='" + target + "'>");

    out.println("    Item ID (Unique Identifier): <br />");

    out.println("    <input name='id' type='text' maxLength='10' />" +

        "<br /><br />");

    out.println("    Item Name: <br />");

    out.println("    <input name='name' type='text' maxLength='50' />" +

        "<br /><br />");

    out.println("    Item Description: <br />");

    out.println("    <textarea name='description' rows='10' cols='30' " +

        "wrap='wrap' ></textarea><br /><br />");

    out.println("    <input type='reset' value='Reset Form'>&nbsp;&nbsp;");

    out.println("    <input type='submit' value='Add/Update Item' />");

    out.println("   </form>");

    out.println("  </p>");

    out.println(" </body>");

    out.println("</html>"); 

    out.close( );

  }

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Namespaces
Inhaltsvorschau
So far, I’ve basically punted on the issue of XML namespaces. Happily, DOM does support namespaces, so let’s get into that now. This support is achieved through two methods on the Node interface: getPrefix(  ) and getNamespaceURI(  ). Additionally, all of the creation methods have namespace-aware versions available. So, instead of calling createElement(  ), you call createElementNS(  ).
In each of these new namespace-aware methods, the first argument is the namespace URI, and the second is the qualified name of the element, attribute, etc. Note that I said qualified; this means that if you want to use a namespace URI of http://www.ajaxian.com and a prefix of ajax on an element called blog-entry, you would call createElementNS("http://www.ajaxian.com", "ajax:blog-entry"). This is very important, and remembering to use that prefix will save you a lot of time down the road. Calling getPrefix(  ) on that new element will return "ajax".
If you want the element in the default namespace (with no prefix), just pass in the element name (the local name), and you’re all set. Calling getPrefix(  ) on a default-namespaced element returns null, by the way, as it does on an element not in any namespace.
The prefix tells you very little about whether an element is a namespace. Elements with a default namespace (and no prefix) have the same return value from getPrefix(  ) as elements not in any namespace.
Rather than simply list all the new namespace-aware methods, let’s look at some real code. Here’s the bulk of the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 6: DOM Modules
Inhaltsvorschau
Chapter 5 introduced and detailed the DOM API, and specifically what is called the DOM core. This is the portion of DOM that is most used, as it handles basic XML reading, as well as document creation. However, there are times when basic XML isn’t enough—whether you’re working with XML, or writing a document editor, or trying to serialize XML using the latest DOM APIs. In these specialized cases, you will often find a DOM module that can help.
I summarized the complete set of DOM specifications, including DOM modules, in Chapter 5. In this chapter, I’ll detail each module, and show you how you can use these modules in your applications.
Since DOM Level 3 is still new and largely unsupported, I’ve split coverage of these modules depending on the DOM Level they are based on. Most current parsers support at least a few of the DOM Level 2 modules, and a few will support beta versions of the DOM Level 3 modules.
As a brief refresher (and so you’re not constantly flipping back to Chapter 5), Table 6-1 lists the DOM modules.
Table : Each module has a specific name used to query a parser for module support
SpecificationModule nameSummary of purpose
DOM Level 2 CoreXMLExtends the DOM Level 1 specification; deals with basic DOM structures like Element, Attr, Document, etc.
DOM Level 2 ViewsViewsProvides a model for scripts to dynamically update a DOM structure
DOM Level 2 Events EventsDefines an event model for programs and scripts to use in working with DOM
DOM Level 2 Style CSSProvides a model for CSS based on the DOM Core and DOM Views specifications
DOM Level 2 Traversal and Range Traversal/RangeDefines extensions to the DOM for traversing a document andidentifying the range of content within that document
DOM Level 2 HTML HTMLExtends the DOM to provide interfaces for dealing with HTML structures in a DOM format
DOM Level 3 Core XMLExpands DOM Level 2 to provide bootstrapping of DOM implementations and support for XML InfoSet
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Checking for Module Support
Inhaltsvorschau
As a brief refresher (and so you’re not constantly flipping back to Chapter 5), Table 6-1 lists the DOM modules.
Table : Each module has a specific name used to query a parser for module support
SpecificationModule nameSummary of purpose
DOM Level 2 CoreXMLExtends the DOM Level 1 specification; deals with basic DOM structures like Element, Attr, Document, etc.
DOM Level 2 ViewsViewsProvides a model for scripts to dynamically update a DOM structure
DOM Level 2 Events EventsDefines an event model for programs and scripts to use in working with DOM
DOM Level 2 Style CSSProvides a model for CSS based on the DOM Core and DOM Views specifications
DOM Level 2 Traversal and Range Traversal/RangeDefines extensions to the DOM for traversing a document andidentifying the range of content within that document
DOM Level 2 HTML HTMLExtends the DOM to provide interfaces for dealing with HTML structures in a DOM format
DOM Level 3 Core XMLExpands DOM Level 2 to provide bootstrapping of DOM implementations and support for XML InfoSet
DOM Level 3 Load & Save LSDefines DOM extensions for loading and writing XML documents to a persistent storage mechanism, like a filesystem, in a vendor-neutral manner
DOM Level 3 Validation ValidationAllows DOM trees to be validated (in memory) and checked for validity as new Nodes are added to the tree
DOM parsers are not required to implement these modules, so you need to verify that the features you want to use are supported by your XML parser. The DOMImplementation class provides the hasFeature(  ) method for just that purpose, as seen in Example 6-1. You will need to change the name of your vendor’s DOMImplementation class, but other than that adjustment, it should work for any parser.
Example . Supply the module name and a version to the hasFeature(     ) method to see if a particular DOM parser implementation supports a certain module
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
DOM Level 2 Modules
Inhaltsvorschau
I’ll start with the DOM Level 2 modules. You should expect to find support for most of these (with the repeated exception of HTML-related modules) in most modern DOM-compliant parsers.
First up on the list is the DOM Level 2 Traversal module. This module provides tree-walking capability, along with a highly customizable manner. In particular, the DOM Traversal module is useful when you don’t know—or aren’t sure about—the structure of an XML document you’re parsing.
The whole of the traversal module is contained within the org.w3c.dom.traversal package. Just as everything within core DOM begins with a Document interface, everything in DOM Traversal begins with the org.w3c.dom.traversal.DocumentTraversal interface . This interface provides two methods:
NodeIterator createNodeIterator(Node root, int whatToShow, NodeFilter filter,

                                boolean expandEntityReferences);

TreeWalker createTreeWalker(Node root, int whatToShow, NodeFilter filter,

                            boolean expandEntityReferences);
Most DOM implementations that support traversal choose to have their org.w3c.dom.Document implementation class implement the DocumentTraversal interface as well; in Xerces, you can use the default Document implementation, and you’re all set. DocumentTraversal is shown along with the rest of the traversal classes in Figure 6-1.
Figure 6-1: DOM Traversal module
There are just three other classes to worry about (all in the org.w3c.dom.traversal package); all focus on selecting certain DOM nodes, and working with the results of that selection. NodeFilter does just what it sounds like it does: provides a means of selecting only certain nodes based on filtering criteria. Using a NodeIterator provides a list view of the elements iterated over, and the TreeWalker class provides a tree view of that same data.

Selecting nodes

One of the more popular applications in today’s web-centric world is a spider, or crawler, that searches and indexes web pages. Google has also begun to add more and more power to its search engine (
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
DOM Level 3 Modules
Inhaltsvorschau
DOM Level 3 seems to be the point at which the specification maintainers got very practical about the API. While modules like Traversal, Range, Events, and HTML are nice, they’re not the sorts of things you’ll find yourself using every day, at least in most programming environments. However, the ability to validate a DOM tree in-memory, as well as writing out XML documents, is something you’re more likely to need every hour, let alone just once in a while. Fortunately, these key improvements are being adapted fairly quickly, so expect widespread DOM Level 3 support within the next year.
Personally, I’m as excited about the Load and Save module as I am about anything that has come out of DOM since the first edition of this book came out seven years ago. In short, this module allows you to excise the following line from your code, once and for all:
import org.apache.xerces.parsers.DOMParser;
Now, I’m as much a fan of Xerces as anyone, but I just don’t like vendor-specific code in my classes. I’d much rather configure code with system properties, and be able to change parsers, processors, and the like all on the fly. Load and Save (LS) fills this need nicely.

Reading XML documents

There are quite a few classes involved in loading a DOM tree; they’re all in the org.w3c.dom.ls package, and shown in Figure 6-8.
Figure 6-8: DOM Load and Save module
I’m not going to cover every nuance of each of these, but instead will focus on what most of you care about: loading a DOM tree without using Xerces (or some other parser) directly.
First, find an instance of org.w3c.bootstrap.DOMImplementationRegistry ; this class was covered in the last chapter and is critical for using the LS module:
DOMImplementationRegistry registry = 

  DOMImplementationRegistry.newInstance(  );
Remember, you can request DOM implementations from this registry via the getDOMImplementation(  ) method; just use the "LS" string to get an LS-capable implementation.
DOMImplementationSource impl = registry.getDOMImplementation("LS");
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 7: JAXP
Inhaltsvorschau
With SAX and DOM, there aren’t a whole lot of XML problems you run into that you can’t solve. Loading, reading, and writing XML are all handled by these APIs, and you can even avoid vendor-specific code with the tricks you’ve already seen in previous chapters. However, Java remains a Sun creation (as much as I’d love to see the language go open source), and as a rule, Sun is going to provide an API for anything it sees as common—the thinking, I suppose, is that if a programmer is going to work with Java, he should be using Sun software as much as possible.
Along those lines, Sun provides JAXP for working with XML. Although initially a very small API that handled only parsing, the latest version of JAXP provides everything you find in SAX and DOM, as well as a few extras, and JAXP makes vendor neutrality much easier than using DOM or SAX directly. In this chapter, I’ll walk you through JAXP piece by piece, from parsing to validation to transformations.
Before you get too far into working with JAXP, you need to understand a little bit about exactly what JAXP is. Sun calls it the Java API for XML Processing, although it might better be known as the Java Abstraction Layer for XML Processing. JAXP doesn’t provide any original functionality, but instead sits on top of existing APIs—most notably SAX and DOM, which of course you’re already familiar with, as well as TrAX and a few other APIs which you’ll learn about in this chapter.
For parsing XML, JAXP allows you to make method calls that affect either SAX parsing or DOM processing. As you’ll see shortly, you can either work with SAX and DOM through the JAXP layer, or use JAXP to obtain a SAX or DOM parser and then interact directly with those APIs.
For those of you who think you’ll never use SAX or DOM, this should help you change your mind. Anyone who uses JAXP is going to need to have at least passing familiarity with SAX and DOM, and if you want to get the most out of JAXP, you better know these underlying APIs inside and out. JAXP doesn’t replace SAX or DOM; it simply supplements them.
In addition to providing a Sun-endorsed means of operating upon XML, recent versions of Sun’s JDK and JRE come bundled with JAXP. For example, Java 5.0 includes JAXP 1.3 (the very latest and greatest) alongside other standard Java APIs like Swing, AWT, and Collection classes. Even more important, the servers and systems you deploy on will all have JAXP support as long as they have a recent version of Java running on them. This guarantee makes it a lot simpler to write XML applications, and know they’ll run normally on various servers.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
More Than an API
Inhaltsvorschau
Before you get too far into working with JAXP, you need to understand a little bit about exactly what JAXP is. Sun calls it the Java API for XML Processing, although it might better be known as the Java Abstraction Layer for XML Processing. JAXP doesn’t provide any original functionality, but instead sits on top of existing APIs—most notably SAX and DOM, which of course you’re already familiar with, as well as TrAX and a few other APIs which you’ll learn about in this chapter.
For parsing XML, JAXP allows you to make method calls that affect either SAX parsing or DOM processing. As you’ll see shortly, you can either work with SAX and DOM through the JAXP layer, or use JAXP to obtain a SAX or DOM parser and then interact directly with those APIs.
For those of you who think you’ll never use SAX or DOM, this should help you change your mind. Anyone who uses JAXP is going to need to have at least passing familiarity with SAX and DOM, and if you want to get the most out of JAXP, you better know these underlying APIs inside and out. JAXP doesn’t replace SAX or DOM; it simply supplements them.
In addition to providing a Sun-endorsed means of operating upon XML, recent versions of Sun’s JDK and JRE come bundled with JAXP. For example, Java 5.0 includes JAXP 1.3 (the very latest and greatest) alongside other standard Java APIs like Swing, AWT, and Collection classes. Even more important, the servers and systems you deploy on will all have JAXP support as long as they have a recent version of Java running on them. This guarantee makes it a lot simpler to write XML applications, and know they’ll run normally on various servers.
In the same vein, JAXP provides a full-featured validation API. Unlike SAX and DOM, though, JAXP breaks out most of its validation functionality into a separate package and set of classes. With this separation comes a lot more flexibility, allowing you to work with DTDs, XML Schema, or even Relax NG schemas, all while staying within the JAXP framework.
JAXP provides for XML transformations in addition to parsing and validation. You can process XSL stylesheets, apply them to XML documents, and even re-process and validate the output. While you’re already familiar with the APIs that underlie JAXP’s parsing, the XML transformations API will be new to many of you. Called TrAX—the Transformations API for XML—you can handle all the transformation tasks you’ll probably ever run into.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing XML
Inhaltsvorschau
XML is pretty much useless unless you can get at the data it represents, so any good API begins with the parsing process. JAXP uses SAX and DOM to get the job done, so this section is largely about how JAXP interacts with those APIs; you should already have the intricacies of SAX and DOM down, and I’ll leave out repeated discussions of how these underlying APIs work.
JAXP makes heavy use of the factory model. In general, you’ll get a factory for the type of parsing you want to use with a static method on the factory itself. Then, you perform optional configuration on the factory, and obtain a parser. Once you’ve got the parser, you can set some more options, and then actually parse XML. This is the same process used in both SAX and DOM (as well as TrAX), and it’s illustrated by the simple flowchart in Figure 7-1.
Figure 7-1: The JAXP parsing process is the same for both SAX and DOM
Working with JAXP and SAX is really just a matter of plugging in the right class and interface names. Along those lines, you need to familiarize yourself with the javax.xml.parsers package.
By now, I’m assuming you’ve got a current version of Java and the JDK. As I write this text, Java 5.0 is still somewhat new, but I imagine by the time you’re holding this book, it will be fairly commonplace. I recommend moving to Java 5.0 (even if you choose not to take advantage of its new features), simply for JAXP 1.3. That’s the version of JAXP covered in this chapter, and throughout the rest of this book. If you can’t move to Java 5.0 for any reason, don’t worry, JAXP 1.3 is available as a separate download from https://jaxp.dev.java.net.
There are only two classes to worry about, as well as an Exception and an Error. It can’t get much simpler than that, can it? The UML for these is shown in Figure 7-2.
Figure 7-2: The SAX portion of JAXP uses a factory, a parser, an Exception, and an Error, all of which are almost completely self-explanatory

Creating a parser factory

Review Figure 7-1, and it should be obvious that you’re going to start with
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Processing XSL
Inhaltsvorschau
Since JAXP 1.1, JAXP has been the Java API for XML Processing; this replaces the 1.0 version, which was the Java API for XML Parsing. Much of this change is due to the addition of TrAX (yes, it’s an API-happy world). Via TrAX, JAXP offers vendor-neutral XML document transformations. This is a welcome feature, as XSL processors have even greater variance across vendors than their XML parser counterparts.
Thanks to the JAXP expert group—and in particular Scott Boag and Michael Key, two XSL processor gurus—JAXP and TrAX offer a wide array of features and options, and provide complete support for almost all XML transformations. All this is sheltered under the javax.xml.transform package (and a few subpackages); Figure 7-5 shows the complete set of JAXP/TrAX classes and interfaces (omitting subpackage class definitions).
Figure 7-5: There are many classes in TrAX, but you typically use only one or two
Like the parsing portion of JAXP, performing XML transformations requires just a few basic steps:
  1. Obtain a TransformerFactory.
  2. Retrieve a Tranformer.
  3. Perform transformations on XML documents.
This is summarized in Figure 7-6.
Figure 7-6: The XML transformation process looks a lot like the SAX and DOM parsing process, involving a transformation factory
For XML transformations, the factory you want is javax.xml.transform.TransformerFactory. This class is analogous to SAXParserFactory and DocumentBuilderFactory, both of which you’ve already seen. Obtaining a factory instance is a piece of cake:
TransformerFactory factory = TransformerFactory.newInstance(  );
Once you’ve got the factory, you can set a number of options:
Error listeners setErrorListener(  )/getErrorListener(  )
Defined in the javax.xml.transform package, the ErrorListener interface allows problems in transformations to be caught and handled programmatically.
URI resolvers setURIResolver(  )/getURIResolver(  )
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
XPath
Inhaltsvorschau
With the release of JAXP 1.3, a rich XPath API was added to JAXP. The API was designed to be object model neutral, meaning that, assuming the proper classes exist, your code can evaluate XPath expressions on XML objects created by any XML object model as well as return the API-appropriate types for nodes and sets of nodes. In addition to DOM, it is possible to obtain implementations of JAXP XPath interfaces that work with document objects created with JDOM, dom4j, and XOM, among others. (The JDOM and dom4j object models are discussed in Chapters 9 and 10, respectively.) The standard JAXP distribution, however, only includes support for DOM Document objects.
This section is not an exhaustive look at XPath. It specifically discusses the XPath API within JAXP. For more information on XPath, please check out XPath and XPointer by John E. Simpson (O’Reilly). As an additional caveat, some of the examples use expressions that are more verbose than necessary for illustrative purposes.
The core interface for the JAXP XPath API is javax.xml.xpath.XPath . This interface defines several methods named evaluate(  ) for evaluating an XPath expression against an XML document that has already been parsed into a document object or an instance of org.xml.sax.InputSource in the case that the document has not already been parsed. The XPath interface also supports compiling an expression into an XPathExpression object. This functionality, similar to the Templates objects from TrAX, allows you to avoid the overhead of repeated compilation if you are going to use the same expression repeatedly. Also like Templates objects, XPathExpression objects are thread-safe and can be used by multiple threads simultaneously. Figure 7-12 contains a UML diagram of the JAXP XPath interfaces. I am also including NamespaceContext, which isn’t strictly an XPath class (in fact, it’s in the javax.xml.namespace package whereas the rest of these interfaces are in the javax.xml.xpath package), but I do discuss it in this section.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
XML Validation
Inhaltsvorschau
Along with XPath support, JAXP 1.3 added an entirely new validation framework. Previously, validation was handled by invoking setValidating(  ) on either a SAXParserFactory or a DocumentBuilderFactory:
factory.setValidating(true);
This approach, while functional, left a lot to be desired. It relied on the document being parsed to specify the schema to validate against, which can be problematic; it’s common for documents to omit a DOCTYPE or schema reference, and yet you still may want to validate that document against a schema you have on hand. Additionally, setValidating(  ) is ambiguous as to the constraint type being used. Is the document to be validated against a DTD? an XML Schema? Can RELAX NG schemas be used? What if the document references both a DTD and XML Schema? These are all questions that prompted the creation of a new JAXP package, javax.xml.validation (shown in Figure 7-13).
This should already start to make some sense; classes like Schema and SchemaFactory look a lot like the SAXParser/SAXParserFactory and DocumentBuilder/DocumentBuilderFactory combinations from SAX and DOM. In fact, you begin—as you do with the other JAXP factory classes—by creating a new SchemaFactory via the newInstance(  ) method:
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);  
Figure 7-13: Most of the classes in javax.xml.validation are related to internal processing; you’ll usually use only SchemaFactory, Schema, and Validator
JAXP hardwires each SchemaFactory instance to a particular type of schema, so you’ll need to supply this method with a constant representing the schema variant you want to use. These come from another new JAXP class, javax.xml.XMLConstants; Table 7-2 shows the constants supported for use with validation.
Table : Constants supported for use with validation
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 8: Pull Parsing With StAX
Inhaltsvorschau
The two APIs we’ve examined thus far—SAX and DOM—take two different approaches to XML document parsing. A SAX parser notifies your code, through predefined interfaces, of various events as the parser traverses the XML document. DOM creates a tree structure in memory that is then returned to your code as one whole piece.
This chapter looks at an additional API—StAX—that uses yet a third approach for XML parsing commonly referred to as pull parsing. Pull parsing is similar to SAX in that your code interacts with the document as it is being read by the parser. The difference lies in how this interaction occurs. As the name implies, when you use a pull parser, your code asks the parser for the next event. Your code need not implement any special interfaces, as is necessary with SAX. As a result, code that uses a pull parser may be more concise and easier to read than the corresponding SAX code.
In addition, StAX provides a set of classes for writing XML documents, something SAX doesn’t handle at all. Unlike DOM or any other tree-based parser, the document does not remain in memory while it is being built.
We will also look at an alternative pull parser API—XmlPull—which was the predecessor to StAX but continues to be useful in memory-constrained applications, specifically those that use J2ME.
StAX is an acronym for Streaming API for XML. It is Java Specification Recommendation (JSR) 173, sponsored by BEA with the goal of standardizing the various pull parser implementations that had been created in the absence of a Java or W3C standard. StAX provides interfaces for parsing XML documents as well as producing them. The JSR should actually be titled “Streaming APIs for XML” because StAX encompasses two distinct APIs. The specification refers to these as the cursor API and the event iterator API. According to the specification, the objective of the cursor API is “[t]o allow users to read and write XML as efficiently as possible,” whereas for the event iterator API, it’s “to be easy to use, event based, easy to extend, and allow easy pipelining.” This implies a greater difference between the APIs than actually exists, as we’ll see throughout this chapter.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
StAX Basics
Inhaltsvorschau
StAX is an acronym for Streaming API for XML. It is Java Specification Recommendation (JSR) 173, sponsored by BEA with the goal of standardizing the various pull parser implementations that had been created in the absence of a Java or W3C standard. StAX provides interfaces for parsing XML documents as well as producing them. The JSR should actually be titled “Streaming APIs for XML” because StAX encompasses two distinct APIs. The specification refers to these as the cursor API and the event iterator API. According to the specification, the objective of the cursor API is “[t]o allow users to read and write XML as efficiently as possible,” whereas for the event iterator API, it’s “to be easy to use, event based, easy to extend, and allow easy pipelining.” This implies a greater difference between the APIs than actually exists, as we’ll see throughout this chapter.
The specific interfaces for the cursor API are XMLStreamReader and XMLStreamWriter. For the event iterator API, these interfaces are XMLEventReader and XMLEventWriter. All of these interfaces are in the package javax.xml.stream.
In the cursor API interfaces, methods on the reader or writer object itself allow the developer to obtain information or add new content to the XML document. This is referred to as the cursor API, as it is similar to how database cursors work. In the event iterator API, you obtain event objects from the reader or add event objects to the writer. This strongly typed event object contains only the methods appropriate for that type of event. In most implementations, the XMLEventReader implementation uses XMLStreamReader under the hood and, likewise, XMLEventWriter uses XMLStreamWriter.
The final release of the StAX specification, API, and JavaDocs can be downloaded from http://jcp.org/en/jsr/detail?id=173.
Whether using the cursor or event interfaces, StAX defines the same set of events that will occur while traversing the document. As part of the API, each of these is assigned an
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
StAX Factories
Inhaltsvorschau
To obtain an instance of any of the four primary StAX interfaces mentioned above, you’ll use one of two factory classes: javax.xml.stream.XMLInputFactory and javax.xml.stream.XMLOutputFactory . To obtain an instance of the factory class, call the static method newInstance(  ) on the abstract class XMLInputFactory.
XMLInputFactory inputFactory = XMLInputFactory.newInstance(  );
The following steps determine which implementation of StAX is returned by the newInstance(  ) method:
  1. Check the javax.xml.stream.XMLInputFactory system property.
  2. Look for a file named xml.stream.properties in the lib subdirectory of the JRE. This file is in the standard properties file syntax and defines the property javax.xml.stream.XMLInputFactory.
  3. Look for a resource named META-INF/services/javax.xml.stream.XMLInputFactory in the classpath.
If these steps look familiar, that’s because it’s the same process used by JAXP.
In general, an implementation’s jar file will provide the META-INF/services file. The first two options are useful when you want to provide your own implementation of the interfaces or if you have multiple implementations in your classpath and need to be explicit about which one to use.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Parsing with StAX
Inhaltsvorschau
Reading a XML document with the two StAX reader interfaces is relatively similar. Both XMLStreamReader and XMLEventReader provide an interface similar to java.util.Iterator. XMLEventReader extends Iterator, whereas XMLStreamReader has methods named hasNext(  ) and next(  ), just as Iterator does, but the next(  ) method returns an int, not an Object. Because of this relation to Iterator, the primary use of either interface looks like one of the event loops in Examples 8-1 and 8-2.
Example . Basic XMLStreamReader event loop
while (streamReader.hasNext(  )) {

    int eventTypeID = streamReader.next(  );

    // do something

}
Example . Basic XMLEventReader event loop
while (eventReader.hasNext(  ) {

    XMLEvent event = (XMLEvent) eventReader.next(  );

    // do something with event

}
As described above, javax.xml.stream.XMLInputFactory is used to create instances of XMLStreamReader and XMLEventReader. XMLInputFactory has six different overloaded methods named createXMLStreamReader(  ) for creating XMLStreamReader(  ) instances and seven different overloaded methods named createXMLEventReader(  ) for creating XMLEventReader instances (the seventh being to create an XMLEventReader that wraps an already-created XMLStreamReader). The parameters that can be passed to these create methods are:
  • A java.io.InputStream
  • A java.io.InputStream and a character encoding
  • A java.io.InputStream and a system ID to use for resolving relative URIs
  • A java.io.Reader
  • A java.io.Reader and a system ID to use for resolving relative URIs
  • A javax.xml.transform.Source
The last of these, javax.xml.transform.Source, is optional. If an implementation does not provide support for Source inputs, both createXMLEventReader(  ) and createXMLStreamReader(  ) will throw a java.lang.UnsupportedOperationException. One case of an implementation that does not support
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Document Output with StAX
Inhaltsvorschau
The StAX specification states that the first design goal for StAX is to provide “symmetrical APIs for reading and writing XML using a streaming paradigm.” This is a significant difference from SAX, which provides an API for reading only. Writing XML documents with StAX solves the fundamental problem with using DOM or any DOM-like API—you do not have to create the entire document in memory before being able to serialize it. Instead, you write events, using the same event vocabulary we’ve already discussed in this chapter, to a writer object that is attached to an output stream. The writer object will flush the character representation of those events to the output stream as necessary, or when your code requests it by calling the flush(  ) method. As a result, it is possible to create massive documents with StAX with a limited amount of memory, something that isn’t possible with DOM.
As with the reading APIs, there are two main interfaces for document output: XMLStreamWriter and XMLEventWriter. Instances of these are created using the static newInstance(  ) method of the abstract class XMLOutputFactory. The concrete implementation of XMLOutputFactory returned by newInstance(  ) is determined using the same process described in the section “StAX Factories” earlier in this chapter. Once you have obtained an instance of XMLOutputFactory, instances of XMLStreamWriter and XMLEventWriter are obtained by invoking methods named createXMLStreamWriter(  ) and createXMLEventWriter(  ), respectively. As with the createXMLStreamReader(  ) and createXMLEventReader(  ) methods of XMLInputFactory, these writer creation methods have several overloaded versions. There are overloaded methods for each that accept:
  • A java.io.Writer
  • A java.io.OutputStream
  • A java.io.OutputStream and a character set encoding
  • A javax.xml.transform.Result
As with XMLInputFactory and javax.xml.transform.Source, support by XMLOutputFactory for javax.xml.transform.Result
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Factory Properties
Inhaltsvorschau
As with SAX and DOM, StAX readers and writers can be configured by changing both settings defined in the specification and any implementation-specific settings that a parser vendor may create for their implementation. In StAX, these settings are referred to as properties. Properties are set using the setProperty(  ) method of XMLInputFactory and XMLOutputFactory. Values of properties, which may or may not have been set by your code, can be retrieved using the getProperty(  ) method. This method can be invoked both on factories and the readers and writers they have created. Once set on a factory, properties affect all readers and writers subsequently created by the various create methods.
An interesting difference between SAX and StAX is that, where SAX uses features that get set to true or false as well as properties that get set to a java.lang.Object , StAX only has properties that are set to an Object. As a result, several of the properties defined by the StAX specification are set to a java.lang.Boolean. So where you might use true to set a feature in SAX, you would use Boolean.TRUE in StAX.
If you’re using Java 5, you can use the autoboxing language feature to eliminate this difference, as the Java compiler will create the same bytecode from these two method invocations:
// works with any version of Javafactory.setProperty("javax.xml.stream.isValidating",     Boolean.TRUE);// only with Java 5 or higherfactory.setProperty("javax.xml.stream.isValidating",     true);
Table 8-5 contains a list of the properties of XMLInputFactory that are set using java.lang.Boolean values. Most of these have their default values specified in the StAX specification. Support for some of these properties is optional. Like DOM, there is a separate mechanism to test if a property is supported by an implementation through the isPropertySupported(  ) method. If you try to set a property that is not supported,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Common Issues with StAX
Inhaltsvorschau
Here are a couple of errors you might run into:
Provider com.bea.xml.stream.MXParserFactory not found
This indicates that you only have the API JAR in your classpath, not an actual implementation. The default implementation of javax.xml.stream.XMLInputFactory is com.bea.xml.stream.com.bea.xml.stream.MXParserFactory.
Current state of the parser is X . But expected state is Y
This indicates you are attempting to call a method on XMLStreamReader that is not valid for the current event. See Table 8-3 for a list of valid methods for each event.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
XmlPull
Inhaltsvorschau
The XmlPull API was a predecessor to the StAX specification. It defines a simple API similar to the StAX cursor API. Full information about the XmlPull API can be found at http://www.xmlpull.org. The advantage the XmlPull API has over StAX is that because it’s a much smaller API, it’s suitable for memory-constrained environments, such as mobile devices. One implementation of XmlPull is available in a 9 KB JAR file. The API JAR file from the StAX specification is more than 25 KB alone. If you’re building a server application, this difference is irrelevant, but if you’re developing a game to be run on a mobile phone where the JAR file must be under 100 KB, this is a big difference. Example 8-23 contains a version of the Tree Builder application writing with the XmlPull API.
Example . XmlPullTreeViewer
package javaxml3;



import java.awt.BorderLayout;

import java.io.File;

import java.io.FileInputStream;

import java.io.IOException;



import javax.swing.JFrame;

import javax.swing.JScrollPane;

import javax.swing.JTree;

import javax.swing.tree.DefaultMutableTreeNode;

import javax.swing.tree.DefaultTreeModel;



import org.xmlpull.v1.XmlPullParser;

import org.xmlpull.v1.XmlPullParserException;

import org.xmlpull.v1.XmlPullParserFactory;



public class XmlPullTreeViewer extends JFrame {

    /** The base tree to render */

    private JTree jTree;



    /** Tree model to use */

    DefaultTreeModel defaultTreeModel;



    public XmlPullTreeViewer(  ) {

        // Handle Swing setup

        super("XmlPull Tree Viewer");

        setSize(800, 450);

        // setSize(600, 200);

    }



    public void init(File file) throws XmlPullParserException, IOException {

        DefaultMutableTreeNode base = new DefaultMutableTreeNode(

                "XML Document: " + file.getAbsolutePath(  ));



        // Build the tree model

        defaultTreeModel = new DefaultTreeModel(base);

        jTree = new JTree(defaultTreeModel);



        // Construct the tree hierarchy

        buildTree(defaultTreeModel, base, file);



        // Display the results

        getContentPane(  ).add(new JScrollPane(jTree), BorderLayout.CENTER);

    }



    // Swing-related variables and methods, including

    // setting up a JTree and basic content pane



    public static void main(String[] args) {

        try {

            if (args.length != 1) {

                System.out.println("Usage: java javaxml3.XmlPullTreeViewer "

                        + "[XML Document]");

                return;

            }

            XmlPullTreeViewer viewer = new XmlPullTreeViewer(  );

            File f = new File(args[0]);



            viewer.init(f);

            viewer.setVisible(true);

        } catch (Exception e) {

            e.printStackTrace(  );

        }

    }



    public void buildTree(DefaultTreeModel treeModel,

            DefaultMutableTreeNode current, File file)

            throws XmlPullParserException, IOException {

        FileInputStream inputStream = new FileInputStream(file);

        XmlPullParserFactory factory = XmlPullParserFactory.newInstance(  );

        factory.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, true);

        XmlPullParser parser = factory.newPullParser(  );

        parser.setInput(inputStream, null);



        parseRestOfDocument(parser, current);

    }



    private void parseRestOfDocument(XmlPullParser parser,

            DefaultMutableTreeNode current) throws XmlPullParserException,

            IOException {



        int type = parser.getEventType(  );

        while (type != XmlPullParser.END_DOCUMENT) {

            switch (type) {

            case XmlPullParser.START_TAG:



                DefaultMutableTreeNode element = new DefaultMutableTreeNode(

                        "Element: " + parser.getName(  ));

                current.add(element);

                current = element;



                // Determine namespace

                if (parser.getNamespace(  ) != null) {

                    String prefix = parser.getPrefix(  );

                    if (!"".equals(prefix)) {

                        prefix = "[None]";

                    }

                    DefaultMutableTreeNode namespace = new DefaultMutableTreeNode(

                            "Namespace: prefix = '" + prefix + "', URI = '"

                                    + parser.getNamespace(  ) + "'");

                    current.add(namespace);

                }



                if (parser.getAttributeCount(  ) > 0) {

                    for (int i = 0; i < parser.getAttributeCount(  ); i++) {

                        DefaultMutableTreeNode attrib = new DefaultMutableTreeNode(

                                "Attribute (name = '"

                                        + parser.getAttributeName(i)

                                        + "', value = '"

                                        + parser.getAttributeValue(i) + "')");

                        String attURI = parser.getAttributeNamespace(i);

                        if (!"".equals(attURI)) {

                            String attPrefix = parser.getAttributePrefix(i);

                            if (attPrefix == null || attPrefix.equals("")) {

                                attPrefix = "[None]";

                            }

                            DefaultMutableTreeNode an = new DefaultMutableTreeNode(

                                    "Namespace: prefix = '" + attPrefix

                                            + "', URI = '" + attURI + "'");

                            attrib.add(an);

                        }

                        current.add(attrib);

                    }

                }



                break;

            case XmlPullParser.END_TAG:

                current = (DefaultMutableTreeNode) current.getParent(  );

                break;

            case XmlPullParser.TEXT:

                if (!parser.isWhitespace(  )) {

                    DefaultMutableTreeNode data = new DefaultMutableTreeNode(

                            "Character Data: '" + parser.getText(  ) + "'");

                    current.add(data);

                }

                break;

            case XmlPullParser.IGNORABLE_WHITESPACE:

                // let's ignore this

                break;

            case XmlPullParser.COMMENT:

                DefaultMutableTreeNode comment = new DefaultMutableTreeNode(

                        "Comment: '" + parser.getText(  ) + "'");

                current.add(comment);

                break;

            default:

                System.out.println(type);

            }

            type = parser.next(  );

        }

    }

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 9: JDOM
Inhaltsvorschau
JDOM provides a means of accessing an XML document within Java through a tree structure, and in that respect is somewhat similar to the DOM. However, it was built specifically for Java (remember the discussion in Chapter 5 on language bindings for the DOM?), so it is in many ways more intuitive to a Java developer than DOM. I’ll describe these aspects of JDOM throughout this chapter; I’ll also describe some specific cases in which to use SAX, DOM, or JDOM. And for the complete set of details on JDOM, you should check out the web site at http://www.jdom.org.
Additionally, and importantly, JDOM is an open source API. You have the ability to suggest and implement changes yourself. If you find that you like JDOM but are annoyed by one little thing, you can help investigate solutions to your problem.
Chapters 5 and 6 should have given you a pretty good understanding of dealing with XML tree representations. So when I say that JDOM also provides a tree-based representation of an XML document, that gives you a starting point for understanding how JDOM behaves. To help you see how the classes in JDOM match up to XML structures, take a look at Figure 9-1, which shows a UML model of JDOM’s core classes.
Figure 9-1: UML model of core JDOM classes
As you can see, the names of the classes tell the story. At the core of the JDOM structure is the Document object, which is both the representation of an XML document, and a container for all the other JDOM structures. Element represents an XML element, Attribute an attribute, Text and CDATA represent character data within Element objects, and so on down the line.
Another important item to take note of is that you don’t see any list classes like SAX’s Attributes class or DOM’s NodeList and NamedNodeMap classes. This is a nod to Java developers; it was decided that using Java collections (java.util.List, java.util.Map, etc.) would provide a familiar and simple API for XML usage. DOM must serve across languages (remember Java language bindings in Chapter 5?), and can’t take advantage of language-specific things like Java collections. For example, when invoking the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
The Basics
Inhaltsvorschau
Chapters 5 and 6 should have given you a pretty good understanding of dealing with XML tree representations. So when I say that JDOM also provides a tree-based representation of an XML document, that gives you a starting point for understanding how JDOM behaves. To help you see how the classes in JDOM match up to XML structures, take a look at Figure 9-1, which shows a UML model of JDOM’s core classes.
Figure 9-1: UML model of core JDOM classes
As you can see, the names of the classes tell the story. At the core of the JDOM structure is the Document object, which is both the representation of an XML document, and a container for all the other JDOM structures. Element represents an XML element, Attribute an attribute, Text and CDATA represent character data within Element objects, and so on down the line.
Another important item to take note of is that you don’t see any list classes like SAX’s Attributes class or DOM’s NodeList and NamedNodeMap classes. This is a nod to Java developers; it was decided that using Java collections (java.util.List, java.util.Map, etc.) would provide a familiar and simple API for XML usage. DOM must serve across languages (remember Java language bindings in Chapter 5?), and can’t take advantage of language-specific things like Java collections. For example, when invoking the getAttributes(  ) method on the Element class, you get back a List; you can of course operate upon this List just as you would any other Java List, without looking up new methods or syntax. The List objects returned by JDOM are “live” so that a call such as element.getAttributes.clear(  ) will remove all the attributes from the element object.
Another basic tenet of JDOM that is different from DOM, and not as visible, is that JDOM is an API of concrete classes. In other words, Element, Attribute, ProcessingInstruction, Comment, and the rest are all classes that can be directly instantiated using the new keyword. This generally makes JDOM document construction code much simpler than the corresponding DOM code, since you don’t need to create a
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
PropsToXML
Inhaltsvorschau
To put some real code to the task of learning JDOM, let me introduce the PropsToXML class. This class is a utility that takes a standard Java properties file and converts it to an XML equivalent. Many developers out there have requested a means of doing exactly this; it often allows legacy applications using properties files to easily convert to using XML without the overhead of manually converting the configuration files.
If you have never worked with Java properties files, they are essentially files with name-value pairs that can be read easily with some Java classes (for instance, the java.util.Properties class) . These files often look similar to Example 9-1, and in fact, I will use this example properties file throughout the rest of the chapter. Incidentally, it’s from the Enhydra application server.
Example . A typical Java properties file
#

# Properties added to System properties

#



# sax parser implementing class

org.xml.sax.parser="org.apache.xerces.parsers.SAXParser"



#

# Properties used to start the server

#



# Class used to start the server

org.enhydra.initialclass=org.enhydra.multiServer.bootstrap.Bootstrap



# initial arguments passed to the server (replace command line args)

org.enhydra.initialargs="./bootstrap.conf"



# Classpath for the parent top enhydra classloader

org.enhydra.classpath="."



# separator for the classpath above

org.enhydra.classpath.separator=":"
No big deal here, right? Well, using an instance of the Java Properties class, you can load these properties into the object (using the load(InputStream inputStream) method) and then deal with them like a Hashtable. In fact, the Properties class extends the Hashtable class in Java; nice, huh? The problem is that many people write these files like the example with names separated by a period (.) to form a sort of hierarchical structure. In the example, you would have a top level (the properties file itself), then the org node, and under it the xml and enhydra
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
XMLProperties
Inhaltsvorschau
Let’s take things to the next logical step and look at reading XML. Continuing with the example of converting a properties file to XML, you are now probably wondering how to access the information in your XML file. Luckily, there’s a solution for that, too! In this section, for the sake of explaining how JDOM reads XML, I want to introduce a new utility class, XMLProperties. This class is essentially an XML-aware version of the Java Properties class; in fact, it extends that class. This class allows access to an XML document through the typical property-access methods like getProperty(  ) and properties(  ). In other words, it allows Java-style access (using the Properties class) to XML-style storage. In my opinion, this is the best combination you can get.
To accomplish this task, you can start by creating an XMLProperties class that extends the java.util.Properties class. With this approach, making things work becomes simply a matter of overriding the load(  ), save(  ), and store(  ) methods. The first of these, load(  ), reads in an XML document and loads the properties within that document into the superclass object.
Don’t mistake this class for an all-purpose XML-to-properties converter: it will only read in XML that is in the format detailed earlier in this chapter. In other words, properties are elements with either textual or attribute values, but not both. I’ll cover both approaches, but you will have to choose one or the other. Don’t try to take all your XML documents, read them in, and expect things to work as planned!
The second method, save(  ), is actually deprecated in Java 2, as it doesn’t expose any error information; still, it needs to be overridden for Java 1.1 users. To facilitate this, the implementation in XMLProperties simply calls store(  ). And store(  ) handles the task of writing the properties information out to an XML document. Example 9-6 is a good start at this, and provides a skeleton within which to work.
Example . The skeleton of theXMLProperties class
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
More JDOM Classes
Inhaltsvorschau
The following sections discuss additional JDOM classes.
This section will briefly cover namespace support in JDOM with the Namespace class. This class acts as both an instance variable and a factory within the JDOM architecture. When you need to create a new namespace, either to create a new element or attribute or for searching for existing elements and attributes, you use the static getNamespace(  ) methods on this class:
// Create namespace with prefix

Namespace schemaNamespace = 

    Namespace.getNamespace("xsd", "http://www.w3.org/XMLSchema/2001");



// Create namespace without prefix

Namespace javaxml3Namespace =

    Namespace.getNamespace("http://www.oreilly.com/javaxml3");
As you can see, there is a version for creating namespaces with prefixes and one for creating namespaces without prefixes, in which case the namespace URI is set as the default namespace. Either version can be used, with the resulting Namespace object then supplied to the various JDOM methods:
// Create element with namespace

Element schema = new Element("schema", schemaNamespace);



// Search for children in the specified namespace

List chapterElements = contentElement.getChildren("chapter", javaxml3Namespace);



// Declare a new namespace on this element

catalogElement.addNamespaceDeclaration(

    Namespace.getNamespace("tng", "http://www.truenorthguitars.com"));
These are all fairly self-explanatory. Also, when XML serialization is performed with the various outputters (SAXOutputter, DOMOutputter, and XMLOutputter), the namespace declarations are automatically handled and added to the resulting XML.
One final note: in JDOM, namespace comparison is based solely on URI. That is, two Namespace objects are equal if their URIs are equal, regardless of prefix. This is in keeping with the letter and spirit of the XML Namespace specification, which indicates that two elements are in the same namespace if their URIs are identical, regardless of prefix. Look at this XML document fragment:
<guitar xmlns="http://www.truenorthguitars.com">

  <ni:owner xmlns:ni="http://www.newInstance.com">

    <ni:name>Brett McLaughlin</ni:name>

    <tng:model xmlns:tng="http://www.truenorthguitars.com>Model 1</tng:model>

    <backWood>Madagascar Rosewood</backWood>

  </ni:owner>

</guitar>
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
JDOM and Factories
Inhaltsvorschau
As noted earlier in this chapter, the ability to have some form of factories allows greater flexibility in how your XML is modeled in Java. Take a look at the simple subclass of JDOM’s Element class shown in Example 9-12.
Example . Subclassing the JDOM Element class
package javaxml3;



import org.jdom.Element;

import org.jdom.Namespace;



public class ORAElement extends Element {



    private static final Namespace ORA_NAMESPACE =

        Namespace.getNamespace("ora", "http://www.oreilly.com");



    public ORAElement(String name) {

        super(name, ORA_NAMESPACE);

    }



    public ORAElement(String name, Namespace ns) {

        super(name, ORA_NAMESPACE);

    }



    public ORAElement(String name, String uri) {

        super(name, ORA_NAMESPACE);

    }



    public ORAElement(String name, String prefix, String uri) {

        super(name, ORA_NAMESPACE);

    }

}

This is about as simple a subclass as you could come up with. It is somewhat similar to the NamespaceFilter class from Chapter 4 in that it disregards whatever namespace is actually supplied to the element (even if there isn’t a namespace supplied!), and sets the element’s namespace defined by the URI http://www.oreilly.com with the prefix ora. This is a simple case, but it gives you an idea of what is possible, and serves as a good example for this section.
Once you’ve got a custom subclass, the next step is actually using it. As I already mentioned, JDOM considers having to create all objects with factories a bit over the top. Simple element creation in JDOM works like this:
// Create a new Element

Element element = new Element("guitar");
Things remain equally simple with a custom subclass:
// Create a new Element, typed as an ORAElement

Element oraElement = new ORAElement("guitar");
The element is dropped into the O’Reilly namespace because of the custom subclass. Additionally, this method is more self-documenting than using a factory. It is clear at any point exactly what classes are being used to create objects. Compare that to this code fragment:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Common Issues with JDOM
Inhaltsvorschau
The following sections discuss some issues you may encounter when working with JDOM.
Although I stated it previously, it is worth repeating that JDOM is not an XML parser. It uses an external parser through a builder class. As a result, what frequently appears to be a JDOM issue is actually a problem with the underlying processor. Be sure you understand which parser you are using or specify the parser class directly with the appropriate constructor of SAXBuilder.
First and foremost, you should realize that JDOM isn’t DOM. It doesn’t wrap DOM, and doesn’t provide extensions to DOM. In other words, the two have no technical relation to each other. Realizing this basic truth will save you a lot of time and effort; there are many articles out there today that talk about getting the DOM interfaces to use JDOM, or avoiding JDOM because it hides some of DOM’s methods. These statements are more likely to confuse than clarify. You don’t need to have the DOM interfaces, and DOM calls (like appendChild(  ) or createDocument(  )) simply won’t work on JDOM. Sorry, wrong API!
Another interesting facet of JDOM, and one that has raised some controversy, is the return values from methods that retrieve element content. For example, the various getChild(  ) methods on the Element class may return a null value. I mentioned this, and demonstrated it, in the PropsToXML example code. The gotcha occurs when instead of checking if an element exists (as was the case in the example code), you assume that an element already exists. This is most common when some other application or component sends you XML, and your code expects it to conform to a certain format (be it a DTD, XML Schema, or simply an agreed-upon standard). For example, take a look at the following code:
Document doc = otherComponent.getDocument(  );

String price = doc.getRootElement(  ).getChild("item")

                                   .getChild("price")

                                   .getTextTrim(  );
The problem in this code is that if there is no
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 10: dom4j
Inhaltsvorschau
Like JDOM, which was explored in the last chapter, dom4j is designed to be a Java-specific alternative to DOM—a document object model that is targeted only at Java (thus the repeated use of the letter J) and isn’t constrained by language neutrality in the way DOM is. Since dom4j and JDOM share this common goal, portions of the APIs look similar. However, the two APIs do differ on a key design principal in that dom4j is built around a set of core interfaces, whereas JDOM is class-orientated. What this means in practice is that there are various implementations of the core dom4j interfaces that provide different functionality. Through this, dom4j’s behavior can be tuned to match the needs of your application.
Also like JDOM, dom4j is an open source project with a vibrant user community that you can join to receive assistance with the API and contribute to the future of dom4j. Full details on dom4j can be found at http://www.dom4j.org.
With that brief introduction to dom4j, let us begin by looking at the interfaces and classes that make up dom4j. We’ll start with the core interfaces and then examine some of the special features those interfaces have that set dom4j apart from other similar APIs.
As I mentioned above, dom4j is built around a set of core interfaces. These interfaces describe the structure and content of an XML document. Figure 10-1 contains a UML model of these core interfaces.
Figure 10-1: UML model of dom4j core interfaces
As you can see from the model diagram, dom4j has several levels of interfaces. Every interface ultimately extends the Node interface, which defines common functionality for all components of an XML document and is analogous to org.w3c.dom.Node. The CharacterData and Branch interfaces similarly define common functionality for nodes that contain text and nodes that contain other nodes, respectively.

Factories

Since the core of dom4j is a set of interfaces, you use a factory object to obtain implementations of these interfaces. The default factory class is
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Overview
Inhaltsvorschau
With that brief introduction to dom4j, let us begin by looking at the interfaces and classes that make up dom4j. We’ll start with the core interfaces and then examine some of the special features those interfaces have that set dom4j apart from other similar APIs.
As I mentioned above, dom4j is built around a set of core interfaces. These interfaces describe the structure and content of an XML document. Figure 10-1 contains a UML model of these core interfaces.
Figure 10-1: UML model of dom4j core interfaces
As you can see from the model diagram, dom4j has several levels of interfaces. Every interface ultimately extends the Node interface, which defines common functionality for all components of an XML document and is analogous to org.w3c.dom.Node. The CharacterData and Branch interfaces similarly define common functionality for nodes that contain text and nodes that contain other nodes, respectively.

Factories

Since the core of dom4j is a set of interfaces, you use a factory object to obtain implementations of these interfaces. The default factory class is org.jdom.DocumentFactory. Figure 10-2 contains the class diagram for DocumentFactory. The various create methods enable you to create instances of the corresponding dom4j interface. Calling createElement(  ) returns an Element instance, createAttribute(  ) returns an Attribute instance, and so on. The createXPath(  ), createXPathFilter(  ), and createPattern(  ) methods are slightly different in that they return objects that operate on Node objects; these creation methods will be explored in greater depth later in this chapter.
Figure 10-2: DocumentFactory
DocumentFactory returns instances of classes in the org.dom4j.tree package such as DefaultElement
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading and Writing with dom4j
Inhaltsvorschau
Document input and output is probably where JDOM and dom4j are closest. Both define input and output as both being able to read and write XML documents from and to input sources such as files, URLs, and String objects and as a way of interfacing with other XML APIs. Both JDOM and dom4j, for example, have classes (SAXWriter for dom4j and SAXOutputer for JDOM) for firing SAX event method calls based on the structure of a Document object.
One additional, critical thing that JDOM and dom4j have in common is that neither is an XML parser. I mentioned this in the last chapter, but it’s worth repeating: both JDOM and dom4j use a parser object provided by some other package. Both can use different parsers (SAX, DOM, StAX, etc.), but most commonly, SAX is used. In the case of SAX and DOM, by default, both JDOM and dom4j will use the SAX or DOM parser retrieved through the JAXP factories as described in Chapter 7. This means that in dom4j, like JDOM, if you run into parsing problems, it’s likely that the source of your problem is the underlying SAX parser.
As noted above, dom4j is not an XML parser and must use a separate parser to produce Document objects. In general, you will use a SAX parser through the dom4j class org.dom4j.io.SAXReader . A call to one of SAXReader’s read(  ) methods will create an instance of org.xml.sax.XMLReader and pass it an implementation of the ContentHandler interface that has calls to DocumentFactory to create the dom4j object tree. The code to parse a java.io.File looks something like:
// assume we got a path as a command-line argument

File file = new File(args[0];

SAXReader reader = new SAXReader(  );

Document doc = reader.read(file);
Through various constructor arguments, it’s possible to create a SAXReader instance that does validation, uses an alternate DocumentFactory implementation, or uses a specific SAX implementation. In addition, there are a variety of setter methods (setValidating(  ), setDocumentFactory(  ), etc.) to set these properties and others on the
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Document Traversal
Inhaltsvorschau
After parsing an XML document, you generally need to find some piece of information contained within the document. dom4j provides several different options for moving through the Document object and its children.
Just as in DOM and JDOM, dom4j’s Document and Element interfaces have a variety of methods for getting child nodes. In dom4j’s case, the basic methods to get child nodes are actually contained within the Branch interface. Figure 10-4 contains a UML diagram containing the Branch, Document, and Element interfaces. For clarity, the methods to add and remove nodes have been removed.
Figure 10-4: Node access methods on Branch, Document, and Element interfaces
After looking at DOM and JDOM, some of these method names may seem a bit unusual: attributes(  ) versus getAttributes(  ), content(  ) versus getChildNodes(  ) and getContent(  ), etc. dom4j does not consistently follow the JavaBeans method naming conventions. But past these naming differences, these methods are largely the same as what we have already seen in those APIs. Another cosmetic difference is that to access a namespace-qualified element or attribute in dom4j, you create a QName object encapsulating both the local name and the namespace. Compare this to both DOM and JDOM where getElementsByTagNameNS(  ) and getChildren(  ) both accept the local name and namespace as two separate parameters.
Using these methods, it is possible to easily write code that, for example, outputs the value of an attribute named location on all of an Element’s children:
public void outputLocationAttributes(Element parent) {

    for (Iterator it = parent.elementIterator(  ); it.hasNext(  ); ) {

        Element child = (Element) it.next(  );

        String value = child.attributeValue("location");

        if (value == null) {

            System.out.println("No location attribute");

        } else {

            System.out.println("Location attribute value is "  + value);

        }

    }

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Transformations
Inhaltsvorschau
The dom4j has ample support for XML transformations. dom4j objects can be used either with XSL transformations using JAXP or dom4j’s rule-based transformation classes. In both cases, you encapsulate the logic used to transform a document and then apply that logic to multiple documents.
Document objects created with dom4j can be used as the source or result of transformations done by the TrAX that’s part of the JAXP specifications discussed in Chapter 7. This is done with the classes org.dom4j.io.DocumentSource and org.dom4j.io.DocumentResult. These implement the javax.xml.transform.Source and javax.xml.transform.Result interfaces, respectively. DocumentSource and DocumentResult can be used together—where the input and output of a transformation are both dom4j Document objects—or independently—for example, a dom4j Document as the input and a String as the output. Example 10-3 contains sample code transforming the contents of an XML file to a dom4j Document object.
Example . Transformation from a file to an org.dom4j.Document object using TrAX
TransformerFactory factory = TransformerFactory.newInstance(  );

Transformer transformer = factory.

       newTransformer(new StreamSource("stylesheet.xsl"));

StreamSource in = new StreamSource("input.xml");

JDOMResult out = new DocumentResult(  );

transformer.transform(in, out);

Document resultDocument = out.getDocument(  );
dom4j includes an API for defining a transformation entirely with Java. These transformations are written with a series of org.dom4j.rule.Rule objects contained within in an org.dom4j.rule.Stylesheet object. A Rule object is composed of an implementation of the org.dom4j.rule.Pattern interface, which governs what nodes a Rule applies to, and an implementation of the org.dom4j.rule.Action interface , which performs some action upon the matched nodes. The two implementations of the Pattern interface included with the dom4j distribution are org.dom4j.rule.pattern.NodeTypePattern
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Special-Purpose Factories
Inhaltsvorschau
In addition to the default DocumentFactory used throughout this chapter, dom4j includes some subclasses of DocumentFactory that both demonstrate the rationale behind dom4j’s interface-based design and provide useful functionality. This is not an exhaustive look at all the DocumentFactory classes in the dom4j distribution. For a full list, please take a look at the dom4j Javadocs at http://www.dom4j.org/apidocs. In general, these factories can be used when creating new dom4j objects directly or with a builder object. To use a DocumentFactory subclass with a builder, either pass the DocumentFactory to the builder’s constructor or its setDocumentFactory(  ) method:
// create a SAXBuilder with DOMDocumentFactory as it's factory

SAXBuilder builder = new SAXBuilder(DOMDocumentFactory.getInstance(  ));



// parse something



// now switch the factory to BeanDocumentFactory

builder.setDocumentFactory(BeanDocumentFactory.getInstance(  ));
The factory org.dom4j.dom.DOMDocumentFactory and the classes it produces are perhaps the best case for the interface-based design used by both DOM and dom4j. The instances produced by DOMDocumentFactory implement the corresponding interfaces from both dom4j and W3C DOM—the result of a call to createDocument(  ) implements both org.dom4j.Document and org.w3c.dom.Document; createElement(  ) returns an object that implements both org.dom4j.Element and org.w3c.dom.Element ; and so on.
This is useful when working with classes that use the DOM interfaces such as if you had created a dom4j Element and wanted to pass it to the following interface:
public interface ElementProcessor {

    void doSomething(org.w3c.dom.Element element);

}
You could use the DOMWriter class to create copies of your objects that implement the DOM interfaces. But using DOMDocumentFactory allows you to simply pass your Element object to the doSomething(  ) method, by casting it to the org.w3c.dom.Element interface:
public org.dom4j.Element create(String name, ElementProcessor processor) {

    DocumentFactory factory = DOMDocumentFactory.getInstance(  );

    Element element = factory.createElement(name);

    processor.doSomething((org.w3c.dom.Element) element);

    return element;

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 11: Data Binding with JAXB
Inhaltsvorschau
As I mentioned at the end of the last chapter, data binding is an XML processing technique that eliminates references to XML nodes from your code. Instead of working with elements and attributes, your code uses classes named Customer and PurchaseOrder. This first means that you have to define the structure of your XML documents using a schema, typically either an XML Schema or a DTD. To bind this schema to specific Java classes, which could include generating those classes from the schema, you’ll use a data binding framework. These are generally composed of code-generation tools to build Java classes from a schema and a runtime library that converts an XML document into a tree of Java objects (and vice versa). There are many data binding frameworks available for Java. In this chapter, we’ll look specifically at one: the Java Architecture for XML Binding ( JAXB).
Before getting into the specifics of JAXB, it will be helpful to take a look at the concepts that underlie data binding in general. Fundamentally, data binding is similar to the document object model APIs we’ve discussed—DOM, JDOM, and dom4j—in that it defines an association, referred to as a binding, between an XML document and a tree of Java objects. A tree of Java objects can be created from an XML document and vice versa. The difference is that when data binding, the Java objects mapped to the document are instances not of generic interfaces representing elements and attributes (and comments, processing instructions, etc.), but of specific classes that have a meaning beyond the XML document. In part to indicate this difference, with data binding you don’t “parse” or “serialize” documents. Instead, you marshall XML into Java objects and unmarshall Java objects into an XML document. The components that sit between objects and XML documents are called marshallers and unmarshallers. This relationship is shown in Figure 11-1.
Figure 11-1: Marshallers and unmarshallers
Let’s take a look at what we can do with a fictional data binding framework and the XML document in Example 11-1.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Data Binding Basics
Inhaltsvorschau
Before getting into the specifics of JAXB, it will be helpful to take a look at the concepts that underlie data binding in general. Fundamentally, data binding is similar to the document object model APIs we’ve discussed—DOM, JDOM, and dom4j—in that it defines an association, referred to as a binding, between an XML document and a tree of Java objects. A tree of Java objects can be created from an XML document and vice versa. The difference is that when data binding, the Java objects mapped to the document are instances not of generic interfaces representing elements and attributes (and comments, processing instructions, etc.), but of specific classes that have a meaning beyond the XML document. In part to indicate this difference, with data binding you don’t “parse” or “serialize” documents. Instead, you marshall XML into Java objects and unmarshall Java objects into an XML document. The components that sit between objects and XML documents are called marshallers and unmarshallers. This relationship is shown in Figure 11-1.
Figure 11-1: Marshallers and unmarshallers
Let’s take a look at what we can do with a fictional data binding framework and the XML document in Example 11-1.
Example . A person XML document
<?xml version="1.0"?>

<person xmlns="http://www.example.com/person">

    <firstName>Lola</firstName>

    <lastName>Arbuckle</firstName>

</person>
Using DOM, outputting the first name looks something like:
DocumentBuilder documentBuilder = DocumentBuilderFactory.newDocumentBuilder(  );

Document doc = documentBuilder.parse(new File("lola.xml"));

Element element = doc.getDocumentElement(  );

NodeList firstNames = element.getElementsByTagName("firstName");

Element firstName = (Element) firstName.item(0);

System.out.println(firstName.getTextContent(  ));
With a data binding framework, we can write much simpler code, as in Example 11-2.
Example . Unmarshalling to a Person object
Unmarshaller unmarshaller = DataBindingFactory.newUnmarshaller(  );

Person person = (Person) unmarshaller.unmarshal(new File("lola.xml"));

System.out.println(person.getFirstName(  ));
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Introducing JAXB
Inhaltsvorschau
The Java Architecture for XML Binding has been developed through the Java Community Process (JCP). There are two Java Specification Requests associated with JAXB: JSR 31 and JSR 222 define JAXB 1.0 and JAXB 2.0, respectively. JAXB has a few different web sites associated with it, listed in Table 11-1.
Main site http://java.sun.com/webservices/jaxb
Reference implementation site https://jaxb.dev.java.net
JSR 31 http://jcp.org/en/jsr/detail?id=31
JSR 222 http://jcp.org/en/jsr/detail?id=222
JAXB 1.0 defines a standardized API for marshalling and unmarshalling as well as a validation API. The specification also defines how a schema compiler binds a schema to its Java representation. However, it does not specify how a schema compiler is invoked. As a result, implementations are free to package their schema compiler in any way, but generally you will see a shell script, an Ant task, or both. Although JAXB 1.0 applications are portable in the sense that the behavior of implementations of the Marshaller, Unmarshaller, and Validator interfaces is defined in the specification, Java classes and interfaces generated by a JAXB 1.0 schema compiler are not portable between JAXB implementations.
JAXB 1.0 requires implementations to support a subset of W3C XML Schemas only. Implementations are free to support additional features of W3C XML Schemas and additional schema languages, including DTDs. The specific features for which support is not required are listed in the JAXB 1.0 specification, available from the JSR 31 web site, listed in Table 11-1.
For each namespace defined in a schema, a JAXB 1.0 schema compiler will produce a package containing a set of Java interfaces, a class named ObjectFactory, and, if necessary, classes for any enumerations defined in the schema. The compiler will also produce implementation classes for these interfaces, usually in a separate implementation package. I will detail the interfaces and classes created in the section “Compiling a Schema” later in this chapter. In JAXB 1.0, it is not possible to bind an arbitrary Java class, even one that adheres to JavaBeans naming conventions, to an XML representation. Classes to be marshalled must be generated by the JAXB schema compiler.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using JAXB
Inhaltsvorschau
The APIs for JAXB 1.0 and JAXB 2.0 are relatively similar even though the implementations are quite different. At the core of both APIs are interfaces called Marshaller and Unmarshaller, both in the javax.xml.bind package. A factory class called JAXBContext (also in javax.xml.bind) exists to create instances of these interfaces. Figure 11-11 contains a UML model for the core JAXB 1.0 API.
Figure 11-11: JAXB 1.0 core API
The significant changes between the JAXB 1.0 versions of these interfaces and the JAXB 2.0 versions relate to the use of JAXP validation. Specifically, the setValidating(  ) and isValidating(  ) methods of the Unmarshaller interface are deprecated. Instead, the Marshaller and Unmarshaller interfaces now have methods called setSchema(  ) and getSchema(  ), which deal with instances of javax.xml.validation.Schema , discussed in Chapter 7. In addition, as mentioned above, both Marshaller and Unmarshaller now have methods that accept the reader and writer interfaces from the StAX API discussed in Chapter 8.
To obtain an instance of the Marshaller, Unmarshaller, or Validator interfaces, you first need to obtain an instance of JAXBContext. To do so, call one of JAXBContext’s static newInstance(  ) methods. In JAXB 1.0, both newInstance(  ) methods accept a colon-separated list of package names. The second newInstance(  ) method also accepts a ClassLoader object that will be used to load the classes in those packages. The newInstance(  ) method searches for classes within these packages to create the context path—the list of classes that instances created by the JAXBContext object is able to marshall, unmarshall, and validate. In JAXB 2.0, there are additional newInstance(  ) methods that allow you to pass in a list of Class objects, using the Java 5 varargs language features. When passing one or more classes to
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Other Binding Frameworks
Inhaltsvorschau
Despite the standardization effort around JAXB, there are numerous other Java-XML data binding frameworks available, including many under some sort of open source license. Before leaving data binding behind and moving on to the next chapter, I want to touch on a two of these non-JAXB frameworks, albeit briefly.
XMLBeans was originally written by BEA, but is now a project of the Apache Software Foundation. The web site for XMLBeans is http://xmlbeans.apache.org. XMLBeans is under active development, with the most recent version (2.2.0) released in June of 2006. XMLBeans is fairly unique among Java-XML data binding frameworks in that it stores the full XML infoset. This allows you to perform round-tripping of XML: unmarshall a document and then marshall the resulting object and know that the input and output will be identical, including things like processing instructions and comments. However, XMLBeans (unlike JAXB 2.0) requires that all bound classes be created with their schema compiler.
Castor is one of the oldest Java-XML data binding frameworks that are still in active development. Its web site is http://www.castor.org. The latest version (1.0.1) was released in July 2006. Castor actually does much more than Java-XML data binding, including a full implementation of Java Data Objects (JDO). The Castor developers are currently hard at work implementing the new Java Persistence API (JPA). Although it is not done with annotations and is more limited than JAXB 2.0, Castor does support binding classes that were not created by its schema compiler.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 12: Content Syndication with RSS
Inhaltsvorschau
The next few chapters of this book will discuss some specific XML applications rather than the generalized toolkits for processing XML documents examined in previous chapters. The first such application is content syndication. Traditional content syndication, by companies, such as the Associated Press and Reuters for news, King World and DiC Entertainment for entertainment, and King Features Syndicate and United Feature Syndicate for editorial columns and comic strips, is a business-to-business enterprise. Many distributors—newspapers, radio stations, television stations, etc.—do not have the resources to pay the salary of an Oprah Winfrey or a Scott Adams, so they license this content from a content syndicator for a fraction of the overall cost.
With the advent of the Web, content syndication changed in four dramatic ways:
The Web empowered thousands, if not millions, of content providers—basically anyone with a web site.
Electronic distribution of content in the form of content feeds all but eliminated the barrier to entry for new content syndicators and enabled content providers to become their own syndicators.
Thousands of new distribution outlets were opened, many in need of some level of syndicated content.
Unlike newspapers or television and radio stations, these new distribution outlets are not limited by geography and thus are able to compete with one another. Because the barriers to becoming a distribution outlet are so low, some syndicators began direct-to-consumer distribution, competing with their commercial customers.
So what does this have to do with Java and XML? Well, although there are a variety of means for syndicating content on the Web, the vast majority use XML. And as a Java developer, you may need to create a syndication feed and/or ingest one or more feeds. This ingesting could occur with a single feed to display syndicated content on a web site; or you may need to write an application to ingest several feeds to produce an aggregated view of those feeds’ content. In recent years, web content syndication has consolidated around a family of XML formats referred to under the umbrella name of RSS.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
What Is RSS?
Inhaltsvorschau
RSS is an application of XML that defines a sequenced list of content. RSS calls this list a channel. Within the channel are one or more items. These items are usually located at a URL. The feed also contains metadata about the channel and each item; the feed can specify an image to be used as the logo of the channel, a description of each item, and so on. RSS was originally created by Netscape for use on its My Netscape portal. Users needed to be able to add channels of content to their portals, and Netscape wanted a consistent way to represent those channels. Thus, the first version of RSS was born as Version 0.9 in March of 1999. In this initial specification, the letters RSS stood for RDF Site Summary. Since then, RSS has been used as an acronym for two additional terms:
  • Rich Site Summary
  • Really Simple Syndication
In addition, some people involved with the development of RSS now claim that it is not an acronym at all.
Nine different specifications have been released under the name RSS. These can be separated into those that are based on the Resource Description Framework (RDF) and those that aren’t, as seen in Table 12-1.
Table : RSS variants
Based on RDFNot based on RDF
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Creating an RSS Feed
Inhaltsvorschau
Based on the examples and information above, you should be able to create RSS documents using the tools explored earlier in this book. DOM, JDOM, dom4j, and StAX can all be used to create XML documents and thus can be used to create RSS documents. You don’t even need to use an XML library. For example, the Java blogging application blojsom (available from http://www.blojsom.com) generates its RSS feeds using the Velocity scripting language. If you find yourself creating multiple different RSS feeds, you may find it helpful to use an RSS library that contains an RSS data model so that you deal with classes named Channel and Item instead of Document and Element. Because feeds are represented in this data model, an RSS library can be used to output or input feeds targeting a variety of RSS formats with a consistent data model. One open source RSS library is ROME: RSS and Atom Utilities.
ROME is an RSS library supporting the full range of RSS and Atom formats:
  • RSS 0.9
  • RSS 0.91 Netscape
  • RSS 0.91 Userland
  • RSS 0.92
  • RSS 0.93
  • RSS 0.94
  • RSS 1.0
  • RSS 2.0
  • Atom 0.3
  • Atom 1.0
ROME supports parsing and generating feeds as well as converting between one format and another. ROME is downloadable from its project web site, https://rome.dev.java.net/. As of the time of writing, the current version of ROME is 0.8 beta. ROME uses the JDOM library we examined in Chapter 9, and you will need to include the JDOM JAR file, along with the ROME JAR file, in your classpath.

ROME data models

ROME includes three different data models: an RSS data model in the package com.sun.syndication.feed.rss, an Atom data model in the package com.sun.syndication.feed.atom, and a format-independent data model in the package com.sun.syndication.feed.synd. Figure 12-1 contains a UML diagram of the interfaces in the format-independent model.
Figure 12-1: ROME format-independent data model
For each of these interfaces, ROME includes an implementation class: SyndFeedImpl for the SyndFeed interface,
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Reading an RSS Feed
Inhaltsvorschau
Because RSS is an XML application, any XML library from SAX through dom4j is capable of parsing RSS documents. However, due to the sheer number of RSS versions, an RSS library can be especially useful if you need to parse RSS feeds. (For this same reason, data binding is a poor technique to use with RSS.)
Just as there are SyndFeedOutput and WireFeedOutput classes to output syndicated feeds, ROME includes classes called SyndFeedInput and WireFeedInput to input feeds. These input classes can read feeds from a variety of sources:
  • A java.io.File object
  • A java.io.Reader object
  • An org.xml.sax.InputSource object
  • An org.w3c.dom.Document object
  • An org.jdom.Document object
When parsing a feed, WireFeedInput determines what type of feed it is by looking at elements, attributes, and namespaces defined in the feed. SyndFeedInput delegates parsing to WireFeedInput and then converts the resulting WireFeed object to a SyndFeed object.
To demonstrate these feed reading capabilities, let’s build a simple command-line RSS and Atom aggregator. Our aggregator will be passed a list of feed URLs on the command line and output the entries in those feeds in a single list. The user can then select a single entry from the list to see the title, description, and link for that entry. Because we want this aggregator to support both RSS and Atom, we’ll use SyndFeedInput and the classes in the com.sun.syndication.feeds.synd package. Example 12-5 contains the skeleton code for our SimpleAggregator class.
Example . Framework aggregator code
package javaxml3;



import java.util.List;



import com.sun.syndication.feed.synd.SyndEntry;

import com.sun.syndication.io.SyndFeedInput;



public class SimpleAggregator {



    private SyndFeedInput feedInput;



    public SimpleAggregator(  ) {

        feedInput = new SyndFeedInput(  );

    }



    private void run(String[] args) {

        System.out.println("Welcome to the Simple Aggregator.");



        List allEntries = loadFeeds(args);



        System.out.println("Done loading feeds.");

        System.out.println(  );



        System.out.println("Please choose an entry below:");



        outputMenu(allEntries);



        int choice = 
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Modules with ROME
Inhaltsvorschau
Just as RSS is extensible through modules, ROME supports an extension mechanism called modules. ROME modules are used to represent any element outside of the base set of elements defined for the SyndFeed, SyndEntry, Feed, Channel, Entry, and Item objects. We actually already used one of the modules that are included with the ROME distribution: Dublin Core. Because RSS 1.0 does not support any sort of date element either within a channel or item, the RSS 1.0 feed produced by our example feed generator contained the Dublin Core date element to define the date for the feed:
    <dc:date>2006-08-03T02:51:45Z</dc:date>
When our SyndFeed object was converted to an RSS 1.0 Channel object, the Dublin Core module was used to store this date value. And when WireFeedOutput was given this Channel object, it saw that the Dublin Core module was present and had the Dublin Core module add this additional element to the output XML document.
In addition to the Dublin Core and RSS 1.0 Syndication modules included with the ROME distribution, there are many useful modules available listed on the ROME Wiki, located at http://wiki.java.net/bin/view/Javawsxml/Rome. Currently these include ROME modules support these RSS modules:
  • RSS 1.0 Content
  • iTunes
  • Slashcode
  • Google Base
  • Creative Commons
  • MediaRSS
  • GeoRSS
  • Apple iPhoto Photocast
  • A9 Open Search
ROME modules are packaged in regular JAR files. By simply adding the JAR file to your classpath, the module discovery mechanism used by the feed parser will be able to find it. We’ll see how this happens later in the section “Creating a ROME Module.” When creating a feed or entry object, it’s necessary to add an instance of a module object. A module can be applied to a feed-level object (SyndFeed, Feed, and Channel), an entry-level object (
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 13: XML As Presentation
Inhaltsvorschau
So far, we’ve primarily looked at XML as a low-level enabling technology—end users won’t know if you’re using regular properties files or the XML properties files shown in Chapter 9. And when the XML documents we discussed are meant to be shared between applications, those applications are generally data-centric server applications. The last chapter explored examples in which XML was used in a client-server context: RSS and Atom feeds are delivered directly to clients, in those cases RSS and Atom aggregators. (Of course, there are server-based RSS aggregators like News-Gator (http://www.newsgator.com) and My Yahoo! (http://my.yahoo.com).) But that is a limited case tied to the specific vocabularies of RSS and Atom. In this chapter, we’ll look at more generic cases of using XML as part of the presentation technology in a web application.
I need to make a few assumptions here. First, I’m going to assume you have read the prior chapters. As with the ROME library used in the last chapter, we’re going to be using some of the libraries used prior in this chapter, most significantly, DOM. Second, that you have some familiarity with various web technologies such as HTML, JavaScript, Java servlets, and JavaServer Pages (JSP). Along the same lines, I’m assuming you know how to set up a Java servlet container (such as Apache Tomcat) or can get someone’s help to do so. If you want to learn how to write a Java web application, this chapter won’t help. So, if you’re not at least a little familiar with the above technologies, I’d highly recommend putting this book down, picking up a different one, and coming back here when you’re ready. Great books on Java web technology include Java Servlet Programming by Jason Hunter (O’Reilly) and JavaServer Pages by Hans Bergsten (O’Reilly).
When I refer to XML as a presentation technology, I am referring primarily to the view in an application using a Model-View-Controller (MVC) architecture. Model-View-Controller is a software architecture originally documented as a pattern for traditional client applications (like those created with Swing) but has been widely adopted as an architecture for web applications. In short, an MVC application separates an application into three main areas:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
XML and the Model-View-Controller Pattern
Inhaltsvorschau
When I refer to XML as a presentation technology, I am referring primarily to the view in an application using a Model-View-Controller (MVC) architecture. Model-View-Controller is a software architecture originally documented as a pattern for traditional client applications (like those created with Swing) but has been widely adopted as an architecture for web applications. In short, an MVC application separates an application into three main areas:
Model
The raw data and business rules of an application
View
The user-visible rendition of the model
Controller
Functionality that receives requests from users, interprets those requests, interacts with the model, and provides the view with any necessary model objects
In more concrete terms, an MVC web application written with Java servlets and JSPs could processes a request in four steps:
  1. A servlet (the controller) receives the request and parses it.
  2. The servlet calls some methods on a data access object (the model).
  3. The servlet passes model data objects to a JSP page for rendering.
  4. The JSP page outputs an HTML page including data from the model objects.
There are a number of Java web MVC frameworks available that provide much of the base code necessary in any web application. Popular examples include Apache Struts (http://struts.apache.org), Spring MVC (http://www.springframework.org), JavaServer Faces (http://java.sun.com/javaee/javaserverfaces), and Tapestry (http://tapestry.apache.org).
XML can be used in several places in an MVC web application. Most MVC frameworks make heavy use of XML for internal configuration. More interesting for our purposes is where XML is used to contain the data passed between the view and the controller. Instead of the controller passing one or more model objects to the view, the controller constructs an XML representation of the model objects and passes the XML document to the view. In some cases, your application is responsible for delivering XML; the “view” is simply serializing the XML document as the HTTP response. In others, the view is some form of server-side transformation from the controller-supplied XML to a different XML syntax or to HTML. In addition, the use of XML to transfer model data between the controller and the view allows us to move any necessary transformations from the server to users’ client applications (usually web browsers).
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Transforming to HTML with JSP
Inhaltsvorschau
If, instead of XML, the expected responses from our web application are HTML documents, we have a number of options for transforming an XML representation of model objects to HTML. Although basic JSP does not have any special XML capabilities, the Java Standard Tag Library (JSTL) contains several custom tags for working with XML documents. With these tags, we can transform XML to HTML with either XPath or XSLT. In addition, JSTL includes a tag to parse an XML document into a DOM object, to which the other tags can then be applied.
The JSTL is a collection of JSP custom tags meant to address a variety of basic needs when writing JSP pages. It includes an XML tag library that contains a series of tags used for processing XML documents within JSP pages:
out
Evaluates an XPath expression and outputs the result
parse
Parses a string into a DOM Document object
set
Evaluates an XPath expression and saves the result as a local JSP variable
if
Executes the tag’s body if an XPath expression returns the Boolean value true
choose/when/otherwise
Provides functionality similar to the Java switch/case/default language construct using XPath expressions
forEach
Iterates over a list of DOM nodes
transform
Performs an XSLT transformation
Because all of these tags (with the exception of parse) accept a DOM node as a starting point, we can modify the renderBooks(  ) method from Example 13-1 to save the DOM Document object as a request attribute and dispatch to a file named booklist2.jsp:
private void renderBooks(Document doc, HttpServletRequest request,

        HttpServletResponse response) throws IOException, ServletException {



    request.setAttribute("xml", doc);

    RequestDispatcher dispatcher = getServletContext(  )

            .getRequestDispatcher("/booklist2.jsp");

    dispatcher.include(request, response);

}
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Using XSLT
Inhaltsvorschau
As seen in prior chapters, XSLT is a powerful tool for transforming an XML document from one XML syntax to another as well as transforming from XML to HTML. One interesting option enabled by the use of XSLT is offloading the transformation processing onto client applications, as most modern web browsers support XSLT. However, there are definite downsides to this offloading. Unless you’re building applications for a limited and controlled audience, you will have little to no control as to how fast your users’ client applications run. As a result, different users could have widely different experiences. Of course, this is also somewhat true with regular HTML. That being said, client-side transformations are a useful addition to any XML developer’s toolbox.
There are two main methods for performing a client-side transformation: processing instructions and client-side scripting.

Using processing instructions

The simplest way requesting an XSL transformation is to include an xml-stylesheet processing instruction between the XML declaration and the document’s root element. For example, if a web browser receives a document such as in Example 13-6, it will make a request for http://www.example.com/books.xsl and use that stylesheet to transform this document for display.
Example . XML document with reference to stylesheet
<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="http://www.example.com/books.xsl"?>

<books>

  <book>

    <title>Ajax Hacks</title>

    <author>Bruce W. Perry</author>

    <pubDate>March 2006</pubDate>

  </book>

  <book>

    <title>LDAP System Administration</title>

    <author>Gerald Carter</author>

    <pubDate>March 2003</pubDate>

  </book>

  <book>

    <title>Java Servlet Programming</title>

    <author>Jason Hunter</author>

    <pubDate>April 2001</pubDate>

  </book>

</books>
The xml-stylesheet processing instruction can be limited to apply only when the XML document is to be displayed on a particular type of device. For example, many RSS feeds now use this processing instruction to display an HTML page when viewed through a browser. For example, the RSS feed from
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Ajax
Inhaltsvorschau
Ajax is a name for a group of related web development patterns used to create interactive web applications. In a traditional, non-Ajax web application, users browse through the application page by page: each user action results in a request from the web browser for a new page. However, in an Ajax application, user actions result in the updating of a portion of the page based on a small amount of data transferred between the browser and the web server asynchronously. As a result, the application appears much more responsive and consumes far less bandwidth. Although the term Ajax was coined in 2005, similar techniques have been in use since frames were introduced to HTML in the mid-1990s. In those early applications, hidden frames and the IFRAME tag, in Internet Explorer, were used to load HTML documents using JavaScript. These HTML documents contained JavaScript that changed the appearance of the page. In addition, similar functionality could be achieved using Java Applets and browser plug-ins.
Although asynchronous capabilities existed with frames, these capabilities were fairly error-prone. For example, if the user clicked the browser’s back button, the application could be put into an invalid state. Alternatives to using frames such as Java Applets or browser plug-ins had their own browser compatibility and security issues. As a result, Ajax applications were not common and, where they did exist, were designed for specific user bases or platforms. This all changed when Microsoft introduced the XMLHttpRequest object in Internet Explorer 5 in 2000.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Flash
Inhaltsvorschau
Since its introduction in 1996 under the name FutureSplash, Adobe Flash (formerly Macromedia Flash) has become the de facto standard for web animation and interactivity. With each subsequent release of the Flash platform, it has become more full-featured to the point where it can now be considered a platform for any kind of interactive application. The Flash platform is currently composed of three different components:
  • A file format named SWF (files in this format are commonly called Flash movies)
  • Applications for displaying SWF files, packaged as a standalone executable (Flash Player) and a browser plug-in
  • Authoring tools
The specification for the SWF file format is available on Adobe’s web site. However, the license terms prohibit the use of the specification to create alternatives to the Flash Player. It can be used to write applications that create SWF files. As a result, authoring tools for SWF files range from Adobe’s tools (Flash Professional, Flash Standard, and Flex) to open source command-line compilers like MTASC.
Almost every version of Flash has included some level of scripting support. Flash 5, released in 2000, introduced a new scripting language named ActionScript.
ActionScript is based on ECMAScript and thus resembles JavaScript. Flash 7 (a.k.a. Flash MX 2004) introduced ActionScript 2.0 in 2003 supporting language features more commonly associated with Java than JavaScript such as class inheritance, interfaces, and strong typing. ActionScript 3.0 was introduced with Flash Player 9 in 2006 and continues this trend with better exception support, true runtime typing, a new API for XML, and regular expression support.
In addition to the language itself, ActionScript has an active developer community. You can produce Javadoc-style documentation with as2api and perform unit tests with AS2Unit. For links to these and other open source tools, check out the web site http://www.osflash.org.
In March 2004, Macromedia introduced the Flex server application as an alternative development platform for Flash. Unlike Flash Professional and Flash Standard, which use a binary file format called FLA and then compile the FLA file into a SWF file, Flex uses an XML file format called MXML, which is compiled into a SWF file. MXML files represent both the display and functionality of an application. Example 13-14 contains an example MXML file. You can see the result of compiling this MXML in Figure 13-11. Even if you’ve never used ActionScript, Flash, or Flex, it is pretty obvious what this application does.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Chapter 14: Looking Forward
Inhaltsvorschau
It’s almost time to wrap up the journey through Java and XML. I hope you’ve had fun. Before I leave you to mull over all the possibilities, I want to finish up with a few pointers to interesting XML-related technologies that we weren’t able to discuss in this book.
The various XML processing libraries discussed in this book have been implemented entirely in software, sometimes as part of the Java Runtime Environment (JRE) and sometimes as separate libraries. In addition to these options, there are also solutions for processing XML using specialized hardware. In some cases, this hardware is packaged as an add-in card that is installed in a server. In other cases, the hardware is a separate box, which is accessed over a network. Regardless of how it’s packaged, your application uses a specialized library to offload processing onto the specialized XML hardware. With most hardware, these libraries include implementation of the JAXP interfaces discussed in Chapter 7, in which case your code may not need to be changed to take advantage of the hardware. XML appliances are made by companies such as DataPower and Sarvega, subsidiaries of IBM and Intel, respectively, as well as smaller companies like Layer 7 Technologies and Reactivity.
As you work with XML documents, you may find yourself needing to manage collections of documents. XML databases (sometimes called XML-native databases) are built for just this task. You can query a collection using XPath or XQuery (see below). Another specification, XUpdate, defines how collections get updated, although most XML databases support a variety of mechanisms for adding and updating documents. There are a variety of available XML databases, both open source and commercial.
In addition, many relational database servers support an XML datatype. With columns of this type, XML queries can be combined with traditional, relational queries. The disadvantage of XML support in relational databases is that, in general, the support has been bolted on and not fully integrated into the software.
XQuery is a query language for extracting data from XML documents. It is similar in purpose to SQL (Structured Query Language). Although XQuery is related to XSLT—both use XPath extensively and can be used to accomplish some of the same things—XQuery queries are not in XML as XSLT stylesheets are. At the time of writing, the specification for XQuery is not yet finalized, although it is getting very close. This specification is available at
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
XML Appliances
Inhaltsvorschau
The various XML processing libraries discussed in this book have been implemented entirely in software, sometimes as part of the Java Runtime Environment (JRE) and sometimes as separate libraries. In addition to these options, there are also solutions for processing XML using specialized hardware. In some cases, this hardware is packaged as an add-in card that is installed in a server. In other cases, the hardware is a separate box, which is accessed over a network. Regardless of how it’s packaged, your application uses a specialized library to offload processing onto the specialized XML hardware. With most hardware, these libraries include implementation of the JAXP interfaces discussed in Chapter 7, in which case your code may not need to be changed to take advantage of the hardware. XML appliances are made by companies such as DataPower and Sarvega, subsidiaries of IBM and Intel, respectively, as well as smaller companies like Layer 7 Technologies and Reactivity.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
XML Databases
Inhaltsvorschau
As you work with XML documents, you may find yourself needing to manage collections of documents. XML databases (sometimes called XML-native databases) are built for just this task. You can query a collection using XPath or XQuery (see below). Another specification, XUpdate, defines how collections get updated, although most XML databases support a variety of mechanisms for adding and updating documents. There are a variety of available XML databases, both open source and commercial.
In addition, many relational database servers support an XML datatype. With columns of this type, XML queries can be combined with traditional, relational queries. The disadvantage of XML support in relational databases is that, in general, the support has been bolted on and not fully integrated into the software.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
XQuery
Inhaltsvorschau
XQuery is a query language for extracting data from XML documents. It is similar in purpose to SQL (Structured Query Language). Although XQuery is related to XSLT—both use XPath extensively and can be used to accomplish some of the same things—XQuery queries are not in XML as XSLT stylesheets are. At the time of writing, the specification for XQuery is not yet finalized, although it is getting very close. This specification is available at http://www.w3.org/TR/xquery. You can also learn more about XQuery from the web site http://www.xquery.com. A standard Java API for XQuery is being developed under JSR 225 using the name XQuery for Java (XQJ).
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Fast Infoset
Inhaltsvorschau
Fast Infoset is an alternative encoding of the XML object model. Normally an XML document is written in plain text, just as we’ve seen throughout this book. With Fast Infoset, however, a binary, nonhuman readable, file format is used. In exchange for the loss of human readability, Fast Infoset-encoded documents are significantly smaller and require less processing effort to parse. The Fast Infoset specification is defined by two different standards bodies: the International Telecommunications Union (ITU) and the International Organization for Standards (ISO). There is an open source Java implementation of Fast Infoset at https://fi.dev.java.net. This implementation is included in the Java Web Services Developer Pack discussed in Chapter 11.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
And Many More...
Inhaltsvorschau
I could go on, but who knows what will happen between now and the time you’re reading this? Without a doubt, something new and exciting will be developed. The best I can do is point you to a few web sites:
XML.com
O’Reilly’s XML news and information site. In addition to articles written for the site, XML.com contains links to XML-related entries throughout the blogosphere.
JRoller.com
JRoller.com is one of the larger Java blog sites. JRoller.com hosts thousands of Java-oriented blogs.
TheServerSide.com
TheServerSide.com is a premiere enterprise Java news site.
Oracle’s XML Technology Center
Located at http://www.oracle.com/technology/tech/xml/index.html, Oracle’s XML site has a variety of technical articles and tutorials.
IBM developerWorks
developerWorks is IBM’s developer community site. The XML section, located at http://www-128.ibm.com/developerworks/xml, contains tutorials, documentation, podcasts, and various pieces of software for download. IBM frequently uses developerWorks to preview technologies coming out of their vast research facilities.
If you keep up to date using these sites and the thousands of other related sites on the Web, you’ll be aware of the next big thing before it’s here. And if you have the inclination (and the time), all of these sites have healthy user communities and more are always welcome.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Appendix 1: SAX Features and Properties
Inhaltsvorschau
This appendix describes the SAX 2.0 standard features and properties. Although a vendor’s parsing software can add additional features and properties for vendor-specific functionality, this list represents the core set of functionality that any SAX 2.0-compliant parser implementation should support.
To be precise, these are drawn from the SAX 2.0.2 release 3. However, any SAX 2.x parser should provide these features and properties—or, at worst, recognize them and throw a SAXNotSupportedException.

Section : Core Features

Section : Core Properties

Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Core Features
Inhaltsvorschau
The core set of features supported by SAX 2.0 XMLReader implementations is listed here. These features can be set through setFeature(  ), and the value of a feature can be obtained through getFeature(  ). Any feature can be read-only or read/write; features also may be modifiable only when parsing is occurring, or only when parsing is not occurring. For more information on SAX features and properties, refer to Chapter 4.
This feature tells a parser whether or not to process external general entities, such as:
 <!ENTITY copyright    SYSTEM "legal/copyright.xml">
URI: http://xml.org/sax/features/external-general-entities
Access: read/write
Default: unspecified; always true if the parser is validating (see the “Validation” section)
This feature tells a parser whether or not to process external parameter entities, used to define DTDs by a system and/or public ID (rather than directly in an XML document by location):
<!DOCTYPE book [

  <!ENTITY % book SYSTEM "http://www.newInstance.com/dtd/book.dtd">

  %book;

]>
URI: http://xml.org/sax/features/external-parameter-entities
Access: read/write
Default: unspecified; always true if the parser is validating (see the “Validation” section)
This feature reports whether a document is standalone, declared via the standalone attribute in the XML declaration:
<?xml version="1.0" standalone="yes"?>
URI: http://xml.org/sax/features/is-standalone
Access: read-only during parsing; not available otherwise
Default: not applicable
This feature is a bit of an aberration; it’s available only during parsing, and must be called after the startDocument(  ) callback has been fired. Additionally, you can’t set this feature on a parser; it has no meaning outside of the parsing context.
This features lets a parser know that parameter entity reporting (when they start, and when they stop) should be handled by a LexicalHandler (for more on LexicalHandlers, see Chapter 4, and the “Lexical Handler” property section).
URI:
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
Core Properties
Inhaltsvorschau
Properties provide a way to deal with objects used in the parsing process, particularly when dealing with handlers such as LexicalHandler and DeclHandler that are not in the core set of SAX 2.0 handlers (EntityResolver, DTDHandler, ContentHandler, and ErrorHandler). Any property can be read-only or read/write; features also may be modifiable only when parsing is occurring, or only when parsing is not occurring.
This property allows the setting and retrieval of a DeclHandler implementation to be used for handling of constraints within a DTD.
URI: http://xml.org/sax/properties/declaration-handler
Datatype: org.xml.sax.ext.DeclHandler
Access: read/write
This property returns the version string of the XML document, indicated by the version attribute in the XML declaration:
<?xml version="1.0"?>
Like “Standalone,” this property acts a bit abnormally; it’s only available during parsing, and must be called after the startDocument(  ) callback has been fired.
URI: http://xml.org/sax/properties/document-xml-version
Datatype: String
Access: read-only during parsing; not available otherwise
When parsing is occurring, this property retrieves the current DOM node (if a DOM iterator is being used). When parsing is not occurring, it retrieves the root DOM node.
Most of the parsers I used in testing for this book did not support this property except in very special cases; you shouldn’t rely on it providing useful information in the general case.
URI: http://xml.org/sax/properties/dom-node
Datatype: org.w3c.dom.Node
Access: read-only when parsing; read/write when not parsing
This property allows the setting and retrieval of a LexicalHandler implementation to be used for handling of comments and DTD references within an XML document.
URI: http://xml.org/sax/properties/lexical-handler
Datatype: org.xml.sax.ext.LexicalHandler
Access: read/write
This retrieves the literal characters in the XML document that triggered the event in the process when this property is used.
Ende der Inhaltsvorschau. Der weiterere Inhalt dieses Abschnitts ist hier nicht einsehbar.
	

Zurück zu Java and XML


Themen

Buchreihen

Special Interest

International Sites

O'Reilly China O'Reilly USA O'Reilly Japan O'Reilly Taiwan