The Structure of an XML Document in Java

An XML document should start with a header such as

<?xml version=”1.0″?>

or

<?xml version=”1.0″ encoding=”UTF-8″?>

Strictly speaking, a header is optional, but it is highly recommended.

The header can be followed by a document type definition (DTD), such as

<!DOCTYPE web-app PUBLIC

“-//Sun Microsystems, Inc.//DTD Web Application 2.2//EN”

“http://java.sun.com/j2ee/dtds/web-app_2_2.dtd”>

DTDs are an important mechanism to ensure the correctness of a document, but they are not required. We will discuss them later in this chapter.

Finally, the body of the XML document contains the root element, which can contain other elements. For example,

<?xml version=”1.0″?>

<!DOCTYPE config . . .>

<config>

<entry id=”title”>

<font>

<name>Hetvetica</name>

<size>36</size>

</font>

</entry>

</config>

An element can contain child elements, text, or both. In the preceding example, the font element has two child elements, name and size. The name element contains the text “Helvetica”.

XML elements can contain attributes, such as

<size unit=”pt”>36</size>

There is some disagreement among XML designers about when to use ele­ments and when to use attributes. For example, it would seem easier to describe a font as

font name=”Helvetica” size=”36″/>

compared to

<font>

<name>Helvetica</name>

<size>36</size>

</font>

However, attributes are much less flexible. Suppose you want to add units to the size value. If you use attributes, you will have to add the unit to the attribute value:

<font name=”Helvetica” size=”36 pt”/>

Ugh! Now you have to parse the string “36 pt”, just the kind of hassle that XML was designed to avoid. Adding an attribute to the size element is much cleaner:

<font>

<name>Hetvetica</name>

<size unit=”pt”>36</size>

</font>

A commonly used rule of thumb is that attributes should be used only to modify the interpretation of a value, not to specify values. If you find yourself engaged in a metaphysical discussion about whether a particular setting is a modification of the interpretation of a value or not, just say “no” to attrib­utes and use elements throughout. Many useful XML documents don’t use attributes at all.

Elements and text are the “bread and butter” of XML documents. Here are a few other markup instructions that you might encounter:

  • Character references have the form &#decimalValue; or &#xhexValue;. For example, the e character can be denoted with either of the following:

&#233; &#xE9;

  •  Entity references have the form &name;. The entity references

&lt; &gt; &amp; &quot; &apos;

have predefined meanings: the less-than, greater-than, ampersand, quotation mark, and apostrophe characters. You can define other entity references in a DTD.

  • CDATA sections are delimited by <![CDATA[ and ]]>. They are a special form of character data. You can use them to include strings that contain char­acters such as < > & without having them interpreted as markup, for example:

<![CDATA[< & > are my favorite detimiters]]>

CDATA sections cannot contain the string ]]>. Use this feature with caution! It is too often used as a back door for smuggling legacy data into XML documents.

  • Processing instructions are instructions for applications that process XML documents. They are delimited by <? and ?>, for example

<?xmt-stytesheet href=”mystyle.css” type=”text/css”?>

Every XML document starts with a processing instruction

<?xmt version=”1.0″?>

  • Comments are delimited by <!– and –>, for example <!– This is a comment. –>

Comments should not contain the string –. Comments should only be information for human readers. They should never contain hidden commands; use processing instructions for commands.

Source: Horstmann Cay S. (2019), Core Java. Volume II – Advanced Features, Pearson; 11th edition.

Leave a Reply

Your email address will not be published. Required fields are marked *