SQL and XML: XML Basics

To understand the interactions between XML and SQL, you need a basic understanding of XML and how it is used. If you already understand or use XML, feel free to skip this section and go on to the next. If you are not familiar with XML, this section provides a simple introduction, based on some examples of XML documents.

Figure 25-1 shows a typical XML representation of a text document, a portion of Part II of this book. This example has little to do with data processing or SQL, but it shows XML in its original environment, and it illustrates key XML concepts. Each element of the XML document in the figure—each component part—is represented by a corresponding XML element with the simple structure shown in Figure 25-2. The element is identified by an opening tag, which contains the name of the element type, enclosed between less-than (<) and greater-than (>) symbols.

In Figure 25-1, paragraphs are identified by an opening <para> tag, and headers are identified by an opening <header> tag. The end of each element is identified by a closing tag, which again contains the name of the element type, preceded by a slash (/) character, again enclosed between less-than and greater-than symbols. In Figure 25-1, paragraphs end with a </para> tag and headers end with a </header> tag. Between the opening and closing tags is the content of the element. Most of the content in Figure 25-1 is text, enclosed in quotes. You can use single or double quotes to enclose the text, as long as you use the same type of quotation mark for the beginning and ending of a piece of text.

Figure 25-1 shows the hierarchy of elements typical of most XML documents. At the top level is the part element. Its contents are not text, but other elements—a sequence of chapter elements. Each chapter element contains a title element, possibly some introductory para elements, and then a series of section elements. Each section element contains a header element and one or more para elements, possibly interspersed with some figure elements and some table elements. Each para element has only text as its contents.

In addition to the element hierarchy, Figure 25-1 shows some examples of attributes, another fundamental XML structure. An attribute is associated with a specific XML element, and describes some characteristic of the element. Each attribute has an attribute name and a value. In Figure 25-1, the chapter element has an attribute called chapNum whose value is the chapter number associated with that particular content. The chapter element has another attribute called revStatus whose value indicates whether the chapter is in its original draft, being rewritten, or in final form. Individual <header> elements in Figure 25-1 also have an attribute called hdrLevel that indicates whether the header is top level (level 1) or lower level (level 2 or 3).

The first line of the XML document in Figure 25-1 identifies it as an XML 1.0 document. Every other part of the document describes the element structure, element contents, or attributes of elements. XML documents can become considerably more complex, but these fundamental components are the ones that are important for XML/database interaction. Note that element names and attribute names are case-sensitive. An element named bookPart and one named bookpart are not considered the same element. This is different from the usual SQL convention for table and column names, which are usually case-insensitive.

One additional XML shorthand notation is not shown in Figure 25-1 for clarity, but is very useful in practice. For elements that have no content of their own but only attributes, the end of the element can be indicated within the same pair of less-than and greater-than symbols as the opening tag, indicated by a slash just before the greater-than symbol. Using this convention, this element from Figure 25-1:

<figure figNum=”5-1″></figure>

can be instead represented as:

figure figNum=”5-2″ />

The XML specification defines certain rules that every XML document should follow. It dictates that elements within an XML document must be strictly nested within one another. The closing tag for a lower-level element must appear before the closing tag for a higher-level element that contains it. The standard also dictates that an attribute must be uniquely named within its element; it is illegal to have two attributes with the same name attached to a single element. XML documents that obey the rules are described as well-formed XML documents.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *