SQL and XML: XML for Data

Although the roots of XML are in documents and document processing, XML can be quite useful for representing the structured data commonly found in data processing applications as well. Figure 25-3 shows a typical XML document from the data processing world, a very simplified purchase order. This is quite a different type of document than the book excerpt in Figure 25-1, but the key components of the document are the same. Instead of a chapter, the top-level element is a purchaseOrder. Its contents, like those of the chapter, are subelements—a customerNumber, an orderNumber, an orderDate, and an orderItem. The orderItem in turn is composed of further subelements. Figure 25-3 also shows some business terms associated with the purchase order as attributes of the terms element. The ship attribute specifies how the order is to be shipped. The bill attribute specifies the credit terms for the order.

It should be obvious that the simple XML purchase order document in Figure 25-3 has a strong relationship to the ORDERS table in the sample database. You may want to compare it to the structure of the ORDERS table shown in Appendix A (Figure A-5). The lowest-level elements in the document mostly match the individual columns of the ORDERS table, except for the terms element. The top-level element in the document represents an entire row of the table. The transformation between a group of documents like the one in Figure 25-3 and a set of rows in the ORDERS table is a straightforward, mechanical one, which can be automatically performed by a simple computer program.

Unlike the ORDERS table, the XML document imposes one middle level of hierarchy, grouping together the information about the ordered product—the manufacturer ID, product ID, quantity, and total amount. In a real-world purchase order, this group of data items might be repeated several times, forming multiple line items on the order. The XML document could be easily extended to support this structure, by adding a second or third orderItem element after the first one. The sample database cannot be so easily extended. To support orders with multiple line items, the ORDERS table would probably be split into two tables: one holding the order header information (order number, date, customer-ID, and so on), and the other holding individual order line items.

1. XML and SQL

The SGML origins give XML several unique and useful characteristics, which have strong parallels to the SQL language:

  • Descriptive approach. XML approaches document structure by telling what each element of a document is, rather than how to process it. You may recall this is also a characteristic of SQL, which focuses on which data is requested rather than how to retrieve it.
  • Building blocks. XML documents are built up from a very small number of basic building blocks, including two fundamental concepts, elements and attributes. There are some strong (but not perfect) parallels between an XML element and a SQL table, and between an XML attribute and a SQL column.
  • Document types. XML defines and validates documents as conforming to specific document types that parallel real-world documents, such as a purchase order document or a business reply letter document or a vacation request document. Again, there are strong parallels to SQL, where tables represent different types of real-world entities.

Although there are some strong parallels between XML and SQL, there are also some very strong differences:

  • Document vs. data orientation. The core concepts of XML arise out of typical document structures. XML is text-centric, and it implements a strong distinction between the content itself (the elements of a document) and characteristics of the content (attributes). The core concepts of SQL arise out of typical data processing record structures. It is data-centric, with a range of data types (in their binary representations), and its structures (tables and columns) focus on content (data). This mismatch between the fundamental XML and SQL models can cause some conflicts or difficult choices when using them together.
  • Hierarchical vs. tabular structure. Natural XML structures are hierarchical, reflecting the hierarchy of elements in most types of documents (for example, a book contains chapters, chapters contain sections, and sections contain a heading, paragraphs, and figures). The structures are also flexible and variable. One section may contain five paragraphs and a single figure, the next one three paragraphs and two figures, and the next one six paragraphs and no figures.

In contrast, SQL structures are tabular, not hierarchical, and they reflect the records typical of data processing applications. SQL structures are also quite rigid. Every row of a table contains exactly the same columns, in the same order. Each column has the same data type in every row. There are no optional columns; every column must appear in every row. These differences can also cause conflicts when using XML and SQL together.

  • Objects vs. operations. The core purpose of the XML language is to represent objects. If you take a meaningful piece of XML text and ask “What does this represent?” the answer will be an object: a paragraph, a purchase order, or a customer address, for example. The SQL language has a broader purpose, but most of it is focused on manipulating objects. If you take a meaningful piece of SQL text and ask “What does this represent?” the answer will usually be an operation on an object: creating an object, deleting an object, finding one or more objects, or updating object contents. These differences make the two languages fundamentally complementary in their purpose and use.

2. Elements vs. Attributes

The relational model offers only one way to represent data values within the database— as values of individual columns within individual rows of a table. The XML document model offers two ways to represent data:

  • Elements. An element within an XML document has contents, and the contents can include a data value in the form of text for that element. When represented in this way, the data value is a fundamental part of the XML document hierarchy; the hierarchy is built up from elements. Often, an element containing a data value will be a leaf node in the XML document tree; that element will be a child of higher-level elements, but it will not itself have any children. This will almost always be true of elements that represent data that comes from a relational database. However, XML does support mixed elements, which contain a combination of text (content) and other subelements.
  •  Attributes. An element within an XML document may have one or more named attributes, and each attribute has a text value. The attributes are attached to an element within the XML hierarchy but are not the content of the element. The names of different attributes of an element must be different, so you can’t have two attributes with the same name. Also, XML treats the order of the attributes of an element as insignificant; they can appear in any order. This differs from the XML treatment of elements, which have a definite position within an XML document, and where the difference between the first, second, and third child elements of a higher-level element is significant.

The existence of two different ways to represent data in XML means that there are two different legitimate ways to express the contents of a relational database as XML. These two rows of data:

might be represented by this XML document when elements are used to represent column values:

<?xml version=”1.0″?>

<queryResults>

<row>

<orderNum>112963</orderNum>

<mfr>ACI</mfr>

<product>41004</product>

<qty>28</qty>

<amount>3276.00</amount>

</row>

<row>

<orderNum>112983</orderNum>

<mfr>ACI</mfr>

<product>41004</product>

<qty>3</qty>

<amount>702.00</amount>

</row>

</queryResults>

and would be represented by this XML document when attributes are used:

<?xml version=”1.0″?>

<queryResults>

<row orderNum=”112963″

mfr=”ACI”

product=”41004″

qty=”28″

amount=”3276.00″>

</row>

<row orderNum=”112983″

mfr=”ACI”

product=”41004″

qty=”3

amount=”702.00″>

</row>

</queryResults>

As you might expect, there are strong advocates for both the element-representation and the attribute-representation methods, with strongly held beliefs. Advocates of the element approach make these arguments:

  • Elements are more fundamental to the XML model than attributes; they are the carriers of content in all markup languages (HTML, XML, SGML, etc.), and the content of the database (column values) should be represented as content in XML.
  • Element order matters, and in some cases, so does the ordering of data in a DBMS (for example, when identifying a column by number in a query specification or when using a column number to retrieve query results with an API).
  • Elements provide a uniform way of representing column data, regardless of whether the column has a simple, atomic data type (integer, string) or more complex, compound, user-defined data types supported by the object-relational extensions and SQL3. Attributes don’t provide this capability. (Attribute values are atomic.)

Advocates of the attribute approach make these arguments:

  • Attributes are a fundamental match for the columns in the relational model. Individual rows represent entities, so they should be mapped into elements. Column values describe attributes of the entity (row) in which they appear; they should be represented as attribute values in XML.
  • The restriction of unique attribute names within an element matches the uniqueness required of column names within a table. The unordered nature of attributes matches the unordered nature of columns in the fundamental relational model. (The places where column position is used are shortcuts for convenience, not fundamental to the underlying relations.)
  • The attribute representation is more compact, since column names appear only once in the XML form, not as both opening and closing tags. This is a practical advantage when storing or transmitting XML.

Both the element-centric and attribute-centric styles are found in today’s XML and SQL products. The choice depends on the preferences of the document author and the conventions of the organization using XML with SQL. In addition, standards imposed by industry bodies for document exchange using XML may dictate one style or the other.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *