SQL and XML: XML and Queries

SQL provides a powerful and very useful query facility for finding, transforming, and retrieving structured data from relational databases, so it’s natural to seek a similar query capability for finding, transforming, and retrieving structured data from XML documents. The earliest efforts to define a query and transformation capability were embodied in a pair of specifications—eXtensible Stylesheet Language Transformation (XSLT) and XML Path Language (XPath). Like XmL itself, these specifications have their roots in document processing

XSLT focuses on transforming an XML document, as shown in Figure 25-8. A stylesheet governs the transformation, selecting which elements of the input XML document are to be transformed and dictating how they are modified and combined with other elements to produce the output XML document. One popular use for XSLT is to transform a single, generic version of a web page into various forms that are appropriate for rendering on different screen sizes and display devices.

Within the XSLT language, it is often necessary to select individual elements or groups of elements to be transformed, or to navigate through the element hierarchy to specify data to be combined from parent and child elements. XPath originally emerged as a part of the XSLT language for element selection and navigation. It quickly became apparent that XPath was useful for other applications as well, and the specification was split out of XSLT to stand on its own. In the early days of XML, XPath was the de facto query capability for XML documents.

More recently, industry attention has focused on some of the deficiencies of XPath as a full query language. A W3C working group was formed to specify a query facility under the working name XML Query, or XQuery. As the specification passed through various drafts, the XSL working group (responsible for XSLT and XPath) and the XQuery working groups joined forces. At this writing, both XQuery 1.0 and XPath 2.0 are in Working Draft stage, and the two languages are tightly linked, with common syntax and semantics wherever possible.

A full description of XQuery and XPath is beyond the scope of this book. Because XQuery 1.0 has only Working Draft and not Official Recommendation status, it is still subject to change as it is reviewed by the W3C membership. However, a brief review of XQuery concepts and a few examples will illustrate the relationship to SQL, and these fundamentals are unlikely to change as XQuery 1.0 moves toward official standard status.

1. XQuery Concepts

If the underlying data model beneath the SQL language is the row/column table, the underlying data model beneath XQuery is a tree-structured hierarchy of nodes that represent an XML document. XQuery actually uses a finer-grained tree structure than the element hierarchy of XML documents and XML Schema. These XQuery nodes are relevant for database-style queries:

  • Element node. This type of XQuery node represents an element itself.
  • Text node. This type of node represents element contents. It is a child of the corresponding element node.
  • Attribute node. This type of node represents an attribute and attribute value for an element. It is a child of the corresponding element node.
  • Document node. This is a specialized element node that represents the top, or root level, of a document.

To navigate through an item tree and identify one or more items for processing, XQuery uses a path expression. In many ways, a path expression plays the same role for XQuery as the SQL3 query expression, described in Chapter 24, plays for SQL.

A path expression identifies an individual node in the XQuery item tree by specifying the sequence of steps through the tree hierarchy that is needed to reach the node. XQuery path expressions come in two types:

  • Rooted path expression. A rooted path expression starts at the top (the root) of the item tree, and steps down through the hierarchy to reach the target node. Within the book document in Figure 25-1, the rooted path expression /bookPart / chapter/section/para navigates down to an individual paragraph within a section of a chapter.
  • Relative path expression. A relative path expression starts the current node of the item tree (the node where processing is currently focused) and steps up and/or down through the hierarchy to reach the target node. Within the book document in Figure 25-1, the relative path expression section/para navigates down to a specific paragraph if the current node is a chapter node.

The steps within a path can specify motion downward within the node tree to child nodes that represent subelements, element contents, or element attributes. The steps can also specify upward motion to the parent of a node. With each step, you can specify a node test that must be passed to continue on the path to the target element. Table 25-2 shows some typical path expressions and the navigation path that they specify.

Like SQL, XQuery is a set-oriented language. It is optimized to work on an XQuery sequence, an ordered collection of zero or more items. The items themselves might be elements, attributes, or data values. XQuery operations tend to take sequences as their input and produce sequences as their output. A simple atomic data item is usually treated as if it were a one-item sequence.

XQuery also resembles SQL in being a strongly typed language. The working draft of the XQuery specification is evolving to align the XQuery data types with those specified in XML Schema, which were described earlier in this chapter in the “XML Schema” section. Like SQL3, XQuery provides constructors to build up more complex data values.

XQuery differs substantially from SQL in being an expression-oriented rather than a statement-oriented language. Casually stated, everything in XQuery is an expression, which is evaluated to produce a value. Path expressions are one type of XQuery expression, and they produce a sequence of nodes as their result. Other expressions may combine literal values, function calls, arithmetic and Boolean expressions, and the typical parenthesized combinations of these to form arbitrarily complex expressions. Expressions can also combine sequences of nodes, using set operations like the union or intersection of sets, which match the corresponding SQL set operations.

Named variables in XQuery are denoted by a leading dollar sign ($) in their names. For example, $orderNum, $currentOffice, and $c would be valid XQuery variable names. Variables can be used freely in XQuery expressions to combine their variables with literals and other variable values and node values. Variables receive new values through function calls, and by assignment in for or let expressions.

2. Query Processing in XQuery

XQuery path expressions can provide the XML equivalent of the simple SQL SELECT statement with a WHERE clause. Assume that a collection of XML documents contains the XML equivalent of the contents of the sample database, with the top-level documents named with the names of the tables in the sample database and the individual row structures named with the singular equivalents of those names (e.g., the OFFICES document contains individual OFFICE elements to represent the rows of the OFFICES table, and so on). Here are some query requests and their corresponding path expressions:

Identify the offices managed by employee number 108.

/offices/office[mgr=108]

Find all offices with sales over target.

/offices/office[sales > target]

Find all orders for manufacturer ACI with amounts over $30,000.

/orders/order[mfr = ‘ACI’ and amount > 35000.00]

Because the sample database is a shallow row/column structure, the XML hierarchy is only three levels deep. To illustrate the query possibilities in more hierarchical XML documents, consider once again the book document in Figure 25-1. Here are some query requests and their corresponding path expressions:

Find all components of chapters that have draft status.

/book/part/chapter[revStatus=’draft’]/*

Get the third paragraph of the second chapter of part 2.

/book/part[2]/chapter[2]/para[3]

These expressions don’t give you the same control over query results as the SELECT list provides in SQL queries. They also don’t provide the equivalent of SQL cursors for row-by-row processing. XQuery provides both of these capabilities through For/Let/ Where/Returns expressions (FLWR expressions, pronounced “flower”). An example is the best way to illustrate the capability. Once again, assume a set of XML documents structured to resemble the sample database, as in the previous examples. This query implements a two-table join and generates three specific columns of query results:

List the salesperson’s name, order date, and amount of all orders under $5000, sorted by amount.

<smallOrders> {

for $o in document(“orders.xml”)//order[amount < 5000.00],

$r in

document(“salesreps.xml”)//salesreps[empl_num=$o/order/rep]

return

<smallOrder> {

$r/name,

$o/order_date,

$o/amount

}

</smallOrder>

sortby(amount)

}

</smallOrders>

At the outer level, the contents of the smallOrders element are specified by the XQuery expression enclosed in the outer braces. The for expression uses two variables to iterate through two documents, corresponding to the ORDERS and SALESREPS tables. These two variables effectively implement a join between the two tables (documents). The predicates at the end of each line following the for keyword correspond to the SQL WHERE clause. The predicate in the first line restricts the query to orders with amounts over $5000. The predicate in the second row implements the join, using the $o variable to link rows in the SALESREPS table (document) with rows in the ORDERS table (document).

The return part of the for expression specifies which elements should be returned as the results of the expression evaluation. It corresponds to the select list in a SQL query. The returned value will be an XML sequence of smallOrder elements, and each element comes from one corresponding element in the source tables (documents). Once again, the iteration variables are used to qualify the specific path to the element whose contents are to be returned. Finally, the sortby part of the expression functions in the same way as the corresponding ORDER BY clause of a SQL query.

There are a few additional query-processing capabilities not illustrated in this example. You can use a let expression within the for iteration to capture additional variable values within the for loop that you may need in predicates or other expressions. An if…then…else expression supports conditional execution. Aggregate functions support grouped XQuery queries, corresponding to the SQL summary queries described in Chapter 8. With these capabilities, the flexibility of XQuery is comparable to that of SQL query expressions. However, as you can see from the example, the style of the query expression is quite different, reflecting both the expression orientation and the very strong navigational orientation of XQuery and XML documents.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *