Advanced Queries in SQL2 *

The SQL queries described thus far in Chapters 6-9 are the mainstream capabilities provided by most SQL implementations. The combination of features they represent— column selection in the SELECT clause, row selection criteria in the WHERE clause, multitable joins in the FROM clause, summary queries in the GROUP BY and HAVING clauses, and subqueries for more complex requests—give the user a powerful set of data retrieval and data analysis capabilities. However, database experts have pointed out many limitations of these mainstream query capabilities, including these:

  • No decision making within queries. Suppose you wanted to generate a two-column report from the sample database showing the name of each sales office and either its annual sales target or its year-to-date sales, whichever is larger. With standard SQL query features, this is hard to do. Or suppose you had a database that kept track of sales by quarter (four columns of data for each office) and wanted to write a program that displayed offices and their sales for a specific (user-supplied) quarter. Again, this program is more difficult to write using standard SQL queries. You must include four separate SQL queries (one for each quarter), and the program logic must select which query to run, based on user input. This simple case isn’t too difficult, but in a more general case, the program could become much more complex.

  • Limited use of subqueries. The simplest example of this limitation is the SQL1 restriction that a subquery can appear only on the right side of a comparison test in a WHERE The database request “list the offices where the sum of the salesperson’s quotas is greater than the office target” is most directly expressed as this query:

SELECT OFFICE FROM OFFICES

WHERE (SELECT SUM(QUOTA)

FROM SALESREPS

WHERE REP_OFFICE = OFFICE) > TARGET

But this isn’t a legal SQL1 statement. Instead, you must turn the inequality around:

SELECT OFFICE FROM OFFICES

WHERE TARGET > (SELECT SUM(QUOTA)

  FROM SALESREPS

  WHERE REP_OFFICE = OFFICE)

In this simple example, it isn’t hard to turn the logic around, but the restriction is a nuisance at best, and it does prevent you from comparing the results of two subqueries, for example.

  • Limited-row expressions. Suppose you wanted to list the suppliers, item numbers, and prices for a set of products that are substitutes for one another. Conceptually, these are a set of products whose identification (a manufacturer- ID/product-ID pair) matches one of a set of values, and it would be natural to write the query using a set membership test as:

SELECT MFR_ID, PRODUCT_ID, PRICE

FROM PRODUCTS

WHERE (MFR_ID, PRODUCT_ID) IN ((‘ACI’,41003),(‘BIC’,41089), …)

The SQL1 standard doesn’t permit this kind of set membership test. Instead, you must construct the query as a long set of individual comparisons, connected by ANDs and ORs.

  • Limited-table expressions. SQL allows you to define a view like this one for large orders:

SELECT *

FROM PRODUCTS

WHERE AMOUNT > 10000

and then use the view as if it were a real table in the FROM clause of a query to find out which products, in which quantities, were ordered in these large orders:

 SELECT MFR, PRODUCT, SUM(QTY)

FROM BIGORDERS

GROUP BY MFR, PRODUCT

Conceptually, SQL should let you substitute the view definition right into the query, like this:

 SELECT MFR, PRODUCT, SUM(QTY)

FROM (SELECT * FROM ORDERS WHERE AMOUNT > 10000)

GROUP BY MFR, PRODUCT

But the SQL1 standard doesn’t allow a subquery in this position in the WHERE clause. Yet clearly, the DBMS should be able to determine the meaning of this query, since it must basically do the same processing to interpret the BIGORDERS view definition.

As these examples show, the SQL1 standard and mainstream DBMS products that implement to this level of the standard are relatively restrictive in their permitted use of expressions involving individual data items, sets of data items, rows, and tables. The SQL2 standard includes a number of advanced query capabilities that are focused on removing these restrictions and making the SQL language more general. The spirit of these SQL2 capabilities tends to be that a user should be able to write a query expression that makes sense and have the query expression be a legal SQL query. Because these SQL2 capabilities constitute a major expansion of the language over the SQL1 standard, most of them are required only at a full level of the standard.

1. Scalar-Valued Expressions (SQL2)

The simplest extended query capabilities in SQL2 are those that provide more data manipulation and calculation power involving individual data values (called scalars in the SQL2 standard). Within the SQL language, individual data values tend to have three sources:

  • The value of an individual column within an individual row of a table
  • A literal value, such as 125.7 or ABC
  • A user-supplied data value, entered into a program

In this SQL query:

 SELECT NAME, EMPL_NUM, HIRE_DATE, (QUOTA * .9)

FROM SALESREPS

WHERE (REP_OFFICE = 13) OR TITLE = ‘VP SALES’

the column names NAME, EMPL_NUM, HIRE_DATE, and QUOTA generate individual data values for each row of query results, as do the column names REP_OFFICE and TITLE in the WHERE clause. The numbers .9 and 13 and the character string “VP SALES” similarly generate individual data values. If this SQL statement appeared within an embedded SQL program (described in Chapter 17), the program variable offic e_num might contain an individual data value, and the query might appear as:

 SELECT NAME, EMPL_NUM, HIRE_DATE, (QUOTA * .9)

FROM SALESREPS

WHERE (REP_OFFICE = :office_num) OR TITLE = ‘VP SALES’

As this query and many previous examples have shown, individual data values can be combined in simple expressions, like the calculated value QUOTA * .9. To these basic SQL1 expressions, SQL2 adds the CAST operator for explicit data type conversion, the CASE operator for decision making, the NULLIF operation for conditionally creating a NULL value, and the COALESCE operator for conditionally creating non-NULL values.

2. The cast Expression (SQL2)

The SQL standard has fairly restrictive rules about combining data of different types in expressions. It specifies that the DBMS shall automatically convert among very similar data types, such as 2-byte and 4-byte integers. However, if you try to compare numbers and character data, for example, the standard says that the DBMS should generate an error. The standard considers this an error condition even if the character string contains numeric data. You can, however, explicitly ask the DBMS to convert among data types using the CAST expression, whose syntax is shown in Figure 9-9.

The CAST expression tends to be of little importance when you are typing SQL statements directly into an interactive SQL interface. However, it can be critical when using SQL from within a programming language whose data types don’t match the data types supported by the SQL standard. For example, the CAST expression in the SELECT clause of this query converts the values for REP_OFFICE (integers in the sample database) and HIRE_DATE (a date in the sample database) into character strings for the returned query results:

SELECT NAME, CAST REP_OFFICE AS VARCHAR, HIRE_DATE AS VARCHAR

FROM SALESREPS

The CAST expression can generally appear anywhere that a scalar-valued expression can appear within a SQL statement. In this example, it’s used in the WHERE clause to convert a character string customer number into an integer, so that it can be compared with the data in the database:

 SELECT PRODUCT, QTY, AMOUNT

FROM ORDERS

WHERE CUST = CAST ‘2107’ AS INTEGER

Instead of specifying a data type in the CAST expression, you can specify a SQL2 domain. Domains are specific collections of legal data values that can be defined in the database under the SQL2 standard. They are fully described in Chapter 11 because of the role they play in SQL data integrity. Note that you can also generate a NULL value of the appropriate data type for use in SQL expressions using the CAST expression.

The most common uses for the CAST expression are:

  • To convert data from within a database table where the column is defined with the wrong data type. For example, when a column is defined as a character string, but you know it actually contains numbers (that is, strings of digits) or dates (strings that can be interpreted as a month/day/year).
  • To convert data from data types supported by the DBMS that are not supported by a host programming language. For example, most host programming languages do not have explicit date and time data types and require that date/time values be converted into character strings for handling by a program.
  • To eliminate differences between data types in two different tables. For example, if an order date is stored in one table as DATE data, but a product availability date is stored in a different table as a character string, you can still compare the columns from the two tables by CASTing one of the columns into the data type of the other. Similarly, if you want to combine data from two different tables with a UNION operation, their columns must have identical data types. You can achieve this by CASTing the columns of one of the tables.

3. The case Expression (SQL2)

The SQL2 CASE expression provides for limited decision making within SQL expressions. Its basic structure, shown in Figure 9-10, is similar to the IF…THEN…ELSE statement found in many programming languages. When the DBMS encounters a CASE expression, it evaluates the first search condition, and if it is TRUE, then the value of the CASE expression is the value of the first result expression. If the result of the first search condition is not TRUE, the DBMS proceeds to the second search condition and checks whether it is TRUE. If so, the value of the CASE expression is the value of the second result expression, and so on.

Here is a simple example of the use of the CASE expression. Suppose you want to do an A/B/C analysis of the customers from the sample database according to their credit limits. The A customers are the ones with credit limits over $60,000, the B customers are those with limits over $30,000 and the C customers are the others. Using SQL1, you would have to retrieve customer names and credit limits from the database and then rely on an application program to look at the credit limit values and assign an A, B, or C rating. Using a SQL2 CASE expression, you can have the DBMS do the work for you:

 SELECT COMPANY, CASE WHEN CREDIT_LIMIT > 60000 THEN ‘A’

   WHEN CREDIT_LIMIT > 30000 THEN ‘B’

   ELSE ‘C’

FROM CUSTOMERS

For each row of query results, the DBMS evaluates the CASE expression by first comparing the credit limit to $60,000, and if the comparison is TRUE, returns an A in the second column of query results. If that comparison fails, the comparison to $30,000 is made and returns a B if this second comparison is TRUE. Otherwise, the third column of query results will return a C.

This is a very simple example of a CASE expression. The results of the CASE expression are all literals here, but in general, they can be any SQL expression. Similarly, there is no requirement that the tests in each WHEN clause are similar, as they are here.

The CASE expression can also appear in other clauses of a query. Here is an example of a query where it’s useful in the WHERE clause. Suppose you want to find the total of the salesperson’s sales, by office. If a salesperson is not yet assigned to an office, that person should be included in the total for his or her manager’s office. Here is a query that generates the appropriate office groupings:

 SELECT CITY, SUM(SALES)

FROM OFFICES, SALESREPS

WHERE OFFICE =

CASE WHEN (REP_OFFICE IS NOT NULL) THEN REP_OFFICE

ELSE (SELECT REP_OFFICE

FROM SALESREPS AS MGRS

WHERE MGRS.EMPL NUM = MANAGER)

The SQL2 standard provides a shorthand version of the CASE expression for the common situation where you want to compare a test value of some kind to a sequence of data values (usually literals). This version of the CASE syntax is shown in Figure 9-11. Instead of repeating a search condition of the form:

test_value = value1

in each WHEN clause, it lets you specify the test_valuecalculation once. For example, suppose you wanted to generate a list of all of the offices, showing the names of their managers and the cities and states where they are located. The sample database doesn’t include state names, so the query must generate this information itself. Here is a query, with a CASE expression in the SELECT list, that does the job:

SELECT NAME, CITY, CASE OFFICE WHEN 11 THEN ‘New York’

   WHEN 12 THEN ‘Illinois’
   WHEN 13 THEN ‘Georgia’
   WHEN 21 THEN ‘California’
   WHEN 22 THEN ‘Colorado’

FROM OFFICES, SALESREPS

WHERE MGR = EMPL_NUM

4. The coalesce Expression (SQL2)

One of the most common uses for the decision-making capability of the CASE expression is for handling NULL values within the database. For example, it’s frequently desirable to have a NULL value from the database represented by some literal value (such as the word “missing”) or by some default value when using SQL to generate a report. Here is a report that lists the salespeople and their quotas. If a salesperson has not yet been assigned a quota, assume that the salesperson’s actual year-to-date sales should be listed instead. If for some reason the actual year-to-date sales are also NULL (unknown), then a zero amount should be listed. The CASE statement generates the desired IF.. .THEN… ELSE logic:

 SELECT NAME, CASE WHEN (QUOTA IS NOT NULL) THEN QUOTA

WHEN (SALES IS NOT NULL) THEN SALES

ELSE 0.00

FROM SALESREPS

This type of NULL-handling logic is needed frequently, so the SQL2 standard includes a specialized form of the CASE expression, the COALESCE expression, to handle it. The syntax for the COALESCE expression is shown in Figure 9-12. The processing rules for the COALESCE expression are very straightforward. The DBMS examines the first value in the list. If its value is not NULL, it becomes the value of the COALESCE expression. If the first value is NULL, the DBMS moves to the second value and checks to see whether it is NULL. If not, it becomes the value of the expression. Otherwise, the DBMS moves to the third value, and so on. Here is the same example just given, expressed with the COALESCE expression instead of a CASE expression:

 SELECT NAME, COALESCE (QUOTA, SALES, 0.00)

FROM SALESREPS

As you can see by comparing the two queries, the simplicity of the COALESCE syntax makes it easier to see, at a glance, the meaning of the query. However, the operation of the two queries is identical. The COALESCE expression adds simplicity, but no new capability, to the SQL2 language.

5. The nullif Expression (SQL2)

Just as the COALESCE expression is used to eliminate NULL values when they are not desired for processing, sometimes you may need to create NULL values. In many data processing applications (especially older ones that were developed before relational databases were popular), missing data is not represented by NULL values. Instead, some special code value that is otherwise invalid is used to indicate that the data is missing.

For example, suppose that in the sample database, the situation where a salesperson had not yet been assigned a manager was indicated by a zero (0) value in the MANAGER column instead of a NULL value. In some circumstances, you will want to detect this situation within a SQL query and substitute the NULL value for the zero “code.” The NULLIF expression, shown in Figure 9-13, is used for this purpose. When the DBMS encounters a NULLIF expression, it examines the first value (usually a column name) and compares it to the second value (usually the code value used to indicate missing data). If the two values are equal, the expression generates a NULL value. Otherwise, the expression generates the first value.

Here is a query that handles the case where missing office numbers are represented by a zero:

SELECT CITY, SUM(SALES)

FROM OFFICES, SALESREPS

WHERE OFFICE = (NULLIF REP_OFFICE, 0)

GROUP BY CITY

Together, the CASE, COALESCE, and NULLIF expressions provide a solid decision-making logic capability for use within SQL statements. They fall far short of the complete logical flow constructs provided by most programming languages (looping, branching, and so on) but do provide for much greater flexibility in query expressions. The net result is that more processing work can be done by the DBMS and reflected in query results, leaving less work to be done by the human user or the application program.

6. Row-Valued Expressions (SQL2)

Although columns and the scalar data values they contain are the atomic building blocks of a relational database, the structuring of columns into rows that represent real-world entities, such as individual offices or customers or orders, is one of the most important features of the relational model. The SQL1 standard, and most mainstream commercial database products, certainly reflect this row/column structure, but they provide very limited capability to actually manipulate rows and groups of rows. Basically, SQL1 operations allow you to insert a row into a table, or to retrieve, update, or delete groups of rows from a database (using the SELECT, UPDATE, or DELETE statements).

The SQL2 standard goes well beyond these capabilities, allowing you to generally use rows in SQL expressions in much the same way that you can use scalar values. It provides a syntax for constructing rows of data. It allows row-valued subqueries. And it defines row-valued meanings for the SQL comparison operators and other SQL structures.

7. The Row-Value Constructor (SQL2)

SQL2 allows you to specify a row of data values by using a row-value constructor expression, whose syntax is shown in Figure 9-14. In its most common form, the row constructor is a comma-separated list of literal values, or expressions. For example, here is a row-value constructor for a row of data whose structure matches the OFFICES table in the sample database:

(23, ‘San Diego’, ‘Western’, NULL, DEFAULT, 0.00)

The result of this expression is a single row of data with six columns. The NULL keyword in the fourth column position indicates that the fourth column in the constructed row should contain a NULL (unknown) value. The DEFAULT keyword in the fifth column position indicates that the fifth column in the constructed row should contain the default value for the column. This keyword may appear in a row-value constructor only in certain situations—for example, when the row-value constructor appears in an INSERT statement to add a new row to a table.

When a row constructor is used in the WHERE clause of a SQL statement, column names can also appear as individual data items within the row constructor, or as part of an expression within the row constructor. For example, consider this query:

List the order number, quantity, and amount of all orders for ACI-41002 widgets.

 SELECT ORDER_NUM, QTY, AMOUNT

FROM ORDERS

WHERE (MFR, PRODUCT) = (‘ACI’,’41002′)

Under the normal rules of SQL query processing, the WHERE clause is applied to each row of the ORDERS table, one by one. The first row-value constructor in the WHERE clause (to the left of the equal sign) generates a two-column row, containing the manufacturer code and the product number for the current order being considered. The second row-value constructor (to the right of the equal sign) generates a two- column row, containing the (literal) manufacturer code ACI and product number 41002. The equal sign is now comparing two rows of values, not two scalar values.

The SQL2 standard defines this type of row-valued comparison for equality, which is processed by comparing, pairwise, each of the columns in the two rows. The result of the comparison is TRUE only if all of the pairwise column comparisons are TRUE. Of course, it’s possible to write the query without the row-value constructors, like this:

List the order number, quantity, and amount of all orders for ACI-41002 widgets.

 SELECT ORDER_NUM, QTY, AMOUNT

FROM ORDERS

WHERE (MFR = ‘ACI’) AND (PRODUCT = ‘41002’)

and in this simple example, the meaning of the query is probably equally clear with either form. However, row-value constructors can be very useful in simplifying the appearance of more complex queries, and they become even more useful when combined with row­valued subqueries.

8. Row-Valued Subqueries (SQL2)

As described throughout the earlier parts of this chapter, the SQL1 standard provides a subquery capability for expressing more complex database queries. The subquery takes the same form as a SQL query (that is, a SELECT statement), but a SQL1 subquery must be scalar-valued—that is, it must produce a single data value as its query results. The value generated by the subquery is then used as part of an expression within the main SQL statement that contains the subquery. This use of subqueries is supported by the major enterprise-class relational database systems today.

The SQL2 standard dramatically expands the subquery facility, including support for row-valued subqueries. A row-valued subquery returns not just a single data item, but a row of data items, which can be used in SQL2 expressions and compared to other rows. For example, suppose you wanted to show the order numbers and dates for all of the orders placed against the highest-priced product in the sample database. A logical way to start building the appropriate SQL query is to find an expression that will give you the identity (manufacturer ID and product ID) of the high-priced product in question. Here is a query that finds the right product:

Find the manufacturer ID and product ID of the product with the highest unit price.

 SELECT MFR_ID, PRODUCT_ID FROM PRODUCTS

WHERE PRICE = (SELECT MAX(PRICE)

FROM PRODUCTS)

Ignoring the possibility of a tie for the most expensive product for a moment, this query will generate a single row of query results, consisting of two columns. Using SQL2’s row-valued subquery capability, you can embed this entire query as a subquery within a SELECT statement to retrieve the order information:

List the order numbers and dates of all orders placed for the highest-priced product.

SELECT ORDER_NUM, ORDER_DATE FROM ORDERS

WHERE (MFR, PRODUCT) = (SELECT MFR_ID, PRODUCT_ID

FROM PRODUCTS

WHERE PRICE = (SELECT MAX(PRICE)

FROM PRODUCTS))

The top-level WHERE clause in this query contains a row-valued comparison. On the left side of the equal sign is a row-value constructor consisting of two column names. Each time the WHERE clause is examined to carry out the top-level query, the value of this row-valued expression is a manufacturer-ID/product-ID pair from a row of the ORDERS table. On the right side of the equal sign is the subquery that generates the identity of the product with the highest dollar value. The result of this subquery is again a row value, with two columns, whose data types match those of the row-valued expression on the left side of the equal sign.

It’s possible to express this query without the row-valued subquery, but the resulting query will be much less straightforward:

List the order numbers and dates of all orders placed for the highest-priced product.

SELECT ORDER_NUM, ORDER_DATE

FROM ORDERS

WHERE (MFR = (SELECT MFR_ID

FROM PRODUCTS

WHERE PRICE = (SELECT MAX(PRICE)

FROM PRODUCTS)))

AND (PRODUCT = (SELECT PRODUCT_ID

FROM PRODUCTS

WHERE PRICE = (SELECT MAX(PRICE)

FROM PRODUCTS)))

Instead of a single row-valued comparison in the WHERE clause, the resulting query has two separate scalar-valued comparisons, one for the manufacturer ID and one for the product ID. Because the comparison must be split, the lower-level subquery to find the maximum price must be repeated twice as well. Overall, the form of the query using the row-valued expression is a more direct translation of the English-language request, and it’s easier to read and understand.

9. Row-Valued Comparisons (SQL2)

The most common use of row-valued expressions in the WHERE or HAVING clause is within a test for equality, as illustrated by the last few examples. A constructed row (often consisting of column values from a candidate row of query results) is compared to another constructed row (perhaps a row of subquery results or a row of literal values), and if the rows are equal, the candidate row is included in the query results. The SQL2 standard also provides for row-valued forms of the inequality comparison tests and the range test. When comparing two rows for inequality, SQL2 uses the same rules that it would use if the columns were being used to sort the rows. It compares the contents of the first column in the two rows, and if they are unequal, uses them to order the rows. If they are equal, the comparison moves to the second column, and then the third, and so on. Here are the resulting comparisons for some three-column constructed rows derived from the ORDERS table:

(‘ACI’,’41002′,54) < (‘REI’,’2A44R’,5)—based on first column

(‘ACI’,’41002′,54) < (‘ACI’,’41003′,35)—based on second column

(‘ACI’,’41002′,10) < (‘ACI’,’41002′,54)—based on third column

10. Table-Valued Expressions (SQL2)

In addition to its extended capabilities for expressions involving simple scalar data values and row values, the SQL2 standard dramatically extends the SQL capabilities for table processing. It provides a mechanism for constructing a table of data values in place within a SQL statement. It allows table-valued subqueries and extends the SQL1 subquery tests to handle them. It also allows subqueries to appear in many more places within a SQL statement—for example, a subquery can appear in the FROM clause of a SELECT statement as of its source tables. Finally, it provides expanded capabilities for combining tables, including the UNION, INTERSECTION, and DIFFERENCE operations.

10.1. The Table-Value Constructor (SQL2)

SQL2 allows you to specify a table of data values within a SQL statement by using a table-value constructor expression, whose syntax is shown in Figure 9-15. In its simplest form, the table-value constructor is a comma-separated list of row-value constructors, each of which contains a comma-separated set of literals that form individual column values. For example, the SQL2 INSERT statement uses a table-value constructor as the source of the data to be inserted into a database. While the SQL1 INSERT statement (described in Chapter 10) allows you to insert only a single row of data, the next SQL2 INSERT statement inserts three rows into the OFFICES table.

Add three offices to the OFFICES table.

INSERT INTO OFFICES (OFFICE,CITY,REGION,MGR,SALES)

VALUES (23, ‘San Diego’, ‘Western’, 108, 0.00),

  (24, ‘Seattle’, ‘Western’, 104, 0.00),

  (14, ‘Boston’, ‘Eastern, NULL, 0.00)

Note that the individual rows in the table-value constructor are not restricted to contain only literal values. The source of a data value can be a scalar-valued subquery, or an entire row can be the result of a row-valued subquery. Although it doesn’t make much sense in the sample database, this is a legal SQL2 INSERT statement that illustrates these capabilities:

Add three offices to the OFFICES table.

INSERT INTO OFFICES (OFFICE,CITY,REGION,MGR,SALES)

VALUES (23, ‘San Diego’, ‘Western’, 108, 0.00),

  (24, ‘Seattle’, ‘Western’, (SELECT MANAGER

   FROM SALESREPS

   WHERE EMPL_NUM = 105), 0.00),

  (SELECT ‘BOSTON’, ‘EASTERN’, REGION, MGR, 0.00

FROM OFFICES

WHERE OFFICE = 12)

Like the preceding example, the VALUES clause in this INSERT statement generates a three-row table to be inserted. The first row is specified with literal values. In the second row, the fourth column is specified as a scalar-valued subquery that retrieves the manager of employee number 105. In the third row, the entire row is generated by a row-valued subquery. In this case, three of the column values in the subquery’s SELECT clause are actually literal values, but the third and fourth columns are produced by the subquery, which retrieves the manager and region for the New York office (number 12).

10.2. Table-Valued Subqueries (SQL2)

Just as SQL2 expands the use of scalar subqueries into row-valued subqueries, it also extends the SQL subquery facility to support table-valued subqueries—that is, subqueries that return a full table of results. One useful role for table-valued subqueries is within the WHERE or HAVING clause, where it is combined with extended forms of the subquery tests. For example, suppose you wanted to list the descriptions and prices of all products with orders exceeding $20,000 in the sample database. Perhaps the most straightforward way to express this request is in this SQL2 statement that uses a table-valued subquery:

List the description and price of all products with individual orders over $20,000.

SELECT DESCRIPTION, PRICE

FROM PRODUCTS

WHERE (MFR_ID,PRODUCT_ID) IN (SELECT MFR, PRODUCT

FROM ORDERS

WHERE AMOUNT > 20000.00)

The top-level query is a straightforward statement of the English-language request—it asks for the description and price of those products whose identification (as in previous examples, a manufacturer-ID/product-ID pair) matches some set of products. This is expressed as a subquery set membership test in the WHERE clause. The subquery generates a two-column table of subquery results, which are the identifications of the products that meet the stated order size criterion.

It’s certainly possible to express this query in other ways. From the discussion in Chapter 7, you probably recognize that it can be stated as a join of the PRODUCTS and ORDERS tables with a compound search condition:

List the description and price of all products with individual orders over $20,000.

SELECT DESCRIPTION, PRICE

FROM PRODUCTS, ORDERS

WHERE (MFR_ID = MFR)

AND (PRODUCT_ID = PRODUCT)

AND (AMOUNT > 20000.00)

This is an equally valid statement of the query, but it’s a lot further removed from the English-language request, and therefore more difficult to understand for most people. As queries become more complex, the ability to use table-valued subqueries becomes even more useful to simplify and clarify SQL requests.

10.3. The SQL2 Query Specification

The SQL2 standard formalizes the definition of what we have loosely been calling a SELECT statement or a query in the last three chapters into a basic building block called a query specification. For a complete understanding of the SQL2 table expression capabilities in the next section, it’s useful to understand this formal definition. The form of a SQL2 query specification is shown in Figure 9-16. Its components should be familiar from the earlier chapters:

  • A select list specifies the columns of query results. Each column is specified by an expression that tells the DBMS how to calculate its value. The column can be assigned an optional alias with the AS
  • The keywords ALL or UNIQUE control duplicate row elimination in the query results.
  • The FROM clause specifies the tables that contribute to the query results.
  • The WHERE clause describes how the DBMS should determine which rows are included in the query results and which should be discarded.
  • The GROUP BY and HAVING clauses together control the grouping of individual query results rows in a grouped query, and the selection of row groups for inclusion or exclusion in the final results.

The query specification is the basic query building block in the SQL2 standard. Conceptually, it describes the process of combining data from the tables in the FROM clause into a row/column table of query results. The value of the query specification is a table of data. In the simplest case, a SQL2 query consists of a simple query specification. In a slightly more complex case, a query specification is used to describe a subquery, which appears within another (higher-level) query specification. Finally, query specifications can be combined using table-valued operations to form general-purpose query expressions, as described in the next section.

11. Query Expressions (SQL2)

The SQL2 standard defines a query expression as the full, general-purpose way that you can specify a table of query results in the SQL2 language. The basic building blocks you can use to create a query expression are the following:

  • A query specification, as described in the preceding section (SELECT.. .FROM…). Its value is a table of query results.
  • A table-value constructor, as previously described (VALUES .). Its value is a table of constructed values.
  • An explicit table reference (TABLE tblname). Its value is the contents of the named table.

Using these building blocks, SQL2 lets you combine their table values using the following operations:

  •  JOIN. SQL2 provides explicit support for full cross-product joins (cross joins), natural joins, inner joins, and all types of outer joins (left, right, and full), as described in Chapter 7. A JOIN operation takes two tables as its input and produces a table of combined query results according to the join specification.
  •  UNION. The SQL2 UNION operation provides explicit support for merging the rows of two compatible tables (that is, two tables having the same number of columns and with corresponding columns having the same data types). The UNION operation takes two tables as its input and produces a single merged table of query results.
  • DIFFERENCE. The SQL2 EXCEPT operation takes two tables as its input and produces as its output a table containing the rows that appear in the first table but that do not appear in another table—that is, the rows that are missing from the second table. Conceptually, the EXCEPT operation is like table subtraction. The rows of the second table are taken away from the rows of the first table, and the answer is the remaining rows of the first table.
  • INTERSECT. The SQL2 INTERSECT operation takes two tables as its input and produces as its output a table containing the rows that appear in both input tables.

12. SQL2 union, intersect, and difference Operations

The SQL2 UNION, INTERSECT, and DIFFERENCE operations provide set operations for combining two input tables to form an output table. All three of the operations require that the two input tables be union-compatible—they must have the same number of columns, and the corresponding columns of each table must have identical data types. Here are some simple examples of SQL2 query expressions involving UNION, INTERSECT, and DIFFERENCE operations based on the sample database:

Show all products for which there is an order over $30,000 or more than $30,000 worth of inventory on hand.

(SELECT MFR, PRODUCT

FROM ORDERS

WHERE AMOUNT > 30000.00)

UNION

(SELECT MFR_ID, PRODUCT_ID

FROM PRODUCTS

WHERE (PRICE * QTY_ON_HAND) > 30000)

Show all products for which there is an order over $30,000 and more than $30,000 worth of inventory on hand.

(SELECT MFR, PRODUCT

FROM ORDERS

WHERE AMOUNT > 30000.00)

INTERSECT

(SELECT MFR_ID, PRODUCT_ID

FROM PRODUCTS

WHERE (PRICE * QTY_ON_HAND) > 30000)

Show all products for which there is an order over $30,000 except for those products that sell for under $1000.

(SELECT MFR, PRODUCT

FROM ORDERS

WHERE AMOUNT > 30000.00)

EXCEPT

(SELECT MFR_ID, PRODUCT_ID

FROM PRODUCTS

WHERE PRICE < 100.00)

By default, the UNION, INTERSECT, and EXCEPT operations eliminate duplicate rows during their processing. This is usually the desired result, as it is in these examples, but occasionally, you may need to suppress the elimination of duplicate rows. You can do this by specifying the UNION ALL, INTERSECT ALL, or EXCEPT ALL forms of the operations.

Note each of these examples produces a two-column table of query results. The results come from two different source tables within the database—the ORDERS table and the PRODUCTS table. However, the columns selected from these tables have the same corresponding data types, so they can be combined using these operations. In the sample database, the corresponding columns have different names in the two tables. (The manufacturer-ID column is named MFR in the ORDERS table but named MFR_ID in the PRODUCTS table.)

However, corresponding columns such as these will often have the same name in each of the tables being combined. As a convenience, SQL2 lets you specify the corresponding columns in a CORRESPONDING clause attached to the UNION, INTERSECT, or EXCEPT operation. Here is the preceding UNION example, changed for the situation where the ORDERS and PRODUCTS tables have parallel column names for manufacturer ID and product ID:

Show all products for which there is an order over $30,000 or more than $30,000 worth of inventory on hand.

 (SELECT *

FROM ORDERS

WHERE AMOUNT > 30000.00)

UNION CORRESPONDING BY (MFR, PRODUCT)

(SELECT *

FROM PRODUCTS

WHERE (PRICE * QTY_ON_HAND) > 30000)

In a case like this one where all of the corresponding (that is, identically named) columns from the two tables participate in the UNION operation, SQL2 even allows you to leave off the explicit list of column names:

Show all products for which there is an order over $30,000 or more than $30,000 worth of inventory on hand.

 (select *

FROM ORDERS

WHERE AMOUNT > 30000.00)

UNION CORRESPONDING

(SELECT *

FROM PRODUCTS

WHERE (PRICE * QTY_ON_HAND) > 30000)

Finally, it’s worth noting that the column alias capability of the query specification can be used to rename or assign names to the columns from the individual query results that are being combined with the UNION operation. If we eliminate the assumption that the PRODUCTS and ORDERS tables use the same column names, it’s still possible to use the CORRESPONDING form of the UNION operation in this query simply by renaming the columns in one of the tables:

Show all products for which there is an order over $30,000 or more than $30,000 worth of inventory on hand.

 (SELECT *

FROM ORDERS

WHERE AMOUNT > 30000.00)

UNION CORRESPONDING

(SELECT MFR_ID AS MFR, PRODUCT_ID AS PRODUCT

FROM PRODUCTS

WHERE (PRICE * QTY_ON_HAND) > 30000)

In this simple example, there is not much advantage in this construct, but in the more general case where the individual queries involve calculated columns or are grouped queries, the CORRESPONDING clause and column aliases can help to clarify the meaning of the query.

13. Query Expressions in the from Clause

SQL2 query expressions provide a much more powerful and flexible method for generating and combining tables of query results than the simple subquery and UNION operations provided by the SQL1 standard. To make query expressions even more useful and more general-purpose, the SQL2 standard allows them to appear almost anywhere that a table reference could appear in a SQL1 query. In particular, a query expression can appear in place of a table name in the FROM clause. Here is a simple example of a SQL2 query for the sample database that uses this feature:

Show the names and total outstanding orders of all customers with credit limits over $50,000.

SELECT COMPANY, TOT_ORDERS

FROM CUSTOMER, (SELECT CUST, SUM(AMOUNT) AS TOT_ORDERS

FROM ORDERS

GROUP BY CUST),

WHERE (CREDIT_LIMIT > 50000.00)

AND (CUST_NUM = CUST)

The second “table name” in the FROM clause of the main query is not a table name at all, but a full-blown query expression. In fact, the expression could have been much more complex, involving UNION or JOIN operations. When a query expression appears in the FROM clause, as it does here, the DBMS conceptually carries it out first, before any other processing of the query, and creates a temporary table of the query results generated by the query expression. In this case, this temporary table consists of two columns, listing each customer number and the total of orders for that customer number. This temporary table then acts as one of the source tables for the main query. In this example, its contents are joined to the CUSTOMER table to obtain the company name and generate the answer to the main question.

There are many other ways in which this query could be written. The entire query could be written as one top-level grouped query that joins the CUSTOMER and ORDERS table. The join operation could be made explicit with a SQL2 JOIN operator, and then the results of the join could be grouped in the top-level query. As this example shows, one of the benefits of the SQL2 query expression capabilities is that they typically provide several different ways to obtain the same query results.

The general philosophy behind the SQL2 capabilities in this area is that the SQL language should provide the flexibility to express a query in the most natural form. The underlying DBMS must be able to take the query, however expressed, break it down into its fundamentals, and then determine the most efficient way to carry out the query. This internal query execution plan may be quite different than the apparent plan called for by the actual SQL statement, but as long as it produces the same query results, the net effect is to shift the optimization workload from the human user or programmer to the DBMS.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *