Simple Queries in SQL: Rules for Single-Table Query Processing

Single-table queries are generally simple, and it’s usually easy to understand the meaning of a query just by reading the SELECT statement. As queries become more complex, however, it’s important to have a more precise “definition” of the query results that will be produced by a given SELECT statement. The following steps describe the procedure for generating the results of a SQL query that includes the clauses described in this chapter.

As these steps show, the query results produced by a SELECT statement are specified by applying each of its clauses, one by one. The FROM clause is applied first (selecting the table containing data to be retrieved). The WHERE clause is applied next (selecting specific rows from the table). The SELECT clause is applied next (generating the specific columns of query results and eliminating duplicate rows, if requested). Finally, the ORDER BY clause is applied to sort the query results.

To generate the query results for a SELECT statement, follow these steps:

  1. Start with the table named in the FROM
  2. If there is a WHERE clause, apply its search condition to each row of the table, retaining those rows for which the search condition is TRUE, and discarding those rows for which it is FALSE or NULL.
  3. For each remaining row, calculate the value of each item in the select list to produce a single row of query results. For each column reference, use the value of the column in the current row.
  4. If SELECT DISTINCT is specified, eliminate any duplicate rows of query results that were produced.
  5. If there is an ORDER BY clause, sort the query results as specified.

The rows generated by this procedure comprise the query results.

These “rules” for SQL query processing will be expanded several times in the next three chapters to include the remaining clauses of the SELECT statement.

1. Combining Query Results (union) *

Occasionally, it’s convenient to combine the results of two or more queries into a single table of query results. SQL supports this capability through the UNION feature of the SELECT statement. Figure 6-15 illustrates how to use the UNION operation to satisfy the following request:

List all the products where the price of the product exceeds $2,000 or where more than $30,000 of the product has been ordered in a single order.

The first part of the request can be satisfied with the top query in the figure:

List all the products whose price exceeds $2,000.

SELECT MFR_ID, PRODUCT_ID

FROM PRODUCTS

WHERE PRICE > 2000.00

MFR_ID  PRODUCT_ID

—————–

ACI      4100Y

REI      2A44L

ACI      4100Z

REI      2A44R

Similarly, the second part of the request can be satisfied with the bottom query in the figure:

List all the products where more than $30,000 of the product has been ordered in a single order.

SELECT DISTINCT MFR, PRODUCT

FROM ORDERS

WHERE AMOUNT > 30000.00

MFR  PRODUCT

———–

IMM  775C

REI  2A44L

REI  2A44R

As shown in Figure 6-15, the UNION operation produces a single table of query results that combines the rows of the top query results with the rows of the bottom query results. The SELECT statement that specifies the UNION operation looks like this:

List all the products where the price of the product exceeds $2,000 or where more than $30,000 of the product has been ordered in a single order.

SELECT MFR_ID, PRODUCT_ID

FROM   PRODUCTS

WHERE  PRICE > 2000.00

UNION

SELECT DISTINCT MFR, PRODUCT

FROM ORDERS

WHERE AMOUNT > 30000.00

ACI      4100Y

ACI      4100Z

IMM       775C

REI      2A44L

REI      2A44R

There are severe restrictions on the tables that can be combined bya UNION operation:

  • The two tables must contain the same number of columns.
  • The data type of each column in the first table must be the same as the data type of the corresponding column in the second table.
  • Neither of the two tables can be sorted with the ORDER BY However, the combined query results can be sorted, as described in the following section.

Note that the column names of the two queries combined by UNION do not have to be identical. In the preceding example, the first table of query results has columns named MFR_ID and PRODUCT_ID, while the second table of query results has columns named MFR and PRODUCT. Because the columns in the two tables can have different names, the columns of query results produced by the UNION operation are unnamed.

The ANSI/ISO SQL standard specifies a further restriction on a SELECT statement that participates in a UNION operation. It permits only column names or an all-columns specification (SELECT *) in the select list and prohibits expressions in the select list. Most commercial SQL implementations relax this restriction and permit simple expressions in the select list. However, many SQL implementations do not allow the SELECT statements to include the GROUP BY or HAVING clauses, and some do not allow column functions in the select list (prohibiting summary queries as described in Chapter 8). In fact, some SQL implementations do not support the UNION operation at all.

2. Unions and Duplicate Rows *

Because the UNION operation combines the rows from two sets of query results, it would tend to produce query results containing duplicate rows. For example, in the query of Figure 6-15, product REI-2A44L sells for $4500.00, so it appears in the top set of query results. There is also an order for $31,500.00 worth of this product in the ORDERS table, so it also appears in the bottom set of query results. By default, the UNION operation eliminates duplicate rows as part of its processing. Thus, the combined set of query results contains only one row for product REI-2A44L.

If you want to retain duplicate rows in a UNION operation, you can specify the ALL keyword immediately following the word UNION. This form of the query produces two duplicate rows for product REI-2A44L:

List all the products where the price of the product exceeds $2,000 or where more than $30,000 of the product has been ordered in a single order.

SELECT MFR_ID, PRODUCT_ID

FROM PRODUCTS

WHERE PRICE > 2000.00

UNION ALL

SELECT DISTINCT MFR, PRODUCT

FROM ORDERS

WHERE AMOUNT > 30000.00

ACI 4100Y

REI 2A44L

ACI 4100Z

REI 2A44R

IMM 775C

REI 2A44L

REI 2A44R

Note that the default duplicate row handling for the UNION operation and for the simple SELECT statement is exactly opposite. For the SELECT statement, SELECT ALL (duplicates retained) is the default. To eliminate duplicate rows, you must explicitly specify SELECT DISTINCT. For the UNION operation, UNION (duplicates eliminated) is the default. To retain duplicate rows, you must explicitly specify UNION ALL.

Database experts have criticized the handling of duplicate rows in SQL and point to this inconsistency as an example of the problems. The reason for the inconsistency is that the SQL defaults were chosen to produce the correct behavior most of the time:

  • In practice, most simple SELECT statements do not produce duplicate rows, so the default is no duplicate elimination.
  • In practice, most UNION operations would produce unwanted duplicate rows, so the default is duplicate elimination.

Eliminating duplicate rows from query results is a very time-consuming process, especially if the query results contain a large number of rows. If you know, based on the individual queries involved, that a UNION operation cannot produce duplicate rows, you should specifically use the UNION ALL operation because the query will execute much more quickly.

3. Unions and Sorting *

The ORDER BY clause cannot appear in either of the two SELECT statements combined bya UNION operation. It wouldn’t make much sense to sort the two sets of query results anyway, because they are fed directly into the UNION operation and are never visible to the user. However, the combined set of query results produced by the UNION operation can be sorted by specifying an ORDER BY clause after the second SELECT statement. Since the columns produced by the UNION operation are not named, the ORDER BY clause must specify the columns by column number.

Here is the same products query as that shown in Figure 6-15, with the query results sorted by manufacturer and product number:

List all the products where the price of the product exceeds $2,000 or where more than $30,000 of the product has been ordered in a single order, sorted by manufacturer and product number.

SELECT MFR_ID, PRODUCT_ID

FROM PRODUCTS

WHERE PRICE > 2000.00 UNION

SELECT DISTINCT MFR, PRODUCT

FROM ORDERS

WHERE AMOUNT > 30000.00

ORDER BY 1, 2

ACI      4100Y

ACI      4100Z

IMM     775C

REI      2A44L

REI      2A44R

4. Multiple UNIONS*

You can use the UNION operation repeatedly to combine three or more sets of query results, as shown in Figure 6-16. The union of Table B and Table C in the figure produces a single, combined table. This table is then combined with Table A in another UNION operation. The query in the figure is written this way:

SELECT *

FROM A

UNION (SELECT * FROM B UNION SELECT * FROM C)

Bill

Mary

George

Fred

Sue

Julia

Harry

The parentheses in the query indicate which UNION operation should be performed first. In fact, if all of the UNIONs in the statement eliminate duplicate rows, or if all of them retain duplicate rows, the order in which they are performed is unimportant. These three expressions are completely equivalent:

A UNION (B UNION C)

(A UNION B) UNION C

(A UNION C) UNION B

and produce seven rows of query results. Similarly, the following three expressions are completely equivalent and produce twelve rows of query results, because the duplicates are retained:

A UNION ALL (B UNION ALL C)

(A UNION ALL B) UNION ALL C

(A UNION ALL C) UNION ALL B

However, if the unions involve a mixture of UNION and UNION ALL, the order of evaluation matters. If this expression:

A UNION ALL B UNION C

is interpreted as:

A UNION ALL (B UNION C)

then it produces ten rows of query results (six from the inner UNION, plus four rows from Table A). However, if it is interpreted as:

(A UNION ALL B) UNION C

then it produces only four rows, because the outer UNION eliminates all duplicate rows. For this reason, it’s always a good idea to use parentheses in UNIONs of three or more tables to specify the order of evaluation intended.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *