Using SQL Subqueries

A subquery is a query within a query. The results of the subquery are used by the DBMS to determine the results of the higher-level query that contains the subquery. In the simplest forms of a subquery, the subquery appears within the WHERE or HAVING clause of another SQL statement. Subqueries provide an efficient, natural way to handle query requests that are themselves expressed in terms of the results of other queries. Here is an example of such a request:

List the offices where the sales target for the office exceeds the sum of the individual salespeople’s quotas.

The request asks for a list of offices from the OFFICES table, where the value of the TARGET column meets some condition. It seems reasonable that the SELECT statement that expresses the query should look something like this:

SELECT CITY FROM OFFICES WHERE TARGET > ???

The value ” ??? ” needs to be filled in and should be equal to the sum of the quotas of the salespeople assigned to the office in question. How can you specify that value in the query? From Chapter 8, you know that the sum of the quotas for a specific office (say, office number 21) can be obtained with this query:

SELECT SUM(QUOTA)

FROM SALESREPS

WHERE REP_OFFICE = 21

But it would be inefficient to have to type in this query, write down the results, and then type in the previous query with the correct amount. How can you put the results of this query into the earlier query in place of the question marks? It would seem reasonable to start with the first query and replace the “???” with the second query, as follows:

 SELECT CITY

FROM OFFICES

WHERE TARGET > (SELECT SUM(QUOTA)

FROM SALESREPS

WHERE REP_OFFICE = OFFICE)

In fact, this is a correctly formed SQL query. For each office, the inner query (the subquery) calculates the sum of the quotas for the salespeople working in that office.

The outer query (the main query) compares the office’s target to the calculated total and decides whether to add the office to the main query results. Working together, the main query and the subquery express the original request and retrieve the requested data from the database.

SQL subqueries typically appear as part of the WHERE clause or the HAVING clause. In the WHERE clause, they help to select the individual rows that appear in the query results. In the HAVING clause, they help to select the row groups that appear in the query results.

1. What Is a Subquery?

Figure 9-1 shows the form of a SQL subquery. The subquery is enclosed in parentheses, but otherwise it has the familiar form of a SELECT statement, with a FROM clause and optional WHERE, GROUP BY, and HAVING clauses. The form of these clauses in a subquery is identical to that in a SELECT statement, and they perform their normal functions when used within a subquery. There are, however, a few differences between a subquery and an actual SELECT statement:

  • In the most common uses, a subquery must produce a single column of data as its query results. This means that a subquery almost always has a single select item in its SELECT clause.
  • The ORDER BY clause cannot be specified in a subquery. The subquery results are used internally by the main query and are never visible to the user, so it makes little sense to sort them anyway.
  • Column names appearing in a subquery may refer to columns of tables in the main query. These outer references are described in detail later in the “Outer References” section.
  • in most implementations, a subquery cannot be the UNION of several different SELECT statements; only a single SELECT is allowed. (The SQL2 standard allows much more powerful query expressions and relaxes this restriction, as described later in the section “Advanced Queries in SQL2.”)

2. Subqueries in the where Clause

Subqueries are most frequently used in the WHERE clause of a SQL statement. When a subquery appears in the WHERE clause, it works as part of the row selection process. The very simplest subqueries appear within a search condition and produce a value that is used to test the search condition. Here is an example of a simple subquery:

List the salespeople whose quota is less than 10 percent of the company wide sales target.

SELECT NAME

FROM SALESREPS

WHERE QUOTA < (.1 * (SELECT SUM(TARGET) FROM OFFICES))

NAME

———

Bob Smith

In this case, the subquery calculates the sum of the sales targets for all of the offices to determine the companywide target, which is multiplied by 10 percent to determine the cutoff sales quota for the query. That value is then used in the search condition to check each row of the SALESREPS table and find the requested names. In this simple case, the subquery produces the same value for every row of the SALESREPS table; the QUOTA value for each salesperson is compared to the same companywide number. In fact, you could carry out this query by first performing the subquery, to calculate the cutoff quota amount ($275,000 in the sample database), and then carry out the main query using this number in a simple WHERE clause:

WHERE QUOTA < 275000

It’s more convenient to use the subquery, but it’s not essential. Usually, subqueries are not this simple. For example, consider once again the query from the previous section:

List the offices where the sales target for the office exceeds the sum of the individual salespeople’s quotas.

SELECT CITY
FROM OFFICES
WHERE TARGET > (SELECT SUM(QUOTA)

  FROM SALESREPS

  WHERE REP OFFICE = OFFICE)

CITY

——-

Chicago

Los Angeles

In this (more typical) case, the subquery cannot be calculated once for the entire query. The subquery produces a different value for each office, based on the quotas of the salespeople in that particular office. Figure 9-2 shows conceptually how SQL carries out the query. The main query draws its data from the OFFICES table, and the WHERE clause selects which offices will be included in the query results. SQL goes through the rows of the OFFICES table one by one, applying the test stated in the WHERE clause.

The WHERE clause compares the value of the TARGET column in the current row to the value produced by the subquery. To test the TARGET value, SQL carries out the subquery, finding the sum of the quotas for salespeople in the current office. The subquery produces a single number, and the WHERE clause compares the number to the TARGET value, selecting or rejecting the current office based on the comparison. As the figure shows, SQL carries out the subquery repeatedly, once for each row tested by the WHERE clause of the main query.

3. Outer References

Within the body of a subquery, it’s often necessary to refer to the value of a column in the current row of the main query. Consider once again the query from the previous sections:

List the offices where the sales target for the office exceeds the sum of the individual salespeople’s quotas.

SELECT CITY

FROM OFFICES

WHERE TARGET > (SELECT SUM(QUOTA)

  FROM SALESREPS

  WHERE REP_OFFICE = OFFICE)

The role of the subquery in this SELECT statement is to calculate the total quota for those salespeople who work in a particular office—specifically, the office currently being tested by the WHERE clause of the main query. The subquery does this by scanning the SALESREPS table. But notice that the OFFICE column in the WHERE clause of the subquery doesn’t refer to a column of the SALESREPS table; it refers to a column of the OFFICES table, which is a part of the main query. As SQL moves through each row of the OFFICES table, it uses the OFFICE value from the current row when it carries out the subquery.

The OFFICE column in this subquery is an example of an outer reference, which is a column name that does not refer to any of the tables named in the FROM clause of the subquery in which the column name appears. Instead, the column name refers to a column of a table specified in the FROM clause of the main query. As the previous example shows, when the DBMS examines the search condition in the subquery, the value of the column in an outer reference is taken from the row currently being tested by the main query.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *