Correlated SQL Subqueries

In concept, SQL performs a subquery over and over again—once for each row of the main query. For many subqueries, however, the subquery produces the same results for every row or row group. Here is an example:

List the sales offices whose sales are below the average target.

SELECT CITY FROM OFFICES

WHERE SALES < (SELECT AVG(TARGET)

  FROM OFFICES)

CITY

——-

Denver

Atlanta

In this query, it would be silly to perform the subquery five times (once for each office). The average target doesn’t change with each office; it’s completely independent of the office currently being tested. As a result, SQL can handle the query by first performing the subquery, yielding the average target ($550,000), and then converting the main query into:

 SELECT CITY

FROM OFFICES

WHERE SALES < 550000.00

Commercial SQL implementations automatically detect this situation and use this shortcut whenever possible to reduce the amount of processing required by a subquery. However, the shortcut cannot be used if the subquery contains an outer reference, as in this example:

List all of the offices whose targets exceed the sum of the quotas of the salespeople who work in them:

SELECT CITY

WHERE TARGET > (SELECT SUM(QUOTA)

  FROM SALESREPS

  WHERE REP OFFICE = OFFICE)

CITY

——-

Chicago

Los Angeles

For each row of the OFFICES table to be tested by the WHERE clause of the main query, the OFFICE column (which appears in the subquery as an outer reference) has a different value. Thus, SQL has no choice but to carry out this subquery five times—once for each row in the OFFICES table. A subquery containing an outer reference is called a correlated subquery because its results are correlated with each individual row of the main query. For the same reason, an outer reference is sometimes called a correlated reference.

A subquery can contain an outer reference to a table in the FROM clause of any query that contains the subquery, no matter how deeply the subqueries are nested. A column name in a fourth-level subquery, for example, may refer to one of the tables named in the FROM clause of the main query, or to a table named in the FROM clause of the second-level subquery or the third-level subquery that contains it. Regardless of the level of nesting, an outer reference always takes on the value of the column in the current row of the table being tested.

Because a subquery can contain outer references, there is even more potential for ambiguous column names in a subquery than in a main query. When an unqualified column name appears within a subquery, SQL must determine whether it refers to a table in the subquery’s own FROM clause, or to a FROM clause in a query containing the subquery. To minimize the possibility of confusion, SQL always interprets a column reference in a subquery using the nearest FROM clause possible. To illustrate this point, in this example, the same table is used in the query and in the subquery:

List the salespeople who are over 40 and who manage a salesperson over quota.

SELECT NAME

FROM SALESREPS

WHERE AGE > 40

AND EMPL_NUM IN (SELECT MANAGER

FROM SALESREPS

WHERE SALES > QUOTA)

NAME

——

Sam Clark

Larry Fitch

The MANAGER, QUOTA, and SALES columns in the subquery are references to the SALESREPS table in the subquery’s own FROM clause; SQL does not interpret them as outer references, and the subquery is not a correlated subquery. SQL can perform the subquery first in this case, finding the salespeople who are over quota and generating a list of the employee numbers of their managers. SQL can then turn its attention to the main query, selecting managers whose employee numbers appear in the generated list.

If you want to use an outer reference within a subquery like the one in the previous example, you must use a table alias to force the outer reference. This request, which adds one more qualifying condition to the previous one, shows how:

List the managers who are over 40 and who manage a salesperson who is over quota and who does not work in the same sales office as the manager.

SELECT NAME

FROM SALESREPS MGRS

WHERE AGE > 40

AND MGRS.EMPL_NUM IN (SELECT MANAGER

FROM SALESREPS EMPS

WHERE EMPS.QUOTA > EMPS.SALES

AND EMPS.REP_OFFICE <> MGRS.REP_OFFICE)

NAME

——

Sam Clark

Larry Fitch

The copy of the SALESREPS table used in the main query now has the tag MGRS, and the copy in the subquery has the tag EMPS. The subquery contains one additional search condition, requiring that the employee’s office number does not match that of the manager. The qualified column name MGRS.OFFICE in the subquery is an outer reference, and this subquery is a correlated subquery.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *