Summary Queries in SQL: Group Search Conditions (HAVING Clause)

Just as the WHERE clause can be used to select and reject the individual rows that participate in a query, the HAVING clause can be used to select and reject row groups. The format of the HAVING clause parallels that of the WHERE clause, consisting of the keyword HAVING followed by a search condition. The HAVING clause thus specifies a search condition for groups.

An example provides the best way to understand the role of the HAVING clause. Consider this query:

What is the average order size for each salesperson whose orders total more than $30,000?

SELECT REP, AVG(AMOUNT)
FROM ORDERS
GROUP BY REP
HAVING SUM(AMOUNT) > 30000.00

REP   AVG(AMOUNT)
—- ————
105   $7,865.40
106   $16,479.00
107   $11,477.33
108   $8,376.14

Figure 8-5 shows graphically how SQL carries out the query. The GROUP BY clause first arranges the orders into groups by salesperson. The HAVING clause then eliminates any group where the total of the orders in the group does not exceed $30,000. Finally, the SELECT clause calculates the average order size for each of the remaining groups and generates the query results.

The search conditions you can specify in the HAVING clause are the same ones used in the WHERE clause, as described in Chapters 6 and 9. Here is another example of the use of a group search condition:

For each office with two or more people, compute the total quota and total sales for all salespeople who work in the office.

 SELECT CITY, SUM(QUOTA), SUM(SALESREPS.SALES)

FROM OFFICES, SALESREPS

WHERE OFFICE = REP_OFFICE GROUP BY CITY HAVING COUNT(*) >= 2

CITY          SUM(QUOTA)   SUM(SALESREPS.SALES)
———— ———— ———————
Chicago      $775,000.00    $735,042.00
Los Angeles  $700,000.00    $835,915.00
New York     $575,000.00    $692,637.00

The following steps show the rules for SQL query processing, expanded once again to include group search conditions.

To generate the query results for a SELECT statement:

  1. If the statement is a UNION of SELECT statements, apply Steps 2 through 7 to each of the statements to generate their individual query results.
  2. Form the product of the tables named in the FROM If the FROM clause names a single table, the product is that table.
  3. If there is a WHERE clause, apply its search condition to each row of the product table, retaining those rows for which the search condition is TRUE (and discarding those for which it is FALSE or NULL).
  4. If there is a GROUP BY clause, arrange the remaining rows of the product table into row groups, so that the rows in each group have identical values in all of the grouping columns.
  5. If there is a HAVING clause, apply its search condition to each row group, retaining those groups for which the search condition is TRUE (and discarding those for which it is FALSE or NULL).
  6. For each remaining row (or row group), calculate the value of each item in the select list to produce a single row of query results. For a simple column reference, use the value of the column in the current row (or row group). For a column function, use the current row group as its argument if GROUP BY is specified; otherwise, use the entire set of rows.
  7. If SELECT DISTINCT is specified, eliminate any duplicate rows of query results that were produced.
  8. If the statement is a UNION of SELECT statements, merge the query results for the individual statements into a single table of query results. Eliminate duplicate rows unless UNION ALL is specified.
  9. If there is an ORDER BY clause, sort the query results as specified.

The rows generated by this procedure comprise the query results.

Following this procedure, SQL handles the query in the previous example as follows:

  1. Joins the OFFICES and SALESREPS tables to find the city where each salesperson works.
  2. Groups the resulting rows by office.
  3. Eliminates groups with two or fewer rows—these represent offices that don’t meet the HAVING clause criterion.
  4. Calculates the total quota and total sales for each group.

Here is one more example, which uses all of the SELECT statement clauses:

Show the price, quantity on hand, and total quantity on order for each product where the total quantity on order is more than 75 percent of the quantity on hand.

SELECT DESCRIPTION, PRICE, QTY_ON_HAND, SUM(QTY)

FROM PRODUCTS, ORDERS WHERE MFR = MFR_ID

AND PRODUCT = PRODUCT_ID

GROUP BY MFR_ID, PRODUCT_ID, DESCRIPTION, PRICE, QTY_ON_HAND HAVING SUM(QTY) > (.75 * QTY_ON_HAND)

ORDER BY QTY_ON_HAND DESC

DESCRIPTION        PRICE     QTY_ON_HAND  SUM(QTY)

—————- ———- ———— ———
Reducer            $355.00      38          32
Widget Adjuster    $25.00       37          30
Motor Mount        $243.00      15          16
Right Hinge        $4,500.00    12          15
500-lb Brace       $1,425.00     5          22

To process this query, SQL conceptually performs the following steps:

  1. Joins the ORDERS and PRODUCTS tables to find the description, price, and quantity on hand for each product ordered.
  2. Groups the resulting rows by manufacturer and product ID.
  3. Eliminates groups where the quantity ordered (the total of the QTY column for all orders in the group) is less than 75 percent of the quantity on hand.
  4. Calculates the total quantity ordered for each group.
  5. Generates one summary row of query results for each group.
  6. Sorts the query results so that products with the largest quantity on hand appear first.

As described previously, DESCRIPTION, PRICE, and QTY_ON_HAND must be specified as grouping columns in this query solely because they appear in the select list. They actually contribute nothing to the grouping process, because the MFR_ID and PRODUCT_ID completely specify a single row of the PRODUCTS table, automatically making the other three columns single-valued per group.

1. Restrictions on Group Search Conditions

The HAVING clause is used to include or exclude row groups from the query results, so the search condition it specifies must be one that applies to the group as a whole rather than to individual rows. This means that an item appearing within the search condition in a HAVING clause can be:

  • A constant
  • A column function, which produces a single value summarizing the rows in the group
  • A grouping column, which by definition has the same value in every row of the group
  • An expression involving combinations of these

In practice, the search condition in the HAVING clause will always include at least one column function. If it did not, the search condition could be moved to the WHERE clause and applied to individual rows. The easiest way to figure out whether a search condition belongs in the WHERE clause or in the HAVING clause is to remember how the two clauses are applied:

  • The WHERE clause is applied to individual rows, so the expressions it contains must be computable for individual rows.
  • The HAVING clause is applied to row groups, so the expressions it contains must be computable for a group of rows.

2. Null Values and Group Search Conditions

Like the search condition in the WHERE clause, the HAVING clause search condition can produce one of three results:

  • If the search condition is TRUE, the row group is retained, and it contributes a summary row to the query results.
  • If the search condition is FALSE, the row group is discarded, and it does not contribute a summary row to the query results.
  • If the search condition is NULL, the row group is discarded, and it does not contribute a summary row to the query results.

The anomalies that can occur with NULL values in the search condition are the same as those for the WHERE clause and have been described in Chapter 6.

3. HAVING Without GROUP BY

The HAVING clause is almost always used in conjunction with the GROUP BY clause, but the syntax of the SELECT statement does not require it. If a HAVING clause appears without a GROUP BY clause, SQL considers the entire set of detailed query results to be a single group. In other words, the column functions in the HAVING clause are applied to one, and only one, group to determine whether the group is included or excluded from the query results, and that group consists of all the rows. The use of a HAVING clause without a corresponding GROUP BY clause is seldom seen in practice.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *