SQL Considerations for Multitable Queries

The multitable queries described thus far have not required any special SQL syntax or language features beyond those described for single-table queries. However, some multitable queries cannot be expressed without the additional SQL language features described in the following sections. Specifically:

  • Qualified column names are sometimes needed in multitable queries to eliminate ambiguous column references.
  • All-column selections(SELECT *) have a special meaning for multitable queries.
  • Self-joins can be used to create a multitable query that relates a table to itself.
  • Table aliase scan be used in the FROM clause to simplify qualified column names and allow unambiguous column references in self-joins.

1. Qualified Column Names

The sample database includes several instances where two tables contain columns with the same name. The OFFICES table and the SALESREPS table, for example, both have a column named SALES. The column in the OFFICES table contains year-to-date sales for each office; the one in the SALESREPS table contains year-to-date sales for each salesperson. Normally, there is no confusion between the two columns, because the FROM clause determines which of them is appropriate in any given query, as in these examples:

Show the cities where sales exceed target.

SELECT CITY, SALES

FROM OFFICES

WHERE SALES > TARGET

Show all salespeople with sales over $350,000.

 SELECT NAME, SALES

FROM SALESREPS

WHERE SALES > 350000.00

However, here is a query where the duplicate names cause a problem:

Show the name, sales, and office for each salesperson.

SELECT NAME, SALES, CITY

FROM SALESREPS, OFFICES

WHERE REP_OFFICE = OFFICE

Error: Ambiguous column name “SALES”

Although the English description of the query implies that you want the SALES column in the SALESREPS table, the SQL query is ambiguous. The DBMS has no way of knowing whether you want the SALES column from the SALESREPS table or the one from the OFFICES table, since both are contributing data to the query results. To eliminate the ambiguity, you must use a qualified column name to identify the column. Recall from Chapter 5 that a qualified column name specifies the name of a column and the table containing the column. The qualified names of the two SALES columns in the sample database are:

OFFICES.SALES and SALESREPS.SALES

A qualified column name can be used in a SELECT statement anywhere that a column name is permitted. The table specified in the qualified column name must, of course, match one of the tables specified in the FROM list. Here is a corrected version of the previous query that uses a qualified column name:

Show the name, sales, and office for each salesperson.

 SELECT NAME, SALESREPS.SALES, CITY

FROM SALESREPS, OFFICES

WHERE REP_OFFICE = OFFICE

Using qualified column names in a multitable query is always a good idea. The disadvantage, of course, is that they make the query text longer. When using interactive SQL, you may want to first try a query with unqualified column names and let SQL find any ambiguous columns. If SQL reports an error, you can edit your query to qualify the ambiguous columns.

2. All-Column Selections

As discussed in Chapter 6, SELECT * can be used to select all columns of the table named in the FROM clause. In a multitable query, the asterisk selects all columns of all tables in the FROM clause. The following query, for example, would produce fifteen columns of query results—the nine columns from the SALESREPS table followed by the six columns from the OFFICES table:

Tell me all about salespeople and the offices where they work.

 SELECT *

FROM SALESREPS, OFFICES

WHERE REP_OFFICE = OFFICE

Obviously, the SELECT * form of a query becomes much less practical when there are two, three, or more tables in the FROM clause.

Many SQL dialects treat the asterisk as a special kind of wildcard column name that is expanded into a list of columns. In these dialects, the asterisk can be qualified with a table name, just like a qualified column reference. In the following query, the select item SALESREPS.* is expanded into a list containing only the columns found in the SALESREPS table:

Tell me all about salespeople and the places where they work.

SELECT SALESREPS.*, CITY, REGION

FROM SALESREPS, OFFICES

WHERE REP_OFFICE = OFFICE

The query would produce eleven columns of query results—the nine columns of the SALESREPS table, followed by the two other columns explicitly requested from the OFFICES table. This type of “qualified all-columns” select item is supported in many, but not all brands of SQL-based DBMS. It was not allowed by the SQL1 standard but is part of the ANSI/ISO SQL2 specification.

3. Self-Joins

Some multitable queries involve a relationship that a table has with itself. For example, suppose you want to list the names of all salespeople and their managers. Each salesperson appears as a row in the SALESREPS table, and the MANAGER column contains the employee number of the salesperson’s manager. It would appear that the MANAGER column should be a foreign key for the table that holds data about managers. In fact it is—it’s a foreign key for the SALESREPS table itself!

If you tried to express this query like any other two-table query involving a foreign key/primary key match, it would look like this:

SELECT NAME, NAME

FROM SALESREPS, SALESREPS

WHERE MANAGER = EMPL_NUM

This SELECT statement is illegal because of the duplicate reference to the SALESREPS table in the FROM clause. You might also try eliminating the second reference to the SALESREPS table:

SELECT NAME, NAME

FROM SALESREPS

WHERE MANAGER = EMPL_NUM

This query is legal, but it won’t do what you want it to do! It’s a single-table query, so SQL goes through the SALESREPS table one row at a time, applying the search condition:

MANAGER = EMPL_NUM

The rows that satisfy this condition are those where the two columns have the same value—that is, rows where a salesperson is their own manager. There are no such rows, so the query would produce no results—not exactly the data that the English-language statement of the query requested.

To understand how SQL solves this problem, imagine there were two identical copies of the SALESREPS table, one named EMPS, containing employees, and one named MGRS, containing managers, as shown in Figure 7-9. The MANAGER column of the EMPS table would then be a foreign key for the MGRS table, and the following query would work:

List the names of salespeople and their managers.

 SELECT EMPS.NAME, MGRS.NAME

FROM EMPS, MGRS

WHERE EMPS.MANAGER = MGRS.EMPL_NUM

Because the columns in the two tables have identical names, all of the column references are qualified. Otherwise, this looks like an ordinary two-table query.

SQL uses exactly this “imaginary duplicate table” approach to join a table to itself. Instead of actually duplicating the contents of the table, SQL lets you simply refer to it by a different name, called a table alias. Here’s the same query, written using the aliases EMPS and MGRS for the SALESREPS table:

List the names of salespeople and their managers.

The FROM clause assigns a different alias to each of the two “copies” of the SALESREPS table that are involved in the query by specifying the alias name immediately after the actual table name. As the example shows, when a FROM clause contains a table alias, the alias must be used to identify the table in qualified column references. Of course, it’s really only necessary to use an alias for one of the two table occurrences in this query. It could just as easily have been written:

 SELECT SALESREPS.NAME, MGRS.NAME

FROM SALESREPS, SALESREPS MGRS

WHERE SALESREPS.MANAGER = MGRS.EMPL_NUM

Here the alias MGRS is assigned to one “copy” of the table, while the table’s own name is used for the other copy.

Here are some additional examples of self-joins:

List salespeople with a higher quota than their manager.

 SELECT SALESREPS.NAME, SALESREPS.QUOTA, MGRS.QUOTA

FROM SALESREPS, SALESREPS MGRS

WHERE SALESREPS.MANAGER = MGRS.EMPL_NUM

AND SALESREPS.QUOTA > MGRS.QUOTA

List salespeople who work in different offices than their manager, showing the name and office where each works.

4. Table Aliases

As described in the previous section, table aliases are required in queries involving self-joins. However, you can use an alias in any query. For example, if a query refers to another user’s table, or if the name of a table is very long, the table name can become tedious to type as a column qualifier. This query, which references the BIRTHDAYS table owned by the user named SAM:

List names, quotas, and birthdays of salespeople.

 SELECT SALESREPS.NAME, QUOTA, SAM.BIRTHDAYS.BIRTH_DATE

FROM SALESREPS, BIRTHDAYS

WHERE SALESREPS.NAME = SAM.BIRTHDAYS.NAME

becomes easier to read and type when the aliases S and B are used for the two tables:

List names, quotas, and birthdays of salespeople.

SELECT S.NAME, S.QUOTA, B.BIRTH_DATE

FROM SALESREPS S, SAM.BIRTHDAYS B

WHERE S.NAME = B.NAME

Figure 7-10 shows the basic form of the FROM clause for a multitable SELECT statement, complete with table aliases. The clause has two important functions:

  • The FROM clause identifies all of the tables that contribute data to the query results. Any columns referenced in the SELECT statement must come from one of the tables named in the FROM clause. (There is an exception for outer references contained in a subquery, as described in Chapter 9.)
  • The FROM clause specifies the tag that is used to identify the table in qualified column references within the SELECT statement. If a table alias is specified, it becomes the table tag; otherwise, the table’s name, exactly as it appears in the FROM clause, becomes the tag.

The only requirement for table tags in the FROM clause is that all of the table tags in a given FROM clause must be distinct from each other. The SQL2 specification optionally allows the keyword AS to appear between a table name and table alias. While this makes the FROM clause easier to read, it may not yet be supported in your specific SQL implementation. (Note that the SQL2 specification uses the term correlation name to refer to what we have called a table alias. The function and meaning of a correlation name are exactly as described here; many SQL products use the term alias, and it is more descriptive of the function that a table alias performs. The SQL2 standard specifies a similar technique for designating alternate column names, and in that situation the column alias name is actually called an alias in the standard.)

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *