Multitable Queries in SQL: Outer Joins *

The SQL join operation combines information from two tables by forming pairs of related rows from the two tables. The row pairs that make up the joined table are those where the matching columns in each of the two tables have the same value. If one of the rows of a table is unmatched in this process, the join can produce unexpected results, as illustrated by these queries:

List the salespeople and the offices where they work.

SELECT NAME, REP_OFFICE FROM SALESREPS

NAME           REP_OFFICE

——————–

Bill Adams       13

Mary Jones       11

Sue Smith        21

Sam Clark        11

Bob Smith        12

Dan Roberts      12

Tom Snyder       NULL

Larry Fitch      21

Paul Cruz        12

Nancy Angelli    22

List the salespeople and the cities where they work.

SELECT NAME, CITY

FROM SALESREPS, OFFICES

WHERE REP_OFFICE = OFFICE

NAME          CITY

—————–

Mary Jones   New York

Sam Clark    New York

Bob Smith    Chicago

Paul Cruz    Chicago

Dan Roberts  Chicago

Bill Adams   Atlanta

Sue Smith    Los Angeles

Larry Fitch  Los Angeles

Nancy Angelli Denver

Based on the English-language descriptions of these two queries, you would probably expect them to produce the same number of rows. But the first query includes a row for each of the ten salespeople, while the second query produces only nine. Why? Because Tom Snyder is currently unassigned and has a NULL value in the REP_OFFICE column (which is the matching column for the join). This NULL value doesn’t match any of the office numbers in the OFFICES table, so Tom’s row in the SALESREPS table is unmatched. As a result, it “vanishes” in the join. The standard SQL join thus has the potential to lose information if the tables being joined contain unmatched rows.

Based on the English-language version of the request, you would probably expect the second query to produce results like these:

List the salespeople and the cities where they work.

SELECT NAME, CITY

FROM SALESREPS, OFFICES

WHERE REP OFFICE *= OFFICE

NAME          CITY

——————

Tom Snyder     NULL

Mary Jones     New York

Sam Clark      New York

Bob Smith      Chicago

Paul Cruz      Chicago

Dan Roberts    Chicago

Bill Adams     Atlanta

Sue Smith      Los Angeles

Larry Fitch    Los Angeles

Nancy Angelli  Denver

These query results are generated by using a different type of join operation, called an outer join (indicated by the “*=” notation in the WHERE clause). The outer join is an extension of the standard join described earlier in this chapter, which is sometimes called an inner join. The SQL1 standard specifies only the inner join; it does not include the outer join. The earlier IBM SQL products also support only the inner join. However, the outer join is a well-understood and useful part of the relational database model, and it has been implemented in many non-IBM SQL products, including the flagship database products from Microsoft, Sybase, Oracle, and IBM’s Informix. The outer join is also the most natural way to express a certain type of query request, as shown in the remainder of this section.

To understand the outer join well, it’s useful to move away from the sample database and consider the two simple tables in Figure 7-12. The GIRLS table lists five girls and the cities where they live; the BOYS table lists five boys and the cities where they live. To find the girl/boy pairs who live in the same city, you could use this query, which forms the inner join of the two tables:

List the girls and boys who live in the same city.

The inner join produces four rows of query results. Notice that two of the girls (Anne and Nancy) and two of the boys (James and George) are not represented in the query results. These rows cannot be paired with any row from the other table, and so they are missing from the inner join results. Two of the unmatched rows (Anne and James) have valid values in their CITY columns, but they don’t match any cities in the opposite table. The other two unmatched rows (Nancy and George) have NULL values in their CITY columns, and by the rules of SQL NULL handling, the NULL value doesn’t match any other value (even another NULL value).

Suppose you wanted to list the girl/boy pairs who share the same cities and include the unmatched girls and boys in the list. The outer join of the GIRLS and BOYS tables produces exactly this result. The following list shows the procedure for constructing the outer join, and the outer join is shown graphically in Figure 7-12.

  1. Begin with the inner join of the two tables, using matching columns in the normal way.
  2. For each row of the first table that is not matched by any row in the second table, add one row to the query results, using the values of the columns in the first table, and assuming a NULL value for all columns of the second table.
  3. For each row of the second table that is not matched by any row in the first table, add one row to the query results, using the values of the columns in the second table, and assuming a NULL value for all columns of the first table.
  4. The resulting table is the outer join of the two tables.

Here is the SQL statement that produces the outer join:

List girls and boys in the same city, including any unmatched girls or boys.

The outer join of the two tables contains eight rows. Four of the rows are identical to those of the inner join between the two tables. Two other rows, for Anne and Nancy, come from the unmatched rows of the GIRLS table. These rows have been NULL- extended by matching them to an imaginary row of all NULLs in the BOYS table, and adding them to the query results. The final two rows, for James and George, come from the unmatched rows of the BOYS table. These rows have also been NULL-extended by matching them to an imaginary row of all NULLs in the GIRLS table and then adding them to the query results.

As this example shows, the outer join is an “information-preserving” join. Every row of the BOYS table is represented in the query results (some more than once). Similarly, every row of the GIRLS table is represented in the query results (again, some more than once).

1. Left and Right Outer Joins *

Technically, the outer join produced by the previous query is called the full outer join of the two tables. Both tables are treated symmetrically in the full outer join. Two other well-defined outer joins do not treat the two tables symmetrically.

The left outer join between two tables is produced by following Step 1 and Step 2 in the previous numbered list but omitting Step 3. The left outer join thus includes NULL-extended copies of the unmatched rows from the first (left) table but does not include any unmatched rows from the second (right) table. Here is a left outer join between the GIRLS and BOYS tables:

List girls and boys in the same city and any unmatched girls.

The query produces six rows of query results, showing the matched girl/boy pairs and the unmatched girls. The unmatched boys are missing from the results.

Similarly, the right outer join between two tables is produced by following Step 1 and Step 3 in the previous numbered list but omitting Step 2. The right outer join thus includes NULL-extended copies of the unmatched rows from the second (right) table but does not include the unmatched rows of the first (left) table. Here is a right outer join between the GIRLS and BOYS tables:

List girls and boys in the same city and any unmatched boys.

This query also produces six rows of query results, showing the matched girl/ boy pairs and the unmatched boys. This time the unmatched girls are missing from the results.

As noted before, the left and right outer joins do not treat the two joined tables symmetrically. It is often useful to think about one of the tables being the “major” table (the one whose rows are all represented in the query results) and the other table being the “minor” table (the one whose columns contain NULL values in the joined query results). In a left outer join, the left (first-mentioned) table is the major table, and the right (later-named) table is the minor table. The roles are reversed in a right outer join (right table is major, left table is minor).

In practice, the left and right outer joins are more useful than the full outer join, especially when joining data from two tables using a parent/child (primary key/ foreign key) relationship. To illustrate, consider once again the sample database. We have already seen one example involving the SALESREPS and OFFICES table. The REP_OFFICE column in the SALESREPS table is a foreign key to the OFFICES table; it tells the office where each salesperson works, and it is allowed to have a NULL value for a new salesperson who has not yet been assigned to an office. Tom Snyder is such a salesperson in the sample database. Any join that exercises this SALESREPS-to-OFFICES relationship and expects to include data for Tom Snyder must be an outer join, with the SALESREPS table as the major table. Here is the example used earlier:

List the salespeople and the cities where they work.

SELECT NAME, CITY

FROM SALESREPS, OFFICES
WHERE REP_OFFICE *= OFFICE

NAME          CITY

——————-

Tom Snyder    NULL

Mary Jones    New York

Sam Clark     New York

Bob Smith     Chicago

Paul Cruz     Chicago

Dan Roberts   Chicago

Bill Adams    Atlanta

Sue Smith     Los Angeles

Larry Fitch   Los Angeles

Nancy Angelli Denver

Note in this case (a left outer join), the “child” table (SALESREPS, the table with the foreign key) is the major table in the outer join, and the “parent” table (OFFICES) is the minor table. The objective is to retain rows containing NULL foreign key values (like Tom Snyder’s) from the child table in the query results, so the child table becomes the major table in the outer join. It doesn’t matter whether the query is actually expressed as a left outer join (as it was previously) or as a right outer join like this:

List the salespeople and the cities where they work.

SELECT NAME, CITY

FROM SALESREPS, OFFICES

WHERE OFFICE =* REP_OFFICE

NAME           CITY

——————-

Tom Snyder     NULL

Mary Jones     New York

Sam Clark      New York

Bob Smith      Chicago

Paul Cruz      Chicago

Dan Roberts    Chicago

Bill Adams     Atlanta

Sue Smith      Los Angeles

Larry Fitch    Los Angeles

Nancy Angelli  Denver

What matters is that the child table is the major table in the outer join.

There are also useful joined queries where the parent is the major table and the child table is the minor table. For example, suppose the company in the sample database opens a new sales office in Dallas, but initially the office has no salespeople assigned to it. If you want to generate a report listing all of the offices and the names of the salespeople who work there, you might want to include a row representing the Dallas office. Here is the outer join query that produces those results:

List the offices and the salespeople who work in each one.

SELECT CITY, NAME

FROM OFFICES, SALESREPS
WHERE OFFICE *= REP_OFFICE

CITY          NAME

————————

New York      Mary Jones

New York      Sam Clark

Chicago       Bob Smith

Chicago       Paul Cruz

Chicago       Dan Roberts

Atlanta       Bill Adams

Los Angeles   Sue Smith

Los Angeles   Larry Fitch

Denver        Nancy Angelli

Dallas        NULL

In this case, the parent table (OFFICES) is the major table in the outer join, and the child table (SALESREPS) is the minor table. The objective is to insure that all rows from the OFFICES table are represented in the query results, so it plays the role of major table. The roles of the two tables are precisely reversed from the previous example. Of course, the row for Tom Snyder, which was included in the query results for the earlier example (when SALESREPS was the major table), is missing from this set of query results because SALESREPS is now the minor table.

2. Outer Join Notation *

Because the outer join was not part of the SQL1 standard and was not implemented in early IBM SQL products, the DBMS vendors who support the outer join have used various notations in their SQL dialects. The “* = *” notation used in the earlier examples of this section is used by SQL Server. This notation indicates an outer join by appending an asterisk (*) to the comparison test in the WHERE clause that defines the join condition. To indicate the full outer join between two tables, TBL1 and TBL2, on columns COL1 and COL2, an asterisk (*) is placed before and after the standard join operator. The resulting full outer join comparison test looks like this:

WHERE COL1 *=* COL2

To indicate a left outer join of TBL1 to TBL2 , only the leading asterisk is specified, giving a comparison test like this:

WHERE COL1 *= COL2

To indicate a right outer join of TBL1 to TBL2 , only the trailing asterisk is specified, giving a comparison test like this:

WHERE COL1 =* COL2

An outer join can be used with any of the comparison operators using the same notation. For example, a left outer join of TBL1 to TBL2 using a greater than or equal (>=) comparison would produce a comparison test like this:

WHERE COL1 *>= COL2

Oracle also supports the outer join operation but uses a different notation. This notation indicates the outer join in the WHERE clause by including a parenthesized plus sign following the column whose table is to have the imaginary NULL row added (that is, the minor table in the outer join). The left outer join of TBL1 to TBL2 produces a search condition that looks like this:

WHERE COL1 = COL2 (+)

and the right outer join of TBL1 to TBL2 produces a search condition that looks like this:

WHERE COL1 (+) = COL2

Note that the plus sign appears on the opposite side of the comparison from where the asterisk appears in the SQL Server notation. Oracle does not support a full outer join, but as indicated earlier, this does not diminish the practical usefulness of the Oracle outer join capability.

Although both of these outer join notations are relatively convenient, they’re also somewhat deceiving. Recall that the rules for multitable SQL query processing begin by examining the FROM clause of a query and conceptually building the product of the two (or more) tables. Only after the product table is constructed does the DBMS start eliminating rows that do not meet the WHERE clause search condition. But with the SQL Server or Oracle notation, the FROM clause doesn’t tell the DBMS whether to build a product table that is only the inner join or one that includes the NULL-extended rows of an outer join. To determine this, the DBMS must “look ahead” to the WHERE clause. A more serious problem is that a join between two tables may involve more than one pair of matching columns, and it’s not clear how the notation should be used when there are two or three matching column pairs.

Other problems with the outer join notation arise when it is extended to three or more tables. It’s easy to extend the notion of an outer join to three tables:

TBL1 OUTER-JOIN TBL2 OUTER-JOIN TBL3

This is a perfectly legitimate set of database operations according to the theory of relational databases. But the result depends upon the order in which the outer join operations are performed. The results of:

(TBL1 OUTER-JOIN TBL2) OUTER-JOIN TBL3

will in general be different from the results of:

TBL1 OUTER-JOIN (TBL2 OUTER-JOIN TBL3)

Using either the SQL Server or Oracle notations, it’s impossible to specify the evaluation order of the outer joins. Because of this, the results produced by the outer join of three or more tables depend upon the specifics of the DBMS implementation.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *