Multitable Queries in SQL: Joins and the SQL2 Standard

Outer joins posed a problem for the writers of the SQL2 standard. Because outer joins are the only way to represent some extremely useful queries, it was important that the SQL2 standard include support for outer joins. In addition, outer joins were supported in many commercial SQL products and were becoming a more important part of the SQL language. However, the methods used to represent outer joins varied widely among the different SQL products, as shown in the preceding sections. Furthermore, the methods used to denote outer joins in commercial products all had deficiencies and had been chosen more because of their minor impact on the SQL language than because of their clarity or correctness.

Against this background, the SQL2 standard specified a brand new method for supporting outer joins, which was not based on the established notation of a popular SQL product. The SQL2 specification puts the support for outer joins into the FROM clause, with an elaborate syntax that allows the user to specify exactly how the source tables for a query are to be joined together. The outer join support in the SQL2 standard has two distinct advantages. First, the SQL2 standard can express even the most complex of joins. Second, existing database products can support the SQL2 extensions to SQL1 and retain support for their own proprietary outer join syntax without conflict. IBM’s DB2 relational database, for example, has added support for most, but not all, of the new SQL2 join syntax at this writing. It’s reasonable to expect that most of the major DBMS brands will follow, and that the SQL2-style join features will become a part of the SQL mainstream over the next several years.

The advantages of the SQL2 expanded join support come at the expense of some significant added complexity for what had previously been one of the simpler parts of the SQL language. In fact, the expanded join support is part of a much larger expansion of query capabilities in SQL2 which add even more capability and complexity. The other expanded features include set operations on query results (union, intersection, and differences of tables) and much richer query expressions that manipulate rows and tables and allow them to be used in subqueries. The expanded join-related capabilities are described in this section. The other expanded capabilities are described in the next chapter, after the discussion of basic subqueries.

1. Inner Joins in SQL2 *

Figure 7-13 shows a simplified form of the extended SQL2 syntax for the FROM clause. It’s easiest to understand all of the options provided by considering each type of join, one by one, starting with the basic inner join and then moving to the various forms of outer join. The standard inner join of the GIRLS and BOYS tables can be expressed in SQL1 language:

I SELECT *

FROM GIRLS, BOYS

WHERE GIRLS.CITY = BOYS.CITY

This is still an acceptable statement in SQL2. The writers of the SQL2 standard really couldn’t have made it illegal without “breaking” all of the millions of multitable SQL queries that had already been written by the early 1990s. But the SQL2 standard specifies an alternative way of expressing the same query:

SELECT *

FROM GIRLS INNER JOIN BOYS

ON GIRLS.CITY = BOYS.CITY

Note that the two tables to be joined are explicitly connected by a JOIN operation, and the search condition that describes the join is now specified in an ON clause within the FROM clause. The search condition following the keyword ON can be any search condition that specifies the criteria used to match rows of the two joined tables. The columns referenced in the search condition must come only from the two joined tables. For example, assume that the BOYS table and the GIRLS table were each extended by adding an AGE column. Here is a join that matches girl/boy pairs in the same city and also requires that the boy and girl in each pair be the same age:

SELECT *

FROM GIRLS INNER JOIN BOYS

ON (GIRLS.CITY = BOYS.CITY)

AND (GIRLS.AGE = BOYS.AGE)

In these simple two-table joins, the entire contents of the WHERE clause simply moved into the ON clause, and the ON clause doesn’t add any functionality to the SQL language. However, recall from earlier in this chapter that in an outer join involving three tables or more, the order in which the joins occur affect the query results. The ON clause provides detailed control over how these multitable joins are processed, as described later in this chapter.

The SQL2 standard permits another variation on the simple inner join query between the GIRLS and BOYS tables. Because the matching columns in the two tables have the same names and are being compared for equality (which is often the case), an alternative form of the ON clause, specifying a list of matching column names, can be used:

SELECT *

FROM GIRLS INNER JOIN BOYS

USING (CITY, AGE)

The USING clause specifies a comma-separated list of the matching column names, which must be identical in both tables. It is completely equivalent to the ON clause that specifies each matching column pair explicitly, but it’s a lot more compact and therefore easier to understand. Of course, if the matching columns have different names in the BOYS table and GIRLS table, then an ON clause or a WHERE clause with an equals test must be used. The ON clause must also be used if the join does not involve equality of the matching columns. For example, if you wanted to select girl/boy pairs where the girl was required to be older than the boy, you must use an ON clause to specify the join:

 SELECT *

FROM GIRLS INNER JOIN BOYS

ON (GIRLS.CITY = BOYS.CITY

AND GIRLS.AGE > BOYS.AGE)

There is one final variation on this simple query that illustrates another feature of the SQL2 FROM clause. A join between two tables where the matching columns are exactly those specific columns from the two tables that have identical names is called a natural join, because usually this is precisely the most “natural” way to join the two tables. The query selecting girl/boy pairs who live in the same city and have the same age can be expressed as a natural join using this SQL2 query:

 SELECT *

FROM GIRLS NATURAL INNER JOIN BOYS

If the NATURAL keyword is specified, the ON and USING clauses may not be used in the join specification, because the natural join specifically defines the search condition to be used to join the tables—all of the columns with identical column names in both tables are to be matched.

The SQL2 standard assumes that the “default” join between two tables is an inner join. You can omit the keyword INNER from any of the preceding examples, and the resulting query remains a legal SQL2 statement with the same meaning.

2. Outer Joins in SQL2 *

The SQL2 standard provides complete support for outer joins using the same clauses described in the preceding section for inner joins and additional keywords. For example, the full outer join of the GIRLS and BOYS tables (without the AGE columns) is generated by this query:

SELECT *

FROM GIRLS FULL OUTER JOIN BOYS

ON GIRLS.CITY = BOYS.CITY

As explained earlier in this chapter, the query results will contain a row for each matched girl/boy pair, as well as one row for each unmatched boy, extended with NULL values in the columns from the other, unmatched table. SQL2 allows the same variations for outer joins as for inner joins; the query could also have been written:

SELECT *

FROM GIRLS NATURAL FULL OUTER JOIN BOYS

or

SELECT *

FROM GIRLS FULL OUTER JOIN BOYS

USING (CITY)

Just as the keyword INNER is optional in the SQL2 language, the SQL2 standard also allows you to omit the keyword OUTER. The preceding query could also have been written:

SELECT *

FROM GIRLS FULL JOIN BOYS

USING (CITY)

The DBMS can infer from the word FULL that an outer join is required.

By specifying LEFT or RIGHT instead of FULL, the SQL2 language extends quite naturally to left or right outer joins. Here is the left outer join version of the same query:

SELECT *

FROM GIRLS LEFT OUTER JOIN BOYS

USING (CITY)

As described earlier in the chapter, the query results will include matched girl/boy pairs and NULL-extended rows for each unmatched row in the GIRLS table (the “left” table in the join), but the results do not include unmatched rows from the BOYS table. Conversely, the right outer join version of the same query, specified like this:

SELECT *

FROM GIRLS RIGHT OUTER JOIN BOYS

USING (CITY)

includes boy/girl pairs and unmatched rows in the BOYS table (the “right” table in the join) but does not include unmatched rows from the GIRLS table.

3. Cross Joins and Union Joins in SQL2 *

The SQL2 support for extended joins includes two other methods for combining data from two tables. A cross join is another name for the Cartesian product of two tables, as described earlier in this chapter. A union join is closely related to the full outer join; its query results are a subset of those generated by the full outer join.

Here is a SQL2 query that generates the complete product of the GIRLS and BOYS tables:

 SELECT *

FROM GIRLS CROSS JOIN BOYS

By definition, the Cartesian product (also sometimes called the cross product, hence the name “CROSS JOIN”) contains every possible pair of rows from the two tables. It “multiplies” the two tables, turning tables of, for example, three girls and two boys into a table of six (3×2 = 6) boy/girl pairs. No “matching columns” or “selection criteria” are associated with the cross products, so the ON clause and the USING clause are not allowed. Note that the cross join really doesn’t add any new capabilities to the SQL language. Exactly the same query results can be generated with an inner join that specifies no matching columns. So the preceding query could just as well have been written as:

 SELECT *

FROM GIRLS, BOYS

The use of the keywords CROSS JOIN in the FROM clause simply makes the cross join more explicit. In most databases, the cross join of two tables by itself is of very little practical use. Its usefulness really comes as a building block for more complex query expressions that start with the cross product of two tables and then use SQL2 summary query capabilities (described in the next chapter) or SQL2 set operations to further manipulate the results.

The union join in SQL2 combines some of the features of the UNION operation (described in the previous chapter) with some of the features of the join operations described in this chapter. Recall that the UNION operation effectively combines the rows of two tables, which must have the same number of columns and the same data types for each corresponding column. This query, which uses a simple UNION operation:

SELECT *

FROM GIRLS

UNION ALL

SELECT *

FROM BOYS

when applied to a three-row table of girls and a two-row table of boys yields a five-row table of query results. Each row of query results corresponds precisely to either a row of the GIRLS table or a row of the BOYS table from which it was derived. The query results have two columns, NAME and CITY, because the GIRLS and BOYS tables each have these two columns.

The union join of the GIRLS and BOYS tables is specified by this SQL2 query:

 SELECT *

FROM GIRLS

UNION JOIN BOYS

The query results again have five rows, and again each row of results is contributed by exactly one of the rows in the GIRLS table or the BOYS table. But unlike the simple union, these query results have four columns—all of the columns of the first table plus all of the columns of the second table. In this aspect, the union join is like all of the other joins. For each row of query results contributed by the GIRLS table, the columns that come from the GIRLS table receive the corresponding data values; the other columns (those that come from the BOYS table) have NULL values. Similarly, for each row of query results contributed by the BOYS table, the columns that come from the BOYS table receive the corresponding data values; the other columns (this time, those that come from the GIRLS table) have NULL values.

Another way of looking at the results of the union join is to compare them to the results of a full outer join of the GIRLS and BOYS tables. The union join results include the NULL-extended rows of data from the GIRLS table and the NULL-extended rows of data from the BOYS table, but they do not include any of the rows generated by matching columns. Referring back to the definition of an outer join in Figure 7-14, the union join is produced by omitting Step 1 and following Step 2 and Step 3.

Finally, it’s useful to examine the relationships between the sets of rows produced by the cross join, the various types of outer joins, and the inner join shown in Figure 7-14. When joining two tables, TBL1 with m rows and TBL2 with n rows, the figure shows that:

  • The cross join will contain m x n rows, consisting of all possible row pairs from the two tables.
  • TBL1 INNER JOIN TBL2 will contain some number of rows, r, which is less than m x n. The inner join is strictly a subset of the cross join. It is formed by eliminating those rows from the cross join that do not satisfy the matching condition for the inner join.
  • The left outer join contains all of the rows from the inner join, plus each unmatched row from TBL1, NULL-extended.
  • The right outer join also contains all of the rows from the inner join, plus each unmatched row from TBL2, NULL-extended.
  • The full outer join contains all of the rows from the inner join, plus each unmatched row from TBL1, NULL-extended, plus each unmatched row from TBL2, NULL-extended. Roughly speaking, its query results are equal to the left outer join “plus” the right outer join.
  • The union join contains all of the rows of TBL1, NULL-extended, plus all of the rows of TBL2, NULL-extended. Roughly speaking, its query results are the full outer join “minus” the inner join.x

4. Multitable Joins in SQL2

An important advantage of the SQL2 notation is that it allows very clear specification of three-table or four-table joins. To build these more complex joins, any of the join expressions shown in Figure 7-13 and described in the preceding sections can be enclosed in parentheses. The resulting join expression can itself be used in another join expression, as if it were a simple table. Just as SQL allows you to combine mathematical
operations (+, -, *, and /) with parentheses and build more complex expressions, the SQL2 standard allows you to build more complex join expressions in the same way.

To illustrate multitable joins, assume that a new PARENTS table has been added to the database containing the GIRLS and BOYS example we have been using. The PARENTS table has three columns:

CHILD         Matches the NAME column in the GIRLS or BOYS table

TYPE         Specifies FATHER or MOTHER

PNAME       First name of the parent

A row in the GIRLS or BOYS table can have two matching rows in the PARENTS table, one specifying a MOTHER and one a FATHER, or it can have only one of these rows, or it can have no matching rows if no data on the child’s parents is available. The GIRLS, BOYS, and PARENTS tables together provide a rich set of data for some multitable join examples.

For example, suppose you wanted to make a list of all of the girls, along with the names of their mothers and the names of the boys who live in the same city. Here is one query that produces the list:

SELECT GIRLS.NAME, PNAME, BOYS.NAME

FROM ((GIRLS JOIN PARENTS

ON PARENT.CHILD = NAME)

JOIN BOYS

ON (GIRLS.CITY = BOYS.CITY))

WHERE TYPE = “MOTHER”

Because both of these joins are inner joins, any girl who does not have a boy living in the same city or any girl who does not have a mother in the database will not show up in the query results. This may or may not be the desired result. To include those girls without a matching mother in the database, you would change the join between the GIRLS and the PARENTS table to a left outer join, like this:

SELECT GIRLS.NAME, PNAME, BOYS.NAME

FROM ((GIRLS LEFT JOIN PARENTS

ON PARENT.CHILD = NAME)

JOIN BOYS

ON (GIRLS.CITY = BOYS.CITY))

WHERE (TYPE = “MOTHER”) OR (TYPE IS NULL)

This query will include all of the girl/boy pairs, regardless of whether the girls have a mother in the database, but it will still omit girls who do not live in a city with any of the boys. To include these girls as well, the second join must also be converted to a left outer join:

 SELECT GIRLS.NAME, PNAME, BOYS.NAME

FROM ((GIRLS LEFT JOIN PARENTS

ON PARENT.CHILD = NAME)

LEFT JOIN BOYS

ON (GIRLS.CITY = BOYS.CITY))

WHERE (TYPE = “MOTHER”) OR (TYPE IS NULL)

Note that the NULL-extension of the GIRLS rows by the outer join with their mothers also creates some additional complication in the WHERE clause. The girls without matching mothers will generate rows with not only a NULL mother’s name (PNAME) column but also a NULL value in the TYPE column. The simple selection criterion:

WHERE (TYPE = “MOTHER”)

would generate an “unknown” result for these rows, and they will not be included in the query results. But the entire reason for using the left outer join was to make certain they were included! To solve this problem, the WHERE clause is expanded to also test for, and allow, rows where the parent type is NULL.

As one final example, suppose you want to generate a girl/boy listing again, but this time you want to include the name of the boy’s father and the girl’s mother in the query results. This query requires a four-table join (BOYS, GIRLS, and two copies of the PARENTS table, one for joining to the boys information to get father names and one for joining to the girls information to obtain mother names). Again the potential for unmatched rows in the joins means there are several possible “right” answers to the query. Suppose, as before, that you want to include all girls and boys in the boy/girl pairing, even if the boy or girl does not have a matching row in the PARENTS table. You need to use outer joins for the (BOYS join PARENTS) and (GIRLS join PARENTS) parts of the query, but an inner join for the (BOYS join GIRLS) part of the query. This SQL2 query yields the desired results:

 SELECT GIRLS.NAME, MOTHERS.PNAME, BOYS.NAME, FATHERS.PNAME

FROM ((GIRLS LEFT JOIN PARENTS AS MOTHERS

ON ((CHILD = GIRLS.NAME) AND (TYPE = “MOTHER”)))

JOIN (BOYS LEFT JOIN PARENTS AS FATHERS

ON ((CHILD = BOYS.NAME) AND (TYPE = “FATHER”))))

USING (CITY)

This query solves the WHERE-clause test problem in a different way—by moving the test for the TYPE of parent into the ON clause of the join specification. In this position, the test for appropriate TYPE of parent will be performed when the DBMS finds matching

columns to construct the join, before the NULL-extended rows are added to the outer join results. Because the PARENTS table is being used twice in the FROM clause, in two different roles, it’s necessary to give it two different table aliases so that the correct names can be specified in the select list.

As this example shows, even a four-join query like this one can become quite complex with the SQL2 syntax. However, despite the complexity, the SQL2 query does specify precisely the query that the DBMS is to carry out. There is no ambiguity about the order in which the tables are joined, or about which joins are inner or outer joins. Overall, the added capability is well worth the added complexity introduced by the extended SQL2 FROM clause.

Although none of the query examples included in this section had WHERE or ORDER BY clauses, they can be freely used with the extended FROM clause in SQL2. The relationship among the clauses is simple and remains as described earlier in this chapter. The processing specified in the FROM clauses occurs first, including any joins or unions. The join criteria specified in a USING or ON clause are applied as a part of the particular join specification where they appear. When processing of the FROM class is complete, the resulting table is used to apply the selection criteria in the WHERE clause. Thus, the ON clause specifies search criteria that apply to specific joins; the WHERE clause specifies search criteria that apply to the entire table resulting from these joins.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *