Multitable Queries in SQL: Simple Joins (Equi-Joins)

The process of forming pairs of rows by matching the contents of related columns is called joining the tables. The resulting table (containing data from both of the original tables) is called a join between the two tables. (A join based on an exact match between two columns is more precisely called an equi-join. Joins can also be based on other kinds of column comparisons, as described later in this chapter.)

Joins are the foundation of multitable query processing in SQL. All of the data in a relational database is stored in its columns as explicit data values, so all possible relationships between tables can be formed by matching the contents of related columns. Joins thus provide a powerful facility for exercising the data relationships in a database. In fact, because relational databases do not contain pointers or other mechanisms for relating rows to one another, joins are the only mechanism for exercising cross-table data relationships.

Because SQL handles multitable queries by matching columns, it should come as no surprise that the SELECT statement for a multitable query must contain a search condition that specifies the column match. Here is the SELECT statement for the query that was performed manually in Figure 7-2:

List all orders showing order number, amount, customer name, and the customer’s credit limit.

This looks just like the queries from the previous chapter, with two new features. First, the FROM clause lists two tables instead of just one. Second, the search condition:

CUST = CUST_NUM

compares columns from two different tables. We call these two columns the matching columns for the two tables. Like all search conditions, this one restricts the rows that appear in the query results. Because this is a two-table query, the search condition restricts the pairs of rows that generate the query results. In fact, the search condition specifies the same matching columns you used in the paper-and-pencil query processing. It actually captures the spirit of the manual column matching very well, saying:

“Generate query results only for pairs of rows where the customer number (CUST) in the ORDERS table matches the customer number (CUST_NUM) in the CUSTOMERS table.” Notice that the SELECT statement doesn’t say anything about how SQL should execute the query. There is no mention of “starting with orders” or “starting with customers.” Instead, the query tells SQL what the query results should look like and leaves it up to SQL to decide how to generate them.

1. Parent/Child Queries

The most common multitable queries involve two tables that have a natural parent/ child relationship. The query about orders and customers in the preceding section is an example of such a query. Each order (child) has an associated customer (parent), and each customer (parent) can have many associated orders (children). The pairs of rows that generate the query results are parent/child row combinations.

You may recall from Chapter 4 that foreign keys and primary keys create the parent/child relationship in a SQL database. The table containing the foreign key is the child in the relationship; the table with the primary key is the parent. To exercise the parent/child relationship in a query, you must specify a search condition that compares the foreign key and the primary key. Here is another example of a query that exercises a parent/child relationship, shown in Figure 7-3:

List each salesperson and the city and region where they work.

The SALESREPS (child) table contains REP_OFFICE, a foreign key for the OFFICES (parent) table. This relationship is used to find the correct OFFICES row for each salesperson, so that the correct city and region can be included in the query results.

Here’s another query involving the same two tables, but with the parent and child roles reversed, as shown in Figure 7-4.

List the offices and the names and titles of their managers.

The OFFICES (child) table contains MGR, a foreign key for the SALESREPS (parent) table. This relationship is used to find the correct SALESREPS row for each salesperson, so that the correct name and title of the manager can be included in the query results.

SQL does not require that the matching columns be included in the results of a multitable query. They are often omitted in practice, as in the two preceding examples. That’s because primary keys and foreign keys are often ID numbers (such as the office numbers and employee numbers in the examples), which humans find hard to remember, while the associated names (cities, regions, names, titles) are easier to understand. It’s quite common for ID numbers to be used in the WHERE clause to join two tables, and for more descriptive names to be specified in the SELECT clause to generate columns of query results.

2. Joins with Row Selection Criteria

The search condition that specifies the matching columns in a multitable query can be combined with other search conditions to further restrict the contents of the query results. Suppose you want to rerun the preceding query, showing only offices with large sales targets:

List the offices with a target over $600,000.

SELECT CITY, NAME, TITLE

FROM OFFICES, SALESREPS

WHERE MGR = EMPL_NUM

AND TARGET > 600000.00

CITY        NAME        TITLE

——————————-

Chicago     Bob Smith   Sales Mgr

Los Angeles Larry Fitch Sales Mgr

With the additional search condition, the rows that appear in the query results are further restricted. The first test (MGR=EMPL_NUM) selects only pairs of OFFICES and SALESREPS rows that have the proper parent/child relationship; the second test further selects only those pairs of rows where the office is above target.

3. Multiple Matching Columns

The ORDERS table and the PRODUCTS table in the sample database are related by a composite foreign key/primary key pair. The MFR and PRODUCT columns of the ORDERS table together form a foreign key for the PRODUCTS table, matching its MFR_ID and PRODUCT_ID columns, respectively. To join the tables based on this parent/child relationship, you must specify both pairs of matching columns, as shown in this example:

List all the orders, showing amounts and product descriptions.

The search condition in the query tells SQL that the related pairs of rows from the ORDERS and PRODUCTS tables are those where both pairs of matching columns contain the same values. Multicolumn joins involving two tables are less common than single­column joins and are usually found in queries involving compound foreign keys such as this one. There is no SQL restriction on the number of columns that are involved in the matching condition, but joins normally mirror the real-world relationships between entities represented in the database tables, and those relationships are usually embodied in one or just a few columns of the tables.

4. Queries with Three or More Tables

SQL can combine data from three or more tables using the same basic techniques used for two-table queries. Here is a simple example of a three-table join:

List orders over $25,000, including the name of the salesperson who took the order and the name of the customer who placed it.

This query uses two foreign keys in the ORDERS table, as shown in Figure 7-5. The CUST column is a foreign key for the CUSTOMERS table, linking each order to the customer who placed it. The REP column is a foreign key for the SALESREPS table, linking each order to the salesperson who took it. Informally speaking, the query links each order to its associated customer and salesperson.

Here is another three-table query that uses a different arrangement of parent/child relationships:

List the orders over $25,000, showing the name of the customer who placed the order and the name of the salesperson assigned to that customer.

Figure 7-6 shows the relationships exercised by this query. The first relationship again uses the CUST column from the ORDERS table as a foreign key to the CUSTOMERS table. The second uses the CUST_REP column from the CUSTOMERS table as a foreign key to the SALESREPS table. Informally speaking, this query links each order to its customer, and each customer to their salesperson.

It’s not uncommon to find three-table or even four-table queries used in production SQL applications. Even within the confines of the small, five-table sample database, it’s not too hard to find a four-table query that makes sense:

List the orders over $25,000, showing the name of the customer who placed the order, the customer’s salesperson, and the office where the salesperson works.

SELECT ORDER_NUM, AMOUNT, COMPANY, NAME, CITY

FROM ORDERS, CUSTOMERS, SALESREPS, OFFICES

WHERE CUST = CUST_NUM

AND CUST_REP = EMPL_NUM

AND REP_OFFICE = OFFICE

AND AMOUNT > 25000.00

Figure 7-7 shows the parent/child relationships in this query. Logically, it extends the join sequence of the previous example one more step, linking an order to its customer, the customer to their salesperson, and the salesperson to their office.

5. Other Equi-Joins

The vast majority of multitable queries are based on parent/child relationships, but SQL does not require that the matching columns be related as a foreign key and primary key. Any pair of columns from two tables can serve as matching columns, provided they have comparable data types. The next example demonstrates a query that uses a pair of dates as matching columns.

Find all orders received on days when a new salesperson was hired.

The results of this query come from pairs of rows in the ORDERS and SALESREPS tables where the ORDER_DATE happens to match the HIRE_DATE for the salesperson, as shown in Figure 7-8. Neither of these columns is a foreign key or a primary key, and the relationship between the pairs of rows is admittedly a strange one—the only thing the matched orders and salespeople have in common is that they happen to have the same dates. However, SQL happily joins the tables anyway.

Matching columns like the ones in this example generate a many-to-many relationship between the two tables. Many orders can share a single salesperson’s hire date, and more than one salesperson may have been hired on a given order’s order date. For example, note that three different orders (112968,112975, and 112979) were received on October 12,1989, and two different salespeople (Larry Fitch and Mary Jones) were hired that same day. The three orders and two salespeople produce six rows of query results.

This many-to-many relationship is different from the one-to-many relationship created by primary key/foreign key matching columns. The situation can be summarized as follows:

  • Joins that match primary keys to foreign keys always create one-to-many, parent/child relationships.
  • Other joins may also generate one-to-many relationships, if the matching column in at least one of the tables has unique values for all rows of the table.
  • In general, joins on arbitrary matching columns generate many-to-many relationships.

Note that these three different situations have nothing to do with how you write the SELECT statement that expresses the join. All three types of joins are written the same way—by including a comparison test for the matching column pairs in the WHERE clause. Nonetheless, it’s useful to think about joins in this way to understand how to turn an English-language request into the correct SELECT statement.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *