Simple Queries in SQL: Duplicate Rows (DISTINCT)

If a query includes the primary key of a table in its select list, then every row of query results will be unique (because the primary key has a different value in each row). If the primary key is not included in the query results, duplicate rows can occur. For example, suppose you made this request:

List the employee numbers of all sales office managers.

SELECT MGR

FROM OFFICES

MGR

108

106

104

105

108

The query results have five rows (one for each office), but two of them are exact duplicates of one another. Why? Because Larry Fitch manages both the Los Angeles and Denver offices, and his employee number (108) appears in both rows of the OFFICES table. These query results are probably not exactly what you had in mind.

If there are four different managers, you might have expected only four employee numbers in the query results.

You can eliminate duplicate rows of query results by inserting the keyword DISTINCT in the SELECT statement just before the select list. Here is a version of the previous query that produces the results you want:

List the employee numbers of all sales office managers.

SELECT DISTINCT MGR

FROM OFFICES

MGR

104

105

106

108

Conceptually, SQL carries out this query by first generating a full set of query results (five rows) and then eliminating rows that are exact duplicates of one another to form the final query results. The DISTINCT keyword can be specified regardless of the contents of the SELECT list (with certain restrictions for summary queries, as described in Chapter 8).

If the DISTINCT keyword is omitted, SQL does not eliminate duplicate rows. You can also specify the keyword ALL to explicitly indicate that duplicate rows are to be retained, but it is unnecessary since this is the default behavior.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *