SELECT Statement: Its Clauses and Functions in SQL Server

The Transact-SQL language has one basic statement for retrieving information from a database: the SELECT statement. With this statement, it is possible to query information from one or more tables of a database (or even from multiple databases). The result of a SELECT statement is another table, also known as a result set.

The simplest form of the SELECT statement contains a SELECT list with the FROM clause. (All other clauses are optional.) This form of the SELECT statement has the following syntax:

SELECT [ ALL |DISTINCT] column_list

FROM {tablel [tab_alias1] } ,…;

table1 is the name of the table from which information is retrieved. tab_alias1 provides an alias for the name of the corresponding table. An alias is another name for the corresponding table and can be used as a shorthand way to refer to the table or as a way to refer to two logical instances of the same physical table. Don’t worry; this will become clearer as examples are presented. column_list contains one or more of the following specifications:

  • The asterisk symbol (*), which specifies all columns of the named tables in the FROM clause (or from a single table when qualified, as in table2.*)
  • The explicit specification of column names to be retrieved
  • The specification column_name [AS] column_heading, which is a way to replace the name of a column or to assign a new name to an expression
  • An expression
  • A system or an aggregate function

NOTE In addition to the preceding specifications, there are other options that will be presented later in this chapter.

A SELECT statement can retrieve either columns or rows from a table. The first operation is called SELECT list (or projection), and the second one is called selection. The combination of both operations is also possible in a SELECT statement.

NOTE Before you start to execute queries in this chapter, re-create the entire sample database.

Example 6.1 shows the simplest retrieval form with the SELECT statement.

Example 6.1

Get full details of all departments:

USE sample;

SELECT dept_no, dept_name, location

FROM department;

The result is

The SELECT statement in Example 6.1 retrieves all rows and all columns from the department table. If you include all columns of a table in a SELECT list (as in Example 6.1), you can use * as shorthand. The column names serve as column headings of the resulting output.

The simplest form of the SELECT statement just described is not very useful for queries.

In practice, there are always several more clauses in a SELECT statement than in the statement shown in Example 6.1. The following is the syntax of a SELECT statement that references a table, with (almost) all possible clauses included:

SELECT select_list

[INTO new_table_]

FROM table

[WHERE search_condition]

[GROUP BY group_by_expression]

[HAVING search_condition]

[ORDER BY order_expression [ASC | DESC] ];

NOTE The clauses in the SELECT statement must be written in the syntactical order given in the preceding syntax—for example, the GROUP BY clause must come after the WHERE clause and before the HAVING clause. However, because the INTO clause is not as significant as the other clauses, it will be discussed later in the chapter, after the other clauses have been discussed.

The following subsections describe the clauses that can be used in a query, WHERE, GROUP BY, HAVING, and ORDER BY, as well as aggregate functions, the IDENTITY property, the sequences feature, set operators, and the CASE expression.

1. WHERE Clause

Often, it is necessary to define one or more conditions that limit the selected rows. The WHERE clause specifies a Boolean expression (an expression that returns a value of TRUE or FALSE) that is tested for each row to be returned (potentially). If the expression is true, then the row is returned; if it is false, it is discarded.

Example 6.2 shows the use of the WHERE clause.

Example 6.2

Get the names and numbers of all departments located in Dallas:

USE sample;

SELECT dept_name, dept_no

FROM department

WHERE location = ‘Dallas’;

The result is

In addition to the equal sign, the WHERE clause can contain other comparison operators, including the following:

Example 6.3 shows the use of a comparison operator in the WHERE clause.

Example 6.3

Get the last and first names of all employees with employee numbers greater than or equal to 15000:

USE sample;

SELECT emp_lname, emp_fname

FROM employee

WHERE emp_no >= 15000;

The result is

An expression can also be a part of the condition in the WHERE clause, as Example 6.4 shows.

Example 6.4

Get the project names for all projects with a budget > 60000 £. The current rate of exchange is 0.51 £ per $1.

USE sample;

SELECT project_name FROM project

WHERE budget*0.51 > 60000;

The result is

Comparisons of strings (that is, values of data types CHAR, VARCHAR, NCHAR, or NVARCHAR) are executed in accordance with the collating sequence in effect (the “sort order” specified when the Database Engine was installed). If two strings are compared using ASCII code (or any other code), each of the corresponding (first, second, third, and so on) characters will be compared. One character is lower in priority than the other if it appears in the code table before the other one. Two strings of different lengths are compared after the shorter one is padded at the right with blanks, so that the length of both strings is equal. Numbers compare algebraically. Values of temporal data types (such as DATE, TIME, and DATETIME) compare in chronological order.

1.1. Boolean Operators

WHERE clause conditions can either be simple or contain multiple conditions. Multiple conditions can be built using the Boolean operators AND, OR, and NOT. The behavior of these operators was described in Chapter 4 using truth tables.

If two conditions are connected by the AND operator, rows are retrieved for which both conditions are true. If two conditions are connected by the OR operator, all rows of a table are retrieved in which either the first or the second condition (or both) is true, as shown in Example 6.5.

Example 6.5

Get the employee numbers for all employees who work for either project pi or project p2 (or both):

USE sample;

SELECT project_no, emp_no

FROM works_on

WHERE project_no = ‘p1’

OR project_no = ‘p2’;

The result is

The result of Example 6.5 contains some duplicate values of the emp_no column. To eliminate this redundant information, use the DISTINCT option, as shown here:

USE sample;

SELECT DISTINCT emp_no

FROM works_on

WHERE project_no = ‘p1’

OR project_no = ‘p2’;

In this case, the result is

Note that the DISTINCT option can be used only once in a SELECT list, and it must precede all column names in that list. Therefore, Example 6.6 is wrong.

Example 6.6 (Example of an Illegal Statement)

USE sample;

SELECT emp_fname, DISTINCT emp_no

FROM employee

WHERE emp_lname = ‘Moser’;

The result is

Server: Msg 156, Level 15, State 1, Line 1

Incorrect syntax near the keyword ‘DISTINCT’.

NOTE When there is more than one column in the SELECT list, the DISTINCT clause displays all rows where the combination of columns is distinct.

The WHERE clause may include any number of the same or different Boolean operations. You should be aware that the three Boolean operations have different priorities for evaluation: the NOT operation has the highest priority, AND is evaluated next, and the OR operation has the lowest priority. If you do not pay attention to these different priorities for Boolean operations, you will get unexpected results, as Example 6.7 shows.

Example 6.7

USE sample;

SELECT emp_no, emp_fname, emp_lname

FROM employee

WHERE emp_no = 25348 AND emp_lname = ‘Smith’

OR emp_fname = ‘Matthew’ AND dept_no = ‘d1’;

SELECT emp_no, emp_fname, emp_lname

FROM employee
WHERE ((emp_no = 25348 AND emp_lname = ‘Smith’)
OR emp_fname =’Matthew’) AND dept_no = ‘d1’;

The result is

As the results of Example 6.7 show, the two SELECT statements display two different result sets. In the first SELECT statement, the system evaluates both AND operators first (from the left to the right), and then evaluates the OR operator. In the second SELECT statement, the use of parentheses changes the operation execution, with all expressions within parentheses being executed first, in sequence from left to right. As you can see, the first statement returned one row, while the second statement returned zero rows.

The existence of several Boolean operations in a WHERE clause complicates the corresponding SELECT statement and makes it error prone. In such cases, the use of parentheses is highly recommended, even if they are not necessary. The readability of such SELECT statements will be greatly improved, and possible errors can be avoided.

Here is the first SELECT statement from Example 6.7, modified using the recommended form:

USE sample;

SELECT emp_no, emp_fname, emp_lname

FROM employee

WHERE (emp_no = 25348 AND emp_lname = ‘Smith’)

OR (emp_fname = ‘Matthew’ AND dept_no = ‘d1’);

The third Boolean operator, NOT, changes the logical value of the corresponding condition. The truth table for NOT in Chapter 4 shows that the negation of the TRUE value is FALSE and vice versa; the negation of the NULL value is also NULL.

Example 6.8 shows the use of the NOT operator.

Example 6.8

Get the employee numbers and last names of all employees who do not belong to the department d2:

USE sample

SELECT emp_no, emp_lname

FROM employee

WHERE NOT dept_no = ‘d2’;

The result is

In this case, the NOT operator can be replaced by the comparison operator <> (not equal).

NOTE This book uses the operator <> (instead of !=) to remain consistent with the ANSI SQL standard.

2. IN and BETWEEN Operators

An IN operator allows the specification of two or more expressions to be used for a query search. The result of the condition returns TRUE if the value of the corresponding column equals one of the expressions specified by the IN predicate.

Example 6.9 shows the use of the IN operator.

Example 6.9

Get the first and last names and the corresponding employee number for every employee whose employee number equals 29346, 28559, or 25348:

USE sample;

SELECT emp_no, emp_fname, emp_lname

FROM employee

WHERE emp_no IN (29346, 28559, 25348);

The result is

An IN operator is equivalent to a series of conditions, connected with one or more OR operators. (The number of OR operators is equal to the number of expressions following the IN operator minus one.)

The IN operator can be used together with the Boolean operator NOT, as shown in Example 6.10. In this case, the query retrieves rows that do not include any of the listed values in the corresponding columns.

Example 6.10

Get all columns for every employee whose employee number is neither 10102 nor 9031:

USE sample;

SELECT emp_no, emp_fname, emp_lname, dept_no

FROM employee

WHERE emp_no NOT IN (10102, 9031);

The result is

In contrast to the IN operator, which specifies each individual value, the BETWEEN operator specifies a range, which determines the lower and upper bounds of qualifying values. Example 6.11 provides an example.

Example 6.11

Get the names and budgets for all projects with a budget between $95,000 and $120,000, inclusive:

USE sample;

SELECT project_name, budget

FROM project

WHERE budget BETWEEN 95000 AND 120000;

The result is

The BETWEEN operator searches for all values in the range inclusively; that is, qualifying values can be between or equal to the lower and upper boundary values.

The BETWEEN operator is logically equal to two individual comparisons, which are connected with the Boolean operator AND. Example 6.11 is equivalent to Example 6.12.

Example 6.12

USE sample;

SELECT project_name, budget

FROM project

WHERE budget >= 95000 AND budget <= 120000;

Like the BETWEEN operator, the NOT BETWEEN operator can be used to search for column values that do not fall within the specified range. The BETWEEN operator can also be applied to columns with character and date values.

The two SELECT statements in Example 6.13 show a query that can be written in two different, but equivalent, ways.

Example 6.13

Get the names of all projects with a budget less than $100,000 and greater than $150,000:

USE sample;

SELECT project_name

FROM project

WHERE budget NOT BETWEEN 100000 AND 150000;

The result is

Using comparison operators, the query looks different:

USE sample;

SELECT project_name

FROM project

WHERE budget < 100000 OR budget > 150000;

NOTE Although the English phrasing of the requirements, “Get the names of all projects with budgets that are less than $100,000 and greater than $150,000,” suggests the use of the AND operator in the second SELECT statement presented in Example 6.13, the logical meaning of the query demands the use of the OR operator, because if you use AND instead of OR, you will get no results at all. (The reason is that there cannot be a budget that is at the same time less than $100,000 and greater than $150,000.) Therefore, the second query in the example shows a possible problem that can appear between English phrasing of an exercise and its logical meaning.

2.1. Queries Involving NULL Values

A NULL in the CREATE TABLE statement specifies that a special value called NULL (which usually represents unknown or not applicable values) is allowed in the column. These values differ from all other values in a database. The WHERE clause of a SELECT statement generally returns rows for which the comparison evaluates to TRUE. The concern, then, regarding queries is, how will comparisons involving NULL values be evaluated in the WHERE clause?

All comparisons with NULL values will return FALSE (even when preceded by NOT). To retrieve the rows with NULL values in the column, Transact-SQL includes the operator feature IS NULL. This specification in a WHERE clause of a SELECT statement has the following general form:

column IS [NOT] NULL

Example 6.14 shows the use of the IS NULL operator.

Example 6.14

Get employee numbers and corresponding project numbers for employees with unknown jobs who work on project p2:

USE sample;

SELECT emp_no, project_no

FROM works_on

WHERE project_no = ‘p2’

AND job IS NULL;

The result is

Because all comparisons with NULL values return FALSE, Example 6.15 shows syntactically correct, but logically incorrect, usage of NULL.

Example 6.15

USE sample;

SELECT project_no, job

FROM works_on

WHERE job <> NULL;

The result is

The condition “column IS NOT NULL” is equivalent to the condition “NOT (column IS NULL).”

The system function ISNULL allows a display of the specified value as substitution for NULL (see Example 6.16).

Example 6.16

USE sample;

SELECT emp_no, ISNULL(job, ‘Job unknown’) AS task

FROM works_on

WHERE project_no = ‘p1’;

The result is

Example 6.16 uses a column heading called task for the job column.

3. LIKE Operator

LIKE is an operator that is used for pattern matching; that is, it compares column values with a specified pattern. The data type of the column can be any character or date. The general form of the LIKE operator is

column [NOT] LIKE ‘pattern’

pattern may be a string or expression (including columns of tables) and must be compatible with the data type of the corresponding column. For the specified column, the comparison between the value in a row and the pattern evaluates to TRUE if the column value matches the pattern expression.

Certain characters within the pattern-called wildcard characters—have a specific interpretation. Two of them are

  • % The percent sign specifies any sequence of zero or more characters.
  • _ The underscore specifies any single character.

Example 6.17 shows the use of the wildcard characters % and _.

Example 6.17

Get the first and last names and numbers of all employees whose first name contains the letter a as the second character:

USE sample;

SELECT emp_fname, emp_lname, emp_no

FROM employee

WHERE emp_fname LIKE ‘_a%’;

The result is

In addition to the percent sign and the underscore, Transact-SQL supports other characters that have a special meaning when used with the LIKE operator. These characters ([, ], and A) are demonstrated in Examples 6.18 and 6.19.

Example 6.18

Get full details of all departments whose location begins with a character in the range C through F:

USE sample;

SELECT dept_no, dept_name, location

FROM department

WHERE location LIKE ‘[C-F]%’;

The result is

As shown in Example 6.18, the square brackets, [ ], delimit a range or list of characters.

The order in which characters appear in a range is defined by the collating sequence, which is determined during the system installation.

The character a specifies the negation of a range or a list of characters. This character has this meaning only within a pair of square brackets, as shown in Example 6.19.

Example 6.19

Get the numbers and first and last names of all employees whose last name does not begin with the letter J, K, L, M, N, or O and whose first name does not begin with the letter E or Z:

USE sample;

SELECT emp_no, emp_fname, emp_lname

FROM employee

WHERE emp_lname LIKE ‘[J-O]%’

AND emp_fname LIKE ‘[*EZ]%’;

The result is

The condition “column NOT LIKE ‘pattern’” is equivalent to the condition “NOT (column LIKE ‘pattern’)”

Example 6.20 shows the use of the LIKE operator (together with NOT).

Example 6.20

Get full details of all employees whose first name does not end with the character n:

USE sample;

SELECT emp_no, emp_fname, emp_lname

FROM employee

WHERE emp_fname NOT LIKE ‘%n’;

The result is

Any of the wildcard characters (%, _, [, ], or ^) enclosed in square brackets stands for itself. An equivalent feature is available through the ESCAPE option. Therefore, both SELECT statements in Example 6.21 have the same meaning.

Example 6.21

USE sample;

SELECT project_no, project_name

FROM project

WHERE project_name LIKE ‘%[_]%’;

SELECT project_no, project_name

FROM project

WHERE project_name LIKE ‘%!_%’ ESCAPE ‘!’;

The result is

Both SELECT statements search for the underscore as an actual character in the column project_name. In the first SELECT statement, this search is established by enclosing the sign _ in square brackets. The second SELECT statement uses a character (in Example 6.21, the character !) as an escape character. The escape character overrides the meaning of the underscore as the wildcard character and leaves it to be interpreted as an ordinary character. (The result contains no rows because there are no project names that include the underscore character.)

NOTE The SQL standard supports the use of only %, and the ESCAPE operator. For this reason, if any wildcard character must stand for itself, using the ESCAPE operator instead of a pair of square brackets is recommended.

4. GROUP BY Clause

The GROUP BY clause defines one or more columns as a group such that all rows within any group have the same values for those columns. Example 6.22 shows the simple use of the GROUP BY clause.

Example 6.22

Get all jobs of the employees:

USE sample;

SELECT job

FROM works_on

GROUP BY job;

The result is

In Example 6.22, the GROUP BY clause builds different groups for all possible values (NULL, too) appearing in the job column.

NOTE There is a restriction regarding the use of columns in the GROUP BY clause. Each column appearing in the SELECT list of the query must also appear in the GROUP BY clause. This restriction does not apply to constants and to columns that are part of an aggregate function. (Aggregate functions are explained in the next subsection.) This makes sense, because only columns in the GROUP BY clause are guaranteed to have a single value for each group.

A table can be grouped by any combination of its columns. Example 6.23 shows the grouping of rows of the works_on table using two columns.

Example 6.23

Group all employees using their project numbers and jobs:

USE sample;

SELECT project_no, job

FROM works_on

GROUP BY project_no, job;

The result is

The result of Example 6.23 shows that there are nine groups with different combinations of project numbers and jobs. The only two groups that contain more than one row are

The sequence of the column names in the GROUP BY clause does not need to correspond to the sequence of the names in the SELECT list.

4.1. Aggregate Functions

Aggregate functions are functions that are used to get summary values. All aggregate functions can be divided into several groups:

  • Convenient aggregate functions
  • Statistical aggregate functions
  • User-defined aggregate functions
  • Analytic aggregate functions

The first three types are described in the following sections, while analytic aggregate functions are explained in detail in Chapter 24.

4.2. Convenient Aggregate Functions

The Transact-SQL language supports six aggregate functions:

  • MIN
  • MAX
  • SUM
  • AVG
  • COUNT
  • COUNT_BIG

All aggregate functions operate on a single argument, which can be either a column or an expression. The only exception is the second form of the COUNT and COUNT_BIG functions, COUNT(*) and COUNT_BIG(*). The result of each aggregate function is a constant value, which is displayed in a separate column of the result.

The aggregate functions appear in the SELECT list, which can include a GROUP BY clause. If there is no GROUP BY clause in the SELECT statement, and the SELECT list includes at least one aggregate function, then no simple columns can be included in the SELECT list (other than as arguments of an aggregate function). Therefore, Example 6.24 is wrong.

Example 6.24 (Example of an Illegal Statement)

USE sample;

SELECT emp_lname, MIN(emp_no)

FROM employee;

The emp_lname column of the employee table must not appear in the SELECT list of Example 6.24 because it is not the argument of an aggregate function. On the other hand, all column names that are not arguments of an aggregate function may appear in the SELECT list if they are used for grouping.

The argument of an aggregate function can be preceded by one of two keywords:

  • ALL Indicates that all values of a column are to be considered (ALL is the default value)
  • DISTINCT Eliminates duplicate values of a column before the aggregate function is applied

MIN and MAX Aggregate Functions The aggregate functions MIN and MAX compute the lowest and highest values in the column, respectively. If there is a WHERE clause, the MIN and MAX functions return the lowest or highest of values from selected rows. Example 6.25 shows the use of the aggregate function MIN.

Example 6.25

Get the lowest employee number:

USE sample;

SELECT MIN(emp_no) AS min_employee_no

FROM employee;

The result is

The result of Example 6.25 is not user friendly. For instance, the name of the employee with the lowest number is not known. As already shown, the explicit specification of the emp_name column in the SELECT list is not allowed. To retrieve the name of the employee with the lowest employee number, use a subquery, as shown in Example 6.26, where the inner query contains the SELECT statement of the previous example.

Example 6.26

Get the number and the last name of the employee with the lowest employee number:

USE sample;

SELECT emp_no, emp_lname FROM employee WHERE emp_no =

(SELECT MIN(emp_no)

FROM employee);

The result is

Example 6.27 shows the use of the aggregate function MAX.

Example 6.27

Get the employee number of the manager who was entered last in the works_on table:

USE sample;

SELECT emp_no

FROM works_on

WHERE enter_date =

(SELECT MAX(enter_date)

FROM works_on

WHERE job = ‘Manager’);

The result is


The argument of the functions MIN and MAX can also be a string value or a date. If the argument has a string value, the comparison between all values will be provided using the actual collating sequence. For all arguments of temporal data types, the earliest date specifies the lowest value in the column and the latest date specifies the highest value in the column.

The DISTINCT option cannot be used with the aggregate functions MIN and MAX. All NULL values in the column that are the argument of the aggregate function MIN or MAX are always eliminated before MIN or MAX is applied.

SUM Aggregate Function The aggregate function SUM calculates the sum of the values in the column. The argument of the function SUM must be numeric. Example 6.28 shows the use of the SUM function.

Example 6.28

Calculate the sum of all budgets of all projects:

USE sample;

SELECT SUM(budget) sum_of_budgets

FROM project;

The result is

The aggregate function in Example 6.28 groups all values of the projects’ budgets and determines their total sum. For this reason, the query in Example 6.28 (as does each analog query) implicitly contains the grouping function. The grouping function from Example 6.28 can be written explicitly in the query, as shown in Example 6.29.

Example 6.29

USE sample;

SELECT SUM(budget) sum_of_budgets

FROM project GROUP BY();

The use of this syntax for the GROUP BY clause is recommended because it defines a grouping explicitly. (Chapter 24 describes several other GROUP BY features.)

The use of the DISTINCT option eliminates all duplicate values in the column before the function SUM is applied. Similarly, all NULL values are always eliminated before SUM is applied.

AVG Aggregate Function The aggregate function AVG calculates the average of the values in the column. The argument of the function AVG must be numeric. All NULL values are eliminated before the function AVG is applied. Example 6.30 shows the use of the AVG aggregate function.

Example 6.30

Calculate the average of all budgets with an amount greater than $100,000:

USE sample;

SELECT AVG(budget) avg_budget

FROM project

WHERE budget > 100000;

The result is

COUNT and COUNT_BIG Aggregate Functions The aggregate function COUNT has two different forms:

COUNT([DISTINCT] col_name)

COUNT(*)

The first form calculates the number of values in the col_name column. When the DISTINCT keyword is used, all duplicate values are eliminated before COUNT is applied. This form of COUNT does not count NULL values for the column.

Example 6.31 shows the use of the first form of the aggregate function COUNT.

Example 6.31

Count all different jobs in each project:

USE sample;

SELECT project_no, COUNT(DISTINCT job) job_count

FROM works_on

GROUP BY project_no;

The result is

As can be seen from the result of Example 6.31, all NULL values are eliminated before the function COUNT(DISTINCT job) is applied.

The second form of the function COUNT, COUNT(*), counts the number of rows in the table. Or, if there is a WHERE clause in the SELECT statement, COUNT(*) returns the number of rows for which the WHERE condition is true. In contrast to the first form of the function COUNT, the second form does not eliminate NULL values, because this function operates on rows and not on columns. Example 6.32 shows the use of COUNT(*).

Example 6.32

Get the number of each job in all projects:

USE sample;

SELECT job, COUNT(*) job_count

FROM works_on

GROUP BY job;

The result is

The COUNT_BIG function is analogous to the COUNT function. The only difference between them is in relation to their return values: COUNT_BIG always returns a value of the BIGINT data type, while the COUNT function always returns a value of the INTEGER data type.

4.3. Statistical Aggregate Functions

The following aggregate functions belong to the group of statistical aggregate functions:

  • VAR Computes the variance of all the values listed in a column or expression
  • VARP Computes the variance for the population of all the values listed in a column or expression
  • STDEV Computes the standard deviation (which is computed as the square root of the corresponding variance) of all the values listed in a column or expression
  • STDEVP Computes the standard deviation for the population of all the values listed in a column or expression

Examples showing statistical aggregate functions will be provided in Chapter 24.

4.4. User-Defined Aggregate Functions

The Database Engine also supports the implementation of user-defined aggregate functions. Using these functions, you can implement and deploy aggregate functions that do not belong to aggregate functions included with the system. These functions are a special case of user-defined functions, which will be described in detail in Chapter 8.

5. HAVING Clause

The HAVING clause defines the condition that is then applied to groups of rows. Hence, this clause has the same meaning to groups of rows that the WHERE clause has to the content of the corresponding table. The syntax of the HAVING clause is

HAVING condition

where condition contains aggregate functions or constants.

Example 6.33 shows the use of the HAVING clause with the aggregate function COUNT(*).

Example 6.33

Get project numbers for all projects employing fewer than four persons:

USE sample;

SELECT project_no

FROM works_on

GROUP BY project_no

HAVING COUNT(*) < 4;

The result is

In Example 6.33, the system uses the GROUP BY clause to group all rows according to existing values in the project_no column. After that, it counts the number of rows in each group and selects those groups with three or fewer rows.

The HAVING clause can also be used without aggregate functions, as shown in Example 6.34.

Example 6.34

Group rows of the works_on table by job and eliminate those jobs that do not begin with the letter M:

USE sample;

SELECT job

FROM works_on

GROUP BY job

HAVING job LIKE ‘M%’;

The result is

The HAVING clause can also be used without the GROUP BY clause, although doing so is uncommon in practice. In such a case, all rows of the entire table belong to a single group.

6. ORDER BY Clause

The ORDER BY clause defines the particular order of the rows in the result of a query. This clause has the following syntax:

ORDER BY {[col_name | col_number [ASC | DESC]]},…

The col_name column defines the order. col_number is an alternative specification that identifies the column by its ordinal position in the sequence of all columns in the SELECT list (1 for the first column, 2 for the second one, and so on). ASC indicates ascending order and DESC indicates descending order, with ASC as the default value.

NOTE The columns in the ORDER BY clause need not appear in the SELECT list. However, the ORDER BY columns must appear in the SELECT list if SELECT DISTINCT is specified. Also, this clause cannot reference columns from tables that are not listed in the FROM clause.

As the syntax of the ORDER BY clause shows, the order criterion may contain more than one column, as shown in Example 6.35.

Example 6.35

Get department numbers and employee names for employees with employee numbers < 20000, in ascending order of last and first names:

USE sample;

SELECT emp_fname, emp_lname, dept_no

FROM employee

WHERE emp_no < 20000

ORDER BY emp_lname, emp_fname;

The result is

It is also possible to identify the columns in the ORDER BY clause by the ordinal position of the column in the SELECT list. The ORDER BY clause in Example 6.35 could be written in the following form:

ORDER BY 2,1

The use of column numbers instead of column names is an alternative solution if the order criterion contains any aggregate function. (The other way is to use column headings, which then appear in the ORDER BY clause.) However, using column names rather than numbers in the ORDER BY clause is recommended, to reduce the difficulty of maintaining the query if any columns need to be added or deleted from the SELECT list. Example 6.36 shows the use of column numbers.

Example 6.36

For each project number, get the project number and the number of all employees, in descending order of the employee number:

USE sample;

SELECT project_no, COUNT(*) emp_quantity

FROM works_on

GROUP BY project_no

ORDER BY 2 DESC

The result is

The Transact-SQL language orders NULL values at the beginning of all values if the order is ascending and orders them at the end of all values if the order is descending.

7. Using ORDER BY to Support Paging

If you want to display rows on the current page, you can either implement that in your application or instruct the database server to do it. In the former case, all rows from the database are sent to the application, and the application’s task is to retrieve the rows needed for printing and to display them. In the latter case, only those rows needed for the current page are selected from the server side and displayed. As you might guess, server-side paging generally provides better performance, because only the rows needed for printing are sent to the client.

The Database Engine supports two clauses in relation to server-side paging: OFFSET and FETCH. Example 6.37 shows the use of these two clauses.

Example 6.37

Get the business entity ID, job title, and birthday for all female employees from the AdventureWorks database in ascending order of job title. Display the third page. (Ten rows are displayed per page.)

USE AdventureWorks;

SELECT BusinessEntityID, JobTitle, BirthDate

FROM HumanResources.Employee

WHERE Gender = ‘F’

ORDER BY JobTitle

OFFSET 20 ROWS

FETCH NEXT 10 ROWS ONLY;

NOTE You can find further examples concerning the OFFSET clause in Chapter 24 (see Examples 24.24 and 24.25).

The OFFSET clause specifies the number of rows to skip before starting to return the rows. This is evaluated after the ORDER BY clause is evaluated and the rows are sorted. The FETCH NEXT clause specifies the number of rows to retrieve. The parameter of this clause can be a constant, an expression, or a result of a query. FETCH NEXT is analogous to FETCH FIRST.

The main purpose of server-side paging is to implement generic page forms, using variables. This can be done using a batch. The corresponding example can be found in Chapter 8 (see Example 8.5).

8. SELECT Statement and IDENTITY Property

The IDENTITY property allows you to specify a counter of values for a specific column of a table. Columns with numeric data types, such as TINYINT, SMALLINT, INT, and BIGINT, can have this property. The Database Engine generates values for such columns sequentially, starting with an initial value. Therefore, you can use the IDENTITY property to let the system generate unique numeric values for the table column of your choice. (The default value for the initial value and increment is 1.)

Each table can have only one column with the IDENTITY property. The table owner can specify the starting number and the increment value, as shown in Example 6.38.

Example 6.38

USE sample;

CREATE TABLE product

(product_no INTEGER IDENTITY(10000,1) NOT NULL,

product_name CHAR(30) NOT NULL,

price MONEY);

SELECT $identity

FROM product

WHERE product_name = ‘Soap’;

The result could be

The product table is created first in Example 6.38. This table has the product_no column with the IDENTITY property. The values of the product_no column are automatically generated by the system, beginning with 10000 and incrementing by 1 for every subsequent value: 10000, 10001, 10002, and so on.

Some system functions and variables are related to the IDENTITY property. Example 6.38 uses the $identity variable. As the result set of Example 6.38 shows, this variable automatically refers to the column with the IDENTITY property.

To find out the starting value and the increment of the column with the IDENTITY property, you can use the IDENT_SEED and IDENT_INCR functions, respectively, in the following way:

USE sample;

SELECT IDENT_SEED(‘product’), IDENT_INCR(‘product’);

As you already know, the system automatically sets identity values. If you want to supply your own values for particular rows, you must set the IDENTITY_INSERT option to ON before the explicit value will be inserted:

SET IDENTITY INSERT table name ON

NOTE Because the IDENTITY_INSERT option can be used to specify any values for a column with the IDENTITY property, IDENTITY does not generally enforce uniqueness. Use the UNIQUE or PRIMARY KEY constraint for this task.

If you insert values after the IDENTITY_INSERT option is set to ON, the system presumes that the next value is the incremented value of the highest value that exists in the table at that moment.

9. CREATE SEQUENCE Statement

Using the IDENTITY property has several significant disadvantages, the most important of which are the following:

  • You can use it only with the specified table.
  • You cannot obtain the new value before using it.
  • You can specify the IDENTITY property only when the column is created.

For these reasons, the Database Engine offers another solution called sequences. A sequence has the same semantics as the IDENTITY property but doesn’t have the limitations from the preceding list. Therefore, a sequence is an independent database feature that enables you to specify a counter of values for different database objects, such as columns and variables.

Sequences are created using the CREATE SEQUENCE statement. The CREATE SEQUENCE statement is specified in the SQL standard and is implemented in other relational database systems, such as IBM Db2 and Oracle.

Example 6.39 shows how sequences can be specified.

Example 6.39

USE sample;

CREATE SEQUENCE dbo.Sequence1

AS INT

START WITH 1 INCREMENT BY 5

MINVALUE 1 MAXVALUE 256

CYCLE;

The values of the sequence called Sequence1 in Example 6.39 are automatically generated by the system, beginning with 1 and incrementing by 5 for every subsequent value. Therefore, the START clause specifies the initial value, while the INCREMENT clause defines the incremental value. (The incremental value can be positive or negative.)

The following two optional clauses, MINVALUE and MAXVALUE, are directives, which specify a minimal and maximum value for a sequence object. (Note that MINVALUE must be less than or equal to the start value, while MAXVALUE cannot be greater than the upper boundary for the values of the data type used for the specification of the sequence.) The CYCLE clause specifies that the object should restart from the minimum value (or maximum value, for descending sequence objects) when its minimum (or maximum) value is exceeded. The default value for this clause is NO CYCLE, which means that an exception will be thrown if its minimum or maximum value is exceeded.

The main property of a sequence is that it is table-independent; that is, it can be used with any database object, such as a table’s column or variable. (This property positively affects storage and, therefore, performance. You do not need storage for a specified sequence; only the last-used value is stored.)

To generate new sequence values, you can use the NEXT VALUE FOR expression. Example 6.40 shows the use of this expression.

Example 6.40

USE sample;

SELECT NEXT VALUE FOR dbo.sequencel;

SELECT NEXT VALUE FOR dbo.sequence1;

The result is

You can use the NEXT VALUE FOR expression to assign the results of a sequence to a variable or to a column. Example 6.41 shows how you can use this expression to assign the results to a table’s column.

Example 6.41

USE sample;

CREATE TABLE dbo.table1

(column1 INT NOT NULL PRIMARY KEY, column2 CHAR(10));

INSERT INTO dbo.table1 VALUES (NEXT VALUE FOR dbo.sequence1, ‘A’);

INSERT INTO dbo.table1 VALUES (NEXT VALUE FOR dbo.sequence1, ‘B’);

Example 6.41 first creates a table called tablel with two columns. The following two INSERT statements insert two rows in this table. (For the syntax of the INSERT statement, see Chapter 7.) The first column has values 11 and 16, respectively. (These two values are subsequent values, following the generated values in Example 6.40.)

Example 6.42 shows how you can use the catalog view called sys.sequences to check the current value of the sequence, without using it. (Catalog views are described in detail in Chapter 9.)

Example 6.42

USE sample;

SELECT current_value

FROM sys.sequences

WHERE name = ‘sequence!’;

NOTE Generally, you use the NEXT VALUE FOR expression in the INSERT statement (see Chapter 7) to let the system insert generated values. You can also use the NEXT VALUE FOR expression as part of a multirow query by using the OVER clause (see Example 24.8 in Chapter 24).

The ALTER SEQUENCE statement modifies the properties of an existing sequence. One of the most important uses of this statement is in relation to the RESTART WITH clause, which “reseeds” a given sequence. Example 6.43 shows the use of the ALTER SEQUENCE statement to reinitialize (almost) all properties of the existing sequence called sequence1.

Example 6.43

USE sample;

ALTER SEQUENCE dbo.sequencel

RESTART WITH 100

INCREMENT BY 50

MINVALUE 50

MAXVALUE 10000

NO CYCLE;

To drop a sequence, use the DROP SEQUENCE statement.

9.1. Set Operators

In addition to the operators described earlier in the chapter, three set operators are supported in the Transact-SQL language:

  • UNION
  • INTERSECT
  • EXCEPT

NOTE The three set operators discussed in this section have different priorities for evaluation: the INTERSECT operator has the highest priority, EXCEPT is evaluated next, and the UNION operator has the lowest priority. If you do not pay attention to these different priorities, you will get unexpected results when you use several set operators together.

9.2. UNION Operator

The result of the union of two sets is the set of all elements appearing in either or both of the sets. Accordingly, the union of two tables is a new table consisting of all rows appearing in either one or both of the tables.

The general form of the UNION operator is

select_1 UNION [ALL] select_2 {[UNION [ALL] select_3]}…

select_1, select_2,… are SELECT statements that build the union. If the ALL option is used, all resulting rows, including duplicates, are displayed. The ALL option has the same meaning with the UNION operator as it has in the SELECT list, with one difference: the ALL option is the default in the SELECT list, but it must be specified with the UNION operator to display all resulting rows, including duplicates.

The sample database in its original form is not suitable for a demonstration of the UNION operator. For this reason, this section introduces a new table, employee_enh, which is identical to the existing employee table, up to the additional domicile column. The domicile column contains the place of residence of every employee.

The new employee_enh table has the following form:

Creation of the employee_enh table provides an opportunity to show the use of the INTO clause of the SELECT statement. SELECT INTO has two different parts. First, it creates the new table with the columns corresponding to the columns listed in the SELECT list. Second, it inserts the existing rows of the original table into the new table. (The name of the new table appears with the INTO clause, and the name of the source table appears in the FROM clause of the SELECT statement.)

Example 6.44 shows the creation of the employee_enh table.

Example 6.44

USE sample;

SELECT emp_no, emp_fname, emp_lname, dept_no

INTO employee_enh

FROM employee;

ALTER TABLE employee_enh

ADD domicile CHAR(25) NULL;

In Example 6.44, SELECT INTO generates the employee_enh table and inserts all rows from the initial table (employee) into the new one. Finally, the ALTER TABLE statement appends the domicile column to the employee_enh table.

NOTE In SQL Server 2016 and earlier, SELECT INTO creates a new table and stores it always into the default filegroup. Since SQL Server 2017, you can use the additional ON keyword of SELECT INTO to load a table into a non-default filegroup. For instance, If you want to load the employee_ enh table in the Employee_FSGroup filegroup (see Example 5.18), the SELECT statement in Example 6.44 will be this:

SELECT emp_no, emp_fname, emp_lname, dept_no

INTO employee_enh ON Employee_FSGroup

FROM employee;

After the execution of Example 6.44, the domicile column contains no values. The values can be added using SQL Server Management Studio (see Chapter 3) or the following UPDATE statements:

USE sample;
UPDATE employee_enh SET domicile = ‘San Antonio’
   WHERE emp_no = 25348;
UPDATE employee_enh SET domicile = ‘Houston’
   WHERE emp_no = 10102;
UPDATE employee_enh SET domicile = ‘San Antonio’
   WHERE emp_no = 18316;
UPDATE employee_enh SET domicile = ‘Seattle’
   WHERE emp_no = 29346;
UPDATE employee_enh SET domicile = ‘Portland’
   WHERE emp_no = 2581;
UPDATE employee_enh SET domicile = ‘Tacoma’
   WHERE emp_no = 9031;
UPDATE employee_enh SET domicile = ‘Houston’
   WHERE emp_no = 28559;

Example 6.45

USE sample;
SELECT domicile
FROM employee_enh
UNION
SELECT location
FROM department;

Two tables can be connected with the UNION operator if they are compatible with each other. This means that both the SELECT lists must have the same number of columns, and the corresponding columns must have compatible data types. (For example, INT and SMALLINT are compatible data types.)

The ordering of the result of the union can be done only if the ORDER BY clause is used with the last SELECT statement, as shown in Example 6.46. The GROUP BY and HAVING clauses can be used with the particular SELECT statements, but not with the union itself.

Example 6.46

Get the employee number for employees who either belong to department d1 or entered their project before 1/1/2017, in ascending order of employee number:

USE sample;

SELECT emp_no

FROM employee

WHERE dept_no = ‘d1’

UNION

SELECT emp_no

FROM works_on

WHERE enter_date < ‘01.01.2017’

ORDER BY 1;

The result is

The OR operator can be used instead of the UNION operator if all SELECT statements connected by one or more UNION operators reference the same table. In this case, the set of the SELECT statements is replaced through one SELECT statement with the set of OR operators.

10. INTERSECT and EXCEPT Operators

The two other set operators are INTERSECT, which specifies the intersection, and EXCEPT, which defines the difference operator. The intersection of two tables is the set of rows belonging to both tables. The difference of two tables is the set of all rows, where the resulting rows belong to the first table but not to the second one. Example 6.47 shows the use of the INTERSECT operator.

Example 6.47

USE sample;

SELECT emp_no

FROM employee

WHERE dept_no = ‘d1’

INTERSECT

SELECT emp_no

FROM works_on

WHERE enter_date < ‘01.01.2018’;

The result is

Example 6.48 shows the use of the EXCEPT set operator.

Example 6.48

USE sample;

SELECT emp_no

FROM employee

WHERE dept_no = ‘d3’

EXCEPT

SELECT emp_no

FROM works_on

WHERE enter_date > ‘01.01.2018’;

The result is

 

11. CASE Expressions

In database application programming, it is sometimes necessary to modify the representation of data. For instance, a person’s status can be coded using the values 1, 2, and 3 (for female, male, and child, respectively). Such a programming technique can reduce the time for the implementation of a program. The CASE expression in the Transact-SQL language makes this type of encoding easy to implement.

NOTE CASE does not represent a statement (as in most programming languages) but an expression. Therefore, the CASE expression can be used (almost) everywhere where the Transact-SQL language allows the use of an expression.

The CASE expression has two different forms:

  • Simple CASE expression
  • Searched CASE expression

The syntax of the simple CASE expression is

CASE expression_1

{WHEN expression_2 THEN result_1} …

[ELSE result_n]

END

A Transact-SQL statement with the simple CASE expression looks for the first expression in the list of all WHEN clauses that match expression_1 and evaluates the corresponding THEN clause. If there is no match, the ELSE clause is evaluated. Example 6.49 shows the use of the simple CASE expression.

Example 6.49

USE AdventureWorks;

GO

SELECT ProductNumber, Category =

CASE ProductLine

WHEN ‘R’ THEN ‘Road’
WHEN ‘M’ THEN ‘Mountain’
WHEN ‘T’ THEN ‘Touring’
WHEN ‘S’ THEN ‘Other sale items’
ELSE ‘Not for sale’

END,

Name
FROM Production.Product;

Example 6.49 uses the product table from the production schema of the AdventureWorks database. Depending on the abbreviated value stored in the ProductLine column of this table, the query in Example 6.49 displays the full name of each column value under the Category­heading. (The result of this example is too large to be displayed.)

The syntax of the searched CASE expression is

CASE

{WHEN condition_1 THEN result_1} …

[ELSE result_n]

END

A Transact-SQL statement with the searched CASE expression looks for the first expression that evaluates to TRUE. If none of the WHEN conditions evaluates to TRUE, the value of the ELSE expression is returned. Example 6.50 shows the use of the searched CASE expression.

Example 6.50

USE sample;

SELECT project_name,

CASE

WHEN budget > 0 AND budget < 100000 THEN 1

WHEN budget >= 100000 AND budget < 200000 THEN 2

WHEN budget >= 200000 AND budget < 300000 THEN 3 ELSE 4

END budget_weight FROM project;

The result is

In Example 6.50, budgets of all projects are weighted, and the calculated weights (together with the name of the corresponding project) are displayed.

Source: Petkovic Dusan (2020), Microsoft SQL Server 2019: A Beginner’s Guide, Seventh Edition-McGraw-Hill Education.

Leave a Reply

Your email address will not be published. Required fields are marked *