SQL Data Integrity: Simple Validity Checking

The SQL1 standard provides limited support for restricting the legal values that can appear in a column. When a table is created, each column in the table is assigned a data type, and the DBMS ensures that only data of the specified type is introduced into the column. For example, the EMPL_NUM column in the SALESREPS table is defined as an INTEGER, and the DBMS will produce an error if an INSERT or UPDATE statement tries to store a character string or a decimal number in the column.

However, the SQL1 standard and many commercial SQL products do not provide a way to restrict a column to certain specific data values. The DBMS will happily insert a SALESREPS row with an employee number of 12345, even though employee numbers in the sample database have three digits by convention. A hire date of December 25 would also be accepted, even though the company is closed on Christmas day.

Some commercial SQL implementations provide extended features to check for legal data values. In DB2, for example, each table in the database can be assigned a corresponding validation procedure, a user-written program to check for valid data values. DB2 invokes the validation procedure each time a SQL statement tries to change or insert a row of the table, and gives the validation procedure the proposed column values for the row. The validation procedure checks the data and indicates by its return value whether the data is acceptable. The validation procedure is a conventional program (written in S/370 assembler or PL/I, for example), so it can perform whatever data value checks are required, including range checks and internal consistency checks within the row. However, the validation procedure cannot access the database, so it cannot be used to check for unique values or foreign key/primary key relationships.

SQL Server also provides a data validation capability by allowing you to create a rule that determines what data values can legally be entered into a particular column. SQL Server checks the rule each time an INSERT or UPDATE statement is attempted for the table that contains the column. Unlike DB2’s validation procedures, SQL Server rules are written in the Transact-SQL dialect that is used by SQL Server. For example, here is a Transact-SQL statement that establishes a rule for the QUOTA column in the SALESREPS table:

 CREATE RULE QUOTA_LIMIT

AS @VALUE BETWEEN 0.00 AND 500000.00

This rule prevents you from inserting or updating a quota to a negative value or to a value greater than $500,000. As shown in the example, SQL Server allows you to assign the rule a name (QUOTA_LIMIT, in this example). Like DB2 validation procedures, however, SQL Server rules may not reference columns or other database objects.

The SQL2 standard provides extended support for validity checking through two different features—column check constraints and domains. Both give the database creator a way to tell the DBMS how to determine whether a data value is valid. The check-constraint feature specifies the data validity test for a single column. The domain feature lets you specify the validity test once, and then reuse it in the definition of many different columns whose legal data values are the same.

1. Column Check Constraints (SQL2)

A SQL2 check constraint is a search condition, like the search condition in a WHERE clause, that produces a true/false value. When a check constraint is specified for a column, the DBMS automatically checks the value of that column each time a new row is inserted or a row is updated to ensure that the search condition is true. If not, the INSERT or UPDATE statement fails. A column check constraint is specified as part of the column definition within the CREATE TABLE statement, described in Chapter 13.

Consider this excerpt from a CREATE TABLE statement, modified from the definition of the demo database to include three check constraints:

 CREATE TABLE SALESREPS

 (EMPL_NUM INTEGER NOT NULL

  CHECK (EMPL_NUM BETWEEN 101 AND 199),

 AGE INTEGER

CHECK (AGE >= 21),

.

.

.

QUOTA MONEY

CHECK (MONEY >= 0.0)

.

.

.

The first constraint (on the EMPL_NUM column) requires that valid employee numbers be three-digit numbers between 101 and 199. The second constraint (on the AGE column) similarly prevents hiring of minors. The third constraint (on the QUOTA column) prevents a salesperson from having a quota target less than $0.00.

All three of these column check constraints are very simple examples of the capability specified by the SQL2 standard. In general, the parentheses following the keyword CHECK can contain any valid search condition that makes sense in the context of a column definition. With this flexibility, a check constraint can compare values from two different columns of the table, or even compare a proposed data value against other values from the database. These capabilities are more fully described in the “Advanced Constraint Capabilities” section later in this chapter.

2. Domains (SQL2)

A SQL2 domain generalizes the check-constraint concept and allows you to easily apply the same check constraint to many different columns within a database. A domain is a collection of legal data values. You specify a domain and assign it a domain name using the SQL2 CREATE DOMAIN statement, described in Chapter 13. As with the check-constraint definition, a search condition is used to define the range of legal data values. For example, here is a SQL2 CREATE DOMAIN statement to create the domain VALID_EMPLOYEE_ID, which includes all legal employee numbers:

CREATE DOMAIN VALID_EMPLOYEE_ID INTEGER

CHECK (VALUE BETWEEN 101 AND 199)

After the VALID_EMPLOYEE_ID domain has been defined, it may be used to define columns in database tables instead of a data type. Using this capability, the example CREATE TABLE statement for the SALESREPS table would appear as:

CREATE TABLE SALESREPS

(EMPL_NUM VALID_EMPLOYEE_ID,

AGE INTEGER

CHECK (AGE >= 21),

.

.

.

QUOTA MONEY

CHECK (MONEY >= 0.0)

.

.

.

The advantage of using the domain is that if other columns in other tables also contain employee numbers, the domain name can be used repeatedly, simplifying the table definitions. The OFFICES table contains such a column:

 CREATE TABLE OFFICES

(OFFICE INTEGER NOT NULL,

   CITY VARCHAR(15) NOT NULL,

 REGION VARCHAR(10) NOT NULL,

    MGR VALID_EMPLOYEE_ID,

 TARGET MONEY,

  SALES MONEY NOT NULL

.

.

.

Another advantage of domains is that the definition of valid data (such as valid employee numbers, in this example) is stored in one central place within the database. If the definition changes later (for example, if the company grows and employee numbers in the range 200-299 must be allowed), it is much easier to change one domain definition than to change many column constraints scattered throughout the database. In a large enterprise database, there may literally be hundreds of defined domains, and the benefits of SQL2 domains for change management can be very substantial.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *