SQL Data Integrity: Entity Integrity

A table’s primary key must have a unique value for each row of the table, or the database will lose its integrity as a model of the outside world. For example, if two rows of the SALESREPS table both had value 106 in their EMPL_NUM column, it would be impossible to tell which row really represented the real-world entity associated with that key value—Bill Adams, who is employee number 106. For this reason, the requirement that primary keys have unique values is called the entity integrity constraint.

Support for primary keys was not found in the first commercial SQL databases but it is now very common. It was added to DB2 in 1988 and was added to the original ANSI/ISO SQL standard in an intermediate update, before the full SQL2 standard appeared. You specify the primary key as part of the CREATE TABLE statement, described in Chapter 13. The sample database definition in Appendix A includes primary key definitions for all of its tables, following the ANSI/ISO standard syntax.

When a primary key is specified for a table, the DBMS automatically checks the uniqueness of the primary key value for every INSERT and UPDATE statement performed on the table. An attempt to insert a row with a duplicate primary key value or to update a row so that its primary key would be a duplicate will fail with an error message.

1. Other Uniqueness Constraints

It is sometimes appropriate to require a column that is not the primary key of a table to contain a unique value in every row. For example, suppose you wanted to restrict the data in the SALESREPS table so that no two salespeople could have exactly the same name in the table. You could achieve this goal by imposing a uniqueness constraint on the NAME column. The DBMS enforces a uniqueness constraint in the same way that it enforces the primary key constraint. Any attempt to insert or update a row in the table that violates the uniqueness constraint will fail.

The ANSI/ISO SQL standard uses the CREATE TABLE statement to specify uniqueness constraints for columns or combinations of columns. However, uniqueness constraints were implemented in DB2 long before the publication of the ANSI/ISO standard, and DB2 made them a part of its CREATE INDEX statement. This statement is one of the SQL database administration statements that deals with physical storage of the database on the disk. Normally, the SQL user doesn’t have to worry about these statements at all; they are used only by the database administrator.

Many commercial SQL products followed the original DB2 practice rather than the ANSI/ISO standard for uniqueness constraints and required the use of a CREATE INDEX statement. Subsequent versions of DB2 added a uniqueness constraint to the CREATE TABLE statement. Most of the other commercial vendors have followed the same path, and now support the ANSI/ISO syntax for the uniqueness constraint.

2. Uniqueness and null Values

NULL values pose a problem when they occur in the primary key of a table or in a column that is specified in a uniqueness constraint. Suppose you tried to insert a row with a primary key that was NULL (or partially NULL, if the primary key is composed of more than one column). Because of the NULL value, the DBMS cannot conclusively decide whether the primary key duplicates one that is already in the table. The answer must be “maybe,” depending on the “real” value of the missing (NULL) data.

For this reason, the SQL standard requires that every column that is part of a primary key must be declared NOT NULL. The same restriction applies for every column that is named in a uniqueness constraint. Together, these restrictions ensure that columns that are supposed to contain unique data values in each row of a table actually do contain unique values.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *