Creating SQL Database: Table Definitions

The most important structure in a relational database is the table. In a multiuser production database, the major tables are typically created once by the database administrator and then used day after day. As you use the database, you will often find it convenient to define your own tables to store personal data or data extracted from other tables. These tables may be temporary, lasting for only a single interactive SQL session, or more permanent, lasting weeks or months. In a personal computer database, the table structure is even more fluid. Because you are both the user and the database administrator, you can create and destroy tables to suit your own needs, without worrying about other users.

1. Creating a Table (create table)

The CREATE TABLE statement, shown in Figure 13-1, defines a new table in the database and prepares it to accept data. The various clauses of the statement specify the elements of the table definition. The syntax diagram for the statement appears complex because there are so many parts of the definition to be specified and so many options for each element. In addition, some of the options are available in some DBMS brands or in the SQL2 standard, but not in other brands. In practice, creating a new table is relatively straightforward.

When you execute a CREATE TABLE statement, you become the owner of the newly created table, which is given the name specified in the statement. The table name must be a legal SQL name, and it must not conflict with the name of one of your existing tables. The newly created table is empty, but the DBMS prepares it to accept data added with the INSERT statement.

1.1. Column Definitions

The columns of the newly created table are defined in the body of the CREATE TABLE statement. The column definitions appear in a comma-separated list enclosed in parentheses. The order of the column definitions determines the left-to-right order of the columns in the table. In the CREATE TABLE statements supported by the major DBMS brands, each column definition specifies the following:

Column name. Used to refer to the column in SQL statements. Every column in the table must have a unique name, but the names may duplicate those of columns in other tables.
Data type. Identifies the kind of data that the column stores. Data types were discussed in Chapter 5. Some data types, such as VARCHAR and DECIMAL, require additional information, such as the length or number of decimal places in the data. This additional information is enclosed in parentheses following the keyword that specifies the data type.
Required data. Determines whether the column contains required data, and prevents NULL values from appearing in the column; otherwise, NULL values are allowed.
Default value. Uses an optional default value for the column when an INSERT statement for the table does not specify a value for the column.

The SQL2 standard allows several additional parts of a column definition, which can be used to require that the column contains unique values, to specify that the column is a primary key or a foreign key or to restrict the data values that the column may contain.

These are single-column versions of capabilities provided by other clauses in the CREATE TABLE statement and are described as part of that statement in the following sections.

Here are some simple CREATE TABLE statements for the tables in the sample database:

Define the OFFICES table and its columns

CREATE TABLE OFFICES
(OFFICE INTEGER NOT NULL,
CITY VARCHAR(15) NOT NULL,
REGION VARCHAR(10) NOT NULL,
MGR INTEGER,
TARGET MONEY,
SALES MONEY NOT NULL)

Define the ORDERS table and its columns.

The CREATE TABLE statement for a given table can vary slightly from one DBMS brand to another, because each DBMS supports its own set of data types and uses its own keywords to identify them in the column definitions. In addition, the SQL2 standard allows you to specify a domain instead of a data type within a column definition. (Domains were described in Chapter 11.) A domain is a specific collection of valid data values, which is defined within the database and assigned a name. The domain definition is based on one of the DBMS’ supported data types but performs additional data-value checking that restricts the legal values. For example, if this domain definition appeared in a SQL2-compliant database:

CREATE DOMAIN VALID_OFICE_ID INTEGER

CHECK (VALUE BETWEEN 11 AND 99)

then the OFFICES table definition could be modified to:

Define the OFFICES table and its columns.

CREATE TABLE OFFICES

(OFFICE VALID_OFFICE_ID NOT NULL,

CITY VARCHAR(15) NOT NULL,

REGION VARCHAR(10) NOT NULL,

MGR INTEGER,

TARGET MONEY,

SALES MONEY NOT NULL)

and the DBMS would automatically check any newly inserted rows to insure that their office numbers fall in the designated range. Domains are particularly effective when the same legal data values restrictions apply to many different columns within the database. In the sample database, office numbers appear in the OFFICES and the SALESREPS table, and the VALID_OFFICE_ID domain would be used to define the columns in both of these tables. In a real-world database, there may be dozens or hundreds of such columns whose data is drawn from the same domain.

1.2. Missing and Default Values

The definition of each column within a table tells the DBMS whether the data for the column is allowed to be missing—that is, whether the column is allowed to have a NULL value. In most of the major DBMS brands and in the SQL standard, the default is to allow missing data for a column. If the column must contain a legal data value for every row of a table, then its definition must include the NOT NULL clause. The Sybase DBMS products and Microsoft SQL Server use the opposite convention, assuming that NULL values are not allowed unless the column is explicitly declared NULL or the default nullability mode defined for the database is set to allow NULLs by default.

The SQL2 standard and many of the major SQL DBMS products support default values for columns. If a column has a default value, it is specified within the column definition. For example, here is a CREATE TABLE statement for the OFFICES table that specifies default values:

Define the OFFICES table with default values (ANSI/ISO syntax).

CREATE TABLE OFFICES

(OFFICE INTEGER NOT NULL,

CITY VARCHAR(15) NOT NULL,

REGION VARCHAR(10) NOT NULL DEFAULT ‘Eastern’,

MGR INTEGER DEFAULT 106,

TARGET MONEY DEFAULT NULL,

SALES MONEY NOT NULL DEFAULT 0.00)

With this table definition, only the office number and the city need to be specified when you insert a new office. The region defaults to Eastern, the office manager to Sam Clark (employee number 106), the sales to zero, and the target to NULL. Note that the target would default to NULL even without the DEFAULT NULL specification.

1.3. Primary and Foreign Key Definitions

In addition to defining the columns of a table, the CREATE TABLE statement identifies the table’s primary key and the table’s relationships to other tables in the database. The PRIMARY KEY and FOREIGN KEY clauses handle these functions. These clauses have been supported by the IBM SQL databases for some time and have been added to the ANSI/ISO specification. Most major SQL products support them.

The PRIMARY KEY clause specifies the column or columns that form the primary key for the table. Recall from Chapter 4 that this column (or column combination) serves as a unique identifier for each row of the table. The DBMS automatically requires that the primary key value be unique in every row of the table. In addition, the column definition for every column in the primary key must specify that the column is NOT NULL.

The FOREIGN KEY clause specifies a foreign key in the table and the relationship that it creates to another (parent) table in the database. The clause specifies:

The column or columns that form the foreign key, all of which are columns of the table being created.
The table that is referenced by the foreign key. This is the parent table in the relationship; the table being defined is the child.
An optional name for the relationship. The name is not used in any SQL data manipulation statements, but it may appear in error messages and is required if you want to be able to drop the foreign key later.
How the DBMS should treat a NULL value in one or more columns of the foreign key, when matching it against rows of the parent table.
An optional delete rule for the relationship (CASCADE, SET NULL, SET DEFAULT, or NO ACTION as described in Chapter 11), which determines the action to take when a parent row is deleted.
An optional update rule for the relationship as described in Chapter 11, which determines the action to take when part of the primary key in a parent row is updated.

An optional check constraint, which restricts the data in the table so that its rows meet a specified search condition.

Here is an expanded CREATE TABLE statement for the ORDERS table, which includes a definition of its primary key and the three foreign keys that it contains:

Define the ORDERS table with its primary and foreign keys.

CREATE TABLE ORDERS
(ORDER_NUM INTEGER NOT NULL,
ORDER_DATE DATE NOT NULL,
CUST INTEGER NOT NULL,
REP INTEGER,
MFR CHAR(3) NOT NULL,
PRODUCT CHAR(5) NOT NULL,
QTY INTEGER NOT NULL,
AMOUNT MONEY NOT NULL,
PRIMARY KEY (ORDER_NUM),
CONSTRAINT PLACEDBY
FOREIGN KEY (CUST)
REFERENCES CUSTOMERS
ON DELETE CASCADE,
CONSTRAINT TAKENBY
FOREIGN KEY (REP)
REFERENCES SALESREPS
ON DELETE SET NULL,
CONSTRAINT ISFOR
FOREIGN KEY (MFR, PRODUCT)
REFERENCES PRODUCTS
ON DELETE RESTRICT)

Figure 13-2 shows the three relationships created by this statement and the names it assigns to them. In general, it’s a good idea to assign a relationship name, because it helps to clarify the relationship created by the foreign key. For example, each order was placed by the customer whose number appears in the CUST column of the ORDERS table. The relationship created by this column has been given the name PLACEDBY.

When the DBMS processes the CREATE TABLE statement, it checks each foreign key definition against the definition of the table that it references. The DBMS makes sure that the foreign key and the primary key of the referenced table agree in the number of columns they contain and their data types. The referenced table must already be defined in the database for this checking to succeed.

Note that the FOREIGN KEY clause also specifies the delete and update rules that are to be enforced for the parent/child table relationship that it creates. Delete and update rules, and the actions that can trigger them, are described in Chapter 11. The DBMS enforces the default rules (NO ACTION) if no rule is explicitly specified.

If you want to create two or more tables from a referential cycle (like the OFFICES and SALESREPS tables in the sample database), you cannot include the foreign key definition in the first CREATE TABLE statement because the referenced table does not yet exist. The DBMS will reject the attempted CREATE TABLE statement with an error saying that the table definition refers to an undefined table. Instead, you must create the first table without its foreign key definition and add the foreign key later using the ALTER TABLE statement. (The SQL2 standard and several of the major DBMS products offer a different solution to this problem with the CREATE SCHEMA statement, which creates an entire set of tables at once. This statement and the other database objects that are included within a SQL2 schema are described later in the “SQL2 Schemas” section.)

1.4. Uniqueness Constraints

The SQL2 standard specifies that uniqueness constraints are also defined in the CREATE TABLE statement, using the UNIQUE clause shown in Figure 13-1. Here is a CREATE TABLE statement for the OFFICES table, modified to require unique CITY values:

Define the OFFICES table with a uniqueness constraint.

CREATE TABLE OFFICES
(OFFICE INTEGER NOT NULL,
CITY VARCHAR(15) NOT NULL,
REGION VARCHAR(10) NOT NULL,
MGR INTEGER,
TARGET MONEY,
SALES MONEY NOT NULL,
PRIMARY KEY (OFFICE),
CONSTRAINT HASMGR
FOREIGN KEY (MGR)
REFERENCES SALESREPS
ON DELETE SET NULL,
UNIQUE (CITY))

If a primary key, foreign key, uniqueness constraint, or check constraint involves a single column, the ANSI/ISO standard permits a shorthand form of the definition. The primary key, foreign key, uniqueness constraint, or check constraint is simply added to the end of the column definition, as shown in this example:

Define the OFFICES table with a uniqueness constraint (ANSI/ISO syntax).

CREATE TABLE OFFICES
(OFFICE INTEGER NOT NULL PRIMARY KEY,
CITY VARCHAR(15) NOT NULL UNIQUE,
REGION VARCHAR(10) NOT NULL,
MGR INTEGER REFERENCES SALESREPS,
TARGET MONEY,
SALES MONEY NOT NULL)

Several of the major DBMS brands, including SQL Server, Informix, Sybase, and DB2, support this shorthand.

1.5. Check Constraints

Another SQL2 data integrity feature, the check constraint (described in Chapter 11) is also specified in the CREATE TABLE statement. A check constraint specifies a check condition (identical in form to a search condition in a SQL query) that is checked every time an attempt is made to modify the contents of the table (with an INSERT, UPDATE, or DELETE statement). If the check condition remains TRUE after the modification, it is allowed; otherwise, the DBMS disallows the attempt to modify the data and returns an error. The following is a CREATE TABLE statement for the OFFICES table, with a very simple check condition to make sure the TARGET for the office is greater than $0.00.

Define the OFFICES table with a uniqueness constraint.

You can optionally specify a name for the check constraint, which will be used by the DBMS when it reports an error if the constraint is violated. Here is a slightly more complex check constraint for the SALESREPS table to enforce the rule “Salespeople whose hire date is later than January 1, 1988, shall not be assigned quotas higher than $300,000.” The CREATE TABLE statement names this constraint QUOTA_CAP:

CREATE TABLE SALESREPS
(EMPL_NUM INTEGER NOT NULL,
NAME VARCHAR (15) NOT NULL,

.
.
.

ONSTRAINT WORKSIN
FOREIGN KEY (REP_OFFICE)
REFERENCES OFFICES
ON DELETE SET NULL
CONSTRAINT QUOTA_CAP CHECK ((HIRE_DATE < “01-JAN-88”) OR
(QUOTA <= 300000)))

This check constraint capability is supported by many of the major DBMS brands.

1.6. Physical Storage Definition *

The CREATE TABLE statement typically includes one or more optional clauses that specify physical storage characteristics for a table. Generally, these clauses are used only by the database administrator to optimize the performance of a production database. By their nature, these clauses are very specific to a particular DBMS. Although they are of little practical interest to most SQL users, the different physical storage structures provided by various DBMS products illustrate their different intended applications and levels of sophistication.

Most of the personal computer databases provide very simple physical storage mechanisms. Many personal computer database products store an entire database within a single Windows file, or use a separate Windows file for each database table. They may also require that the entire table or database be stored on a single physical disk volume.

Multiuser databases typically provide more sophisticated physical storage schemes to support improved database performance. For example, Ingres allows the database administrator to define multiple named locations, which are physical directories where database data can be stored. The locations can be spread across multiple disk volumes to take advantage of parallel disk input/output operations. You can optionally specify one or more locations for a table in the Ingres CREATE TABLE statement:

CREATE TABLE OFFICES (table-definition)

WITH LOCATION = (AREA1, AREA2, AREA3)

By specifying multiple locations, you can spread a table’s contents across several disk volumes for greater parallel access to the table.

Sybase offers a similar approach, allowing the database administrator to specify multiple named logical database devices that are used to store data. The correspondence between Sybase’s logical devices and the actual physical disk drives of the computer system is handled by a Sybase utility program, and not within the SQL language. The Sybase CREATE DATABASE statement can then specify that a database should be stored on one or more database devices:

CREATE DATABASE OPDATA

ON DBFILE1, DBFILE2, DBFILE3

Within a given database device, Sybase then allows the database administrator to define logical segments, using one of the Sybase system-provided stored procedures. Finally, a Sybase CREATE TABLE statement can specify the segment where a table’s data is to be stored:

CREATE TABLE OFFICES (table-definition)

ON SEGMENT SEG1A

DB2 offers a similarly comprehensive scheme for managing physical storage, based on the concepts of tablespaces and nodegroups. A tablespace is a logical-level storage container, whereas nodegroups are defined more specifically in terms of physical storage. When you create a DB2 table, you can optionally assign it to a specific tablespace:

CREATE TABLE OFFICES (table-definition)

IN ADMINDB.OPSPACE

Unlike Sybase, DB2 puts most of the management of these storage entities within the SQL language itself, through the CREATE TABLESPACE and CREATE NODEGROUP statements. A consequence is that these statements include operating system-dependent specifications of filenames and directories, which vary from one supported DB2 operating system to another. Other clauses specify the DB2 buffer pool to be used, the overhead and transfer rate of the storage medium, and other characteristics closely related to the physical storage medium. DB2 uses this information in its performance optimization algorithms.

2. Removing a Table (drop table)

Over time, the structure of a database grows and changes. New tables are created to represent new entities, and some old tables are no longer needed. You can remove an unneeded table from the database with the DROP TABLE statement, shown in Figure 13-3.

The table name in the statement identifies the table to be dropped. Normally, you will be dropping one of your own tables and will use an unqualified table name. With proper permission, you can also drop a table owned by another user by specifying a qualified table name. Here are some examples of the DROP TABLE statement:

The CUSTOMERS table has been replaced by two new tables, CUST_INFO and ACCOUNT_INFO, and is no longer needed.

DROP TABLE CUSTOMERS

Sam gives you permission to drop his table, named BIRTHDAYS.

DROP TABLE SAM.BIRTHDAYS

When the DROP TABLE statement removes a table from the database, its definition and all of its contents are lost. There is no way to recover the data, and you would have to use a new CREATE TABLE statement to recreate the table definition. Because of its serious consequences, you should use the DROP TABLE statement with care.

The SQL2 standard requires that a DROP TABLE statement include either CASCADE or RESTRICT, which specifies the impact of dropping a table on other database objects (such as views, described in Chapter 14) that depend on the table. If CASCADE is specified, the DROP TABLE statement fails if other database objects reference the table. Most commercial DBMS products accept the DROP TABLE statement with no option specified.

3. Changing a Table Definition (alter table)

After a table has been in use for some time, users often discover that they want to store additional information about the entities represented in the table. In the sample database, for example, you might want to:

Add the name and phone number of a key contact person to each row of the CUSTOMERS table, as you begin to use it for contacting customers
Add a minimum inventory-level column to the PRODUCTS table, so the database can automatically alert you when stock of a particular product is low
Make the REGION column in the OFFICES table a foreign key for a newly created REGIONS table, whose primary key is the region name
Drop the foreign key definition linking the CUST column in the ORDERS table to the CUSTOMERS table, replacing it with two foreign key definitions linking the CUST column to the newly created CUST_INFO and ACCOUNT_INFO tables

Each of these changes, and some others, can be handled with the ALTER TABLE statement, shown in Figure 13-4. As with the DROP TABLE statement, you will normally use the ALTER TABLE statement on one of your own tables. With proper permission, however, you can specify a qualified table name and alter the definition of another user’s table. As shown in the figure, the ALTER TABLE statement can:

Add a column definition to a table
Drop a column from a table
Change the default value for a column
Add or drop a primary key for a table
Add or drop a new foreign key for a table
Add or drop a uniqueness constraint for a table
Add or drop a check constraint for a table

The clauses in Figure 13-4 are specified in the SQL standard. Many DBMS brands lack support for some of these clauses or offer clauses unique to the DBMS, which alters other table characteristics. The SQL2 standard restricts each ALTER TABLE statement to a single table change. To add a column and define a new foreign key, for example, requires two separate ALTER TABLE statements. Several DBMS brands relax this restriction and allow multiple action clauses in a single ALTER TABLE statement.

3.1. Adding a Column

The most common use of the ALTER TABLE statement is to add a column to an existing table. The column definition clause in the ALTER TABLE statement is just like the one in the CREATE TABLE statement, and it works the same way. The new column is added to the end of the column definitions for the table, and it appears as the rightmost column in subsequent queries. The DBMS normally assumes a NULL value for a newly added column in all existing rows of the table. If the column is declared to be NOT NULL with a default value, the DBMS instead assumes the default value.

Note that you cannot simply declare the new column NOT NULL, because the DBMS would assume NULL values for the column in the existing rows, immediately violating the constraint! (When you add a new column, the DBMS doesn’t actually go through all of the existing rows of the table adding a NULL or default value. Instead, it detects the fact that an existing row is too short for the new table definition when the row is retrieved, and extends it with a NULL or default value before displaying it or passing it to your program.)

Some sample ALTER TABLE statements that add new columns are:

Add a contact name and phone number to the CUSTOMERS table.

ALTER TABLE CUSTOMERS

ADD CONTACT_NAME VARCHAR(30)

ALTER TABLE CUSTOMERS

ADD CONTACT_PHONE CHAR(10)

Add a minimum inventory-level column to the PRODUCTS table.

ALTER TABLE PRODUCTS

ADD MIN_QTY INTEGER NOT NULL WITH DEFAULT 0

In the first example, the new columns will have NULL values for existing customers. In the second example, the MIN_QTY column will have the value zero (0) for existing products, which is appropriate.

When the ALTER TABLE statement first appeared in SQL implementations, the only major structures within a table were the column definitions, and it was very clear what the ADD clause meant. Since then, tables have grown to include primary and foreign key definitions and constraints, and the ADD clauses for these types of objects specify which type of object is being added. For consistency with these other ADD/DROP clauses, the SQL2 standard includes the optional keyword COLUMN after the keyword ADD. With this addition, the preceding example becomes:

Add a minimum inventory-level column to the PRODUCTS table.

ALTER TABLE PRODUCTS

ADD COLUMN MIN_QTY INTEGER NOT NULL WITH DEFAULT 0

3.2. Dropping a Column

The ALTER TABLE statement can be used to drop one or more columns from an existing table when they are no longer needed. Here is an example that drops the HIRE_DATE column from the SALESREPS table:

Drop a column from the SALESREPS table.

ALTER TABLE SALESREPS

DROP HIRE_DATE

The SQL2 standard forces you to issue a separate ALTER TABLE statement if you want to drop several columns, but several of the major DBMS brands allow you to drop multiple columns with a single statement.

Note that dropping a column can pose the same kinds of data-integrity issues that were described in Chapter 11 for database update operations. For example, if you drop a column that is a primary key in some relationship, the foreign key columns that refer to the dropped column become invalid. A similar problem can arise if you drop a column that is referenced in a check constraint—the column that provides the data value for
checking the constraint is now gone. A similar problem is created in views that are defined based on the dropped column.

The SQL2 standard deals with these issues the same way it handled the potential data-integrity problems posed by DELETE and UPDATE statements—with a drop rule (actually called a drop behavior in the standard) that operates just like the delete rules and update rules. You can specify one of two drop rules:

RESTRICT. If any other objects in the database (foreign keys, constraints, and so on) depend on the column to be dropped, the ALTER TABLE statement fails with an error and the column is not dropped.
CASCADE. Any other objects in the database (foreign keys, constraints, and so on) that depend on the column are also dropped as a cascaded effect of the ALTER TABLE statement.

The CASCADE effect can cause quite dramatic changes in the database; therefore, use it with care. It’s usually a better idea to use the RESTRICT mode (explicitly drop the dependent foreign keys and constraints, using the appropriate ALTER or DROP statements) before dropping the column.

3.3. Changing Primary and Foreign Keys

The other common use for the ALTER TABLE statement is to change or add primary key and foreign key definitions for a table. Since primary key and foreign key support is being provided in new releases of several SQL-based database systems, this form of the ALTER TABLE statement is particularly useful. It can be used to inform the DBMS about intertable relationships that already exist in a database, but which have not been explicitly specified before.

Unlike column definitions, primary key and foreign key definitions can be added and dropped from a table with the ALTER TABLE statement. The clauses that add primary key and foreign key definitions are exactly the same as those in the CREATE TABLE statement, and they work the same way. The clauses that drop a primary key or foreign key are straightforward, as shown in the following examples. Note that you can drop a foreign key only if the relationship that it creates was originally assigned a name. If the relationship was unnamed, there is no way to specify it in the ALTER TABLE statement. In this case, you cannot drop the foreign key unless you drop and re-create the table, using the procedure described for dropping a column.

Here is an example that adds a foreign key definition to an existing table:

Make the REGION column in the OFFICES table a foreign key for the newly created REGIONS table, whose primary key is the region name.

ALTER TABLE OFFICES

ADD CONSTRAINT INREGION

FOREIGN KEY (REGION)

REFERENCES REGIONS

Here is an example of an ALTER TABLE statement that modifies a primary key. Note that the foreign key corresponding to the original primary key must be dropped because it is no longer a foreign key for the altered table:

Change the primary key of the OFFICES table.

ALTER TABLE SALESREPS

DROP CONSTRAINT WORKSIN

FOREIGN KEY (REP_OFFICE)

REFERENCES OFFICES