Creating SQL Database: Indexes (CREATE/DROP INDEX)

One of the physical storage structures that is provided by most SQL-based database management systems is an index, which is a structure that provides rapid access to the rows of a table based on the values of one or more columns. Figure 13-6 shows the PRODUCTS table and two indexes that have been created for it. One of the indexes provides access based on the DESCRIPTION column. The other provides access based on the primary key of the table, which is a combination of the MFR_ID and PRODUCT_ ID columns.

The DBMS uses the index as you might use the index of a book. The index stores data values and pointers to the rows where those data values occur. In the index the data values are arranged in ascending or descending order, so that the DBMS can quickly search the index to find a particular value. It can then follow the pointer to locate the row containing the value.

The presence or absence of an index is completely transparent to the SQL user who accesses a table. For example, consider this SELECT statement:

Find the quantity and price for size 4 widgets.

SELECT QTY_ON_HAND, PRICE

FROM PRODUCTS

WHERE DESCRIPTION = ‘Size 4 Widget’

The statement doesn’t say whether there is an index on the DESCRIPTION column, and the DBMS will carry out the query in either case.

If there were no index for the DESCRIPTION column, the DBMS would be forced to process the query by sequentially scanning the PRODUCTS table, row by row, examining the DESCRIPTION column in each row. To make sure it had found all of the rows that satisfied the search condition, it would have to examine every row in the table. For a large table with thousands or millions of rows, the scan of the table could take minutes or hours.

With an index for the DESCRIPTION column, the DBMS can locate the requested data with much less effort. It searches the index to find the requested value (“Size 4 widget”) and then follows the pointer to find the requested row(s) of the table. The index search is very rapid because the index is sorted and its rows are very small. Moving from the index to the row(s) is also very rapid because the index tells the DBMS where on the disk the row(s) are located.

As this example shows, the advantage of having an index is that it greatly speeds the execution of SQL statements with search conditions that refer to the indexed column(s). One disadvantage of having an index is that it consumes additional disk space. Another disadvantage is that the index must be updated every time a row is added to the table and every time the indexed column is updated in an existing row. This imposes additional overhead on INSERT and UPDATE statements for the table.

In general, it’s a good idea to create an index for columns that are used frequently in search conditions. Indexing is also more appropriate when queries against a table are more frequent than inserts and updates. Most DBMS products always establish an index for the primary key of a table, because they anticipate that access to the table will most frequently be via the primary key.

Most DBMS products also automatically establish an index for any column (or column combination) defined with a uniqueness constraint. The DBMS must check the value of such a column in any new row to be inserted, or in any update to an existing row, to make certain that the value does not duplicate a value already contained in the table. Without an index on the column(s), the DBMS would have to sequentially search through every row of the table to check the constraint. With an index, the DBMS can simply use the index to find a row (if it exists) with the value in question, which is a much faster operation than a sequential search.

In the sample database, these columns are good candidates for additional indexes:

The COMPANY column in the CUSTOMERS table should be indexed if customer data is often retrieved by company name.
The NAME column in the SALESREPS table should be indexed if data about salespeople is often retrieved by salesperson name.
The REP column in the ORDERS table should be indexed if orders are frequently retrieved based on the salesperson who took them.
The CUST column in the ORDERS table should similarly be indexed if orders are frequently retrieved based on the customer who placed them.
The MFR and PRODUCT columns, together, in the ORDERS table should be indexed if orders are frequently retrieved based on the product ordered.

The SQL2 standard doesn’t talk about indexes or how to create them. It treats database indexes as an implementation detail, which is outside of the core, standardized SQL language. However, the use of indexes is essential to achieve adequate performance in any sizeable enterprise-class database.

In practice, most popular DBMS brands (including Oracle, Microsoft SQL Server, Informix, Sybase, and DB2) support indexes through some form of the CREATE INDEX statement, shown in Figure 13-7. The statement assigns a name to the index and specifies the table for which the index is created. The statement also specifies the column(s) to be indexed and whether they should be indexed in ascending or descending order. The DB2 version of the CREATE INDEX statement, shown in Figure 13-7, is the most straightforward. Its only option is the keyword UNIQUE, which is used to specify that the combination of columns being indexed must contain a unique value for every row of the table.

The following is an example of a CREATE INDEX statement that builds an index for the ORDERS table based on the MFR and PRODUCT columns and that requires combinations of columns to have a unique value.

Create an index for the ORDERS table.

CREATE UNIQUE INDEX ORD_PROD_IDX

ON ORDERS (MFR, PRODUCT)

In most major DBMS products, the CREATE INDEX statement includes additional DBMS-specific clauses that specify the disk location for the index and for performancetuning parameters. Typical performance parameters include the size of the index pages, the percentage of free space that the index should allow for new rows, the type of index to be created, whether it should be clustered (an arrangement that places the physical data rows on the disk medium in the same sequence as the index), and so on. These options make the CREATE INDEX statement quite DBMS-specific in actual use.

Some DBMS products support two or more different types of indexes, which are optimized for different types of database access. For example, a B-tree index uses a tree structure of index entries and index blocks (groups of index entries) to organize the data values that it contains into ascending or descending order. This type of index provides efficient searching for a single value or for a range of values, such as the search required for an inequality comparison operator or a range test (BETWEEN) operation.

A different type of index, a hash index, uses a randomizing technique to place all of the possible data values into a moderate number of buckets within the index. For example, if there are 10,000,000 possible data values, an index with 500 hash buckets might be appropriate. Since a given data value is always placed into the same bucket, the DBMS can search for that value simply by locating the appropriate bucket and searching within it. With 500 buckets, the number of items to be searched is reduced, on average, by a factor of 500. This makes hash indexes very fast when searching for an exact match of a data value. But the assignment of values to buckets does not preserve the order of data values, so a hash index cannot be used for inequality or range searches.

Other types of indexes are appropriate for other specific DBMS situations. For example, a variation of the B-tree index, known as a T-tree index, is optimized for in-memory databases. A bit-map index is useful when there is a relatively small number of possible data values. When a DBMS supports multiple index types, the CREATE INDEX statement not only defines and creates the index, but defines its type as well.

If you create an index for a table and later decide that it is not needed, the DROP INDEX statement removes the index from the database. The statement removes the index created in the previous example:

Drop the index created earlier.

DROP INDEX ORD_PROD_IDX

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply Cancel reply

Login