File Organization: Indexing

An index is a collection of data entries which is used to locate a record in a file. Index table records consist of two parts, the first part consists of value of prime or non-prime attributes of file record known as indexing field and, the second part consists of a pointer to the location where the record is physically stored in memory. In general, index table is like the index of a book, that consists of the name of topic and the page number. During searching of a file record, index is searched to locate the record memory address instead of searching a record in secondary memory. On the basis of properties that affect the efficiency of searching, the indexes can be classified into two categories.

Ordered indexing
Hashed indexing.

1. Ordered Indexing

In ordered indexing, records of file are stored in some sorted order in physical memory. The values in the index are ordered (sorted) so that binary search can be performed on the index. Ordered indexes can be divided into two categories.

Dense indexing
Sparse indexing.

1.1. Dense and Sparse Indexing

Dense index : In dense indexing there is a record in index table for each unique value of the search-key attribute of file and a pointer to the first data record with that The other records with the same value of search-key attribute are stored sequentially after the first record. The order of data entries in the index differs from the order of data records as shown in Figure 3.20.

Advantages of Dense index : The advantages of dense index are:

It is efficient technique for small and medium sized data
Searching is comparatively fast and

Disadvantages of Dense index : The disadvantages of dense index are:

Index table is large and require more memory
Insertion and deletion is comparatively
In-efficient for large data

Sparse index : On contrary, in sparse indexing there are only some records in index table for unique values of the search-key attribute of file and a pointer to the first data record with that To search a record in sparse index we search for a value that is less than or equal to value in index for which we are looking. After getting the first record, linear search is performed to retrieve the desired record. There is at most one sparse index since it is not possible to build a sparse index that is not clustered.

Advantages of Sparse index : The advantages of sparse index are:

1. Index table is small and hence save memory space (specially in large files).
2. Insertion and deletion is comparatively

Disadvantages of Sparse index : The disadvantages of sparse index are:

1. Searching is comparatively slower, since index table is searched and then linear search is performed inside secondary memory.

1.2. Clustered and Non-Clustered Indexes

Clustered index : In clustering, index file records are stored physically in order on a non-prime key attribute that does not have a unique value for each The non- prime key field is known as clustering field and index in known as clustering index. It is same as dense index. A file can have at most one clustered index as it can be clustered on at most one search key attribute. It may be sparse.
Non-clustered index : An index that is not clustered is known as non-clustered A data file can have more than one non-clustered index.

1.3. Primary and Secondary Index

Primary index : A primary index consists of all prime-key attributes of a table and a pointer to physical memory address of the record of data To retrieve a record on the basis of all primary key attributes, primary index is used for fast searching. A binary search is done on index table and then directly retrieve that record from physical memory. It may be sparse.

Advantages of Primary index : The major advantages of primary index are:

1. Search operation is very
2. Index table record is usually
3. A primary index is guaranteed not to

Disadvantages of Primary index : The major disadvantages of primary index are:

1. There is only one primary index of a To search a record on less than all prime-key attributes, linear search is performed on index table.
2. To create a primary index of an existing table, records should be in some sequential order otherwise database is required to be adjusted.
Secondary index : A secondary index provides a secondary means of accessing a data A secondary index may be on a candidate key field or on non-prime key attributes of a table. To retrieve a record on the basis of non-prime key attributes, secondary index can be used for fast searching. Secondary index must be dense with a index entry for every search key value and a pointer to every record in a file.

Advantages of Secondary index : The major advantages of secondary index are:

1. Improve search time if search on non-prime key
2. A data file can have more than one secondary

Disadvantages of Secondary index : The major disadvantages of secondary index are:

1. A secondary index usually needs more storage space.
2. Search time is more than primary index.
3. They impose a significant overhead on the modification of database.

1.4. Single and Multilevel Indexes

Single level indexes : A single stage index for a data file is known as single level A single level index cannot be divided. It is useful in small and medium size data files. If the file size is bigger, then single level, indexing is not a efficient method. Searching is faster than other indexes for small size data files.
Multilevel indexes : A single index for a large size data file increases the size of index table and increases the search time that results in slower The idea behind multilevel indexes is that, a single level index is divided into multiple levels, which reduces search time.

In multilevel indexes, the first level index consists of two fields, the first field consists of a value of search key attributes and a second field consists of a pointer to the block (or second level index) which consists that values and so on.

To search a record in multilevel index, binary search is used to find the largest of all the small values or equal to the one that needs to be searched. The pointer points to a block of the inner index. After reaching to the desired block, the desired record is searched (in case of two-level indexing) otherwise again the largest of the small values or equal to the one that needs to be searched and so on.

Benefits of multilevel indexes are they reduce search time significantly for large size data files.

2. Hashed Indexing

To overcome the disadvantages of ordered indexing, a hash index can be created for a data file. Hashing allow us to avoid accessing an index structure. A hashed index consists of two fields, the first field consists of search key attribute values and second field consists of pointer to the hash file structure. Hashed indexing is based on values of records being uniformly distributed using a hashed function.

Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Database Management System, 2nd Edition-University Science Press (2017)

Basic Database

File Organization: Indexing

1. Ordered Indexing

1.1. Dense and Sparse Indexing

1.2. Clustered and Non-Clustered Indexes

1.3. Primary and Secondary Index

1.4. Single and Multilevel Indexes

2. Hashed Indexing

Leave a Reply Cancel reply

1. Ordered Indexing

1.1. Dense and Sparse Indexing

1.2. Clustered and Non-Clustered Indexes

1.3. Primary and Secondary Index

1.4. Single and Multilevel Indexes

2. Hashed Indexing

Leave a Reply Cancel reply

Login