File Organization: Indexing

An index is a collection of data entries which is used to locate a record in a file. Index table records consist of two parts, the first part consists of value of prime or non-prime attributes of file record known as indexing field and, the second part consists of a pointer to the location where the record is physically stored in memory. In general, index table is like the index of a book, that consists of the name of topic and the page number. During searching of a file record, index is searched to locate the record memory address instead of searching a record in secondary memory. On the basis of properties that affect the efficiency of searching,  the  indexes  can be  classified  into  two  categories.

  1. Ordered indexing
  2. Hashed indexing.

1. Ordered Indexing

In ordered indexing, records of file are stored in some sorted order in physical memory. The values in the index are ordered (sorted) so that binary search can be performed on the index. Ordered  indexes  can  be  divided into  two  categories.

  1. Dense indexing
  2. Sparse indexing.

1.1. Dense and Sparse Indexing

  • Dense index : In dense indexing there is a record in index table for each unique value of the search-key attribute of file and a pointer to the first data record with that The other records with the same value of search-key attribute are stored sequentially after the first record. The order of data entries in the index differs from the order of data records as shown  in Figure  3.20.

Advantages of  Dense index  The advantages  of  dense index  are:

  1. It is efficient technique for small and  medium sized data
  2. Searching is comparatively fast and

Disadvantages of  Dense index  The disadvantages  of  dense index  are:

  1. Index table is large and require more memory
  2. Insertion and deletion is comparatively
  3. In-efficient for large data

  • Sparse index : On contrary, in sparse indexing there are only some records in index table for unique values of the search-key attribute of file and a pointer to the first data record with that To search a record in sparse index we search for a value that is less than or equal to value in index for which we are looking. After getting the first record, linear search is performed to retrieve the desired record. There is at most one sparse index since it is not possible to build a sparse index that is not clustered.

Advantages of  Sparse index  The advantages  of sparse  index  are:

    1. Index table is small and hence  save memory  space (specially in  large files).
    2. Insertion and deletion is comparatively

Disadvantages of  Sparse index  The disadvantages  of sparse  index  are:

    1. Searching is comparatively slower, since index table is searched and then linear search is performed inside  secondary memory.

1.2. Clustered and Non-Clustered Indexes

  • Clustered index : In clustering, index file records are stored physically in order on a non-prime key attribute that does not have a unique value for each The non- prime key field is known as clustering field and index in known as clustering index. It is same as dense index. A file can have at most one clustered index as it can be clustered on at most one  search  key  attribute. It  may  be  sparse.
  • Non-clustered index : An index that is not clustered is known as non-clustered A data file can have  more  than one  non-clustered  index.

1.3.  Primary and Secondary Index

  • Primary index : A primary index consists of all prime-key attributes of a table and a pointer to physical memory address of the record of data To retrieve a record on the basis of all primary key attributes, primary index is used for fast searching. A binary search is done on index table and then directly retrieve that record from physical memory.  It may  be sparse.

Advantages of  Primary index  The major  advantages  of primary  index  are:

    1. Search operation is very
    2. Index table record is usually
    3. A primary index is guaranteed not to

Disadvantages of  Primary index  The major  disadvantages  of primary  index  are:

    1. There is only one primary index of a To search a record on less than all prime-key attributes, linear  search  is  performed on  index  table.
    2. To create a primary index of an existing table, records should be in some sequential order otherwise database  is required  to be adjusted.
  • Secondary index : A secondary index provides a secondary means of accessing a data A secondary index may be on a candidate key field or on non-prime key attributes of a table. To retrieve a record on the basis of non-prime key attributes, secondary index can be used for fast searching. Secondary index must be dense with a index entry for every search key value and a  pointer to every  record in a  file.

Advantages of  Secondary index  The major  advantages  of secondary  index  are:

    1. Improve search time if search on non-prime key
    2. A data file can have more than one secondary

Disadvantages of Secondary index : The major disadvantages of secondary index are:

    1. A secondary index usually  needs more  storage space.
    2. Search time is more than primary index.
    3. They impose a significant overhead on the modification of database.

1.4. Single and Multilevel Indexes

  • Single level indexes : A single stage index for a data file is known as single level A single level index cannot be divided. It is useful in small and medium size data files. If the file size is bigger, then single level, indexing is not a efficient method. Searching is faster than  other indexes  for  small  size data  files.
  • Multilevel indexes : A single index for a large size data file increases the size of index table and increases the search time that results in slower The idea behind multilevel indexes is that, a single level index is divided into multiple levels, which reduces search time.

In multilevel indexes, the first level index consists of two fields, the first field consists of a value of search key attributes and a second field consists of a pointer to the block (or second level index) which consists that values and so on.

To search a record in multilevel index, binary search is used to find the largest of all the small values or equal to the one that needs to be searched. The pointer points to a block of the inner index. After reaching to the desired block, the desired record is searched (in case of two-level indexing) otherwise again the largest of the small values or  equal  to  the  one  that  needs  to be  searched  and  so  on.

Benefits of multilevel indexes are they reduce search time significantly for large size data files.

2.  Hashed Indexing

To overcome the disadvantages of ordered indexing, a hash index can be created for a data file. Hashing allow us to avoid accessing an index structure. A hashed index consists of two fields, the first field consists of search key attribute values and second field consists of pointer to the hash file structure. Hashed indexing is based on values of records being uniformly distributed using a hashed function.

Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Database Management System, 2nd Edition-University Science Press (2017)

Leave a Reply

Your email address will not be published. Required fields are marked *