File Organization Techniques

A file organization is a way of arranging the records in a file when the file is stored on secondary storage (disk, tape etc). There are different types of file organizations that are used by applications. The operations to be performed and the selection of storage device are the major factors that influence the choice of a particular file organization. The different types of file  organizations  are  as follows:

  1. Heap file organization
  2. Sequential file organization
  3. Indexed—Sequential file organization
  4. Hashing or Direct file

1. Heap File Organization

In this file organization, the records are stored in the file, in the order in which they are inserted. All the new records are stored at the end of the file. This file organization is also called PILE FILE. This organization is generally used with additional access paths, like secondary indexes. Inserting a new record is very fast and efficient. But searching a record using any search condition involves a linear search through the file, which is comparatively more time consuming. Deleting a record from a heap file is not efficient as deletion of records resulted in wastage of storage space. It is generally used to store small files or in cases where data is difficult to organize. It is also used when data is collected at one place prior to  processing.

Advantages of Heap File Organization : The major advantages of Heap file organization are as  follows:

  1. Insertion of new record is fast and
  2. The filling factor of this file organization is 100%.
  3. Space is fully utilized and

Disadvantage of Heap File Organization: The major disadvantages of Heap file organization are as  follows:

  1. Searching and accessing of records is very
  2. Deletion of many records result in wastage of
  3. Updation cost of data is comparatively
  4. It has limited

2. Sequential File Organization

In sequential file organization, records are stored in a sequential order according to the “search key”. A Search key is an attribute or a set of attributes which are used to serialize the records. It is not necessary that search key must be primary key.

It is the simplest method of file organization. Sequential method is based on tape model. Devices who support sequential access are magnetic tapes, cassettes, card readers etc. Editors and compilers  also  use  this  approach  to access  files.

Structure of a sequential file is shown in Figure 3.3. The records are stored in sequential order one after another. To reach at the consequitive record from any record pointers are used. The  Pointers are  used  for fast  retrieval  of records.

To read any record from the file start searching from the very first record with the help of search key. Sequential file organization gives records in sorted form. This organization is used in  small  size  files.

File Operations  on  Sequentially  Organized  Files

Various operations that can be performed on sequential files are as follows:

  • Creating a sequential flle : To create a new file, first free space is searched in memory. After searching, enough space for file, allocate that space to the new file. Name and physical location of new file is entered in file system directory.

  • Open an existing flle : To open any file, enter the name of This name is searched in file system directory. After matching the name, records in the file are transferred from secondary memory to primary memory. Now, the file is ready to read/write.
  • Closing a flle : When task over file is completed then close that The memory space allocated for that file in primary memory is deallocated. If file was opened automatically with any programme then it will be closed automatically after ending that programme.
  • Reading a sequential flle : To read any record from a sequential file it is necessary to start from beginning of file until record is find or EOF is You cannot go directly to any particular record. Only linear search can be used to search any record.

Note: In sequential file system, update, delete, insert and append cannot be performed on any record directly. There is  a  procedure  to perform  all  these  operations.

 Procedure to Update File:

Step 1. Original file is known as Old Master file. Make a new file which is known as New  Master  file  as shown  in  Figure  3.6.

Step 2. To update nth record, First copy all the previous n–1 records to new master file.

Step 3. Make necessary updations to nth record.

Step 4. Now copy all remaining records including updated nth record to New master file and  old  master  file  is deleted.

The major problem is that it is very costly to make New Master file for every single updation. So  the modified  updation  procedure is  as follows  :

Modification in Procedure : Use a temporary file known as transaction file. The following procedure is followed to make a transaction file and then updating the records.

    1. Collect all the requests for
    2. Copy only those records in Transaction file from old Master file on which updation is to be
    3. Update the
    4. Now start copying records from Old Master file and Transaction file to New Master file accordingly to the primary
    5. At last old master file and transaction file are
  • Adding or Insertion a new record : To add or insert a new records in an existing sequential file, the  transaction  file  is used.

Ex. Consider, the old master file emp-records as shown in Figure 3.7.

The records that need to be added are collected in Transaction file as shown in Figure 3.8.

The EMP-ID is primary key. Now, add records in New Master file as shown in Figure 3.9. If primary key of record in Old Master file is less than record in Transaction file then first copy record of Old Master File into New Master file and vice versa.

This process is repeated until EOF is reached in both files.

The Old Master file and transaction file are deleted after adding the records.

  • Deletion of records : Suppose you want to delete some records from Emp-records (as shown in Figure 3.9) which are no more required. Then emp-records shown in Figure 3.9 is taken as old master file. Records that need to be deleted are stored in Transaction file.

Primary Key of Old Master file is matched with primary key of each record in Transaction file. If primary key is not matched then record is copied from Old Master file to New Master file otherwise discard that record. The process is repeated until EOF of the Old Master file is reached. After this process, Old Master file and transaction file  are  deleted.

  • Modiflcation of records : Suppose you want to modify any record of employee. Consider Emp-records in Figure 3.11 as Old Master file. Records that need to be modified are stored in Transaction file with changed values.

Primary key of Old Master file is matched with primary key of each record in Transaction file. If primary key is matched then modified record from Transaction file is copied into New Master file otherwise record from Old Master file is copied into New Master file. The process is repeated until EOF in Old Master file is reached. After this process, Old Master file and  transaction  file  are deleted.

Advantages : The advantages of sequential  files are as follows:

  1. It is easy to understand.
  2. Efficient file system for small size files.
  3. Construction and reconstruction of files are much easier in comparison to other file systems.
  4. Supports tape media, editors and compilers.
  5. It contains sorted records.

Disadvantages : The disadvantages of sequential files are as follows:

  1. Inefficient file system for medium and large size files.
  2. Updations and maintenance are not easy.
  3. Inefficient use of storage space because of fixed size blocks.
  4. Linear search takes more time.
  5. Before updations all transactions are stored sequentially.

3. Index Sequential File Organization

Index sequential file organization is used to overcome the disadvantages of sequential file organization. It also preserves the advantages of sequential access. This organization enables fast searching of records with the use of index. Some basic terms associated with index sequential file  organization  are  as follows:

Block : Block is a unit  of storage in which records are  saved.

Index : Index is a table with  a search key by which block of a record can be find.

Pointer : Pointer is a variable which points from index entry to starting address of block.

To manipulate any record, search key of index is entered to find the starting address of block and then required  record is searched sequentially  within the block.

3.1. Components of an Indexed Sequential File

  1. Prime area : During the creation of index sequential file, records are written in prime Records are maintained according to any key. Hence, prime area must be a sequential file.
  2. Overflow area : Overflow area is used during addition of new There are two types of  overflow  area:
    1. Cylinder overflow area : This area consists of free area on each cylinder which is reserved for overflow records for that particular
    2. Independent overflow area : This area is used to store overflow records from
  3. Record characteristics : Usually fixed length records are
  4. Indexes : Index is a collection of entries and each entry corresponds to block of
    1. First level index  : The  lowest  level  index is  known  as first  level
    2. Higher level index : Higher level indexing is used when first level index becomes too
    3. Cylinder index : The indexes can be made according to the hardware boundaries. Cylinder index entries consists one  entry for  each
    4. Master index : The highest level of index is known as master

3.2.  Operations on Index Sequential Files

  1. Creating a index sequential flle : After allocating free space and make necessary entries in file system directory all records are written into prime area in sequential manner according to  key
  2. Opening and closing an existing flle : It is same as sequential file
  3. Reading records from a index sequential flle (or searching any record) : To search any record enter the key value of record then first search the block of record in index then search record  sequentially within  the
  4. Modiflcation of records : To modify a record, first search that record and then modify Updated record is stored at same position if it has same key value and size equal to the original record. Otherwise original record is deleted and new record is inserted.
  5. Deletion of records : To delete a record, first search that Specific codes are used to indicate deleted records. Records consists of flags. Specific code is inserted to the flag  of that  record  that  needs to  be  deleted.
  6. Insertion of records : To insert a record, first find the desired block according to key value. Then confirm that record does not exist already. The record is first placed in overflow area  and  then copied  to  the desired  block.

Advantages : The advantages  of Index sequential files  are as follows:

  1. Efficient file system for medium and large size
  2. Easy to
  3. Easy to maintain than direct
  4. Efficient use of storage
  5. Searching of records are
  6. Maintain advantages of sequential file

Disadvantages : The disadvantages  of Index sequential files  are as follows:

  1. Inefficient file system for small size
  2. It is expensive
  3. Typical structure than sequential
  4. Indexes need additional storage
  5. Performance degradation r.t. growth of files.

4. Hashing

Hashing is a technique by which key field is converted into address of physical location or record by  using  any  function,  known as  hash  function.

There are various hashing techniques. These are as follows:

  • Mid square method : In mid square hash method, first compute the square of key value and then take the central digits, which is the address of that Suppose you have 100 records in one file and the key value is 81. First compute square of 81 which is (81)2 = 6561. After that take central digits of 6561. So, address of record having key value 81 is 56.
  • Folding method : In this method of hashing, first make partitions of key value and then fold
    • Boundary folding : In this technique various parts of key value are aligned by folding them as folding a At last digits are added to get address as shown in Figure  3.15.

    • Shift folding : In this technique various parts of key value are shifted laterally or you can say that  they are  simply  added as  shown  in Figure  16.

  • Division method : In division method, divide the key value with any number and take quotient as address of that Mostly prime number is taken as divisor. Consider a record having key value 550 which is divided by 5 gives 110 as quotient which is taken  as address of  that  record.
  • Division-remainder method : In division-remainder method, divide the key value with any number (Prime number) and take remainder as address of that The divisor must be greater than total number of records and small then all key values. Suppose there are 100 records in any file. A record has key value 660 which is divided by 109 gives 6 as remainder which is taken as address of that record.
  • Radix transformation method (Radix conversion method) : In radix conversion method, first key is converted into binary Then this string is partitioned into small strings. Then these strings are converted into decimal format. Suppose, a record has key value 269 whose binary equivalent is 100001101. Take small strings of 3 bits each. These are 100, 001, 101. Now convert them into decimal format which gives 4, 1, 5. These digits are treated as numbers with radix r. Suppose r is 5. Then address is

4 × 52 + 1 × 51 + 5 × 50 = 110

  • Polynomial conversion method : In polynomial conversion method every digit of key value is taken as coefficient of After obtaining that polynomial, divide it by another polynomial. The remainder polynomial gives the basis of address of that record.
  • Truncation method : In truncation method, some digits of key value are They may be left most digits, right most digits or central digits. The remaining digits are taken as address of record. Suppose a record is having key value 543189. Then truncate first two left most digits, which gives 3189 as address for that record.
  • Conversion using digital gates : In this method, convert key value into binary format then apply different operations on these digits and at last convert the result into decimal format  as  shown  in  Figure  3.17.

4.1. Collision and Synonyms

The main disadvantage of hashing is collision.

Collision : A collision occurs when a hash function f mapped more than one key value into same  physical  address.

Synonym : The keys which are mapped into same location are known as Synonyms. Consider the folding method in which key 248610 gives address 129 and key 925005 also gives address  129.  This is  a  collision and  key  248610 and  925005  are synonyms.

4.2. Techniques to Avoid Collisions

There are two main collision resolution technique :

  1. Open Addressing
  2. Overflow Chaining.

Both techniques are based on the fact that in case of collision, the synonym key record is write on another location.

(i) Open Addressing : In open addressing, the next free or empty position is searched to store synonym key If there is no empty position within a block then empty position in next block is searched. There are various probing methods in open addressing but the simplest one is  linear  probing.

  • Linear probing : Linear probing is the simplest technique which is based on cyclic proble sequence. Suppose, there are total n locations in a block. Suppose a key k is mapped into location  d which  is  not  empty then  the  searching sequence  becomes

d, d + 1, d + 2, d + 3, …, n, 1, 2, 3, …, d – 1

If there is no free location in that block then, start searching in next consecutive block. It is also known as Consecutive Spill Method.

Consider the example of relation Employee (Dept-ID, Name) where Dept-ID is key field for  mapping.

  • For employee Anand, Dept-ID is mapped into
  • For employee Rakesh, Dept-ID is mapped into
  • For employee Piyush, Dept-ID is mapped into
  • For employee Deepak, Dept-ID is mapped into A collision occurs because location 2 is not empty. So, searching is started from location 3. Location 4 is empty which is given to Deepak.
  • For employee Manoj, again Dept-ID is mapped into 2. A collision occurs because location 2 is not So, searching is started for an empty location. Here Block-I is full so searching is started in consecutive block, Block-II and location 5 is given to Manoj.

Disadvantages : The disadvantages of linear  probing are as follows:

  1. Creation of cluster of records (clustering effect).
  2. Searching time for free location is

b. Rehashing : To avoid clustering effect in linear probing more than one hashing functions can be If first hash function results collision then another hash function is used to resolve collision.  But this method  results extra overhead.

  • Overflow Chaining : In this method, there are two areas for storing records, primary area in which record is stored in normal conditions, if any collision occurs then synonym key record is stored in overflow  area.

Consider the example of relation employee shown in Figure 3.18. Now the employee Deepak and  Manoj  are stored  as  shown  in Figure  3.19.

Advantages : The advantages  of overflow chaining are  as follows:

  1. Less searching time
  2. No clustering
  3. More efficient than open addressing

5. Direct File Organization

To meet the requirement to access records randomly direct file organization is used. In direct file organization records can be stored anywhere in storage area but can be accessed directly, without any sequential searching. It overcomes the drawbacks of sequential, index sequential and B-trees  file  organization.

For an efficient organization and direct access of individual record, some mapping or transformation procedure is needed that converts key field of a record into its physical storage location.

Actually, direct file organization depends upon hashing that provides the base of mapping procedure. To overcome the drawbacks of hashing algorithm, collision resolution technique is needed.  Devices  that  support direct  access  are  CD’s, Floppy  etc.

Direct file organization is also known as Random File Organization. Choice of hashing algorithm and collision resolution technique is crucial point in direct file organization.

File Operations  on  Direct  Files

  • Creating a direct flle : After searching and allocating free space for file, necessary entries in system directory are In direct file organization, a hashing algorithm for mapping procedure and any collision resolution technique to avoid collisions during mapping are specified. The key field is also specified (It may not be primary key).
  • Open an existing file and closing a file are same as in other file
  • Searching (Reading or retrieving) from direct flle : To read any record from direct file, just enter the key field of that With the help of hashing algorithm that key field is mapped into physical location of that record. In case of any collision, collision resolution technique  is  used.
  • Updation of records in direct flle:
    1. Adding a new record : To add a new record in direct file, specify its key With the help of mapping procedure and collision resolution technique, get the free address location for  that  record.
    2. Deleting record from direct flle : To delete a record, first search that record and after searching, change  its  status  code  to  deleted or
    3. Modify any record : To modify any record, first search that record, then make the necessary Then re-write the modified record to the same location.

Advantages : The advantages of direct files are as follows:

  1. Records are not needed to be sorted in order during addition.
  2. It gives fastest retrieval of records.
  3. It gives efficient use of memory.
  4. Operations on direct file are fast so there is no need to collect same type of operations in a file,  as  in  sequential  file system.
  5. Searching time depends upon mapping procedure not logarithm of the number of search keys as  in  B-trees.
  6. Supports fast storage devices.

Disadvantages : The disadvantages of direct files are as follows:

  1. Wastage of storage space (Clustering) if hashing algorithm is not chosen
  2. It does not support sequential storage
  3. Direct file system is complex and hence
  4. Extra overhead due to collision resolution

Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Database Management System, 2nd Edition-University Science Press (2017)

Leave a Reply

Your email address will not be published. Required fields are marked *