A file organization is a way of arranging the records in a file when the file is stored on secondary storage (disk, tape etc). There are different types of file organizations that are used by applications. The operations to be performed and the selection of storage device are the major factors that influence the choice of a particular file organization. The different types of file organizations are as follows:
- Heap file organization
- Sequential file organization
- Indexed—Sequential file organization
- Hashing or Direct file
1. Heap File Organization
In this file organization, the records are stored in the file, in the order in which they are inserted. All the new records are stored at the end of the file. This file organization is also called PILE FILE. This organization is generally used with additional access paths, like secondary indexes. Inserting a new record is very fast and efficient. But searching a record using any search condition involves a linear search through the file, which is comparatively more time consuming. Deleting a record from a heap file is not efficient as deletion of records resulted in wastage of storage space. It is generally used to store small files or in cases where data is difficult to organize. It is also used when data is collected at one place prior to processing.
Advantages of Heap File Organization : The major advantages of Heap file organization are as follows:
- Insertion of new record is fast and
- The filling factor of this file organization is 100%.
- Space is fully utilized and
Disadvantage of Heap File Organization: The major disadvantages of Heap file organization are as follows:
- Searching and accessing of records is very
- Deletion of many records result in wastage of
- Updation cost of data is comparatively
- It has limited
2. Sequential File Organization
In sequential file organization, records are stored in a sequential order according to the “search key”. A Search key is an attribute or a set of attributes which are used to serialize the records. It is not necessary that search key must be primary key.
It is the simplest method of file organization. Sequential method is based on tape model. Devices who support sequential access are magnetic tapes, cassettes, card readers etc. Editors and compilers also use this approach to access files.
Structure of a sequential file is shown in Figure 3.3. The records are stored in sequential order one after another. To reach at the consequitive record from any record pointers are used. The Pointers are used for fast retrieval of records.
To read any record from the file start searching from the very first record with the help of search key. Sequential file organization gives records in sorted form. This organization is used in small size files.
File Operations on Sequentially Organized Files
Various operations that can be performed on sequential files are as follows:
- Creating a sequential flle : To create a new file, first free space is searched in memory. After searching, enough space for file, allocate that space to the new file. Name and physical location of new file is entered in file system directory.
- Open an existing flle : To open any file, enter the name of This name is searched in file system directory. After matching the name, records in the file are transferred from secondary memory to primary memory. Now, the file is ready to read/write.
- Closing a flle : When task over file is completed then close that The memory space allocated for that file in primary memory is deallocated. If file was opened automatically with any programme then it will be closed automatically after ending that programme.
- Reading a sequential flle : To read any record from a sequential file it is necessary to start from beginning of file until record is find or EOF is You cannot go directly to any particular record. Only linear search can be used to search any record.
Note: In sequential file system, update, delete, insert and append cannot be performed on any record directly. There is a procedure to perform all these operations.
Procedure to Update File:
Step 1. Original file is known as Old Master file. Make a new file which is known as New Master file as shown in Figure 3.6.
Step 2. To update nth record, First copy all the previous n–1 records to new master file.
Step 3. Make necessary updations to nth record.
Step 4. Now copy all remaining records including updated nth record to New master file and old master file is deleted.
The major problem is that it is very costly to make New Master file for every single updation. So the modified updation procedure is as follows :
Modification in Procedure : Use a temporary file known as transaction file. The following procedure is followed to make a transaction file and then updating the records.
-
- Collect all the requests for
- Copy only those records in Transaction file from old Master file on which updation is to be
- Update the
- Now start copying records from Old Master file and Transaction file to New Master file accordingly to the primary
- At last old master file and transaction file are
- Adding or Insertion a new record : To add or insert a new records in an existing sequential file, the transaction file is used.
Ex. Consider, the old master file emp-records as shown in Figure 3.7.
The records that need to be added are collected in Transaction file as shown in Figure 3.8.
The EMP-ID is primary key. Now, add records in New Master file as shown in Figure 3.9. If primary key of record in Old Master file is less than record in Transaction file then first copy record of Old Master File into New Master file and vice versa.
This process is repeated until EOF is reached in both files.
The Old Master file and transaction file are deleted after adding the records.
- Deletion of records : Suppose you want to delete some records from Emp-records (as shown in Figure 3.9) which are no more required. Then emp-records shown in Figure 3.9 is taken as old master file. Records that need to be deleted are stored in Transaction file.
Primary Key of Old Master file is matched with primary key of each record in Transaction file. If primary key is not matched then record is copied from Old Master file to New Master file otherwise discard that record. The process is repeated until EOF of the Old Master file is reached. After this process, Old Master file and transaction file are deleted.
- Modiflcation of records : Suppose you want to modify any record of employee. Consider Emp-records in Figure 3.11 as Old Master file. Records that need to be modified are stored in Transaction file with changed values.
Primary key of Old Master file is matched with primary key of each record in Transaction file. If primary key is matched then modified record from Transaction file is copied into New Master file otherwise record from Old Master file is copied into New Master file. The process is repeated until EOF in Old Master file is reached. After this process, Old Master file and transaction file are deleted.
Advantages : The advantages of sequential files are as follows:
- It is easy to understand.
- Efficient file system for small size files.
- Construction and reconstruction of files are much easier in comparison to other file systems.
- Supports tape media, editors and compilers.
- It contains sorted records.
Disadvantages : The disadvantages of sequential files are as follows:
- Inefficient file system for medium and large size files.
- Updations and maintenance are not easy.
- Inefficient use of storage space because of fixed size blocks.
- Linear search takes more time.
- Before updations all transactions are stored sequentially.
3. Index Sequential File Organization
Index sequential file organization is used to overcome the disadvantages of sequential file organization. It also preserves the advantages of sequential access. This organization enables fast searching of records with the use of index. Some basic terms associated with index sequential file organization are as follows:
Block : Block is a unit of storage in which records are saved.
Index : Index is a table with a search key by which block of a record can be find.
Pointer : Pointer is a variable which points from index entry to starting address of block.
To manipulate any record, search key of index is entered to find the starting address of block and then required record is searched sequentially within the block.
3.1. Components of an Indexed Sequential File
- Prime area : During the creation of index sequential file, records are written in prime Records are maintained according to any key. Hence, prime area must be a sequential file.
- Overflow area : Overflow area is used during addition of new There are two types of overflow area:
- Cylinder overflow area : This area consists of free area on each cylinder which is reserved for overflow records for that particular
- Independent overflow area : This area is used to store overflow records from
- Record characteristics : Usually fixed length records are
- Indexes : Index is a collection of entries and each entry corresponds to block of
- First level index : The lowest level index is known as first level
- Higher level index : Higher level indexing is used when first level index becomes too
- Cylinder index : The indexes can be made according to the hardware boundaries. Cylinder index entries consists one entry for each
- Master index : The highest level of index is known as master
3.2. Operations on Index Sequential Files
- Creating a index sequential flle : After allocating free space and make necessary entries in file system directory all records are written into prime area in sequential manner according to key
- Opening and closing an existing flle : It is same as sequential file
- Reading records from a index sequential flle (or searching any record) : To search any record enter the key value of record then first search the block of record in index then search record sequentially within the
- Modiflcation of records : To modify a record, first search that record and then modify Updated record is stored at same position if it has same key value and size equal to the original record. Otherwise original record is deleted and new record is inserted.
- Deletion of records : To delete a record, first search that Specific codes are used to indicate deleted records. Records consists of flags. Specific code is inserted to the flag of that record that needs to be deleted.
- Insertion of records : To insert a record, first find the desired block according to key value. Then confirm that record does not exist already. The record is first placed in overflow area and then copied to the desired block.
Advantages : The advantages of Index sequential files are as follows:
- Efficient file system for medium and large size
- Easy to
- Easy to maintain than direct
- Efficient use of storage
- Searching of records are
- Maintain advantages of sequential file
Disadvantages : The disadvantages of Index sequential files are as follows:
- Inefficient file system for small size
- It is expensive
- Typical structure than sequential
- Indexes need additional storage
- Performance degradation r.t. growth of files.
4. Hashing
Hashing is a technique by which key field is converted into address of physical location or record by using any function, known as hash function.
There are various hashing techniques. These are as follows:
- Mid square method : In mid square hash method, first compute the square of key value and then take the central digits, which is the address of that Suppose you have 100 records in one file and the key value is 81. First compute square of 81 which is (81)2 = 6561. After that take central digits of 6561. So, address of record having key value 81 is 56.
- Folding method : In this method of hashing, first make partitions of key value and then fold
- Boundary folding : In this technique various parts of key value are aligned by folding them as folding a At last digits are added to get address as shown in Figure 3.15.
-
- Shift folding : In this technique various parts of key value are shifted laterally or you can say that they are simply added as shown in Figure 16.
- Division method : In division method, divide the key value with any number and take quotient as address of that Mostly prime number is taken as divisor. Consider a record having key value 550 which is divided by 5 gives 110 as quotient which is taken as address of that record.
- Division-remainder method : In division-remainder method, divide the key value with any number (Prime number) and take remainder as address of that The divisor must be greater than total number of records and small then all key values. Suppose there are 100 records in any file. A record has key value 660 which is divided by 109 gives 6 as remainder which is taken as address of that record.
- Radix transformation method (Radix conversion method) : In radix conversion method, first key is converted into binary Then this string is partitioned into small strings. Then these strings are converted into decimal format. Suppose, a record has key value 269 whose binary equivalent is 100001101. Take small strings of 3 bits each. These are 100, 001, 101. Now convert them into decimal format which gives 4, 1, 5. These digits are treated as numbers with radix r. Suppose r is 5. Then address is
4 × 52 + 1 × 51 + 5 × 50 = 110
- Polynomial conversion method : In polynomial conversion method every digit of key value is taken as coefficient of After obtaining that polynomial, divide it by another polynomial. The remainder polynomial gives the basis of address of that record.
- Truncation method : In truncation method, some digits of key value are They may be left most digits, right most digits or central digits. The remaining digits are taken as address of record. Suppose a record is having key value 543189. Then truncate first two left most digits, which gives 3189 as address for that record.
- Conversion using digital gates : In this method, convert key value into binary format then apply different operations on these digits and at last convert the result into decimal format as shown in Figure 3.17.
4.1. Collision and Synonyms
The main disadvantage of hashing is collision.
Collision : A collision occurs when a hash function f mapped more than one key value into same physical address.
Synonym : The keys which are mapped into same location are known as Synonyms. Consider the folding method in which key 248610 gives address 129 and key 925005 also gives address 129. This is a collision and key 248610 and 925005 are synonyms.
4.2. Techniques to Avoid Collisions
There are two main collision resolution technique :
- Open Addressing
- Overflow Chaining.
Both techniques are based on the fact that in case of collision, the synonym key record is write on another location.
(i) Open Addressing : In open addressing, the next free or empty position is searched to store synonym key If there is no empty position within a block then empty position in next block is searched. There are various probing methods in open addressing but the simplest one is linear probing.
- Linear probing : Linear probing is the simplest technique which is based on cyclic proble sequence. Suppose, there are total n locations in a block. Suppose a key k is mapped into location d which is not empty then the searching sequence becomes
d, d + 1, d + 2, d + 3, …, n, 1, 2, 3, …, d – 1
If there is no free location in that block then, start searching in next consecutive block. It is also known as Consecutive Spill Method.
Consider the example of relation Employee (Dept-ID, Name) where Dept-ID is key field for mapping.
- For employee Anand, Dept-ID is mapped into
- For employee Rakesh, Dept-ID is mapped into
- For employee Piyush, Dept-ID is mapped into
- For employee Deepak, Dept-ID is mapped into A collision occurs because location 2 is not empty. So, searching is started from location 3. Location 4 is empty which is given to Deepak.
- For employee Manoj, again Dept-ID is mapped into 2. A collision occurs because location 2 is not So, searching is started for an empty location. Here Block-I is full so searching is started in consecutive block, Block-II and location 5 is given to Manoj.
Disadvantages : The disadvantages of linear probing are as follows:
- Creation of cluster of records (clustering effect).
- Searching time for free location is
b. Rehashing : To avoid clustering effect in linear probing more than one hashing functions can be If first hash function results collision then another hash function is used to resolve collision. But this method results extra overhead.
- Overflow Chaining : In this method, there are two areas for storing records, primary area in which record is stored in normal conditions, if any collision occurs then synonym key record is stored in overflow area.
Consider the example of relation employee shown in Figure 3.18. Now the employee Deepak and Manoj are stored as shown in Figure 3.19.
Advantages : The advantages of overflow chaining are as follows:
- Less searching time
- No clustering
- More efficient than open addressing
5. Direct File Organization
To meet the requirement to access records randomly direct file organization is used. In direct file organization records can be stored anywhere in storage area but can be accessed directly, without any sequential searching. It overcomes the drawbacks of sequential, index sequential and B-trees file organization.
For an efficient organization and direct access of individual record, some mapping or transformation procedure is needed that converts key field of a record into its physical storage location.
Actually, direct file organization depends upon hashing that provides the base of mapping procedure. To overcome the drawbacks of hashing algorithm, collision resolution technique is needed. Devices that support direct access are CD’s, Floppy etc.
Direct file organization is also known as Random File Organization. Choice of hashing algorithm and collision resolution technique is crucial point in direct file organization.
File Operations on Direct Files
- Creating a direct flle : After searching and allocating free space for file, necessary entries in system directory are In direct file organization, a hashing algorithm for mapping procedure and any collision resolution technique to avoid collisions during mapping are specified. The key field is also specified (It may not be primary key).
- Open an existing file and closing a file are same as in other file
- Searching (Reading or retrieving) from direct flle : To read any record from direct file, just enter the key field of that With the help of hashing algorithm that key field is mapped into physical location of that record. In case of any collision, collision resolution technique is used.
- Updation of records in direct flle:
- Adding a new record : To add a new record in direct file, specify its key With the help of mapping procedure and collision resolution technique, get the free address location for that record.
- Deleting record from direct flle : To delete a record, first search that record and after searching, change its status code to deleted or
- Modify any record : To modify any record, first search that record, then make the necessary Then re-write the modified record to the same location.
Advantages : The advantages of direct files are as follows:
- Records are not needed to be sorted in order during addition.
- It gives fastest retrieval of records.
- It gives efficient use of memory.
- Operations on direct file are fast so there is no need to collect same type of operations in a file, as in sequential file system.
- Searching time depends upon mapping procedure not logarithm of the number of search keys as in B-trees.
- Supports fast storage devices.
Disadvantages : The disadvantages of direct files are as follows:
- Wastage of storage space (Clustering) if hashing algorithm is not chosen
- It does not support sequential storage
- Direct file system is complex and hence
- Extra overhead due to collision resolution
Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Database Management System, 2nd Edition-University Science Press (2017)