The major factors that affect the choice of file organization are as follows:
- Access type : In order to search a record, whether random access or sequential access is required.
- Access time : The total time taken by file organization to find a particular record.
- File size : Choice is also dependent upon file If file size is large then choose direct access otherwise choose sequential access.
- Overhead : Each technique has some overhead (it may be space overhead, time overhead etc.). Any technique giving fast access may waste more space then slower techniques.
- Updation time : Time required to add, delete and modify any record also plays important role in efficiency.
- Complexity : If technique is fast then it is complex and Whether funds are available to adopt new techniques.
- Availability of hardware : The hardware that supports file Example tape reader supports only sequential file organization.
SOLVED PROBLEMS
Problem 1. A patient record consists of the following fixed-length fields: the patient’s date of birth, social-security number, and patient ID, each 10 bytes long. It also has the following variable-length fields: name, address, and patient history. In addition, pointers within a record require 4 bytes, and the record length is a 4-byte integer.
- How many bytes, exclusive of the space needed for the variable fields, are needed for the record? You may assume that no alignment of fields is required.
- Assuming that the variable-length fields: name, address, and history, each have a length that is uniformly For the name, the range is 10–50 bytes; for address it is 20-80 bytes, and for history it is 0–1000 bytes. What is the average length of a patient record?
- Suppose that the patient records are augmented by an additional repeating field that represents cholesterol Each cholesterol test requires 16 bytes for a date and an integer result of the test. Show the layout of patient records if (a) the repeating tests are kept with the record itself; (b) the tests are stored on a separate block, with pointers to them in the record.
Solution. (a) 3*10 + 2*4 + 4 = 42 bytes.
(b) 42 + (10 + 50)/2 + (20 + 80)/2 + (0 + 1000)/2 = 622 bytes.
(c)
Problem 2. A relational database system holds three relations: C (companies), P (products) and M (models) with the following characteristics:
Relation C (company):
- Tuples are stored as fixed length, fixed format records, length 400 bytes.
- There are 10,000 C tuples.
- Tuples contain key attribute N (company number), length 20 bytes; other fields and record header make up rest.
Relation P (product):
- Tuples are stored as fixed length, fixed format records, length 150 bytes.
- There are 30,000 P tuples.
- Tuples contain attribute N (the company number who makes the product), length 20 bytes; other fields and record header make up rest.
- Tuples also contain attribute I (product identifier), length 20 bytes.
Relation M (model):
- Tuples are stored as fixed length, fixed format records, length 100
- There are 150,000 M
- Tuples contain attribute I (the identifier of the product involved), length 20 bytes, and an attribute M.P (the price of the product), length 20 bytes; other fields and record header make up rest.
We have the following assumptions:
While the number of products associated with each company varies, for evaluation purposes we may assume that each company has 3 products, and each product has 5 model records associated with it. Thus, you can assume that there are 15 model records for each company record.
The records are to be stored in a collection of 8 kilobyte (8192 bytes) disk blocks that have been reserved to exclusively hold C, P, or M records, or combinations of those records, and indexes over the records. (That is, there are no other types of records in the blocks we are discussing in this problem.) Each block uses 50 bytes for its header; records are not spanned.
Two disk organization strategies are being considered:
1. Sequential
- All the company (C) records are placed sequentially (ordered by company number) in one subset of blocks.
- Product (P) records are separate in another set of Products are ordered by company number.
- Finally, model (M) records are in a third set of blocks, ordered by product identifier.
2. Clustered
- For each company (C) record, the 3 products for that company (C.N = N) reside in the same block.
- Similarly, the 15 model records for those companies are in the same block.
- The company records are sequenced by company number.
Answer the following two questions:
- For each storage organization, compute the total number of disk blocks required to hold all three
- Imagine that we are told there are two main query types:
- Table scan: scan all of the company (C) records in order by company
- Join: For a given company number N, get the company record followed by all its model records. That is, get all model (M) records with M.I = P.I for each P record with P.N = C.N.
Which storage organization would you prefer for each query type? Why?
Solution. (a) Sequential:
Clustered:
- A company record, plus its associated product records, plus their associated model records, is 400 + 3 × 150 + 15 × 100 = 2350 bytes. Each block holds 8142/2350 = 3 companies and associated product and model tuples. There are 10,000 company tuples, so 10,000/3 = 3334 blocks.
(b) For table scan, the sequential organization is better. This is because all of the company tuples can be read in order without reading any product or model This reduces the number of I/Os.
For the join, the clustered organization is better. This is because a single block contains the answer to the join. Under the sequential organization, at least 3 and as many as 9 blocks may have to be read.
Problem 3. Consider an internal hash table using an array of size 10. Using the hash function h(k) = k mod 10, show the result of inserting the elements 1,4,11,14,3,6,16,22,40,43 when open addressing is used for collision resolution.
Solution. 40, 1, 11, 3, 4, 14, 6, 16, 22, 43
Problem 4. Consider a B-Tree of order m = 10.
- Which is the minimum number of nodes in such tree?
- Which is the maximum number of nodes in such tree?
- Which is the minimum number of keys that must be stored in such tree?
- Which is the maximum number of keys that can be stored in such tree?
Solution.
(a) 1 + [5**(h–1) –1]/2
(b) (10**h –1)/9
(c) 2*(5**(h–1))–1
(d) 10**h –1
Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Database Management System, 2nd Edition-University Science Press (2017)