Factors Affecting Choice of File Organization

The major factors that affect the choice of file organization are as follows:

  1. Access type : In order to search a record, whether random access or sequential access is required.
  2. Access time : The total time taken by file organization to find a particular record.
  3. File size : Choice is also dependent upon file If file size is large then choose direct access otherwise  choose  sequential  access.
  4. Overhead : Each technique has some overhead (it may be space overhead, time overhead etc.). Any technique giving fast access may waste more space then slower techniques.
  5. Updation time : Time required to add, delete and modify any record also plays important role in efficiency.
  6. Complexity : If technique is fast then it is complex and Whether funds are available to  adopt  new  techniques.
  7. Availability of hardware : The hardware that supports file Example tape reader supports only  sequential  file organization.

SOLVED PROBLEMS

Problem 1. A patient record consists of the following fixed-length fields: the patient’s date of birth, social-security number, and patient ID, each 10 bytes long. It also has the following variable-length fields: name, address, and patient history. In addition, pointers within a record require 4 bytes, and  the record length is  a 4-byte integer.

  1. How many bytes, exclusive of the space needed for the variable fields, are needed for the record? You may assume that no alignment of fields is required.
  2. Assuming that the variable-length fields: name, address, and history, each have a length that is uniformly For the name, the range is 10–50 bytes; for address it is 20-80 bytes, and for history it is 0–1000 bytes. What is the average length of a  patient  record?
  3. Suppose that the patient records are augmented by an additional repeating field that represents cholesterol Each cholesterol test requires 16 bytes for a date and an integer result of the test. Show the layout of patient records if (a) the repeating tests are kept with the record itself; (b) the tests are stored on a separate block, with pointers to them  in  the  record.

Solution. (a) 3*10 + 2*4 + 4 = 42 bytes.

(b) 42 + (10 + 50)/2 + (20 + 80)/2 + (0 + 1000)/2 = 622 bytes.

(c)

Problem 2. A relational database system holds three relations: C (companies), P (products) and M (models) with the following characteristics:

Relation C (company):

  • Tuples are stored as fixed length, fixed format records, length 400 bytes.
  • There are 10,000 C tuples.
  • Tuples contain key attribute N (company number), length 20 bytes; other fields and record header make up rest.

Relation P (product):

  • Tuples are stored as fixed length, fixed format records, length 150 bytes.
  • There are 30,000 P tuples.
  • Tuples contain attribute N (the company number who makes the product), length 20 bytes; other fields and record header make up rest.
  • Tuples also contain attribute I (product identifier), length 20 bytes.

Relation M (model):

  • Tuples are stored as fixed length, fixed format records, length 100
  • There are 150,000 M
  • Tuples contain attribute I (the identifier of the product involved), length 20 bytes, and an attribute M.P (the price of the product), length 20 bytes; other fields and record header  make  up rest.

We have the following assumptions:

While the number of products associated with each company varies, for evaluation purposes we may assume that each company has 3 products, and each product has 5 model records associated with it. Thus, you can assume that there are 15 model records for  each  company record.

The records are to be stored in a collection of 8 kilobyte (8192 bytes) disk blocks that have been reserved to exclusively hold C, P, or M records, or combinations of those records, and indexes over the records. (That is, there are no other types of records in the blocks we are discussing in this problem.) Each block uses 50 bytes for its  header;  records  are not  spanned.

Two disk organization strategies are being considered:

1.  Sequential

  • All the company (C) records are placed sequentially (ordered by company number) in one subset of blocks.
  • Product (P) records are separate in another set of Products are ordered by company number.
  • Finally, model (M) records are in a third set of blocks, ordered by product identifier.

2.  Clustered

  • For each company (C) record, the 3 products for that company (C.N = N) reside in the same block.
  • Similarly, the 15 model records for those companies are in the same block.
  • The company records are sequenced by company number.

Answer the following two questions:

  1. For each storage organization, compute the total number of disk blocks required to hold all three
  2. Imagine that we are told there are two main query types:
    • Table scan: scan all of the company (C) records in order by company
    • Join: For a given company number N, get the company record followed by all its model records. That is, get all model (M) records with M.I = P.I for each P record with  P.N  = C.N.

Which storage organization would  you prefer for each  query type? Why?

Solution. (a) Sequential:

Clustered:

  • A company record, plus its associated product records, plus their associated model records,  is  400  +  3  ×  150  +  15  ×  100  =  2350  bytes.  Each  block  holds 8142/2350  =  3  companies  and  associated  product  and  model  tuples.  There  are 10,000  company  tuples,  so  10,000/3  =  3334  blocks.

(b) For table scan, the sequential organization is better. This is because all of the company tuples can be read in order without reading any product or model This reduces the number of I/Os.

For the join, the clustered organization is better. This is because a single block contains the answer to the join. Under the sequential organization, at least 3 and as many as 9  blocks  may  have  to  be  read.

Problem 3. Consider an internal hash table using an array of size 10. Using the hash function h(k) = k mod 10, show the result of inserting the elements 1,4,11,14,3,6,16,22,40,43 when open  addressing  is  used  for collision  resolution.

Solution. 40, 1, 11, 3, 4, 14, 6, 16, 22, 43

Problem 4.  Consider a  B-Tree of  order = 10.

  1. Which is the minimum  number of  nodes  in such  tree?
  2. Which is the maximum  number of  nodes in  such  tree?
  3. Which is the minimum number of  keys that  must be stored  in such  tree?
  4. Which is the maximum number of  keys that  can be stored  in such tree?
Solution.

(a) 1 + [5**(h–1) –1]/2

(b) (10**h –1)/9

(c) 2*(5**(h–1))–1

(d) 10**h –1

Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Database Management System, 2nd Edition-University Science Press (2017)

Leave a Reply

Your email address will not be published. Required fields are marked *