Comparison of Parallel and Distributed Databases (DDB’s)

SOLVED PROBLEMS

Problem 1. Given two very large tables R(A, B) and S(A, C) where all attributes are integers. The data in each table is randomly distributed across three servers N1, N2, and N3. Explain how a parallel DBMS can compute S in parallel using all three servers.

Solution. A common implementation of parallel join is through hashing. The algorithm works as follows:

  • On each server N1 through N3, the local DBMS instance will read the local data for relation It will apply a hash function to the join attribute R.A and it will send the tuple to one of N1 though N3 based on the hash value.

For example, we could compute R.A%3 and send the tuple to N1 if the result is 0,  N2  if  the  result is  1  and  N3  is the  result  is  2.

  • The local DBMS instance on each server will then perform the same operation on the local  data for  relation S.
  • Now, all tuples from R and S with the same value of the join attribute are located on the same The DBMS instance on each server can thus compute a local join  between  the  two  relations.

Problem 2. The following data structure and constraints exist for a magazine publishing company.

  • The company publishes one regional magazine each in Punjab (PB), Haryana (HR), Rajasthan (RJ), and Delhi (DL).
  • The company has 300,000 customers (subscribers) distributed throughout the four states as given above.
  • On the first of each month, an annual subscription INVOICE is printed and sent to each customer whose subscription is due for renewal. The INVOICE entity contains a REGION attribute to indicate the state (PB, HR, RJ, DL) in which the customer resides:

CUSTOMER (CUS_NUM, CUS_NAME, CUS_ADDRESS, CUS_CITY, CUS_STATE, CUS_ZIP, CUS_SUBSDATE)

INVOICE (INV_NUM, INV_REGION, CUS_NUM, INV_DATE, INV_TOTAL)

The company’s management has decided to decentralize the management of the subscriptions in its four regional subsidiaries. Each subscription site will handle its own customer and invoice data. The company’s management, however, wants to have access to customer and invoice data to generate annual reports and to issue ad hoc queries, such as:

  • List all current customers by
  • List all new customers by
  • Report all invoices by customer  and by

Given the above conditions and  the requirements, answer the  following questions:

  1. What recommendations will you make regarding the type and characteristics of the required database  system?
  2. What type of data fragmentation is needed for each table?
  3. What must be the criteria used to partition each database?
  4. Design the database Show an example with node names, location, fragment names, attribute names, and demonstration data.
  5. What type of distributed database operations must be supported at each remote site?
  6. What type of distributed database operations must be supported at the headquarters site?
Solution.

(a) The Magazine Publishing Company requires a distributed system with distributed database The distributed system will be distributed among the company locations in Punjab, Haryana, Rajasthan and Delhi.

The DDBMS must be able to support distributed transparency features, such as fragmentation transparency, replica transparency, transaction transparency, and performance transparency. Heterogeneous capability is not a mandatory feature since we assume there is no existing DBMS in place and that the company wants to standardize  on  a  single  DBMS.

(b) The database must be horizontally partitioned, using the STATE attribute for the CUSTOMER table and  the  REGION attribute  for  the  INVOICE table.

(c) The following fragmentation segments reflect the criteria used to partition each database:

Horizontal Fragmentation of the CUSTOMER Table by State

Horizontal Fragmentation of the INVOICE Table by Region

(d) Note the following fragments:

(e) To answer this question, you must first draw a map of the locations, the fragments at each location, and the type of transaction or request support required to access the data in the distributed database.

Given the problem’s specifications, you conclude that no interstate access of CUSTOMER or INVOICE data is required. Therefore, no distributed database access is required in the four nodes. For the headquarters, the manager wants to be able to access the data in all four nodes through a single SQL request. Therefore, the  DDBMS must  support distributed  requests.

(f) See the answer for part (e).

Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Basic Database Management System, 2nd Edition-University Science Press (2017)

Leave a Reply

Your email address will not be published. Required fields are marked *