Denormalisation in Database

A denormalized data model is not the same as a data model that has not been normalized, and denormalization should only take place after a satisfactory level of normalization has taken place and that any required constraints and/or rules have been created to deal with the inherent  anomalies  in  the  design.

During the normalization process, the tables are decomposed into more tables. The more the tables, the more joins you have to perform in the queries. These joins have a negative impact on performance.

Denormalization is the process of attempting to optimize the read performance of a database by adding redundant data or by grouping data. In some cases, denormalization helps cover  up the  inefficiencies  inherent in  relational  database.

A normalized design will often store different but related pieces of information in separate logical tables. If these relations are stored physically as separate disk files, completing a database query that draws information from several relations can be slow. If many relations are joined, it may be prohibitively slow. There are two strategies for dealing with this. The preferred method is to keep the logical design normalized, but allow the DBMS to store additional redundant information on disk to optimize query response. In this case it is the DBMS software’s responsibility to ensure that any redundant copies are kept consistent.

The more usual approach is to denormalize the logical data design. You can achieve the same improvement in query response but now the database designer has the responsibility to ensure that the denormalized database does not become inconsistent. This is done by adding constraints in the database. These constraints specify how the redundant copies of information must  be  kept  synchronized.

Definition: Denormalization can be described as a process for reducing the degree of normalization with  the  aim of  improving  query processing  performance.

Or

Denormalization is the process  of putting one fact  in numerous places.

One of the main purposes of denormalization is to reduce the number of physical tables that must be accessed to retrieve the desired data by reducing the number of joins needed to derive a query answer. This speeds data retrieval at the expense of data modification. Denormalization has been utilized in many strategic database implementations to boost database performance and reduce query response times. One of the most useful areas for applying denormalization techniques is in data warehousing implementations for data mining transactions.

The primary goals of denormalization are to improve query performance and to present the end-user with a less complex and more user-oriented view of data. This is in part accomplished by reducing the number of physical tables and reducing the number of actual joins necessary  to derive  the answer  to a  query.

1. The Need of Denormalization

As discussed earlier, a relational design is denormalized to enhance performance. But, still there are many indicators by which you can identify systems and tables that are potential candidates for  denormalization.  These  are

  1. Many critical queries and reports exist that need to be processed in an on-line environment and rely upon data  from more  than one table.
  2. Many repeating groups exist which need to be processed in a group instead of individually.
  3. Many calculations need to be applied to one or many columns before queries can be successfully answered.
  4. Tables need to be accessed in different ways by different users during the same timeframe.
  5. Certain columns are queried a large percentage of the time, which makes them a candidate for denormalization.

2. Issues to be Considered when Deciding Denormalization

When deciding whether to denormalize or not, you have to analyze the data access requirements of the applications and their actual performance characteristics. In many cases a good indexing and other solutions solve many performance related problems. The basic issues that  must  be  examined  when  considering  denormalization are

  1. What are the critical transactions, and what  is the expected  response time?
  2. How often are the transactions executed?
  3. What tables or columns do the critical transactions use? How many rows do they access each time?
  4. What is the mix of transaction types: select, insert, update, and delete?
  5. What is the usual sort order?
  6. What are the concurrency expectations?
  7. How big are the most frequently accessed tables?
  8. Do any processes compute summaries?
  9. Where the data physically located?

3. Advantages of Denormalization

Denormalization has the following advantages:

  1. Denormalization can improve performance  by minimizing  the  need for joins.
  2. Denormalization can improve performance by reducing the number of foreign keys on tables.
  3. Denormalization can improve performance by reducing the number of indexes, saving storage space and  reducing  data modification time.
  4. Denormalization can improve performance by precomputing aggregate values, that is, computing them  at  data  modification time  rather  than  at  select  time
  5. In some cases denormalization can improve performance by reducing the number of tables.

4. Disadvantages of Denormalization

Denormalization has the following disadvantages:

  1. It usually speeds retrieval but can slow data modification.
  2. It is always application-specific and needs to be re-evaluated if the application changes.
  3. It can increase the size of tables.
  4. In some instances, it simplifies coding but in some others, it makes coding more complex.

Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Basic Database Management System, 2nd Edition-University Science Press (2017)

Leave a Reply

Your email address will not be published. Required fields are marked *