OLAP (On-Line Analytical Processing)

OLAP can be defined as the interactive process of creating, managing and analyzing data, as well as reporting on the data. This data being usually perceived and manipulated as though it were stored in a multi-dimensional array. OLAP allows users to perform quick and effective analysis on large amounts of data. The data are stored in a multi-dimensional fashion that more closely models real business data. OLAP also allows users to access summary data faster and easier. OLAP applications present the end user with information rather than just data. They make it easy for users to identify patterns or trends in the data very quickly, without the  need  for  them  to search  through  huge  data.

OLAP systems are data warehouse front-end software tools to make aggregate data available efficiently, for advanced analysis, to managers of an enterprise. The analysis often requires resource intensive aggregations processing and therefore it becomes necessary to implement a special database (data warehouse) to improve OLAP response time. It is essential that an OLAP system provides facilities for a manager to pose adhoc complex queries to obtain the information that he/she requires. OLAP applications move into areas such as forecasting and data mining, allowing users to answer questions such as “What are our predicted costs for next year?” and “Show me our most successful salesman”.

Definition: OLAP is the dynamic enterprise analysis required to create, manipulate, animate and synthesize information from exegetical, contemplative and formulaic data analysis models.

Or

OLAP is  a  fast analysis  of  shared multidimensional  information  for advanced  analysis.

Or

OLAP, which is software technology that enables analysts, managers and executives to gain insight into data through fast, consistent, interactive access to a wide variety of possible views of information that, has been transformed from raw data to reflect that real dimensional of the  enterprise  as understood  by  the user.

1. Codd’s OLAP Characteristics

The most  important  characteristics OLAP systems  listed by  Codd  are as  follows:

  1. Multidimensional conceptual view: As noted above, this is central characteristic of an OLAP By requiring a multidimensional view, it is possible to carry out operations like slice and dice.
  2. Accessibility: The OLAP software should be sitting between data sources (g. data warehouse) and an  OLAP front-end.
  1. Batch extraction vs interpretive: An OLAP system should provide multidimensional data staging plus precalculation of aggregates in large multidimensional databases.
  2. Multi-user support: Since the OLAP system is shared, the OLAP software should provide many normal database operations including retrieval, update, concurrency control, integrity and security.
  3. Storing OLAP results: OLAP results data should be kept separate from source data. Read-write OLAP applications should not be implemented directly on live transaction data if OLAP source systems are supplying information to the OLAP system directly.
  4. Extraction of missing values: The OLAP system should distinguish missing values from zero A large data cube may have a large number of zeros as well as some missing values. If a distinction is not made between zero values and missing values, the aggregates are likely to be computed incorrectly.
  5. Treatment of missing values: An OLAP system should ignore all missing values regardless of their source. Correct aggregate values will be computed once the missing values are ignored.
  6. Uniform reporting performance: Increasing the number of dimensions or database size should not significantly degrade the reporting performance of the OLAP system. This is a  good  objective  although it  may  be  difficult  to achieve  in practice.
  7. Generic dimensionality: An OLAP system should treat each dimension as equivalent in both is structure and operational capabilities. Additional operational capabilities may be granted to selected dimensions but such additional functions should be grantable to any dimension.
  8. Unlimited dimensions and aggregation levels: An OLAP system should allow unlimited dimensions and aggregation Practically, the number of dimensions is rarely more than 10 and the number of hierarchies rarely more than six.

2. Difference between OLTP and OLAP

OLPT and OLAP are complementing technologies. The major differences between two OLTP and OLAP are  as  follows:

3. OLAP Operations

The most  common operations  of  OLAP systems  are  as follows:

(a) Roll-up (b) Drill-down, drill-up and Drill-across   (c) Slice and dice   (d) Pivot or rotate

(a) Roll-up: Roll-up is like zooming out on the data It is required when the user needs further abstraction or less detail. This operation performs further aggregations on the data.

(b) Drill-down, drill-up and Drill-across: Drill-down is like zooming in on the data and is therefore the reverse of roll-up. It is an appropriate operation when the user needs further details or when the user wants to partition more finely or wants to focus on some particular values of certain Drill-down adds more details to the data. The hierarchy defined on a dimension may be involved in drill-down. Roll-up and drill-down operations do not remove any events but change the level of granularity of a particular dimension.

Drill-up refers to the process of selecting a child field and displaying its parent field.

Drill-across describes  any changes  to the  sectioned fields.

(c) Slice and dice: This term is used to describe the process used to retrieve and view data stored in an OLAP Since data can be displayed in a two-dimensional format, the multi-dimensional cube must be restricted into flat “slices” of data. To pick a particular orientation of data in a cube, the user is actually “slicing and dicing” the data  in  order  to view  a  simple  flat  layout.

Slice: A slice is a subset of the cube corresponding to a single value for one or more members of the dimensions. For example, a slice operation is performed when the user wants a selection on one dimension of a three-dimensional cube resulting in a two-dimensional site.

In slicing, first step is to make a selection on one dimension of the given cube is performed which resulted in a sub-cube. The second step reduces the dimensionality of the cubes. The Third step sets one or more dimensions to specific values and keeps a subset of dimensions for selected values

Dice: The dice operation is similar to slice but dicing does not involve reducing the number of dimensions. A dice is obtained by performing a selection on two or more dimensions.

In dicing, the first step is to define a sub-cube by performing a selection of one or more dimensions. The second step refers to range select condition on one dimension, or to select condition on more than one dimension. The third step reduces the number of member  values  of  one  or  more  dimensions.

(d) Pivot or Rotate: The pivot operation is used when the user wishes to re-orient the view of the data It may involve swapping the rows and columns, or moving one of the  row  dimensions  into  the  column  dimension.

4. Types of OLAP Systems

There are three different ways of physically storing data that is held within an OLAP cube. These are ROLAP, MOLAP and HOLAP. Each method presents data as a cube but uses different underlying  technology  to achieve  the  results.

4.1. ROLAP

ROLAP is Relational On-Line Analytical Processing. ROLAP systems work primarily from the data that resides in a relational database, where the base data and dimension tables are stored as relational tables. OLAP describes OLAP applications that store all of the cube data, both base and high-level in relational tables. The application hides the presence of the tables by presenting the data in a cube layout. The multidimensional views are generated by combining base and aggregate data tables together with complicated SQL statements. This model permits multidimensional analysis of data as this enables users to perform a function equivalent to that of the traditional OLAP slicing and dicing feature. This is achieved thorough use of any SQL reporting tool to extract or ‘query’ data directly from the data warehouse.

Advantages of  ROLAP:  The  major  advantages  of  ROLAP are:

  1. One advantage of ROLAP over the other styles of OLAP analytic tools is that it is deemed to be more scalable in handling huge amounts of ROLAP sits on top of relational databases therefore enabling it to leverage several functionalities that a relational database is  capable  of.
  2. Since the data is kept in the relational database instead of on the OLAP server, you can view the  data  in  almost  real
  3. ROLAP is efficient in  managing both  numeric  and textual
  4. As the data is kept in the relational database, it allows for much larger amounts of data, which can mean  better
  5. It also permits users to “drill down” to the leaf details or the lowest level of a hierarchy

Disadvantages of  ROLAP:  The  major  disadvantages  of  ROLAP are:

  1. Major disadvantage of ROLAP is the performance. This type gives the poorest query performance because no objects  benefit  from multidimensional
  2. Other major disadvantage of ROLAP are that ROLAP applications display a slower performance as compared to other style of OLAP tools as in general calculations are performed inside the
  3. ROLAP is dependent on use of SQL for data manipulation this means it may not be ideal for performance of some calculations that are not easily translatable into an SQL

4.2. MOLAP

MOLAP stands for Multidimensional On-Line Analytical Processing. MOLAP is a classic form of OLAP. One of the major distinctions of MOLAP against a ROLAP is that data are pre-summarized and are stored in an optimized format in a multidimensional cube, instead of relational database. In MOLAP data are structured into proprietary formats in accordance with a client’s reporting requirements with the calculations pre-generated on the cubes.

This MOLAP applications store all of the cube data, both base and high-level in proprietary multidimensional data files. The application copies the base data from the underlying table into a multidimensional data format and then evaluates the consolidated values.

The multidimensional data views are automatically present in this method and performance is often very quick, particularly if the cubes are small enough to fit into RAM. More typically, the data  is  stored  in large disk  files.

MOLAP is the best OLAP tool to use in making analysis reports since this enables users to easily reorganize or rotate the cube structure to view different aspects of data. This is done by way of slicing and dicing. MOLAP analytic tool are also capable of performing complex calculations. Since calculations are predefined upon cube creation, this result in the faster return of computed data. MOLAP systems also provide users the ability to quickly write back data into a data set.

Advantages of  MOLAP:  The major  advantages  of  MOLAP are:

  1. Excellent performance since pre-aggregation provides quicker response time.
  2. The data is compressed in MOLAP so it takes up less space. In comparison to ROLAP, MOLAP is considerably less heavy on hardware due to compression techniques. Thus MOLAP is more optimized for fast query performance and retrieval of summarized information.
  3. Availability of extensive libraries of complex functions for OLAP analyses.
  4. Optimal for slice and dice operations.
  5. Performs better than ROLAP when data is dense.
  6. Since the data is stored on the OLAP server in optimized format, queries are faster than ROLAP.
  7. The browsing of Cube is fastest using MOLAP.

Disadvantages of  MOLAP:  The  major disadvantages  of  MOLAP are:

  1. The issue of sparsity e. In MOLAP, in general more than 90% of cells are empty.
  2. Scalability problem e. MOLAP can handle limited amount of data, since all calculations are performed when the cube is built. Therefore, it is not commonly used above 20–50 GB.
  3. Another disadvantage of this method is the duplication of base data that occurs when it is copied  into the  cube, requiring  extra disk  space and  processing time
  4. It is difficult to change dimension without re-aggregation.
  5. The MOLAP approach also introduces data  redundancy.
  6. There are certain MOLAP products that encounter difficulty in updating models with dimensions with very high cardinality.
  7. The data must be  copied  and moved  into data stores.
  8. It requires additional investment since cube technology is often proprietary and does not already exist  in organizations.
  9. It lacks security and  administration features  which  RDBMSs can bring.

4.3. HOLAP

Hybrid OLAP or HOLAP describes OLAP applications that store high-level data in proprietary multidimensional data files, but leave the underlying base data in the original data tables. HOLAP is the product of the attempt to incorporate the best features of MOLAP and ROLAP into a single architecture. The cube drives the multidimensional views, so the application requires a robust link between the multidimensional data file and the relational table that  stores  the  base  data  beneath  it.

HOLAP tried to bridge the technology gap of both products by enabling access or use to both multidimensional database (MDDB) and RDBMS data stores. HOLAP systems stores larger quantities of detailed data in the relational tables while the aggregations are stored in the pre-calculated cubes. HOLAP also has the capacity to “drill through” from the cube down to  the  relational  tables  for  delineated  data.

Advantages of  HOLAP:  The major  advantages  of HOLAP are:

  1. It combined advantages of both MOLAP and ROLAP.
  2. This method has the advantage of not requiring duplication of the base data, resulting in time and disk space savings.
  3. It can combine the ROLAP technology for sparse regions and MOLAP for dense Also ROLAP for storing the detailed data and MOLAP for higher-level summary data.
  4. HOLAP is best used when large amounts of aggregations are queried often with little detail It offers high performance and lower storage requirements.
  5. The cubes are smaller than MOLAP since the detail data is kept in the relational database
  6. The processing time is less than MOLAP since only aggregations are stored in multidimensional format.
  7. HOLAP has better scalability, quick data processing and flexibility in accessing of data sources.
  8. Low latency since processing takes place when changes occur and detail data is kept in the relational database.

Disadvantages of  HOLAP:  The  major disadvantages  of  HOLAP are:

  1. It is complex as HOLAP server must support both MOLAP and ROLAP engines and tools to combine both storage engines and operations.
  2. Functionality overlaps between storage and optimization techniques in ROLAP and MOLAP engines.
  3. It is as slow as ROLAP when you have to access leaf level data.
  4. The HOLAP has to be processed, when new records inserted.

Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Basic Database Management System, 2nd Edition-University Science Press (2017)

Leave a Reply

Your email address will not be published. Required fields are marked *