Definition: Data warehouse is a collection of data designed to support management decision- making.
Another definition is : Data warehousing is the process, whereby, organizations extract meaning from their informational assets through the use of data warehouses. (Barguin, 1996).
Another definition is : A data warehouse is a subject oriented, integrated, time variant, nonvolatile collection of data used in support of management decision making processes. (Inmon and Hackathorn, 1994). The meaning of the key terms in this definition is as follows :
- Subject oriented : In data warehouse, data is organized and optimized according to specific subjects or areas of interest of the organization rather than simply as computer files. Examples of major subject areas include Customers, Products, Accounts, Transactions etc. It provides capability to provide answers to various queries coming from various functional areas within an organization.
- Integrated : Data warehouse is A single source of information for and about understanding multiple areas of interest. It provides information about a variety of subjects at one place. The input data comes from various sources in inconsistent form. Data warehouse refines the data to make it consistent and provides a unified view of overall organisational data to users. So, data warehouse is a centralized depository of data of the entire organisation which helps in better understanding of organisation’s operations for strategic business opportunities and hence increase decision making capabilities.
- Non-volatile : Data warehouse contains stable information that doesn’t change each time an operational process is The data in the data warehouse are loaded and refreshed from operational systems, but cannot be changed by end users. New data is always added as a supplement to database, rather than a replacement.
- Time-variant : The data in Data warehouse is only accurate and valid at some point in time or over some time Data contains a time-dimension so that they may be used as a historical record of business’. i.e., sales statistics of previous week.
- Accessible : The primary purpose of a data warehouse is to provide readily accessible information to end users.
A number of separate technologies have come together to make Data warehousing possible to implement. However, it may be deployed physically, the data warehouse may be viewed as a single, consistent state of information with appropriate tools to provide valuable information about a business.
1. Distinctive Characteristics of Data Warehouses
The data warehouses have many characteristics that make them different from others.
- It typically integrates several resources g., sales databases from various regions/states/ years.
- It requires more historical data than generally maintained in operational databases.
- It must be optimized for access to very large amounts of data.
- It is mostly read-accessed and rarely write-accessed.
- Data may be more coarse grained than in operational databases.
- Data warehouses are maintained separately from operational data.
- It is based on client-server architecture.
- It provides multi-user support.
- It is capable of handling dynamic sparse matrices.
- It provides multidimensional conceptual view.
- It supports unrestricted cross-dimensional operations.
- It maintains transparency.
- It provides consistent and flexible reporting performance.
- Its having unlimited dimensions and aggregation levels.
2. Difference between Database and Data Warehouse
There are many differences in database and data warehouse. These differences are in the organization and data stored in both. The various differences are as follows:
3. Data Warehouse Architecture
The hardware, software and data resources required to construct the data warehouse depends upon the organization that wants to construct it. The needs and resources available, forces the decisions of the organization regarding the architecture of a particular data warehouse. There are many phases, that are common to all the data warehouses regardless of the organization or the design selected. The most common phases are acquisition of data, storage of data and data access. The general architecture of a data warehouse is shown in Figure 13.1.
Acquisition of Data : All the data warehouses must have a source from where the data is acquired. Most of the data in the data warehouse is derived from the operational data of the organization. The required data is extracted, filtered, translated and integrated into the data storage environment.
Storage of Data : The large amounts of operational data that is historical in nature are defined, indexed and then partitioned to allow for economic and efficient access.
Data Access : A number of data mining applications allow many users throughout the organization to retrieve, analyze, query and generate reports. The ability to access data is fundamental to the concept of data warehouse in the organization.
4. Data Warehouse Components
There are mainly six components of a data warehouse. These are as follows:
- Summarized data
- Operational data-store
- Integration/Transformation programs
- Detailed data
- Meta data
- Archives
- Summarized data : The raw data generated by a transaction-processing system may be too large to store However, many queries can be answered by just maintaining the summary data obtained by aggregation on a relation, rather than maintain the entire relation. Summary data is classified into two categories—Lightly summarized and Highly summarized.
- Lightly summarized data : This represents data distilled from current detailed It is summarized according to some unit of time and always resides on disk.
- Highly summarized data : This represents data distilled from lightly summarized It is more compact and easily accessible and resides on disk.
- Operational data store : Operational databases are the source data for the data Operational data store is a repository of operational data.
- Integration/transformation programs : The integration and transformation programs convert the operational data that is applications specific into enterprise The major functions performed by these programs are as follows :
- Reformatting, re-evaluation or changing key structures.
- Adding time elements.
- Default values identification.
- Providing logic to choose between multiple data sources.
- Summarizing, tallying and merging data from multiple sources.
These programs are modified when operational or data warehouse environments change to reflect the changes.
- Detailed data : Detailed data is of two types—Older detail data and current detail The older detail data represent data that is not very recent, may be as old as ten years or longer. It is voluminous and most frequently stored on mass storage such as tape. The current detail data represent data of a recent nature and always has a shorter time horizon than older detail data. It can be voluminous; it is almost always stored on disk to permit faster access.
- Meta data : Meta data is data about Meta data for data warehouse users are part of the data warehouse itself and controls access and analysis of the data warehouse contents. The meta data repository is a key data warehouse component. It contains both technical and business meta data. The technical meta data cover details about acquisition, processing, storage structure, data descriptions, warehouse operations and maintenance and access support functionality. The business meta data covers the relevant business rules and organizational details supporting the warehouse.
- Archives : These contain old or historical data of significant interest and have value to the It is generally used for forecasting and trend analysis, thus, these archives store old data and the meta data that describe the characteristics of the old data.
5. Advantages of Data Warehouse
The data warehouse has many advantages for an enterprise. The most important one are as follows.
- Effective decision making : The major benefit of a data warehouse is its ability to analyze and execute business decisions based on data from multiple By using data warehouse, one can look at past trends and may be do some predictions of what is going to happen in the future.
- Increases the productivity of business analysts : The data warehouse can provide analysts with pre-calculated reports and graphs, that increase the productivity of business analysts.
- An enterprise can maintain better customer relationships by correlating all customer data through a single data warehouse.
- It provides supplementing disaster recovery plans with another data back up source.
- Business and information re-engineering : By knowing what information is important to the enterprise, that is possible by using data warehousing, the re-engineering efforts become more directional and have Also the data warehouse development is the effective first step in re-engineering the enterprise’s legacy system.
6. Disadvantages/Limitations of Data Warehouse
The data warehouse have some limitations, these are as follows:
- The data warehouse is very expensive solution and generally found in large firms.
- Performance tuning is hard due to very large size of the data warehouse.
- The cost of maintaining the data warehouse is very high.
- Data warehouses has a high demand of various resources.
- Scalability can be a problem with the data warehouse.
- Complexity of integration in data warehouse.
- Data warehouse is query intensive.
Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Basic Database Management System, 2nd Edition-University Science Press (2017)