Data Warehousing Concepts in SQL

One of the foundations of data warehousing is the notion that databases for transaction processing and databases for business analysis serve very different needs. The core focus of an online transaction processing (OLTP) database is to support the basic day-to-day functions of an organization. In a manufacturing company, OLTP databases support the taking of customer orders, ordering of raw materials, management of inventory, billing of customers, and similar functions. Their heaviest users are the applications used by order-processing clerks, production workers, warehouse staff, and the like. By contrast, the core focus of a business intelligence (BI) database is to support business decision making through data analysis and reporting. Its heaviest users are typically product managers, production planners, and marketing professionals.

Table 21-1 highlights the significant differences in OLTP and business intelligence application profiles and the database workloads they produce. A typical OLTP transaction processing a customer’s order might involve these database accesses:

  • Read a row of the customer table to verify the proper customer number.
  • Check the credit limit for that customer.
  • Read a row of the inventory table to verify that a product is available.
  • Insert a new row in an order table and an order-items table to record the customer’s order.
  • Update the row of the inventory table to reflect the decreased quantity available.

The workload presents a large volume of short, simple database requests that typically read, write, or update individual rows and then commit a transaction. The same type of workload is presented by all of the most frequent types of transactions, such as:

  • Retrieving the price of a product
  • Checking the quantity of product available
  • Deleting an order
  • Updating a customer address
  • Raising a customer’s credit limit

In contrast, a typical business analysis transaction (generating an order analysis report) might involve these database accesses:

  • Join information from the orders, order-items, products, and customers tables
  • Summarize the detail from the orders table by product in a summary query
  • Compute the total order quantities for each product
  • Sort the resulting information by customer

This workload presents a single, long-running query that is read-intensive. It processes many rows of the database (in this case, every order item) and involves computing totals and averages and summarizing data. These characteristics are typical of almost all business analysis queries, such as:

  • Which regions had the best performance last quarter?
  • How did sales by product last quarter compare to last year?
  • What is the trend line for a particular product’s sales?
  • Which customers are buying the highest-growth products?
  • Which characteristics do those customers share?

The difference between the business intelligence and the OLTP workloads is substantial and makes it difficult or impossible for a single DBMS to serve both types of applications.

1. Components of a Data Warehouse

Figure 21-1 shows the architecture of a data warehousing environment. There are three key components:

  • Warehouse loading tools. Typically, a suite of programs that extract data from corporate transaction-processing systems (relational databases, mainframe and minicomputer files, legacy databases), process the data, and load it into the warehouse. This process typically involves substantial cleanup of the transaction data, filtering it, reformatting it, and loading it on a bulk basis into the warehouse.
  • A warehouse database. Typically, a relational database optimized for storing vast quantities of data, bulk loading data at high speeds, and supporting complex business analysis queries.
  • Data analysis tools. Typically, a suite of programs for performing statistical and time series analysis, doing “what if” analysis, and presenting the results in graphical form.

Vendors in the data warehousing market have tended to concentrate in one of these component areas. Several vendors build product suites that focus on the warehouse­loading process and challenges. A different group of vendors have focused on data analysis. There has been some vendor consolidation in each of these areas, but both remain areas of focus for individual independent software companies, including several whose revenues are in the $100 million range.

Specialized warehouse databases were also the target of several startup companies early in the data warehousing market. Over time, the major enterprise DBMS vendors also moved to address this area. Some developed their own specialized warehouse databases; others added warehouse databases to their product line by acquiring smaller companies that produced them. Today, the database component in the figure is almost always a specialized SQL-based warehouse DBMS supplied by one of the major enterprise database vendors.

2. The Evolution of Data Warehousing

The initial focus of data warehousing was the creation of huge, enterprisewide collections of all of the enterprise’s accumulated data. By creating such a warehouse of data, almost any possible question about historical business practices could be posed. Many companies started down the road to creating warehouses with this approach, but success rates were low. Large, enterprisewide warehouses generally proved too difficult to create, too big, and too unwieldy to use in practice.

The focus eventually turned to smaller data warehouses focused on specific areas of a business that could most benefit from in-depth data analysis. The term data mart was coined to describe these smaller (but still often massive) data warehouses. With the advent of multiple data marts within enterprises, a recent area of focus has been on management of distributed data marts. In particular, there is a large potential for duplication of effort in the data cleansing and reformatting process when multiple marts are drawing data from the same production databases. The emerging answer seems to be a coordinated approach to data transformation for distributed marts, rather than a return to huge centralized warehouses.

Data warehousing, and more recently data marts, have grown to prominence in many different industries. They are most widely (and aggressively) used in industries where better information about business trends can be used to make decisions that save or generate large amounts of money. For example:

  • High-volume manufacturing. Analysis of customer purchase trends, seasonality, and so on can help the company plan its production and lower its inventory levels, saving money for other purposes.
  • Packaged goods. Analysis of promotions (coupons, advertising campaigns, direct mail, and so on) and the response of consumers with different demographics can help to determine the most effective way to reach prospective customers, saving millions of dollars in advertising and promotion costs.
  • Telecommunications. Analysis of customer calling patterns can help to create more attractive pricing and promotional plans, perhaps attracting new customers from a competitor.
  • Airlines. Analysis of customer travel patterns is critical to yield management, the process of setting airfares and associated restrictions on available airline seats to maximize profitability.
  • Financial services. Analysis of customer credit factors and comparing them to historical customer profiles can help to make better decisions about which customers are creditworthy.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *