SQL Network Applications and Database Architecture

Innovations in computer networking have been closely linked to many of the innovations in relational database architectures and SQL over the years. Powerful minicomputers with mainframe network connections (such as Digital’s VAX family) were the first popular platform for SQL-based databases. They offered a platform for decision support, based on data offloaded from mainframe systems. They also supported local data processing applications, for capturing business data and uploading it to corporate mainframe applications.

UNIX-based servers and powerful local area networks (such as Sun’s server products) drove another wave of DBMS growth and innovation. This era of databases and networks gave birth to the client/server architecture that dominated enterprise data processing in the 1990s. Later, the rise of enterprisewide networks and applications (such as Enterprise Resource Planning) created a need for a new level of database scalability and distributed database capability. Today, the exploding popularity of the Internet is driving still another wave of innovation, as very high peak-load transaction rates and personalized user interaction drive database caching and main-memory database technologies.

1. Client/Server Applications and Database Architecture

When SQL-based databases were first deployed on minicomputer systems, the database and application architecture was very simple—all of the processing, from screen display (presentation) to calculation and data processing (business logic) to database access occurred on the minicomputer’s CPU. The advent of powerful personal computers and server platforms drove a major change in that architecture, for several reasons.

The graphical user interface (GUI) of popular PC office automation software (spreadsheets, word processors, and so on) set a new standard for ease of use, and companies demanded the same style of interface from corporate applications. Supporting a GUI is processor-intensive and demands a high-bandwidth path from the processor to the display memory that holds the screen image. While some protocols emerged for running a GUI over the LAN (the X-windows protocol), the best place to run a production application’s presentation-layer code was clearly on the PC itself.

Economics was also a factor. Personal computer systems were much cheaper, on a cost-per-processing-power basis, than minicomputers or UNIX-based servers. If more of the processing for a business application could take place on lower-cost PCs, the overall hardware cost of deploying an application would be reduced. This was an argument for moving not just the presentation layer, but much of the business logic layer, onto the PC as well.

Driven by these and other factors, the first client/server architectures emerged, shown in Figure 23-15. Many PC-based applications are still being deployed today using this architecture. SQL plays a key role as the client/server language. Requests are sent from the application logic (on the PC) to the DBMS (on the server) expressed in SQL statements. The answers come back across the network in the form of SQL completion status codes (for database updates) or SQL query results (for information requests).

2. Client/Server Applications with Stored Procedures

Whenever an application is split across two or more networked computer systems, as in Figure 23-15, one of the major issues is the interface between the two halves of the split application. Each interaction across this interface now generates network traffic, and the network is always the slowest part of the overall system, both in its data transmission capacity (bandwidth) and in round-trip messaging delays (latency).

With the architecture shown in Figure 23-15, each database access (that is, each SQL statement) generates at least one round trip across the network.

In an OLTP application, typical transactions may require as many as a dozen individual SQL statements. For example, to take a customer’s order for a single product in the simple structure of the sample database, the order-processing application might:

  • Retrieve the customer number based on the customer name (single-row SELECT)
  • Retrieve the customer’s credit limit to verify creditworthiness (single-row SELECT)
  • Retrieve product information, such as price and quantity available (single-row SELECT)
  • Add a row to the ORDERS table for the new order (INSERT)
  • Update the product information to reflect the lower quantity available (UPDATE)
  • Update the customer’s credit limit, reducing the available credit (UPDATE)
  • Commit the entire transaction (COMMIT)

for a total of seven round trips between the application and the database. In a real-world application, the number of database accesses might be two or three times this amount. As transaction volumes grow, the amount of network traffic can be very significant.

Database stored procedures provide an alternative architecture that can dramatically reduce the amount of network traffic, as shown in Figure 23-16. A stored procedure within the database itself incorporates the sequence of steps and the decision-making logic required to carry out all of the database operations associated with the transaction. Basically, part of the business logic that formerly resided within the application itself has been pushed across the network onto the database server. Instead of sending individual SQL statements to the DBMS, the application calls the stored procedure, passing the customer name, the product to be ordered, and the quantity desired. If all goes well, the stored procedure returns successfully. If a problem arises (such as lack of available product or a customer credit problem), a returned error code and message describes it. By using the stored procedure, the network traffic is reduced to a single client/server interaction.

There are several other advantages to using stored procedures, but the reduction in network traffic is one of the major ones. It was a major selling advantage of Sybase SQL Server when it was first introduced and helped to position Sybase as a DBMS specialized for high-performance OLTP applications. With the popularity of stored procedures, every major general-purpose enterprise DBMS now offers this capability.

3. Enterprise Applications and Data Caching

Today, major applications from the large packaged enterprise software vendors are all based on SQL and relational databases. Examples include large Enterprise Resource Planning (ERP), Supply Chain Management (SCM), Human Resources Management (HRM), Customer Relationship Management (CRM), financial management, and other packages from vendors such as SAP, BAAN, PeopleSoft, Vantive, Clarify, Siebel Systems, i2 Technologies, Manugistics, and others. These large-scale applications typically run on large UNIX-based server systems and place a heavy workload on the supporting DBMS. To isolate the applications and DBMS processing, and apply more total processing power to the application, they often use a three-tier architecture shown in Figure 23-17.

Even with the use of stored procedures to minimize network traffic, the network and database access demands of the most data-intensive of these enterprise applications can outstrip the available network bandwidth and DBMS transaction rates. For example, consider a supply chain planning application that helps a manufacturing company determine the parts that it must order from suppliers. To generate a complete plan, the application must examine every open order and apply the product bill-of-materials to it.

A complex product might involve hundreds of parts, some of which are themselves subassemblies consisting of dozens or hundreds of parts.

If written using straightforward programming techniques, the planning application must perform a database inquiry to determine the parts makeup of every product, and then every subassembly, for every order, and it will accumulate the total needed information in the planning database for each of these parts. Using this technique, the application will take hours to process the hundreds of thousands of orders that may be currently on the books. In fact, the application will probably run so long that it cannot possibly complete its work during the typical overnight low-volume batch processing window of time during which the company normally runs such applications.

To deliver acceptable performance, all data-intensive enterprise applications employ caching techniques, pulling the data forward out of the database server, closer to the application. In most cases, the application uses relatively primitive caching techniques. For example, it might read the bill-of-materials once and load it into main-memory data tables within the application program. By eliminating the heavily repeated product-structure queries, the program can dramatically improve its performance.

Recently, enterprise application vendors have begun to use more complex caching techniques. They may replicate the most heavily accessed data (the hot data) in a duplicate database table, on the same system as the application itself. Main-memory databases offer an even higher-performance alternative and are already being used where there is a relatively small amount of hot data (tens to hundreds of megabytes). With the advent of 64-bit operating system architectures and continuing declines in memory prices, it is becoming practical to cache larger amounts of data (several gigabytes or tens of gigabytes).

Advanced caching and replication will become more important in response to emerging business requirements. Leading-edge manufacturing companies want to move toward real-time planning, where incoming customer orders and changes immediately impact production plans. They want to offer more customized products, in more configurations, to more closely match customer desires. These and similar trends will continue to raise the volume and complexity of database access.

4. High-Volume Internet Data Management

High-volume Internet applications are also driving the trend to database caching and replication in networked database architectures. For example, financial services firms are competing for online brokerage clients by offering more and more advanced real-time stock reporting and analysis capabilities. The data management to support this application involves real-time data feeds (to insure that pricing and volume information in the database is current) and peak-load database inquiries of tens of thousands of transactions per second. Similar volume demands are found in applications for managing and monitoring high-volume Internet sites. The trend to personalize web sites (determining on the fly which banner ads to display, which products to feature, and so on) and measure the effectiveness of such personalization is another trend driving peak-load data access and data capture rates.

The Web has already shown to be an effective architecture for dealing with these types of peak-load Internet volume demands—through web site caching. Copies of heavily accessed web pages are pulled forward in the network and replicated. As a result, the total network capacity for serving web pages is increased, and the amount of network traffic associated with those page hits is reduced. Similar architectures are beginning to emerge for high-volume Internet database management, as shown in Figure 23-18. In this case, an Internet information services application caches hot data, such as the most recent news and financial information, in a very high-performance main-memory database from a vendor such as TimesTen Performance Software. It also stores summary user profile information in a main-memory database, which is used to personalize users’ experiences as they interact with the web site.

As Figure 23-18 shows, the methods for handling high-performance data management are beginning to follow those already established for high-performance web page management. The issues for databases are more complex because of database integrity issues, but the emerging techniques are similar—replication, high-volume read access, memory-resident databases, and highly fault-tolerant architectures. These demands will only grow as Internet traffic and personalization continues to increase, leading to more advanced network database architectures.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *