SQL in the Next Decade

Predicting the path of the database market and SQL over the next five to ten years is a risky proposition. The computer market is in the midst of a major transition into an Internet-driven era. The early stages of that era, dominated by the World Wide Web and user/browser interaction, are giving way to a ubiquitous Internet used to deliver all communication services, information services, and e-business interaction. The emergence of the PC and its creation of the client/server era of the 1980s and 1990s illustrates how shifts in the underlying computer systems market can produce major changes in data management architectures. It’s likely that the Internet will have at least as large, if not a larger, impact on the data management architectures of the next ten years. Nonetheless, several trends appear to be safe predictions for the future evolution of database management. They are discussed in the final sections of this chapter.

1. Distributed Databases

As more and more applications are used on an enterprisewide basis or beyond, the ability of a single, centralized database to support dozens of major applications and thousands of concurrent users will continue to erode. Instead, major corporate databases will become more and more distributed, with dedicated databases supporting the major applications and functional areas of the corporation. To meet the higher service levels required of enterprisewide or Internet-based applications, data must be distributed; but to ensure the integrity of business decisions and operations, the operation of these distributed databases must be tightly coordinated.

Another strain on centralized database architectures will be the continuing growth of mobile personal computers and other mobile information appliance devices. These devices are, by their nature, more useful if they can become an integral part of a distributed network. However, by their nature, they are also occasionally connected— they work in a sometimes-disconnected, sometimes-connected mode, using either wired or wireless networks. The databases at the heart of mobile applications must be able to operate in this occasionally connected environment.

These trends will drive heavy demand for data distribution, database integration, data synchronization, data caching, data staging, and distributed database technology. A one-size-fits-all model of distributed data and transaction is inadequate for the highly distributed, anywhere/anytime environment that will emerge. Instead, some transactions will require absolute synchronization with a centralized master database, while others will demand support for long-duration transactions where synchronization may take hours or days. Developing ways to create and operate these distributed environments, without having them become a database administrator’s nightmare, will be a major challenge for DBMS vendors in the next decade, and a major source of revenues for the vendors that provide practical, relatively easy-to-use solutions.

2. Massive Data Warehousing

The last few years have demonstrated that companies that use database technology aggressively and treat their data as a valuable corporate asset can gain tremendous competitive advantage. The competitive success of WalMart, for example, is widely attributed to its use of information technology (led by database technology) to track its inventory and sales on a daily basis, based on cash register transaction data. This allowed the company to minimize its inventory levels and closely manage its supplier relationships. Data mining techniques have allowed companies to discover unexpected trends and relationships based on their accumulated data—including the legendary discovery by one retailer that late-night sales of diapers were highly correlated with sales of beer.

It seems clear that companies will continue to accumulate as much information as they can on their customers, sales, inventories, prices, and other business factors. The Internet creates enormous new opportunities for this kind of information-gathering. Literally every customer or prospective customer’s interaction with a company’s web site, click-by-click, provides potential clues to the customer’s wants, needs and behavior. That type of click-by-click information can easily generate tens of gigabytes of data or more per day on a busy web site. The databases to manage these massive quantities of data will need to support multilevel storage systems. They will need to rapidly import vast quantities of new data, and rapidly peel off large data subsets for analysis. Despite the high failure rate of data warehousing projects, the large potential payoffs in reduced operating costs and more on-target marketing and sales activities will continue to drive data warehousing growth.

Beyond the collection and warehousing of data, pressure will build to perform business analyses in real time. IS consulting groups are writing about the zero-latency enterprise or the real-time enterprise to describe an architecture in which customer interactions translate directly into changes in business plans with zero or very little delay. To meet this challenge, database systems will continue to take advantage of processor speed advances and multiprocessing technologies.

3. Ultra-High-Performance Databases

The emergence of an Internet-centric architecture is exposing enterprise data processing infrastructures to new peak-load demands that dwarf the workloads of just a few years ago. When databases primarily supported in-house applications used by a few dozen employees at a time, database performance issues may have produced employee frustration, but they did not really impact customers. The advent of call centers and other customer support applications produced a closer coupling between data management and customer satisfaction, but applications were still limited to at most hundreds of concurrent users (the people manning the phones in the call center).

With the Internet, the connection between a customer and the company’s databases becomes a direct one. Database performance problems translate directly into slow customer response times. Database unavailability translates directly into lost sales. Furthermore, databases and other parts of the data processing infrastructure are no longer buffered from peak-load transaction rates. If a financial services firm offers online trading or portfolio management, it will need to prepare for peak-load volumes on days of heavy stock price movement that may be 10 or 20 times the average daily volume. Similarly, an online retailer must gear up to support the heaviest end-of-year selling season, not just mid-March transaction rates.

The demands of e-commerce and real-time Internet information access are already producing peak-load transaction rates from the most popular Internet services that are one or two orders of magnitude higher than the fastest conventional disk-based RDBMS systems. To cope with these demands, companies will increasingly turn to distributed and replicated databases. They will pull hot data forward and cache it closer to the customer interaction within the network. To meet peak-load demands, they will use in-memory databases. This will, in turn, require new database support for deciding which data to cache, and which levels of synchronization and replication are appropriate. At first, these issues will apply only to the largest and highest-volume sites, but just as web page caching has become an accepted and then an essential technique for maintaining adequate web browser performance, hot data caching will become a mainstream Internet data management architecture as volumes grow.

4. Internet and Network Services Integration

In the Internet era, database management will increasingly become just one more network service, and one that must be tightly integrated with other services, such as messaging, transaction services, and network management. In some of these areas, standards are well established, such as the XA standard for distributed transaction management. In others, standards are in their infancy or are just emerging, such as the SOAP standard for sending XML data over the Internet’s HTTP protocol and the UDDI standards for finding services in a distributed network environment.

The multitier architecture that is dominating Internet-centric applications also poses new questions about which roles should be played by the database manager and by other components of the overall information system. For example, when network transactions are viewed from the point of distributed databases, a two-phase commit protocol, implemented in a proprietary way by a DBMS vendor, may provide a solution. When network transactions involve a combination of legacy applications (e.g., mainframe CICS transactions), relational database updates, and interapplication messages, the transaction management problem moves outside the database, and external mechanisms are required.

A similar trade-off surrounds the emergence of Java-based application servers as a middle-tier platform for executing business logic. Before the Internet era, stored procedures became known as the accepted DBMS technique for embedding business logic within the database itself. More recently, Java has emerged as a viable stored procedure language, an alternative to earlier, vendor-proprietary languages. Now, application servers create an alternative platform for business logic written in Java, in this case, external to the database. It’s not yet clear how these two trends will be rationalized, and whether business logic will continue its migration into the database or will settle in an application server layer. Whichever trend predominates, tighter integration between database servers and application servers will be required. Several of the DBMS vendors now produce their own application servers, and it seems likely
that they will provide the best integration within their own product lines. Whether this approach will prevail against a best-of-breed approach remains another open question.

5. Embedded Databases

Relational database technology has reached into many parts of the computer industry, from small handheld devices to large mainframes. Databases underlie nearly all enterprise-class applications as the foundation for storing and managing their information. Lightweight database technology underlies an even broader range of applications. Directory services, a foundation technology for the new era of value-added data communications network services, are a specialized form of database technology. Lightweight, very-high-performance databases also form an integral part of telecommunications networks, enabling cellular networks, advanced billing schemes, smart messaging services, and similar capabilities.

These embedded database applications have traditionally been implemented using proprietary, custom-written data management code tightly integrated with the application. This application-specific approach produced the highest possible performance, but at the expense of an inflexible, hard-to-maintain data management solution. With declining memory prices and higher-performance processors, lightweight SQL-based relational databases are now able to economically support these applications.

The advantages of a standards-based embedded database are substantial. Without a serious compromise in performance, an application can be developed in a more modular fashion, changes in database structure can be handled transparently, and new services and applications can be rapidly deployed atop existing databases. With these advantages, embedded database applications appear destined to be a new area of growth potential for SQL and relational database technology. As in so many other areas of information technology, the ultimate triumph of SQL-based databases may be that they disappear into the fabric of other products and services—invisible as a stand-alone component, but vital to the product or service that contains them.

6. Object Integration

The most significant unknown in the future evolution of SQL is how it will integrate with object-oriented technologies. Modern application development tools and methodologies are all based on object-oriented techniques. Two object-oriented languages, C++ and Java, dominate serious software development, for both client-side and server-side software. The core row/column principles of the relational data model and SQL, however, are rooted in a much earlier COBOL era of records and fields, not objects and methods.

The object database vendors’ solution to the relational/object mismatch has been the wholesale discarding of the relational model in favor of pure object database structures. But the lack of standards, steep learning curve, lack of simple query facilities, and other disadvantages have prevented pure object databases from having

any significant market success to date. The relational database vendors have responded to the object database challenge by embracing object-oriented features, but the result has been a proliferation of nonstandard, proprietary database features and SQL extensions.

It’s clear that relational database technology and object technology must be more tightly integrated if relational databases are to remain an integral part of the next generation of applications. Several trends are visible today:

  • Java-based interfaces to RDBMS’s, such as JDBC and embedded SQL for Java, will continue to grow rapidly in popularity.
  • Java will become a more important stored procedure language for implementing business logic within a RDBMS. Virtually all of the major DBMS vendors have announced plans to support Java as an alternative to their proprietary stored procedure languages.
  • DBMS products will expand support for abstract, complex data types that exhibit object-oriented capabilities such as encapsulation and inheritance. Beyond high-level agreement on the need to store objects within a row/column structure, the specifics (nested tables, arrays, complex columns) vary dramatically.
  • The SQL :1999 standard for object-oriented extensions to SQL will influence vendor products, but slowly, as vendors continue to seek competitive advantages and user lock-in through proprietary object-oriented extensions.
  • Message-oriented interfaces, including database triggers that produce messages external to the DBMS for integration with other applications, will grow in importance, as the database becomes a more active component for integrating systems together.
  • XML will emerge as an important standard format for representing both data retrieved from a SQL database, and data to be entered into or updated in a database.
  • DBMS vendors will offer SQL extensions to store and retrieve XML documents, and to search and retrieve their contents.

Whether these extensions to SQL and the relational model can successfully integrate the worlds of RDBMS and objects remains to be seen. The object-oriented database vendors continue to maintain that object capabilities bolted onto an RDBMS can’t provide the kind of transparent integration needed. Most of them have enthusiastically embraced XML as the newest wave of object technology. The enterprise DBMS vendors have announced and added substantial object-relational capabilities, and more recently, XML integration products and features, but it’s hard to determine how many of them are actually being used. In addition, the emergence of XML as an important Internet standard has given birth to a new round of database challengers, offering native XML databases. With all of these competing alternatives, the further integration of object technologies into the world of relational databases seems certain. The specific path that this evolution will take remains the largest unknown in the future of SQL.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *