SQL Transactions and Multiuser Processing in SQL

When two or more users concurrently access a database, transaction processing takes on a new dimension. Now the DBMS must not only recover properly from system failures or errors, but it must also ensure that the users’ actions do not interfere with one another. Ideally, each user should be able to access the database as if he or she had exclusive access to it, without worrying about the actions of other users. The SQL transaction model allows a SQL-based DBMS to insulate users from one another in this way.

The best way to understand how SQL handles concurrent transactions is to look at the problems that result if transactions are not handled properly. Although they can show up in many different ways, four fundamental problems can occur. The next four sections give a simple example of each problem.

1. The Lost Update Problem

Figure 12-6 shows a simple application where two users accept telephone orders from customers. The order-entry program checks the PRODUCTS file for adequate inventory before accepting the customer’s order. In the figure, Joe starts entering an order for 100 ACI-41004 widgets from his customer. At the same time, Mary starts entering her customer’s order for 125 ACI-41004 widgets. Each order-entry program does a query on the PRODUCTS file, and each finds that 139 widgets are in stock—more than enough to cover the customer’s request. Joe asks his customer to confirm the order, and his copy of the order-entry program updates the PRODUCTS file to show (139 – 100) = 39 widgets remaining for sale and inserts a new order for 100 widgets into the ORDERS table. A few seconds later, Mary asks her customer to confirm the order. Her copy of the order-entry program updates the PRODUCTS file to show (139 – 125) = 14 widgets remaining in stock and inserts a new order for 125 widgets into the ORDERS table.

The handling of the two orders has obviously left the database in an inconsistent state. The first of the two updates to the PRODUCTS file has been lost! Both customers’ orders have been accepted, but not enough widgets are in inventory to satisfy both orders. Further, the database shows that there are still 14 widgets remaining for sale. This example illustrates the lost update problem that can occur whenever two programs read the same data from the database, use the data as the basis for a calculation, and then try to update the data.

2. The Uncommitted Data Problem

Figure 12-7 shows the same order-processing application as Figure 12-6. Joe again begins taking an order for 100 ACI-41004 widgets from his customer. This time, Joe’s copy of the order-processing program queries the PRODUCTS table, finds 139 widgets available, and updates the PRODUCTS table to show 39 widgets remaining after the customer’s order. Then Joe begins to discuss with the customer the relative merits of the ACI-41004 and ACI-41005 widgets.

In the meantime, Mary’s customer tries to order 125 ACI-41004 widgets. Mary’s copy of the order-processing program queries the PRODUCTS table, finds only 39 widgets available, and refuses the order. It also generates a notice telling the purchasing manager to buy more ACI-41004 widgets, which are in great demand. Now Joe’s customer decides not to order the size 4 widgets after all, and Joe’s order-entry program does a ROLLBACK to abort its transaction.

Because Mary’s order-processing program was allowed to see the uncommitted update of Joe’s program, the order from Mary’s customer was refused, and the purchasing manager will order more widgets, even though 139 of them are still in stock. The situation would have been even worse if Mary’s customer had decided to settle for the 39 available widgets. In this case, Mary’s program would have updated the PRODUCTS table to show zero units available. But when the ROLLBACK of Joe’s transaction occurred, the DBMS would have set the available inventory back to 139 widgets, even though 39 of them are committed to Mary’s customer.

The problem in this example is that Mary’s program has been allowed to see the uncommitted updates from Joe’s program and has acted on them, producing the erroneous results. The SQL2 standard refers to this as problem P1, also known as the dirty read problem. In the parlance of the standard, the data that Mary’s program has seen is dirty because it has not been committed by Joe’s program.

3. The Inconsistent Data Problem

Figure 12-8 shows the order-processing application once more. Again, Joe begins taking an order for 100 ACI-41004 widgets from his customer. A short time later, Mary also begins talking to her customer about the same widgets, and her program does a single-row query to find out how many are available. This time Mary’s customer inquires about the ACI-41005 widgets as an alternative, and Mary’s program does a single-row query on that row.

Meanwhile, Joe’s customer decides to order the widgets, so his program updates that row of the database and does a COMMIT to finalize the order in the database. After considering the ACI-41005 widgets as an alternative, Mary’s customer decides to order the ACI-41004 widgets that Mary originally proposed. Her program does a new single-row query to get the information for the ACI-41004 widgets again. But instead of finding the 139 widgets that were in stock just a moment ago, the new query shows only 39 in stock.

In this example, unlike the preceding two, the status of the database has remained an accurate model of the real-world situation. There are only 39 ACI-41004 widgets left because Joe’s customer has purchased 100 of them. There is no problem with Mary having seen uncommitted data from Joe’s program—the order was complete and committed to the database. However, from the point of view of Mary’s program, the database did not remain consistent during her transaction. At the beginning of the transaction, a row contained certain data, and later in the same transaction, it contained different data, so external events have interfered with her consistent view of the database. This inconsistency can cause problems even if Mary’s program never tries to update the database based on the results of the first query.

For example, if the program is accumulating totals or calculating statistics, it cannot be sure that the statistics reflect a stable, consistent view of the data. The problem in this case is that Mary’s program has been allowed to see committed updates from Joe’s program that affect rows that it has already examined. The SQL2 standard refers to this problem as P2, also known as the nonrepeatable read problem. The name comes from the fact that Mary’s program can’t repeat the same read access to the database and obtain the same results.

4. The Phantom Insert Problem

Figure 12-9 shows an order-processing application once more. This time, the sales manager runs a report program that scans the ORDERS table, printing a list of the orders from customers of Bill Adams and computing their total. In the meantime, a customer calls Bill to place an additional order for $5000. The order is inserted into the database, and the transaction is committed. A short time later, the sales manager’s program (still operating within its initial transaction) again scans the ORDERS table, running the very same query. This time, there is an additional order, and the total is $5000 higher than for the first query.

Like the previous example, the problem here is inconsistent data. The database remains an accurate model of the real-world situation, and its integrity is intact, but the same query executed twice during the same transaction yielded two different results.

In the previous example, the query was a single-row query, and the inconsistency in the data was caused by a committed UPDATE statement. A committed DELETE statement could cause the same kind of problem.

In the example of Figure 12-9, the problem is caused by a committed INSERT statement. The additional row did not participate in the first query, but it shows up as a phantom row, out of nowhere in the second query. Like the inconsistent data problem, the consequences of the phantom insert problem can be inconsistent and incorrect calculations. The SQL2 standard refers to this as P3, and also uses the name phantom to describe it.

5. Concurrent Transactions

As the three multiuser update examples show, when users share access to a database and one or more users is updating data, there is a potential for database corruption.

SQL uses its transaction mechanism to eliminate this source of database corruption. In addition to the all-or-nothing commitment for the statements in a transaction, a SQL-based DBMS makes this commitment about transactions:

During a transaction, the user will see a completely consistent view of the database. The user will never see the uncommitted changes of other users, and even committed changes made by others will not affect data seen by the user in mid-transaction.

Transactions are thus the key to both recovery and concurrency control in a SQL database. The previous commitment can be restated explicitly in terms of concurrent transaction execution:

If two transactions, A and B, are executing concurrently, the DBMS ensures that the results will be the same as they would be if either (a) Transaction A were executed first, followed by Transaction B, or (b) Transaction B were executed first, followed by Transaction A.

This concept is known as the serializability of transactions. Effectively, it means that each database user can access the database as if no other users were concurrently accessing the database. In practice, dozens or hundreds of transactions may be concurrently executing within a large production database. The serializability concept can be directly extended to cover this situation. Serializability guarantees that, if some number, N, concurrent transactions are executing, the DBMS must ensure that its results are the same as if they had been executed in some sequence, one at a time. The concept does not specify which sequence of transactions must be used, only that the results must match the results of some sequence.

The fact that a DBMS insulates you from the actions of other concurrent users doesn’t mean, however, that you can forget all about the other users. In fact, the situation is quite the opposite. Because other users want to concurrently update the database, you should keep your transactions as short and simple as possible, to maximize the amount of parallel processing that can occur.

Suppose, for example, that you run a program that performs a sequence of three large queries. Since the program doesn’t update the database, it might seem that it doesn’t need to worry about transactions. It certainly seems unnecessary to use COMMIT statements. But in fact, the program should use a COMMIT statement after each query. Why? Recall that SQL automatically begins a transaction with the first SQL statement in a program. Without a COMMIT statement, the transaction continues until the program ends. Further, SQL guarantees that the data retrieved during a transaction will be self-consistent, unaffected by other users’ transactions. This means that once your program retrieves a row from the database, no other user can modify the row until your transaction ends, because you might try to retrieve the row again later in your transaction, and the DBMS must guarantee that you will see the same data. Thus, as your program performs its three queries, it will prevent other users from updating larger and larger portions of the database.

The moral of this example is simple: you must always worry about transactions when writing programs for a production SQL database. Transactions should always be as short as possible “COMMIT early and COMMIT often” is good advice when you are using programmatic SQL.

In practice, implementing a strict multiuser transaction model can impose a substantial overhead on the operation of a database with dozens, hundreds, or thousands of concurrent users. In addition, the specifics of the application may not require the absolute isolation among the user programs that the SQL transaction model implies. For example, maybe the application designer knows that an order inquiry program has been designed so that it will never attempt to read and then reread a row of the database during a single transaction. In this case, the inconsistent data problem can’t occur because of the program structure. Alternatively, maybe the application designer knows that all of a program’s access to particular tables of a database is read-only. If the programmer can convey information like this to the DBMS, some of the overhead of SQL transactions can be eliminated.

The SQL1 standard did not address this database performance issue, and most of the major DBMS brands implemented proprietary schemes for enhancing the performance of SQL transactions. The SQL2 standard specified a new SET TRANSACTION statement whose function is to specify the level of SQL transaction-model support that an application needs. You don’t need to use the SET TRANSACTION statement for casual use of SQL or for relatively simple or low-volume SQL transaction processing. To fully understand its operation, it’s useful to understand the locking and other techniques used by commercial DBMS products to implement multiuser SQL transactions. The remainder of this chapter discusses locking, the performance-optimizing capabilities of SQL2, and the various DBMS brands that depend on it.

Source: Liang Y. Daniel (2013), Introduction to programming with SQL, Pearson; 3rd edition.

Leave a Reply

Your email address will not be published. Required fields are marked *