Concurrency Control Techniques in Database

To avoid concurrency related problems and to maintain consistency, some rules (protocols) need to be made so that system can be protected from such situations and increase the efficiency.

1. Lock-Based Protocols

To maintain consistency, restrict modification of the data item by any other transaction which is presently accessed by some transaction. These restrictions can be applied by applying locks on data items. To access any data item, transaction have to obtain lock on it.

1.1. Granularity of Locks

The level and type of information that the lock protects is called locking granularity. In other words, the size of data items that the data manager locks is called the locking granularity. The granularity of locks in a database refers to how much of the data is locked at one time. In theory, a database server can lock as much as the entire database or as little as one column of data. The level of locking used depends on the typical access needs of transactions as well as consideration of performance tradeoffs. Locking granularity affects performance since more work is necessary to maintain and coordinate the increased number of locks. To achieve optimum performance, a locking scheme must balance the needs of concurrency and overhead. Locking can occur on the following levels:

Coarse Granularity (Table or File or Database Locks): The data manager could lock at a coarse granularity such as tables or If the locks are at coarse granularity, the data manager doesn’t have to set many locks, because each lock covers a great amount of data. Thus, the overhead of setting and releasing locks is low. However, by locking large chunks of data, the data manager is usually locking more data than a transaction needs. For example, even if a transaction T accesses only a few records of a table, a data manager that locks at the granularity of tables will lock the whole table, thereby preventing other transactions from locking any other records of the table, most of which are not needed by transaction T. This reduces the number of transactions that can run concurrently, which both reduces the throughput and increases the response time of transactions.
Fine Granularity (Records or Fields Locks): The data manager could lock at a fine granularity, such as records or If the locks are at fine granularity, the data manager only locks the specific data actually accessed by a transaction. These locks do not artificially interfere with other transaction, as coarse grain locks do. However, the data manager must now lock every piece of data accessed by a transaction, which can generate much locking overhead. For example, if a transaction issues an SQL query that accesses tens of thousands of records, a data manager that do record granularity locking would set tens of thousands of locks, which can be quite costly. In addition to the record locks, locks on associated indexes are also needed, which compounds the problem. There is a fundamental tradeoff between amount of concurrency and locking overhead, depending on the granularity of locking. Coarse- grained locking has low overhead but low concurrency. Fine-grained locking has high concurrency but high overhead.

Intermediate Granularity (Pages Locks): The data manager could lock at an intermediate granularity, such as A disk-page or page is the equivalent of a disk-block, which may be described as a section of a disk. A page has a fixed size, such as 4K, 8K, 16K, etc. A table may span several pages, and a page may contain several rows of one or more tables. Page-level locks are the most frequently used of the multi-user DBMS locking. This gives a moderate degree of concurrency with a moderate amount of locking overhead. It works well in systems that do not need to run at high transaction rates, and hence are unaffected by the reduced concurrency, or ones where transactions frequently access many records per page so that page locks are not artificially locking more data than transactions actually access. It also simplifies the recovery algorithms for Commit and Abort. However, for high performance Transaction Processing, record locking is needed, because there are too many cases where concurrent transactions need to lock different records on the same page.

1.2. Concurrency Control Manager (CCM)

The concurrency control manager synchronizes the concurrent access of database transactions to shored objects in the database. It is responsible for maintaining the guarantees regarding the effects of concurrent access to the shored database. It will make sure that the result of transactions running in parallel is identical to the result of some serial execution of the same transactions.

The Concurrency Control Manager grant locks to different transactions. Concurrency Control manager is basically a programme.

1.3. Types of Lock

There are two types of locks that can be implemated on a transaction.

Shared lock : Shared lock is Read-Only If a transaction T_i has obtained shared lock on data item A then T_i can read A but cannot modify A. It is denoted by S and T_i is said to be in Shared lock mode.
Exclusive lock : Exclusive lock is Read-Write If a transaction T_i has obtained exclusive lock on data item A then T_i can both read and modify (write) A. It is denoted by X and T_i is said to be in Exclusive lock mode.

1.4. Compatible Lock Modes

Suppose transaction T_i has obtained a lock on data item V in mode A. If transaction T_j can also obtain lock on V in mode B then, A is compatible with B or these two lock modes are compatible. Matrix of compatibility of shared and Exclusive lock modes are shown in Figure 8.6.

The table shows that if a transaction Ti has shared lock on data item V then any other transaction Tj can obtain only shared lock on V at the same time. All other possible combinations are incompatible.

S – lock(A) — Shared lock on data item A

X – lock(A) — Exclusive lock on data item A

read(A) — Read operation on A

write(A) — Write operation on A

unlock(A) — A is free now.

A transaction Ti can hold only those items which are used in it. If any data item V is not free then Ti is made to wait until V is released. It is not desirable to unlock data item immediately after its final access.

Example. Consider the banking system in which you have to transfer 500 from account A to account B. Transaction T₁ in Figure 8.7 shows transfer of amount and T₂ shows sum of accounts A and B. Initially account A has 1000 and B has 300.

FIGURE 8.7. Transactions T1 and T2 for banking example.

The possible schedule for T₁ and T₂ is shown in Figure 8.8.

In the schedule shown in Figure 8.8, T₂ gives wrong result i.e., 800 instead of 1300 because unlock A is performed immediately after its final access.

1.5. Improvement in Basic Structure of Locking

Keep all locks until the end of transaction. The modified transaction T₁ and T₂ are shown in Figure 8.9.

The transaction T₂ cannot obtain lock on A until A is released by T₁ but at that time changes are made in both accounts.

Advantages : The advantages of locking are:

In maintains
It solves all the concurrency

Disadvantages : The disadvantages of locking are:

It leads to a new problem that is Deadlock.

Two possible schedules S₁ and S₂ for transactions T₁ and T₂ of Figure 8.9 are shown in Figure 8.10(a), (b).

In S₁, right result is obtained but S₂ leads to deadlock because both T₁ and T₂ are in waiting state and wait for release of B and A respectively. (Shared mode and exclusive mode are incompatible).

1.6. Two-phase Locking Protocol

Two-phase locking protocol guarntees serializability. In this protocol, every transaction issue lock and unlock requests in two phases.

Growing phase : In this phase, transaction can obtain new locks but cannot release any lock.
Shrinking or contracting phase : In this phase, transaction can release locks, but cannot obtain new lock.

Lock Point : In growing phase, the point at which transaction has obtained its final lock is called Lock Point.

In two phase locking protocol, all transactions can be ordered according to their lock points. Transactions T₁ and T₂ in Figure 8.9 follows two phase locking protocol.

Basic Two phase Locking Protocol : The technique described above is known as basic two phase locking Protocol.

Advantages : The advantages of Basic two phase locking protocol are:

1. It ensures serializability

Disadvantages : The disadvantages of basic two phase locking protocol are:

1. It may cause deadlock.
2. It may cause cascading rollback.
Strict Two phase Locking Protocol : In strict two phase locking protocol, all exclusive locks are kept until the end of transaction or until the transaction

Advantages : The advantages of strict 2PL protocol are:

1. There is no cascading rollback in strict two phase locking protocol.

Example. Consider partial schedules S₁ (in basic two-phase locking) and S₂ (in strict two- phase locking) as shown in Figure 8.11.

In S₁ there is cascading rollback of T₄ due to rollback of T₃ but in S₂, all exclusive lock are kept till end and avoid cascading rollback as shown in Figure 8.11(a), (b).

Rigorous Two-phase Locking Protocol : In rigorous two phase locking protocol, all locks are kept until the transaction It means both shared and exclusive locks cannot be released till the end of transaction.

1.7. Lock Conversion

To improve efficiency of two-phase locking protocol, modes of lock can be converted.

Upgrade : Coversion of shared mode to exclusive mode is known as upgrade.
Downgrade : Conversion of exclusive mode to shared mode is known as downgrade.

Consider transaction T₅ as shown in Figure 8.12.

In T₅, an exclusive lock of A is needed after sometime so first obtain shared lock on A and then upgrade it. During this period any other transaction can also obtain shared lock of A and hence increase efficiency.

2. Graph Based Protocols

In graph based protocols, an additional information is needed for each transaction to know, how they will access data items of database. These protocols are used where two-phase protocols are not applicable but they required additional information that is not required in two-phase protocols. In graph based protocols partial ordering is used.

Consider, a set of data items D = d₁, d₂, …, d_i, d_j, …, d_z}. If d_i → dj then, if any transaction T want to access both d_i and d_j then T have to access di first then dj.

Here, tree protocol is used in which D is viewed as a directed acyclic graph known as database graph.

Rules for tree protocol:

Any transaction T can lock any data item at first time.
After first lock if T wants to lock any other data item v then, first, T have to lock its parent.
T cannot release all exclusive locks before commit or abort.
Suppose T has obtained lock on data item v and then release it after Then T cannot obtain any lock on v again.

All the schedules under graph based protocols maintain serializability. Cascadelessness and recoverability are maintained by rule (3) but this reduces concurrency and hence reduce efficiency.

Alternate approach : Here rule (3) is modified. Modified version is (3) T can release any lock at any time.

But in alternate approach there may be cascading rollback.

For each data item, if any transaction performed the last write to some data item and still uncommitted then record that transaction.

Commit dependency : A transaction T_i has commit dependency upon T_j if T_i wants to read a uncommitted data item v and T_j is the most recent transaction which performed write operation on v.

If any of these transactions aborts, T_i must also be aborted.

Consider the tree structured database graph as shown in Figure 8.13.

Suppose we want to operate

G = G + 10 and H = H – 5

then possible transaction is as shown in Figure 8.14.

Advantages : The advantages of graph based protocols are:

There is no deadlock.
Unlike in two-phase, data items can be unlocked at any time, that reduces waiting time.
More concurrency.

Disadvantages : Those items have to be locked that have no use in transactions. This results in increased locking overhead. (locking of A, B and E to lock H in T₆ of Figure 8.14).

3. Timestamp-based Protocols

In two-phase and graph based protocols, conflict transactions are found at run time. To order them in advance, use timestamp based protocols.

Note: (Conflict transactions are those transactions that need same data items during execution).

For each transaction T_i, assign a unique fixed number or timestamp before it starts execution. It is denoted by TS(T_i). For any two transactions T_i and T_j, TS(T_i) > TS(T_j) or TS(T_j) < TS(T_i) but never be equal. Timestamps can be given by using:

System clock : System clock time is equal to timestamp of transaction.
Logical counter : Value of counter is equal to timestamp of Counter is incremented every time after assigning new timestamp.

Two timestamps are associated with each data item v of database. These are:

Write–TS(v) : Write timestamp of any data item is equal to the timestamp of most recent transaction that executes write(v) successfully and write–TS(v).
Read–TS(v) : Read timestamp of any data item is equal to the timestamp of most recent transaction that executed read(v) successfully and read–TS(v).

3.1. The Timestamp-Ordering Protocol

To order conflicting read and write operations use the following rules:

1. For any transaction T_i, to execute read(v):

If TS(T_i) < write–TS(v) then read(v) is rejected and rollback T_i because T_i tries to read the old value of v which is overwritten by any other
If TS(T_i) > write–TS(v), then read(v) is executed and read–TS(v) is set to maximum of TS(T_i) and read–TS(v).

2. For any transaction Ti, to execute write(v):

If TS(T_i) < read-TS(v) then T_i tries to write that value of v which was used previously by other transaction. Write(v) is rejected and T_i is rollbacked (to avoid inconsistent analysis problem).
If TS(T_i) < write-TS(v) then T_itry to write obsolete value of v. So, write(v) is rejected and T_i is rollbacked (to avoid Lost – Update problem).
Otherwise execute write-TS(v).

Advantages : The advantages of time stamp based protocols are:

It maintains serializability.
It is free from deadlock.
It is free from cascading rollbacks.

Disadvantages : The disadvantages of time stamp based protocols are:

Starvation is possible for long transactions which were forcefully restarted due to conflicting short transactions.

To avoid starvation, block conflicting transactions for sometime to allow completion of long transactions.

Schedules are not recoverable.

3.2. Thomas Write Rule

To allow more concurrency in timestamp based protocol timestamp ordering protocol is slightly modified and is known as Thomas Write Rule.

In Thomas Write Rule, rules for read operation remains unchanged, there is slight variation in write operation, that is as follows:

For any transaction T_i, to execute write(v).

Same as in timestamp ordering
If TS(T_i) < write-TS(v) then T_i try to write obsolete value of v. So, no need to rollback T_i, just ignored the write

Consider Figure 8.15 which shows the schedule for Transactions T7 and T8 and TS(T7)<TS(T8).

So, T₇ can successfully complete read(v) operation. Then, T₈ can successfully complete write(v) operation. But when T₇ attempts write(v) operation, it is rejected because TS(T₇) < Write-TS(v) [write-TS(v) = TS(T₈)].

But, according to timestamp ordering protocol, T₇ is rolled back which is not required. Since the value of v which T₇ wants to write will never be used.

For any transaction T_isuch that TS(T_i )<TS(T₈) that want to read the value of v [read (v)] will be rejected and rolled back because TS(T_i)<write-TS(v).
For any transaction T_i such that TS(T_i)>TS(T₈) that want to execute read(v) must read the value written by T₈ because read(v) in T₇ is not possible in any

So, ignore or delete read(v) in T₇.

4. Validation Based Protocols

Validation Based Protocols are used where most of the transactions are read-only and conflicts are low. In these protocols, every transaction Ti have to go through the following phases dependending upon its type :

Read Phase : In read phase T_i reads all the data items and store them in temporary variables (local variables of T_i). After reading, all the write operations are made on the temporary variables instead of actual Here changes are local and leaved in actual database.
Validation Phase : In validation phase, a validation test is performed to determine whether changes in actual database can be
Write Phase : If T_i cleared the validation test then actual database is changed according to temporary Here actual changes are made in database. Validation test ensures violation free execution of transactions. To determine the time to perform validation test, use timestamps. Each transaction Ti is associated with three timestamps. These are:
- Start(T_i) : It gives time when T_i starts execution.
- Validation(T_i) : It gives time when T_i finished its read phase and starts its validation phase.
- Finish(T_i) : It gives time when T_i finished execution or write If any transaction T_i failed in validation test then it is aborted and rollback.

To clear the validation test by transaction T_j, for all transactions such that TS(T_i) < T_j, T_j must satisfy one of the following conditions:

If Finish(T_j) < Start(T_j) then there is no conflict because they are in serial order.
If Start(T_j) < Finish(Ti) < Validation(T_j). This condition ensures that actual writes to database for Ti and T_jdoes not overlap.

Example. Consider the schedule as shown in Figure 8.16.

Suppose, initially A = 10 and B = 5 then T₉ display A+B = 15 because T₁₀ first write new values of A and B to local variables and validate after T₉ finished its execution.

Advantages : The advantages of validation based protocols are:

It maintains serializability
It is free from cascading rollback.
Less overhead then other protocols.

Disadvantages : The disadvantages of validation based protocols are:

Starvation of long transactions due to conflicting short transactions.

5. Multiversion Concurrency Control Protocols

Concurrency techniques discussed so far are suffering from delays for read operations, even sometimes they are rejected. To overcome this disadvantage, multiversion protocols are used that maintain different versions of data items.

In multiversion protocols each write operation write(v) creates a new version of v.

When any transaction wants to read v then control manager selects the appropriate version of v for read(v).

It increases the performance because now there is no need to delay read operations.

5.1. Multiversion Timestamp Ordering

In multiversion techniques, timestamps can be used for ordering transactions. Each transaction T_i is associated with a unique and fixed timestamp TS(T_i).

Each data item V_n (n is version of V) has Three fields:

Content : This is the value of data item for a particular
Write-TS(V_n) : This is equal to the timestamp of the transaction that created V_n.
Read-TS(V_n) : This is equal to the timestamp of most recent transaction that successfully executes read(V_n).

Multiversion Timestamp ordering ensures serializability. Consider a transaction T_i and V_n be any data item of version n, whose write timestamp is less than or equal to TS(T_i).

Read(Vn) by T_i : Contents of Vn are returned to read request.
Write(Vn) by T_i : T_i rolls back if TS(T_i) < Read–TS(V_n). If TS(T_i) = Write–TS(V_n) then a new version of V is created with new contents.

Database never maintains all versions of V, old versions of V those are no longer needed are removed from database.

Suppose two versions of V exists, i.e., V_m and V_n both having write stamp less than timestamp of oldest transaction of system then delete the older of two versions V_m and V_n.

Advantages : Any read request never waits and never fails.

Disadvantages : The disadvantages of multiversion time stamp or during are:

Overhead due to updation of read timestamp for every read request.
Conflicting transactions result rollbacks.

5.2. Multiversion Two-phase Locking

In multiversion two-phase locking, advantages of both multiversion concurrency control techniques and two-phase locking technique are combined. It is also helpful to overcome the disadvantages of multiversion timestamp ordering.

Single timestamp is given to every version of each data item. Here timestamp counter (TS-counter) is used instead of system clock and logical counter. Whenever a transaction commits, TS-counter is incremented by 1.

Read only transactions are based upon multiversion timestamp ordering. These transactions are associated with timestamp equal to the value of TS-counter.

Updations are based upon rigrous two-phase locking protocol.

Source: Gupta Satinder Bal, Mittal Aditya (2017), Introduction to Basic Database Management System, 2nd Edition-University Science Press (2017)