Databases DBMS Architecture: Transactions and Concurrency Control

(1)

Databases

DBMS Architecture:

Transactions and Concurrency

Control

(2)

Transactions and Concurrency Control:

▪ Elmasri, 7th Ed. Chapter 20, 21, 22.

▪ Coulouris, 5th Ed. Chapter 16.

Coulouris, Dollimore, Kindberg and Blair:

"Distributed Systems:Concepts and Design".

5th Edition, © Addison-Wesley 2012 References

(3)

Disks Storage

Manager

Buffer Manager Index/file/record

Manager Execution

Engine

Transaction Manager

Logging/

Recovery Query

Compiler

Concurrency control

Buffers

Lock Table

User/Software Query/update

Transactions

Query plan

Indexes, records and file req.

Page Handling

Read/Write Pages

log pages metadati,

statistiche

metadata, indexes, statistics

DDL Compiler

Admin DDL

metadata

DBMS Architecture

3

(4)

Modeling Perspective: data operations

■ A database is a collection of named data items

■ Granularity of data - a field, a record , or a whole disk block (Concepts are independent of granularity)

■ Any high level query could be expressed through a sequence of read and write operations:

■ read(X): Reads a database item named X into a program variable. To simplify our notation, we

assume that the program variable is also named X.

■ write(X): Writes the value of program variable X into the database item named X.

Simple Database Model

(5)

Modeling Perspective: multiuser environment

■ DBMSs are multiuser environments (banks, companies)

■ The OS provides an an abstraction layer for

multiprogramming, but concurrent execution of processes are actually interleaved (transparency).

■ (desirable) side effect: since disk accesses are slow, you can perform multiple transactions in a concurrent way, keeping the CPU always busy

■ …

Simple Database Model: Environment

5

(6)

■ A Transaction: is a logical unit of database processing within an executing program that includes one or more access operations (read = retrieval, write = insert or update, delete).

■ A transaction (set of operations) may be specified in a high level language like SQL and run interactively by an interpreter, or may be embedded within a program.

■ An application program may contain several transactions separated by the Begin and End transaction boundaries.

Simple Database Model: Transactions

(7)

Modeling Perspective: multiuser environment

■ …

■ A schedule or history S of several concurrent transactions T₁,…,T_n is a total ordering over the operations in each T_i:

■ Given operations O_i and in O_j in T_k such that O_i≼O_j in T_k, then such ordering is preserved in S, too.

Simple Database Model: Schedule

7

(8)

Scheduling: Example (1)

T₁ T₂

read₁(X) X ←₁ X-N write₁(X) read₁(Y) Y ←₁ Y+N write₁(Y)

read₂(X) X ←₂ X+M write₂(X)

T₁ T₂

read₁(X) X ←₁ X-N write₁(X)

read₁(Y) Y ←₁ Y+N write₁(Y)

cobegin

transaction T1();

//

transaction T2();

coend

For each T_i, begin and end are implicit.

A schedule represents a possible order of execution of the actions in the transactions involved

(9)

Scheduling: Example (2)

T₁ T₂

read₁(X) X ←₁ X-N write₁(X) read₁(Y) Y ←₁ Y+N write₁(Y)

T₁ T₂

read₁(Y) Y ←₁ Y+N write₁(Y)

read₂(X)

X ←₂ X+M write₂(X)

?

9 cobegin

transaction T1();

//

transaction T2();

coend

For each T_i, begin and end are implicit.

A schedule represents a possible order of execution of the actions in the transactions involved

(10)

Two relevant problems

1. Concurrency Control

▪ Since disk accesses are slow, you can perform multiple

concurrent transactions, keeping the CPU always busy (see SO scheduling).

▪ Uncontrolled concurrency might lead to unexpected

behaviours, e.g. the effect of a transaction can be completely undone by the execution of a subsequent transaction.

2. (Crash) Recovery

▪ A transaction has to be an atomic unit. For recovery purposes, the system needs to keep track of when the transaction

starts, terminates, and terminates successfully (commit) or halts unexpectedly (abort).

▪ It must ensure that other transactions are not affected by a possible abort and such abort must keep the database in a consistent state (e.g., think of a bank transfer)

(11)

ACID: Properties of Transactions

Such transaction properties ensure both a successful concurrent execution and crash recoveries:

■ Atomicity: a transaction is an atomic unit of processing;

it is either performed in its entirety or not performed at all.

■ Consistency preservation: a correct execution of the transaction must take the database from one consistent state to another.

■ Isolation: A transaction should not make its updates visible to other transactions until it is committed.

■ Durability or permanency: Once a transaction changes the database and the changes are committed, these

changes must never be lost because of subsequent failure.

11

(12)

Consistency (preservation) and Isolation

■ The consistency must be ensured by the user implementing that transaction.

■ The isolation, in relation to other competing

transactions, is guaranteed by making sure that the

outcome of the scheduling is the same of the sequential execution of transactions.

■ This must be guaranteed even in the case that the

executions of the actions which form the transaction are intermixed.

■ However, the DBMS does guarantee that transactions are executed in a specific order.

(13)

Atomicity and Durability

■ A transaction aborts when:

1. Local errors at the DBMS level. The DBMS stops such transaction.

2. Exception conditions detected by the transaction, that automatically terminates.

3. System Crash. A hardware, software or network error occurs during the transaction execution, thus preventing to access correctly the

■ An unhandled abort may leave the database in an inconsistent state.

■ DBMS guaratees the transactions’ atomicity through a System Log acting as a journal of all the operations that have been made

■ On aborts, DBMS checks the System Log and back-ups the DB to the last consistent state prior to the transaction’s abortion

(recovery)

■ Log Files also guarantee durability

13

(14)

Consistency preservation for Transactions

■ Example: the developer has defined some integrity constraints that each database must preserve (e.g., “The wages’ sum has to be less than the available budget”)

■ A transaction:

▪ always starts in a consistent state (satisfying all the integrity constraints)

▪ could move to inconsistent intermediate states

▪ always terminates successfully (commit) in a consistent state

S₁ S₂

S₃ S₄

S₇ S₅

S₆ Constraint

violations

DB states

Consistent states

14

(15)

Concurrency Control

(16)

Transaction: a statechart

■ Each transaction must terminate with one of these two following actions:

▪ commit (successful operation, store the updates) oppure…

▪ abort (faulty termination, backing-up the actions)

(17)

Schedule: more definitions (1)

A schedule S of transactions T₁,…,T_n:

▪ is complete if the operations in S are exactly those operations in T₁,…,T_nincluding a commit or abort operation as the last operation for each T_i.

▪ is serial if, for each T_i, all the operations are executed consecutively (only one transaction at a time is active)

▪ is serializable if has the same final state as a complete serial schedule where all the operations perform a

commit.

17

(18)

Non-Serial Scheduling

18

T₁ T₂

read₁(X) X ←₁ X/N

write₁(X)

commit₁()

read₂(X) X ←₂ X+M write₂(X) commit₂()

T₁ T₂

read₁(X) X ←₁ X/N write₁(X) commit₁()

T₁ T₂

read₁(X) X ←₁ X/N write₁(X) commit ()

Serial scheduling T₁;T₂ Result:

X= X/N+M

Serial scheduling T₂;T₁ Result:

X= (X+M)/N

Non-Serial scheduling.

Result:

X= X+M

(19)

Conflict

Two operations in a schedule are said to conflict if they satisfy all three of the following conditions leaving the database potentially into an inconsistent state:

▪ they belong to different transactions

▪ they access the same item

▪ at least one operation is a write.

Depending on when the read and the write operation occur, we could have three distinct types of conflicts:

▪ Write-Read conflict

▪ Read-Write conflict

▪ Write-Write conflict

19

(20)

Write-Read Conflict

T₁ T₂

read₁(X) X ←₁ X-N write₁(X) commit₁()

T₁ T₂

commit₁()

cobegin

transaction T1();

//

transaction T2();

coend

A transaction reads a value that has just been updated by a transaction that has not performed the commit yet (e.g. T₂ reads a value written by T₁ that could abort later on, thus T₂’s commit could bring the database into an

inconsistent state)

(21)

Read-Write Conflict

T₁ T₂

commit₁()

read₂(X)

X ←₂ X+M write₂(X) commit₂()

21 cobegin

transaction T1();

//

transaction T2();

coend

A transaction reads a value prior to its change by another one (e.g. T₂ reads a value before the commit of T₁, thus T₂’s commit updates an “old” value)

(22)

Write-Write Conflict (1)

T₁ T₂

read₁(X) X ←₁ X-N

write₁(X) commit()

read₂(X) X ←₂ X+M

write₂(X) commit()

cobegin

transaction T1();

//

transaction T2();

coend

When different schedules may change the position of two write operations, thus providing (two) possible different DB states.

T₁ T₂

read₁(X) X ←₁ X-N write₁(X) commit()

read₂(X) X ←₂ X+M write₂(X) commit()

(23)

Write-Write Conflict (2)

23 cobegin

transaction T1();

//

transaction T2();

coend

When different schedules may change the position of two write operations, thus providing (two) possible different DB states.

T₁ T₂

write₁(X) commit₁()

T₁ T₂

(24)

A problem: transaction abort

T₁ T₂

abort₁()

cobegin

transaction T1();

//

transaction T2();

coend

One transaction reads values coming from another transaction that later on will abort (write-read conflict)

T₂ should be aborted alongside with T₁ but, if we do so, we violate the

durability property for the transactions. All the actions performed/caused by T₁ should be rolled-back.

(25)

Violations

■ Lost Update: two transactions that access the same database items have their operations interleaved in a way that makes the value of some database items incorrect (Write-Write Conflict).

■ Dirty Read (or Temporary Update): reading a value that has been updated by a transaction that has not committed and that hence could abort later on (Write-Read Conflict)

■ Nonrepeteable Read: a transaction T₁reads a value that is going to be updated by T₂ prior to T₁’s conflict (Read-Write Conflict)

■ Phantom (or Inconsistent Read): Subsequent retrieval of the

same set of unchanged objects from the transaction, give different results

25

(26)

Lost Update (Write-Write)

T₁ T₂

write₁(X) commit₁()

read₂(X) X ←₂ X+M

write₂(X) commit₂()

cobegin

transaction T1();

//

transaction T2();

coend

T₂ reads X before T₁ changes it in the database. The update from T₁is lost.

T₁ T₂

(27)

Dirty Read (Write-Read)

T₁ T₂

read₁(X) X ←₁ X-N write₁(X) commit()

read₂(X) X ←₂ X+M write₂(X) commit()

T₁ T₂

abort₁()

read₂(X)

⋮

27 cobegin

transaction T1();

//

transaction T2();

coend

T₁ updates X but fails before completion. DB has to rollback to the previous value. Before such rollback, T₂ reads such value and commits successfuly the operation.

(28)

Nonrepeteable Read (Read-Write)

T₁ T₂

commit₁()

read₂(X) ⋮

cobegin

transaction T1();

//

transaction T2();

coend

(29)

■ Interleaving is necessary, but not all the possible schedules must be allowed.

■ Some actions must be rolled back when transactions are aborted

■ Which cheks have to be carried out in order to avoid violations and conflicts?

29

Concurrency and Interleaving

(30)

Transaction support in SQL

■ Each SQL transaction has (at least) these three features:

■ Access mode:

▪ READ ONLY is used only for data retrieval

▪ READ WRITE allows to perform select, update, insert and delete

■ Diagnostic area size: Number of feedback conditions held simultaneously

■ Isolation level: defines how concurrent transactions interact with other while performing operations over databases.

■ Setting the transactions’ options in SQL:

EXEC SQL SET TRANSACTION READ WRITE

DIAGNOSTIC SIZE 5

ISOLATION LEVEL SERIALIZABLE

(31)

Shared and Exclusive Locks

▪ Locks are used to implement different isolation levels.

▪ Locks are defined over single objects.

▪ eXclusive locks, X(A):

▪ If an object A is exclusively locked, shared locks cannot be obtained.

▪ If an object A is exclusively locked, other exclusive locks cannot be obtained.

▪ Shared locks, S(A):

▪ Multiple shared locks can co-exist.

▪ If one or more shared locks over one object A already exist, exclusive locks cannot be obtained.

31

T1: S(A),R(A),X(A),W(A),X(B),R(B),W(B),commit T2: X(A),R(A),W(A),X(B),R(B),W(B),commit

(32)

Isolation Level (1)

■ Serializable: the transaction has to hold the lock when reading or modifying each object of the db (and keeps them until it stops):

table-level locks are included while performing table scans.

■ Repeteable read: the transaction has to hold the lock when reading or modifying each object of the db (and keeps them until it stops):

table-level locks are not included.

■ Read Committed: the transaction asks exclusive locks prior to updating data i dati and keeps them until its end, asks for shared locks for reading operations that are released after such read

operations.

■ Read Uncommitted: the transaction asks exclusive locks prior to updating but no locks for reading operations are asked.

(33)

Isolation Level Lost

Update Dirty Read

Unrepeteable

Read Phantom

READ

UNCOMMITTED

YES YES YES YES

READ

COMMITTED

YES NO YES YES

REPETEABLE READ

NO NO NO YES

SERIALIZABLE NO NO NO NO

33

Isolation Level (2)

(34)

Concurrency Control Techniques

■ Different approaches could be adopted:

■ Forcing a serializable scheduling avoiding conflicts

(conflict-serializability) through data locking protocols.

■ Run all the transactions concurrently, checking conflicts before committing

■ Assign a timestamp to transactions that have either written or read objects and compare such values in order to determine how the operations has to be ordered in the scheduling.

(35)

35

Concurrency Control Techniques: Locks

■ DBMS must ensure that there are no conflicts and aborted transactions trigger rollbacks.

■ Strict two-phase locking (Strict 2PL):

▪ Write locks use eXclusive locks, read ones use Shared locks.

▪ The transaction unlocks its eXclusive locks at commit time

■ Strict 2PL guarantees serializability:

▪ If transactions handle different objects, they could be freely interleaved

▪ If at least two transactions want to handle the same object, then such transactions has to be run serially.

■ The lock manager tracks, for each object within the DB, the eXclusive and Shared locks through a lock table.

(36)

Deadlock Preventions

■ Locks may cause deadlock that must be either avoided o detected and solved.

■ Common DBMS usually assign a transaction priority P(T_i)

depending on the transaction start time (e.g. timestamp TS, we have P(T_i)= –TS(T_i) )

■ If T_iasks for a lock owned by T_j, a lock manager could implement one of these two policies:

▪ Wait-Die: (a older transaction is allowed to wait for a younger one)

▪ P(T_i)>P(T_j) ⇒ T_i.wait();

▪ otherwise ⇒ T_i.abort(); T_i.run();

▪ Wound-Wait: (a younger transaction is allowed to wait for an older one)

▪ P(T_i)>P(T_j) ⇒ T_j.abort(); T_j.run();

▪ otherwise ⇒ T_i.wait();

(37)

37

Concurrency Control: Deadlock Detection

■ If deadlocks are unfrequent, we could choose to detect them and solve them instead of implement the deadlock prevention, checking if a deadlock actually exists. There are two possible solutions:

▪ The lock manager uses a waits-for graph identifying the deadlock cycles. From time to time, the graph is analysed and deadlock cycles are solved by aborting some transactions.

▪ If a transaction waits for a period longer than a given timeout, the system assumes that the transaction is deadlocked and aborts it.

(38)

38

Concurrency Control: Timestamp ordering

■ Each transaction T_ihas a start time timestamp, TS(T_i).

■ For each operation a_irun by T_i:

■ a_i runs before of an operation a_j run by T_j when the following conditions hold simultaneously, then ✓:

▪ a_iconflicts with a_j run by T_j

▪ TS(T_i) < TS(T_j)

■ Otherwise T_i is aborted and run again with a greater TS.

(39)

39

Optimistic Concurrency Control: Validation-Based

■ The locking-based protocol adopts a pessimistic approach to preventing conflicts from occurring.

■ The optimistic approach that transactions do not conflict (or they rarely deadlock)

■ A validation phase checks whether any of the transaction’s updates violate serializability. Otherwise, the transaction is aborted and then restarted later. Such concurrency protocol has the following phases:

■ Read phase: a transaction can read values of committed data items.

However, updates are applied only to local copies (versions) of the data items (in database cache).

■ Validation phase: If the transaction wants to commit, the DBMS checks if there are no conflicts. If there are conflicts, the transaction is aborted.

■ Write phase: On a successful validation, transactions’ updates are

applied to the database; otherwise, aborted transactions are restarted.

(40)

Crash Recovery

(41)

41

Database Recovery

A DBMS recovery manager must ensure:

■ Atomicity: operations performed by non-committing transactions are rolled back

■ Persistency: operations performed by committing transaction must survive to system crashes

(42)

■ A log (or journal) is maintained to keep track of all the transactions’

operations that affect the “values” of the database.

■ Each log is composed of log records. Each one has an unique transaction-id, also called log sequence number (LSN), which is generated automatically and has to be monotonically

increasing w.r.t. time operation time.

■ All the log records share some common fields:

■ Type: the type of the record

■ transID: the transaction id that performed that performed the action described in type.

■ prevLSN: the previous LSN belonging to the same transID

■ A log is a sequential, append-only file kept on disk; it used to recovery from transaction failures.

■ The log buffers (log tail) hold the last part of the log file, so that the log entries are first added to the log main memory buffer by append.

The System Log

(43)

43

(Elmasri, Navathe)

■ [start_transaction,T]: Records that transaction T has started execution.

■ Page Update

■ [write_item,T,X,old_value,new_value]: Records that

transaction T has changed the value of database item X from old_value to new_value. It is written before the actual write operation on the DB.

■ [read_item,T,X]: Records that transaction T has read the value of database item X. Please note that such operations are not required on actual practical recovery protocols, but could be written for auditing tasks.

■ [commit,T]: Records that transaction T has completed successfully, and affirms that its effect can be committed (recorded permanently) to the database. The log buffer is written on the log file.

■ [abort,T]: Records that transaction T has been aborted.

The System Log: Types of Record Entries (1)

(44)

(a) An example of some transaction codes

(b) Schedule of T₁, T₂ and T₃

(c) Log file for schedule in (b)

The System Log: Example

(45)

45

Depending on the specific policies, other record types could be written:

■ End: it is mandatory when, after the commit or abort operation, some finalization operations have to be carried out.

■ Undo Page Update: when a transaction aborts, then all its updates have to be undone. quando una transazione viene annullata i suoi aggiornamenti vengono annullati. For each undo operation a Compensation Log

Record (CLR) is written on the Log System.

The System Log: Types of Record Entries (2)

(46)

■ WAL is used when in-place update (immediate or deferred): the

appropriate log records must be permanently recorded in the log on disk before changes are applied to the database (It ensures

persistency).

■ The transaction is actually committed when all its log records are completely written permanently on disk (eg. from the log buffer)

■ WAL states that:

▪ The before image of an item cannot be overwritten by its after image in the database on disk until all UNDO log entries have been force-written to disk

▪ The commit operation of a transaction cannot be completed until all the REDO and UNDO log records of that transactions are

written to disk.

■ The actual number of log pages to be written at the end of a

transaction are far lower than the number of data pages, since such records do not contain all the tuples but only the pieces of data that have been updated.

Write-Ahead Logging (WAL) Protocol

(47)

47

ARIES (1)

▪ ARIES is an acronym for Algorithms for Recovery and Isolation Exploiting Semantics.

▪ ARIES based on three concepts:

▪ Write-ahead logging (WAL): each DB object update has first to be written in the log file prior to the actual execution on the DB.

▪ Repeating history (during Redo): all the actions of the

database system prior to the crash are traced, and the DB is brought to the state when the crasch occurred. Active

transacions at the time of the crash are undone.

▪ Logging (changes) during Undo: it prevents from repeating the completed undo operations if a failure occurs during recovery, which causes a restart of the recovery process.

(48)

ARIES (2)

▪ ARIES has a recovering algorithm which is run by the recovery manager on system crashes. There are three phases:

1. Analisys phase: identifies the “dirty” (updated) pages in the buffer and the set of transactions active at the time of the crash.

2. REDO phase (recovery): reapplies only the necessary updates from the log to the database. This means that updates that were already performed are not re-applied.

3. UNDO phase: the log is scanned backward and the operations of transactions that were active at the time of the crash but

weren’t committed are undone in the reverse order.

(49)

ARIES (3)

In more detail:

1. Analisys phase:

a. Determine from which point in the log file start the redo phase.

b. Determine which pool buffer pages contain modified data that were not written yet at the time of the crash.

c. Identify which are the transactions running at the time of the crash.

2. Redo phase: re-apply each update record or CLR within the log starting from the record having an LSN associated to the oldest dirty page from the buffer pool.

3. Undo phase: all the transactions in 1c) must be undone.

49

(50)

■ A checkpoint is a snapshot of a DBMS state. By periodically creating those checkpoints, a DBMS could reduce the crash recovery time.

■ A master log record is maintained separately, in stable storage, to store the LSN of the latest checkpoint record that made it to disk.

■ ARIES creates checkpoints in three steps:

– A begin-checkpoint record is written in the log

– A end-checkpoint record, containing both the transaction table and the dirty page table, is written on the log.

– After the end-checkpoint record is permanently written, the master log record is updated with the LSN of the

begin-checkpoint record.

■ ARIES has a fuzzy checkpoint: it doesn’t halt the DBMS and does not require to write the log pool pages, hence it is

inexpensive in terms of performances.

ARIES: Checkpointing