Databases
DBMS Architecture:
Transactions and Concurrency
Control
Transactions and Concurrency Control:
▪ Elmasri, 7th Ed. Chapter 20, 21, 22.
▪ Coulouris, 5th Ed. Chapter 16.
Coulouris, Dollimore, Kindberg and Blair:
"Distributed Systems:Concepts and Design".
5th Edition, © Addison-Wesley 2012 References
Disks Storage
Manager
Buffer Manager Index/file/record
Manager Execution
Engine
Transaction Manager
Logging/
Recovery Query
Compiler
Concurrency control
Buffers
Lock Table
User/Software Query/update
Transactions
Query plan
Indexes, records and file req.
Page Handling
Read/Write Pages
log pages metadati,
statistiche
metadata, indexes, statistics
DDL Compiler
Admin DDL
metadata
DBMS Architecture
3
Modeling Perspective: data operations
■ A database is a collection of named data items
■ Granularity of data - a field, a record , or a whole disk block (Concepts are independent of granularity)
■ Any high level query could be expressed through a sequence of read and write operations:
■ read(X): Reads a database item named X into a program variable. To simplify our notation, we
assume that the program variable is also named X.
■ write(X): Writes the value of program variable X into the database item named X.
Simple Database Model
Modeling Perspective: multiuser environment
■ DBMSs are multiuser environments (banks, companies)
■ The OS provides an an abstraction layer for
multiprogramming, but concurrent execution of processes are actually interleaved (transparency).
■ (desirable) side effect: since disk accesses are slow, you can perform multiple transactions in a concurrent way, keeping the CPU always busy
■ …
Simple Database Model: Environment
5
■ A Transaction: is a logical unit of database processing within an executing program that includes one or more access operations (read = retrieval, write = insert or update, delete).
■ A transaction (set of operations) may be specified in a high level language like SQL and run interactively by an interpreter, or may be embedded within a program.
■ An application program may contain several transactions separated by the Begin and End transaction boundaries.
Simple Database Model: Transactions
Modeling Perspective: multiuser environment
■ …
■ A schedule or history S of several concurrent transactions T1,…,Tn is a total ordering over the operations in each Ti:
■ Given operations Oi and in Oj in Tk such that Oi≼Oj in Tk, then such ordering is preserved in S, too.
Simple Database Model: Schedule
7
Scheduling: Example (1)
T1 T2
read1(X) X ←1 X-N write1(X) read1(Y) Y ←1 Y+N write1(Y)
read2(X) X ←2 X+M write2(X)
T1 T2
read1(X) X ←1 X-N write1(X)
read1(Y) Y ←1 Y+N write1(Y)
read2(X) X ←2 X+M write2(X)
cobegin
transaction T1();
//
transaction T2();
coend
For each Ti, begin and end are implicit.
A schedule represents a possible order of execution of the actions in the transactions involved
Scheduling: Example (2)
T1 T2
read1(X) X ←1 X-N write1(X) read1(Y) Y ←1 Y+N write1(Y)
read2(X) X ←2 X+M write2(X)
T1 T2
read1(X) X ←1 X-N write1(X)
read1(Y) Y ←1 Y+N write1(Y)
read2(X)
X ←2 X+M write2(X)
?
9 cobegin
transaction T1();
//
transaction T2();
coend
For each Ti, begin and end are implicit.
A schedule represents a possible order of execution of the actions in the transactions involved
Two relevant problems
1. Concurrency Control
▪ Since disk accesses are slow, you can perform multiple
concurrent transactions, keeping the CPU always busy (see SO scheduling).
▪ Uncontrolled concurrency might lead to unexpected
behaviours, e.g. the effect of a transaction can be completely undone by the execution of a subsequent transaction.
2. (Crash) Recovery
▪ A transaction has to be an atomic unit. For recovery purposes, the system needs to keep track of when the transaction
starts, terminates, and terminates successfully (commit) or halts unexpectedly (abort).
▪ It must ensure that other transactions are not affected by a possible abort and such abort must keep the database in a consistent state (e.g., think of a bank transfer)
ACID: Properties of Transactions
Such transaction properties ensure both a successful concurrent execution and crash recoveries:
■ Atomicity: a transaction is an atomic unit of processing;
it is either performed in its entirety or not performed at all.
■ Consistency preservation: a correct execution of the transaction must take the database from one consistent state to another.
■ Isolation: A transaction should not make its updates visible to other transactions until it is committed.
■ Durability or permanency: Once a transaction changes the database and the changes are committed, these
changes must never be lost because of subsequent failure.
11
Consistency (preservation) and Isolation
■ The consistency must be ensured by the user implementing that transaction.
■ The isolation, in relation to other competing
transactions, is guaranteed by making sure that the
outcome of the scheduling is the same of the sequential execution of transactions.
■ This must be guaranteed even in the case that the
executions of the actions which form the transaction are intermixed.
■ However, the DBMS does guarantee that transactions are executed in a specific order.
Atomicity and Durability
■ A transaction aborts when:
1. Local errors at the DBMS level. The DBMS stops such transaction.
2. Exception conditions detected by the transaction, that automatically terminates.
3. System Crash. A hardware, software or network error occurs during the transaction execution, thus preventing to access correctly the
■ An unhandled abort may leave the database in an inconsistent state.
■ DBMS guaratees the transactions’ atomicity through a System Log acting as a journal of all the operations that have been made
■ On aborts, DBMS checks the System Log and back-ups the DB to the last consistent state prior to the transaction’s abortion
(recovery)
■ Log Files also guarantee durability
13
Consistency preservation for Transactions
■ Example: the developer has defined some integrity constraints that each database must preserve (e.g., “The wages’ sum has to be less than the available budget”)
■ A transaction:
▪ always starts in a consistent state (satisfying all the integrity constraints)
▪ could move to inconsistent intermediate states
▪ always terminates successfully (commit) in a consistent state
S1 S2
S3 S4
S7 S5
S6 Constraint
violations
DB states
Consistent states
14
Concurrency Control
Transaction: a statechart
■ Each transaction must terminate with one of these two following actions:
▪ commit (successful operation, store the updates) oppure…
▪ abort (faulty termination, backing-up the actions)
Schedule: more definitions (1)
A schedule S of transactions T1,…,Tn:
▪ is complete if the operations in S are exactly those operations in T1,…,Tn including a commit or abort operation as the last operation for each Ti.
▪ is serial if, for each Ti, all the operations are executed consecutively (only one transaction at a time is active)
▪ is serializable if has the same final state as a complete serial schedule where all the operations perform a
commit.
17
Non-Serial Scheduling
18
T1 T2
read1(X) X ←1 X/N
write1(X)
commit1()
read2(X) X ←2 X+M write2(X) commit2()
T1 T2
read1(X) X ←1 X/N write1(X) commit1()
read2(X) X ←2 X+M write2(X) commit2()
T1 T2
read1(X) X ←1 X/N write1(X) commit ()
read2(X) X ←2 X+M write2(X) commit2()
Serial scheduling T1;T2 Result:
X= X/N+M
Serial scheduling T2;T1 Result:
X= (X+M)/N
Non-Serial scheduling.
Result:
X= X+M
Conflict
Two operations in a schedule are said to conflict if they satisfy all three of the following conditions leaving the database potentially into an inconsistent state:
▪ they belong to different transactions
▪ they access the same item
▪ at least one operation is a write.
Depending on when the read and the write operation occur, we could have three distinct types of conflicts:
▪ Write-Read conflict
▪ Read-Write conflict
▪ Write-Write conflict
19
Write-Read Conflict
T1 T2
read1(X) X ←1 X-N write1(X) commit1()
read2(X) X ←2 X+M write2(X) commit2()
T1 T2
read1(X) X ←1 X-N write1(X)
commit1()
read2(X) X ←2 X+M write2(X) commit2()
cobegin
transaction T1();
//
transaction T2();
coend
A transaction reads a value that has just been updated by a transaction that has not performed the commit yet (e.g. T2 reads a value written by T1 that could abort later on, thus T2’s commit could bring the database into an
inconsistent state)
Read-Write Conflict
T1 T2
read1(X) X ←1 X-N write1(X) commit1()
read2(X) X ←2 X+M write2(X) commit2()
T1 T2
read1(X) X ←1 X-N write1(X)
commit1()
read2(X)
X ←2 X+M write2(X) commit2()
21 cobegin
transaction T1();
//
transaction T2();
coend
A transaction reads a value prior to its change by another one (e.g. T2 reads a value before the commit of T1, thus T2’s commit updates an “old” value)
Write-Write Conflict (1)
T1 T2
read1(X) X ←1 X-N
write1(X) commit()
read2(X) X ←2 X+M
write2(X) commit()
cobegin
transaction T1();
//
transaction T2();
coend
When different schedules may change the position of two write operations, thus providing (two) possible different DB states.
T1 T2
read1(X) X ←1 X-N write1(X) commit()
read2(X) X ←2 X+M write2(X) commit()
Write-Write Conflict (2)
23 cobegin
transaction T1();
//
transaction T2();
coend
When different schedules may change the position of two write operations, thus providing (two) possible different DB states.
T1 T2
read1(X) X ←1 X-N
write1(X) commit1()
read2(X) X ←2 X+M write2(X) commit2()
T1 T2
read1(X) X ←1 X-N write1(X) commit1()
read2(X) X ←2 X+M write2(X) commit2()
A problem: transaction abort
T1 T2
read1(X) X ←1 X-N write1(X) commit1()
read2(X) X ←2 X+M write2(X) commit2()
T1 T2
read1(X) X ←1 X-N write1(X)
abort1()
read2(X) X ←2 X+M write2(X) commit2()
cobegin
transaction T1();
//
transaction T2();
coend
One transaction reads values coming from another transaction that later on will abort (write-read conflict)
T2 should be aborted alongside with T1 but, if we do so, we violate the
durability property for the transactions. All the actions performed/caused by T1 should be rolled-back.
Violations
■ Lost Update: two transactions that access the same database items have their operations interleaved in a way that makes the value of some database items incorrect (Write-Write Conflict).
■ Dirty Read (or Temporary Update): reading a value that has been updated by a transaction that has not committed and that hence could abort later on (Write-Read Conflict)
■ Nonrepeteable Read: a transaction T1 reads a value that is going to be updated by T2 prior to T1’s conflict (Read-Write Conflict)
■ Phantom (or Inconsistent Read): Subsequent retrieval of the
same set of unchanged objects from the transaction, give different results
25
Lost Update (Write-Write)
T1 T2
read1(X) X ←1 X-N
write1(X) commit1()
read2(X) X ←2 X+M
write2(X) commit2()
cobegin
transaction T1();
//
transaction T2();
coend
T2 reads X before T1 changes it in the database. The update from T1 is lost.
T1 T2
read1(X) X ←1 X-N write1(X) commit1()
read2(X) X ←2 X+M write2(X) commit2()
Dirty Read (Write-Read)
T1 T2
read1(X) X ←1 X-N write1(X) commit()
read2(X) X ←2 X+M write2(X) commit()
T1 T2
read1(X) X ←1 X-N write1(X)
abort1()
read2(X)
⋮
27 cobegin
transaction T1();
//
transaction T2();
coend
T1 updates X but fails before completion. DB has to rollback to the previous value. Before such rollback, T2 reads such value and commits successfuly the operation.
Nonrepeteable Read (Read-Write)
T1 T2
read1(X) X ←1 X-N write1(X) commit1()
read2(X) X ←2 X+M write2(X) commit2()
T1 T2
read1(X) X ←1 X-N write1(X)
commit1()
read2(X) ⋮
cobegin
transaction T1();
//
transaction T2();
coend
■ Interleaving is necessary, but not all the possible schedules must be allowed.
■ Some actions must be rolled back when transactions are aborted
■ Which cheks have to be carried out in order to avoid violations and conflicts?
29
Concurrency and Interleaving
Transaction support in SQL
■ Each SQL transaction has (at least) these three features:
■ Access mode:
▪ READ ONLY is used only for data retrieval
▪ READ WRITE allows to perform select, update, insert and delete
■ Diagnostic area size: Number of feedback conditions held simultaneously
■ Isolation level: defines how concurrent transactions interact with other while performing operations over databases.
■ Setting the transactions’ options in SQL:
EXEC SQL SET TRANSACTION READ WRITE
DIAGNOSTIC SIZE 5
ISOLATION LEVEL SERIALIZABLE
Shared and Exclusive Locks
▪ Locks are used to implement different isolation levels.
▪ Locks are defined over single objects.
▪ eXclusive locks, X(A):
▪ If an object A is exclusively locked, shared locks cannot be obtained.
▪ If an object A is exclusively locked, other exclusive locks cannot be obtained.
▪ Shared locks, S(A):
▪ Multiple shared locks can co-exist.
▪ If one or more shared locks over one object A already exist, exclusive locks cannot be obtained.
31
T1: S(A),R(A),X(A),W(A),X(B),R(B),W(B),commit T2: X(A),R(A),W(A),X(B),R(B),W(B),commit
Isolation Level (1)
■ Serializable: the transaction has to hold the lock when reading or modifying each object of the db (and keeps them until it stops):
table-level locks are included while performing table scans.
■ Repeteable read: the transaction has to hold the lock when reading or modifying each object of the db (and keeps them until it stops):
table-level locks are not included.
■ Read Committed: the transaction asks exclusive locks prior to updating data i dati and keeps them until its end, asks for shared locks for reading operations that are released after such read
operations.
■ Read Uncommitted: the transaction asks exclusive locks prior to updating but no locks for reading operations are asked.
Isolation Level Lost
Update Dirty Read
Unrepeteable
Read Phantom
READ
UNCOMMITTED
YES YES YES YES
READ
COMMITTED
YES NO YES YES
REPETEABLE READ
NO NO NO YES
SERIALIZABLE NO NO NO NO
33
Isolation Level (2)
Concurrency Control Techniques
■ Different approaches could be adopted:
■ Forcing a serializable scheduling avoiding conflicts
(conflict-serializability) through data locking protocols.
■ Run all the transactions concurrently, checking conflicts before committing
■ Assign a timestamp to transactions that have either written or read objects and compare such values in order to determine how the operations has to be ordered in the scheduling.
35
Concurrency Control Techniques: Locks
■ DBMS must ensure that there are no conflicts and aborted transactions trigger rollbacks.
■ Strict two-phase locking (Strict 2PL):
▪ Write locks use eXclusive locks, read ones use Shared locks.
▪ The transaction unlocks its eXclusive locks at commit time
■ Strict 2PL guarantees serializability:
▪ If transactions handle different objects, they could be freely interleaved
▪ If at least two transactions want to handle the same object, then such transactions has to be run serially.
■ The lock manager tracks, for each object within the DB, the eXclusive and Shared locks through a lock table.
Deadlock Preventions
■ Locks may cause deadlock that must be either avoided o detected and solved.
■ Common DBMS usually assign a transaction priority P(Ti)
depending on the transaction start time (e.g. timestamp TS, we have P(Ti)= –TS(Ti) )
■ If Ti asks for a lock owned by Tj, a lock manager could implement one of these two policies:
▪ Wait-Die: (a older transaction is allowed to wait for a younger one)
▪ P(Ti)>P(Tj) ⇒ Ti.wait();
▪ otherwise ⇒ Ti.abort(); Ti.run();
▪ Wound-Wait: (a younger transaction is allowed to wait for an older one)
▪ P(Ti)>P(Tj) ⇒ Tj.abort(); Tj.run();
▪ otherwise ⇒ Ti.wait();
37
Concurrency Control: Deadlock Detection
■ If deadlocks are unfrequent, we could choose to detect them and solve them instead of implement the deadlock prevention, checking if a deadlock actually exists. There are two possible solutions:
▪ The lock manager uses a waits-for graph identifying the deadlock cycles. From time to time, the graph is analysed and deadlock cycles are solved by aborting some transactions.
▪ If a transaction waits for a period longer than a given timeout, the system assumes that the transaction is deadlocked and aborts it.
38
Concurrency Control: Timestamp ordering
■ Each transaction Ti has a start time timestamp, TS(Ti).
■ For each operation ai run by Ti:
■ ai runs before of an operation aj run by Tj when the following conditions hold simultaneously, then ✓:
▪ ai conflicts with aj run by Tj
▪ TS(Ti) < TS(Tj)
■ Otherwise Ti is aborted and run again with a greater TS.
39
Optimistic Concurrency Control: Validation-Based
■ The locking-based protocol adopts a pessimistic approach to preventing conflicts from occurring.
■ The optimistic approach that transactions do not conflict (or they rarely deadlock)
■ A validation phase checks whether any of the transaction’s updates violate serializability. Otherwise, the transaction is aborted and then restarted later. Such concurrency protocol has the following phases:
■ Read phase: a transaction can read values of committed data items.
However, updates are applied only to local copies (versions) of the data items (in database cache).
■ Validation phase: If the transaction wants to commit, the DBMS checks if there are no conflicts. If there are conflicts, the transaction is aborted.
■ Write phase: On a successful validation, transactions’ updates are
applied to the database; otherwise, aborted transactions are restarted.
Crash Recovery
41
Database Recovery
A DBMS recovery manager must ensure:
■ Atomicity: operations performed by non-committing transactions are rolled back
■ Persistency: operations performed by committing transaction must survive to system crashes
■ A log (or journal) is maintained to keep track of all the transactions’
operations that affect the “values” of the database.
■ Each log is composed of log records. Each one has an unique transaction-id, also called log sequence number (LSN), which is generated automatically and has to be monotonically
increasing w.r.t. time operation time.
■ All the log records share some common fields:
■ Type: the type of the record
■ transID: the transaction id that performed that performed the action described in type.
■ prevLSN: the previous LSN belonging to the same transID
■ A log is a sequential, append-only file kept on disk; it used to recovery from transaction failures.
■ The log buffers (log tail) hold the last part of the log file, so that the log entries are first added to the log main memory buffer by append.
The System Log
43
(Elmasri, Navathe)
■ [start_transaction,T]: Records that transaction T has started execution.
■ Page Update
■ [write_item,T,X,old_value,new_value]: Records that
transaction T has changed the value of database item X from old_value to new_value. It is written before the actual write operation on the DB.
■ [read_item,T,X]: Records that transaction T has read the value of database item X. Please note that such operations are not required on actual practical recovery protocols, but could be written for auditing tasks.
■ [commit,T]: Records that transaction T has completed successfully, and affirms that its effect can be committed (recorded permanently) to the database. The log buffer is written on the log file.
■ [abort,T]: Records that transaction T has been aborted.
The System Log: Types of Record Entries (1)
(a) An example of some transaction codes
(b) Schedule of T1, T2 and T3
(c) Log file for schedule in (b)
The System Log: Example
45
Depending on the specific policies, other record types could be written:
■ End: it is mandatory when, after the commit or abort operation, some finalization operations have to be carried out.
■ Undo Page Update: when a transaction aborts, then all its updates have to be undone. quando una transazione viene annullata i suoi aggiornamenti vengono annullati. For each undo operation a Compensation Log
Record (CLR) is written on the Log System.
The System Log: Types of Record Entries (2)
■ WAL is used when in-place update (immediate or deferred): the
appropriate log records must be permanently recorded in the log on disk before changes are applied to the database (It ensures
persistency).
■ The transaction is actually committed when all its log records are completely written permanently on disk (eg. from the log buffer)
■ WAL states that:
▪ The before image of an item cannot be overwritten by its after image in the database on disk until all UNDO log entries have been force-written to disk
▪ The commit operation of a transaction cannot be completed until all the REDO and UNDO log records of that transactions are
written to disk.
■ The actual number of log pages to be written at the end of a
transaction are far lower than the number of data pages, since such records do not contain all the tuples but only the pieces of data that have been updated.
Write-Ahead Logging (WAL) Protocol
47
ARIES (1)
▪ ARIES is an acronym for Algorithms for Recovery and Isolation Exploiting Semantics.
▪ ARIES based on three concepts:
▪ Write-ahead logging (WAL): each DB object update has first to be written in the log file prior to the actual execution on the DB.
▪ Repeating history (during Redo): all the actions of the
database system prior to the crash are traced, and the DB is brought to the state when the crasch occurred. Active
transacions at the time of the crash are undone.
▪ Logging (changes) during Undo: it prevents from repeating the completed undo operations if a failure occurs during recovery, which causes a restart of the recovery process.
ARIES (2)
▪ ARIES has a recovering algorithm which is run by the recovery manager on system crashes. There are three phases:
1. Analisys phase: identifies the “dirty” (updated) pages in the buffer and the set of transactions active at the time of the crash.
2. REDO phase (recovery): reapplies only the necessary updates from the log to the database. This means that updates that were already performed are not re-applied.
3. UNDO phase: the log is scanned backward and the operations of transactions that were active at the time of the crash but
weren’t committed are undone in the reverse order.
ARIES (3)
In more detail:
1. Analisys phase:
a. Determine from which point in the log file start the redo phase.
b. Determine which pool buffer pages contain modified data that were not written yet at the time of the crash.
c. Identify which are the transactions running at the time of the crash.
2. Redo phase: re-apply each update record or CLR within the log starting from the record having an LSN associated to the oldest dirty page from the buffer pool.
3. Undo phase: all the transactions in 1c) must be undone.
49
■ A checkpoint is a snapshot of a DBMS state. By periodically creating those checkpoints, a DBMS could reduce the crash recovery time.
■ A master log record is maintained separately, in stable storage, to store the LSN of the latest checkpoint record that made it to disk.
■ ARIES creates checkpoints in three steps:
– A begin-checkpoint record is written in the log
– A end-checkpoint record, containing both the transaction table and the dirty page table, is written on the log.
– After the end-checkpoint record is permanently written, the master log record is updated with the LSN of the
begin-checkpoint record.
■ ARIES has a fuzzy checkpoint: it doesn’t halt the DBMS and does not require to write the log pool pages, hence it is
inexpensive in terms of performances.
ARIES: Checkpointing