Database Isolation Levels and Data Anomalies

Database Isolation Levels

In MS SQL Server, there are four standard levels of isolation that correspond to the ANSI SQL standard:

Read uncommitted: This level of isolation allows transactions to read data that has been modified but not yet committed by other transactions. This can lead to inconsistent results, as a transaction may read data that is subsequently rolled back by another transaction.

Read committed: This level of isolation ensures that transactions can read only committed data. A transaction cannot read data that has been modified but not yet committed by another transaction. This provides a higher level of consistency than read uncommitted, but still allows for non-repeatable reads.

Repeatable read: This level of isolation ensures that a transaction can read the same data multiple times and get the same result each time, even if other transactions modify the data in the meantime. This prevents non-repeatable reads but can still result in phantom reads (where a transaction sees new rows that were inserted by other transactions after the transaction began).

Serializable: This level of isolation provides the highest level of consistency. It ensures that transactions are executed as if they were serialized (i.e., one after another), even though they may execute concurrently. This prevents non-repeatable reads and phantom reads but can result in lower concurrency and slower performance due to the need for locking.

In addition to these standard levels of isolation, MS SQL Server also provides two additional levels of isolation:

Snapshot isolation: This level of isolation provides a consistent read-only view of the database for each transaction. Each transaction sees a snapshot of the database as it existed at the beginning of the transaction, and any modifications made by other transactions are not visible.

Read committed snapshot isolation: This level of isolation provides a consistent read-only view of the database for each transaction, but uses a different mechanism than snapshot isolation. Instead of taking a snapshot of the database, it uses versioning to provide a consistent view of the data. This can provide better performance than snapshot isolation, but can also lead to increased overhead due to the need to maintain version information.

Data anomalies

Data anomalies refer to inconsistencies, errors or unexpected outcomes that arise in a database when data is entered, deleted, or updated in a way that violates the rules or constraints that govern the database. Data anomalies can cause data to be inaccurate, incomplete, or contradictory, and can ultimately lead to problems such as poor decision-making, lost revenue, or reputational damage.

Below is a diagram of what the default isolation levels are for some common database platforms along with the data anomalies that can occur with each isolation level.

P0, P1, P2, P3, P4, P4C, A5A, and A5B are specific types of data anomalies that can occur in a database. They are often used to categorize and prioritize data anomalies based on their severity and impact on the database.

P0 (Priority 0): P0 data anomalies are the most severe and urgent issues that need immediate attention. They can cause data loss, system crashes, or security breaches, and can lead to significant financial or reputational damage if not resolved quickly.

P1 (Priority 1): P1 data anomalies are high-priority issues that require attention as soon as possible. They can cause major disruptions to business processes, data inconsistencies, or incorrect reporting.

A Dirty Read occurs when one transaction reads a value that has been written by another still in-flight transaction. It is not enough to prevent only reads of values written by transactions that eventually rollback, you need to prevent reads of values from transactions that ultimately commit too.

P2 (Priority 2): P2 data anomalies are medium-priority issues that need to be addressed in a timely manner. They can cause moderate disruptions to business processes, data inaccuracies, or inconsistencies.

A Fuzzy or Non-Repeatable Read occurs when a value that has been read by a still in-flight transaction is overwritten by another transaction. Even without a second read of the value actually occurring this can still cause database invariants to be violated.

P3 (Priority 3): P3 data anomalies are low-priority issues that can be addressed at a later time. They may cause minor disruptions to business processes or minor data inconsistencies.

A Phantom occurs when a transaction does a predicate-based read (e.g. SELECT… WHERE P) and another transaction writes a data item matched by that predicate while the first transaction is still in flight. In the original ANSI SQL language only matching inserts were forbidden (the new entries being the phantoms), but in fact to be safe we need to prohibit any write: updates, deletes, and inserts.

P4 (Priority 4): P4 data anomalies are the least severe and can be considered minor issues that can be addressed during regular maintenance or updates.

It turns out that there is an anomaly that is not prevented by systems that defend against Dirty Reads and Dirty Writes, but that is prevented by systems that also protect against Fuzzy reads. It’s known as a Lost Update. Consider 2 transaction T1 and T2,

T1 wants to update the data of x. It first reads the value of x, then adds 1 to the value and writes it back to the database. But after reading x, T2 writes a new x and submits it successfully, while T1 adds 1 to the old x value. In this way, T2's update appears to be discarded to T1.

P4C (Priority 4 Compliance): P4C data anomalies are minor issues related to compliance regulations. Although they may not directly impact business processes, they need to be addressed to ensure that the database is compliant with legal or regulatory requirements.

P4C is a variation of the Lost Update phenomenon that involves a SQL cursor.

A5A (Availability 5 Availability): A5A data anomalies are issues related to system availability. They can cause system downtime or poor performance, leading to disruptions in business processes.

Read skew can occur when there are integrity constraints between two or more data items. Read skew causes inconsistent reads due to inconsistent state of the data being read.

A5B (Availability 5 Business): A5B data anomalies are issues related to business availability. They can cause disruptions in business processes, resulting in lost revenue or reputational damage.

Write skew is very similar. Suppose our constraint is x+y ≤ 100. And this time we interpose a transaction before T1 issues a write

Поиск по этому блогу

MEAL officer learning log