Learning Journal Week 5- CST363
This week, we focused on database transactions and concurrency control. We learned that a transaction is a sequence of operations performed as a single, atomic unit to ensure the database moves from one consistent state to another. If a transaction completes successfully, its changes are committed; if it fails for any reason, any partial changes are rolled back, preserving data integrity. A primary reason for using transactions is to manage concurrency, which is when multiple programs or requests run at the same time and access the same data.
Without proper controls, concurrency can lead to several problems. We examined the "lost update" problem, where two transactions read the same value, and the second transaction's write overwrites the first, causing the first update to be lost. We also looked at "inconsistent reads," where one transaction reads data that is only partially updated by another, leading to incorrect results. Another issue is the "inconsistent write" (or "Bob and Alice" problem), where two transactions make decisions based on the same initial data but their interleaved updates lead to a state that violates business rules. Finally, we learned about the "phantom" problem, where rows seem to appear or disappear within a transaction because another transaction has inserted or deleted data. To prevent these issues, databases use mechanisms like locking. Reads typically acquire a shared (S) lock, while writes require an exclusive (X) lock. This system can lead to deadlocks, which the database resolves by aborting one of the transactions. We also covered the different SQL isolation levels, from the most strict (SERIALIZABLE) to the least (READ UNCOMMITTED), which offer a trade-off between consistency and performance.
In our reading, we explored the concept of a "slow index" from "Use the Index, Luke". The author clarifies that the slowness is not due to a "degenerated" or broken index, but rather from the full process involved in certain types of queries. An index lookup consists of three steps: the tree traversal, following the leaf node chain, and fetching the table data. While the initial tree traversal is extremely fast, the other two steps can be slow. If a query matches many non-unique entries, the database must follow the leaf node chain to find all of them, potentially reading a large part of the index. Subsequently, for each entry found, the database must perform a separate access to the table to retrieve the full row, which is often scattered across many different disk blocks. This combination of scanning numerous index leaf nodes (INDEX RANGE SCAN) and then performing many individual table accesses (TABLE ACCESS BY INDEX ROWID) for queries that return a large number of rows is what causes the index to seem slow.
Comments
Post a Comment