Chapter 5: Replication
1. Why Replication is Challenging
- If data never changes, replication is simple (copy once).
- The challenge arises when data changes, requiring trade-offs between synchronous and asynchronous replication and dealing with failed replicas.
2. Leader-Based Replication
- Structure: One leader handles all writes and sends updates to followers. Clients can read from any replica.
- Synchronous replication: Followers must confirm writes before success; slow or failed followers block writes.
- Asynchronous replication: Writes do not wait for all followers; faster but risks stale reads.
- Adding followers: Leader sends a snapshot and then all subsequent changes to new followers.
- Follower failure: Followers keep logs of missed changes and catch up after reconnecting.
- Leader failure: Followers elect a new leader, which risks lost updates or split-brain scenarios.
3. Write Replication Techniques
- Statement-based: Sends SQL statements; problematic if statements depend on environment (e.g., random values).
- Write-ahead log: Followers get append-only logs tightly coupled to the storage engine.
- Logical log replication: Decoupled logs allowing backward compatibility.
- Trigger-based: Custom replication logic in application code; flexible but error-prone and has overhead.
4. Replication Lag Problems
- Leader-based systems fit read-heavy workloads but usually require asynchronous replication.
- Read-after-write consistency: Guarantees users see their own writes immediately by reading from the leader. Downsides include routing reads to the leader’s data center and difficulties with multiple devices.
- Monotonic reads: Reads always occur from the same replica to avoid seeing newer writes before older ones. Replica failure requires rerouting.
- Consistent prefix reads: Writes applied in order are read in the same order by ensuring related writes happen on the same partition.
- Eventual consistency: Lagging reads can be stale; stronger consistency might be needed if lag becomes too large.
5. Multi-Leader Replication
- Leaders exist in multiple data centers to improve performance and availability.
- Each data center operates independently; asynchronous replication handles network issues.
- Challenges include concurrent writes causing conflicts.
- Multi-leader replication is risky for databases with features like auto-increment keys, triggers, or integrity constraints.
Conflict Resolution
- Use unique write IDs or replica IDs with “highest wins” rule.
- Merge conflicting values.
- Store conflicts explicitly.
Topologies
- All-to-all: No single point of failure but can have causality issues.
- Circular/star: Has single points of failure.
6. Leaderless Replication (Dynamo-style)
- Designed for write-intensive systems to avoid leader bottlenecks.
- Clients send writes and reads in parallel to multiple replicas.
- Reading multiple replicas detects inconsistencies; the client rewrites the majority value or background processes fix conflicts.
Parameters
- n = total replicas
- w = write quorum size
- r = read quorum size
-
Formula: w + r > n to ensure consistency and minimize stale reads.
- Concurrent writes must be merged safely.
- Optimized for eventual consistency; stronger guarantees require transactions or consensus protocols.
- Suitable for multi-datacenter deployment.
7. Handling Conflicts & Consistency in Leaderless Systems
- Last Write Wins (LWW): Uses timestamps to pick the latest write; converges quickly but may lose writes.
- Happens-before: Detects concurrency; only concurrent writes need resolution.
- Merge values: Keeps all conflicting data; the application decides how to merge or display them.
- Version vectors: Each replica maintains version counters; combined vectors detect conflicts and concurrency.