Chapter 5 - Designing Data-Intensive Applications

Syan S.P · April 8, 2025

Data-Intensive-Applications Distributed-Systems

Chapter 5: Replication

If data never changes, replication is simple (copy once).
The challenge arises when data changes, requiring trade-offs between synchronous and asynchronous replication and dealing with failed replicas.

Structure: One leader handles all writes and sends updates to followers. Clients can read from any replica.
Synchronous replication: Followers must confirm writes before success; slow or failed followers block writes.
Asynchronous replication: Writes do not wait for all followers; faster but risks stale reads.
Adding followers: Leader sends a snapshot and then all subsequent changes to new followers.
Follower failure: Followers keep logs of missed changes and catch up after reconnecting.
Leader failure: Followers elect a new leader, which risks lost updates or split-brain scenarios.

Statement-based: Sends SQL statements; problematic if statements depend on environment (e.g., random values).
Write-ahead log: Followers get append-only logs tightly coupled to the storage engine.
Logical log replication: Decoupled logs allowing backward compatibility.
Trigger-based: Custom replication logic in application code; flexible but error-prone and has overhead.

Leader-based systems fit read-heavy workloads but usually require asynchronous replication.
Read-after-write consistency: Guarantees users see their own writes immediately by reading from the leader. Downsides include routing reads to the leader’s data center and difficulties with multiple devices.
Monotonic reads: Reads always occur from the same replica to avoid seeing newer writes before older ones. Replica failure requires rerouting.
Consistent prefix reads: Writes applied in order are read in the same order by ensuring related writes happen on the same partition.
Eventual consistency: Lagging reads can be stale; stronger consistency might be needed if lag becomes too large.

Leaders exist in multiple data centers to improve performance and availability.
Each data center operates independently; asynchronous replication handles network issues.
Challenges include concurrent writes causing conflicts.
Multi-leader replication is risky for databases with features like auto-increment keys, triggers, or integrity constraints.

Designed for write-intensive systems to avoid leader bottlenecks.
Clients send writes and reads in parallel to multiple replicas.
Reading multiple replicas detects inconsistencies; the client rewrites the majority value or background processes fix conflicts.

n = total replicas
w = write quorum size
r = read quorum size
Formula: w + r > n to ensure consistency and minimize stale reads.
Concurrent writes must be merged safely.
Optimized for eventual consistency; stronger guarantees require transactions or consensus protocols.
Suitable for multi-datacenter deployment.

Last Write Wins (LWW): Uses timestamps to pick the latest write; converges quickly but may lose writes.
Happens-before: Detects concurrency; only concurrent writes need resolution.
Merge values: Keeps all conflicting data; the application decides how to merge or display them.
Version vectors: Each replica maintains version counters; combined vectors detect conflicts and concurrency.