Chapter 5 - Designing Data-Intensive Applications

Syan S.P · April 8, 2025

Chapter 5: Replication

1. Why Replication is Challenging

  • If data never changes, replication is simple (copy once).
  • The challenge arises when data changes, requiring trade-offs between synchronous and asynchronous replication and dealing with failed replicas.

2. Leader-Based Replication

  • Structure: One leader handles all writes and sends updates to followers. Clients can read from any replica.
  • Synchronous replication: Followers must confirm writes before success; slow or failed followers block writes.
  • Asynchronous replication: Writes do not wait for all followers; faster but risks stale reads.
  • Adding followers: Leader sends a snapshot and then all subsequent changes to new followers.
  • Follower failure: Followers keep logs of missed changes and catch up after reconnecting.
  • Leader failure: Followers elect a new leader, which risks lost updates or split-brain scenarios.

3. Write Replication Techniques

  • Statement-based: Sends SQL statements; problematic if statements depend on environment (e.g., random values).
  • Write-ahead log: Followers get append-only logs tightly coupled to the storage engine.
  • Logical log replication: Decoupled logs allowing backward compatibility.
  • Trigger-based: Custom replication logic in application code; flexible but error-prone and has overhead.

4. Replication Lag Problems

  • Leader-based systems fit read-heavy workloads but usually require asynchronous replication.
  • Read-after-write consistency: Guarantees users see their own writes immediately by reading from the leader. Downsides include routing reads to the leader’s data center and difficulties with multiple devices.
  • Monotonic reads: Reads always occur from the same replica to avoid seeing newer writes before older ones. Replica failure requires rerouting.
  • Consistent prefix reads: Writes applied in order are read in the same order by ensuring related writes happen on the same partition.
  • Eventual consistency: Lagging reads can be stale; stronger consistency might be needed if lag becomes too large.

5. Multi-Leader Replication

  • Leaders exist in multiple data centers to improve performance and availability.
  • Each data center operates independently; asynchronous replication handles network issues.
  • Challenges include concurrent writes causing conflicts.
  • Multi-leader replication is risky for databases with features like auto-increment keys, triggers, or integrity constraints.

Conflict Resolution

  • Use unique write IDs or replica IDs with “highest wins” rule.
  • Merge conflicting values.
  • Store conflicts explicitly.

Topologies

  • All-to-all: No single point of failure but can have causality issues.
  • Circular/star: Has single points of failure.

6. Leaderless Replication (Dynamo-style)

  • Designed for write-intensive systems to avoid leader bottlenecks.
  • Clients send writes and reads in parallel to multiple replicas.
  • Reading multiple replicas detects inconsistencies; the client rewrites the majority value or background processes fix conflicts.

Parameters

  • n = total replicas
  • w = write quorum size
  • r = read quorum size
  • Formula: w + r > n to ensure consistency and minimize stale reads.

  • Concurrent writes must be merged safely.
  • Optimized for eventual consistency; stronger guarantees require transactions or consensus protocols.
  • Suitable for multi-datacenter deployment.

7. Handling Conflicts & Consistency in Leaderless Systems

  • Last Write Wins (LWW): Uses timestamps to pick the latest write; converges quickly but may lose writes.
  • Happens-before: Detects concurrency; only concurrent writes need resolution.
  • Merge values: Keeps all conflicting data; the application decides how to merge or display them.
  • Version vectors: Each replica maintains version counters; combined vectors detect conflicts and concurrency.

Twitter, Facebook