Chapter 3 Storage and Retrieval
PART 1: Storage Engines & Log-Structured Approaches
1. Storage Engine Types
- Log-Structured (LSM Trees): Optimized for writes using append-only, immutable segments.
- Page-Oriented (B-Trees): Optimized for reads with update-in-place architecture.
2. Log-Structured Engines
- Use append-only logs for persistent storage.
- Data is written to segment files and later compacted to remove duplicates and retain the latest version.
- Hash Index: An in-memory map of key โ offset in the file for fast lookup; requires sufficient RAM.
- Tombstones are used to mark deletions instead of removing data immediately.
3. SSTables and LSM Trees
- SSTables (Sorted String Tables): Store sorted key-value pairs in immutable files.
- Use a memtable (an in-memory tree) to buffer writes before flushing to disk.
- Sorted structure allows faster merging compared to unordered logs.
- Bloom filters help avoid unnecessary lookups in segments.
Compaction Strategies
- Size-tiered compaction: Merge smaller segments into larger ones.
- Leveled compaction: Move data across levels while keeping non-overlapping key ranges per level.
PART 2: B-Trees and Index Comparisons
1. B-Trees
- Store data in fixed-size pages.
- Organized as a tree with root, internal nodes, and leaf nodes.
- Well-suited for range queries and efficient reads.
2. B-Tree Operations
- Use update-in-place for writes; may split pages when inserting.
- Rely on write-ahead logs (WAL) for crash recovery.
- Use latches (lightweight locks) to support concurrent access.
3. Optimizations
- Sibling pointers on leaf nodes help with efficient range scans.
- Some systems use copy-on-write instead of in-place updates for crash safety and consistency.
- Systems like SQLite use B-trees for both table storage and indexes.
4. B-Trees vs LSM Trees
| Feature | B-Trees | LSM Trees |
|---|---|---|
| Best for | Read-heavy workloads | Write-heavy workloads |
| Write Strategy | Update-in-place | Append-only with compaction |
| Storage | Consistent performance | Lower write amplification on SSDs |
| Compression | Limited | Typically better |
| Semantics | Strong transactional semantics | Eventual consistency more common |
PART 3: Advanced Indexes, In-Memory Databases, and Analytics
1. Secondary and Multi-Column Indexes
- Secondary Index: Multiple rows may share the same index key.
Storage Approaches
- Heap file: Data stored separately from index.
- Clustered index: Rows stored inline with index entries.
- Covering index: Partial rows stored in index to optimize query performance.
Multi-Column Indexing
- Concatenated index: Index built on multiple columns in a defined order.
- Multi-dimensional index: Suitable for spatial or geo data; often use space-filling curves for mapping.
2. Fuzzy Search
- Use Levenshtein automata to support edit-distance and approximate text matching.
3. In-Memory Databases
- RAM is affordable enough to keep the entire database in memory.
Durability Options
- Logs, snapshots, battery-backed RAM, or replication.
- Faster due to the absence of disk encoding/decoding overhead.
- Can support advanced structures like sets and queues (e.g., Redis).
4. OLTP vs OLAP
- OLTP (Online Transaction Processing): Focused on fast, frequent, small-scale writes and reads.
- OLAP (Online Analytical Processing): Focused on large scans, aggregates, and historical analysis.
- ETL pipelines are used to load OLTP data into data warehouses for OLAP to avoid performance impacts.
5. Column-Oriented Storage
- Store data by column instead of by row.
- Ideal for analytics: read fewer columns, compress better.
Compression Strategies
- Bitmap encoding
-
Dictionary encoding
- Sort order improves filtering and enhances compression ratios.
6. Writes in Column Stores
- Use LSM-tree style approach for writes.
- Data is buffered in memory and then flushed to disk in batches.
- Some systems store the same data in multiple sort orders to optimize different query patterns.