Chapter 2 - Designing Data-Intensive Applications
Syan S.P · December 5, 2024
Chapter 2 Data Models and Query Languages
Data Models
Relational Model (SQL)
- Well-known with strong support for joins and many-to-many relationships.
- Uses foreign keys to represent relationships.
- Employs rigid schemas (schema-on-write).
- Faces impedance mismatch with application objects, often requiring Object-Relational Mappers (ORMs).
Document Model (NoSQL)
- Stores self-contained documents, typically JSON-like structures.
- Offers schema flexibility (schema-on-read).
- Provides better data locality for queries that access whole documents.
- Poor support for many-to-many relationships.
- Uses document references instead of foreign keys.
- Closer to application data structures.
Graph Model
- Ideal for highly interconnected data with many-to-many relationships.
- Represents entities as vertices and relationships as edges.
- Commonly used in social networks (e.g., Facebook).
- Query languages include Cypher, SPARQL, and Datalog.
Convergence
- Relational databases increasingly support JSON documents.
- Document databases add relational-like join capabilities.
- Data models are becoming more similar over time.
Query Languages
- Declarative Languages: Specify what data is desired without describing how to retrieve it (e.g., SQL, Cypher, SPARQL). These hide implementation details and enable optimizations.
- Imperative Languages: Specify step-by-step how to retrieve data (e.g., MapReduce).
MapReduce
- A parallel processing model for large-scale data queries.
- The map function runs on multiple machines; the reduce function aggregates the results.
- Requires pure functions without side effects.
- MongoDB’s aggregation pipelines follow a similar pattern.
Key Concepts
- Schema-on-write (used by relational models) versus schema-on-read (used by document models).
- Data locality enhances query performance, especially in document databases.
- Choosing the right data model depends on the use case:
- Relational for complex relationships.
- Document for flexible schemas and data locality.
- Graph for rich many-to-many connections.