Chapter 2 - Designing Data-Intensive Applications

Syan S.P · December 5, 2024

Chapter 2 Data Models and Query Languages

Data Models

Relational Model (SQL)

  • Well-known with strong support for joins and many-to-many relationships.
  • Uses foreign keys to represent relationships.
  • Employs rigid schemas (schema-on-write).
  • Faces impedance mismatch with application objects, often requiring Object-Relational Mappers (ORMs).

Document Model (NoSQL)

  • Stores self-contained documents, typically JSON-like structures.
  • Offers schema flexibility (schema-on-read).
  • Provides better data locality for queries that access whole documents.
  • Poor support for many-to-many relationships.
  • Uses document references instead of foreign keys.
  • Closer to application data structures.

Graph Model

  • Ideal for highly interconnected data with many-to-many relationships.
  • Represents entities as vertices and relationships as edges.
  • Commonly used in social networks (e.g., Facebook).
  • Query languages include Cypher, SPARQL, and Datalog.

Convergence

  • Relational databases increasingly support JSON documents.
  • Document databases add relational-like join capabilities.
  • Data models are becoming more similar over time.

Query Languages

  • Declarative Languages: Specify what data is desired without describing how to retrieve it (e.g., SQL, Cypher, SPARQL). These hide implementation details and enable optimizations.
  • Imperative Languages: Specify step-by-step how to retrieve data (e.g., MapReduce).

MapReduce

  • A parallel processing model for large-scale data queries.
  • The map function runs on multiple machines; the reduce function aggregates the results.
  • Requires pure functions without side effects.
  • MongoDB’s aggregation pipelines follow a similar pattern.

Key Concepts

  • Schema-on-write (used by relational models) versus schema-on-read (used by document models).
  • Data locality enhances query performance, especially in document databases.
  • Choosing the right data model depends on the use case:
    • Relational for complex relationships.
    • Document for flexible schemas and data locality.
    • Graph for rich many-to-many connections.

Twitter, Facebook