Database Design at Scale

Design robust databases that scale to handle millions of users

110 minutes

8Detailed Sections

Senior Level

The relational database model has dominated for 40+ years, but NoSQL emerged around 2009 to address specific scalability and flexibility challenges.

The choice between SQL and NoSQL isn't about one being "better"—it's about matching database characteristics to your use case. Relational databases (PostgreSQL, MySQL, Oracle) excel at complex queries, transactions, and strong consistency.

They use structured tables with predefined schemas, support JOINs across tables, and provide ACID guarantees. The relational model enforces data integrity through foreign keys and constraints.

This is ideal for financial systems, inventory management, or any scenario requiring strong consistency. However, RDBMSs struggle with horizontal scaling—sharding is complex and joins across shards are expensive.

NoSQL databases sacrificed some relational features to achieve better scalability and flexibility.

Document stores (MongoDB, CouchDB) store semi-structured JSON-like documents, making them ideal for content management, user profiles, or product catalogs where each entity might have different fields.

Wide-column stores (Cassandra, HBase) organize data by rows and columns but allow different columns per row, optimized for write-heavy workloads like time-series data or messaging.

Key-value stores (Redis, DynamoDB) offer simple get/put operations with extremely high performance, perfect for caching or session storage.

Graph databases (Neo4j, Amazon Neptune) excel at relationship-heavy data like social networks or fraud detection.

Key Takeaways

ACID vs BASE: SQL guarantees Atomicity, Consistency, Isolation, Durability; NoSQL often uses Basically Available, Soft state, Eventual consistency

Schema Flexibility: NoSQL allows different fields per document; useful when schema evolves rapidly

Horizontal Scaling: NoSQL designed for adding nodes; SQL scales vertically (bigger servers)

Query Complexity: SQL supports complex JOINs; NoSQL requires denormalization or multiple queries

Transactions: Modern NoSQL (MongoDB 4.0+, Cosmos DB) now support multi-document transactions

Common Pitfall: Using NoSQL for everything—many problems still best solved with SQL

Solution: Polyglot persistence—use different databases for different parts of your system

Real-World: Uber uses MySQL for trip data (transactions), Cassandra for time-series (GPS coordinates)

Performance: DynamoDB single-digit millisecond latency for key-value lookups at any scale

GraphQL-Native: Databases like Dgraph provide native GraphQL interfaces, eliminating the need for complex ORMs or mapping layers between the API and the data tier

Visual Diagram


┌────────────────────────────────────────────┐
│       SQL vs NoSQL Trade-offs             │
├────────────────────────────────────────────┤
│                                            │
│ Relational (SQL):                          │
│  ┌─────────┬─────────┬─────────┐          │
│  │ user_id │  name   │  email  │ Schema   │
│  ├─────────┼─────────┼─────────┤          │
│  │    1    │  Alice  │ a@ex.com│ Fixed    │
│  │    2    │   Bob   │ b@ex.com│          │
│  └─────────┴─────────┴─────────┘          │
│  + Strong consistency (ACID)               │
│  + Complex queries (JOINs)                 │
│  + Data integrity (foreign keys)           │
│  - Hard to scale horizontally              │
│  - Schema migrations expensive             │
│  Use: Banking, inventory, analytics        │
│                                            │
│ Document Store (MongoDB):                  │
│  {id:1, name:"Alice", email:"a@ex.com"}   │
│  {id:2, name:"Bob", tags:["vip"]}         │
│  + Flexible schema per document            │
│  + Horizontal scaling (sharding)           │
│  + Developer friendly (JSON)               │
│  - Limited transactions (improving)        │
│  - No JOINs (embed or reference)          │
│  Use: CMS, catalogs, user profiles         │
│                                            │
│ Wide-Column (Cassandra):                   │
│  Row key → Column families → Columns      │
│  user:1 → profile{name, email, ...}       │
│  + Massive write throughput                │
│  + Linear scalability                      │
│  + Time-series optimized                   │
│  - Limited query flexibility               │
│  - Eventual consistency default            │
│  Use: Time-series, IoT, messaging          │
└────────────────────────────────────────────┘

All Tutorials Practice Questions

Database Design at Scale

Table of Contents

Relational vs NoSQL: Choosing the Right Database Paradigm

Key Takeaways

Visual Diagram

Schema Design: Normalization vs Denormalization

Indexing Strategies: The Key to Query Performance

Geospatial Indexing: Quadtrees, Geohashes, and Proximity Search

Database Scaling Strategies: Vertical, Horizontal, and Hybrid

Transaction Isolation Levels: Balancing Consistency and Performance

Database Replication: Patterns, Lag, and Consistency

Database Performance Optimization: Beyond Indexes