Skip to content

Chapter 30: Infrastructure Scaling ​

Your code doesn't need to be perfect β€” but your infrastructure needs to be ready before your users tell you it isn't.

Why This Matters ​

  • 🏒 Owner: Infrastructure failures directly translate to revenue loss. A 1-hour outage at $500K MRR costs ~$700. At $5M MRR, it's ~$7,000 β€” plus the trust you can't buy back.
  • πŸ’» Dev: This is your domain. Understanding when to use read replicas vs. sharding vs. caching is the difference between a weekend deploy and a 3-month rewrite.
  • πŸ“‹ PM: You need to understand infrastructure constraints to set realistic timelines and make informed trade-off decisions with engineering.
  • 🎨 Designer: Page load time is a design problem. Every 100ms of latency reduces conversions by 1%. Infrastructure decisions directly impact the experience you design.

The Concept (Simple) ​

Analogy: The Highway System

Think of your infrastructure as a highway system:

  • Vertical scaling = widening the existing road (bigger server)
  • Horizontal scaling = building parallel roads (more servers)
  • Caching = building shortcuts and exits so not everyone goes downtown
  • Message queues = adding traffic lights so cars don't all arrive at once
  • CDN = building local branches so people don't have to drive to the main office
Traffic Growth vs. Infrastructure Response
──────────────────────────────────────────

Users    β”‚                                          β•± Horizontal
100K     β”‚                                       β•±β•±   Scaling
         β”‚                                    β•±β•±
 50K     β”‚                              β•±β•±β•±β•±β•±
         β”‚                      β•±β•±β•±β•±β•±β•±β•±β•±
 10K     β”‚              β•±β•±β•±β•±β•±β•±β•±              Vertical Scaling
         β”‚        β•±β•±β•±β•±β•±β•±                     Hits Ceiling Here ──→ β– 
  1K     β”‚  β•±β•±β•±β•±β•±β•±
         β”‚β•±β•±
  100    β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
         Start     +6mo     +12mo    +18mo    +24mo

You always start by scaling vertically (bigger machines). When that ceiling hits, you go horizontal. The trick is preparing for horizontal before you need it.

How It Works (Detailed) ​

Infrastructure Scaling Architecture Overview ​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        SCALED SaaS ARCHITECTURE                     β”‚
β”‚                                                                     β”‚
β”‚   Users ──→ CDN (Static Assets, Edge Cache)                        β”‚
β”‚              β”‚                                                      β”‚
β”‚              β–Ό                                                      β”‚
β”‚         Load Balancer ──→ Rate Limiter                             β”‚
β”‚          β•±   β”‚   β•²                                                  β”‚
β”‚         β•±    β”‚    β•²                                                  β”‚
β”‚    β”Œβ”€β”€β”€β–Όβ”€β”β”Œβ”€β”€β–Όβ”€β”€β”β”Œβ”€β–Όβ”€β”€β”€β”                                          β”‚
β”‚    β”‚App 1β”‚β”‚App 2β”‚β”‚App N β”‚  ← Stateless App Servers                 β”‚
β”‚    β””β”€β”€β”¬β”€β”€β”˜β””β”€β”€β”¬β”€β”€β”˜β””β”€β”€β”¬β”€β”€β”€β”˜    (Auto-scaled)                         β”‚
β”‚       β”‚      β”‚      β”‚                                               β”‚
β”‚       β–Ό      β–Ό      β–Ό                                               β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚    β”‚   Redis Cache        β”‚   β”‚  Message Queue   β”‚                  β”‚
β”‚    β”‚   (Session + Data)   β”‚   β”‚  (Async Jobs)    β”‚                  β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
β”‚              β”‚                        β”‚                             β”‚
β”‚              β–Ό                        β–Ό                             β”‚
β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                    β”‚
β”‚    β”‚  Primary DB       β”‚    β”‚  Worker Servers   β”‚                   β”‚
β”‚    β”‚  (Writes)         β”‚    β”‚  (Background Jobs)β”‚                   β”‚
β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
β”‚             β”‚                                                       β”‚
β”‚      β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”                                               β”‚
β”‚      β–Ό             β–Ό                                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”                                            β”‚
β”‚  β”‚Read    β”‚  β”‚Read    β”‚  ← Read Replicas                           β”‚
β”‚  β”‚Replica β”‚  β”‚Replica β”‚    (Scale reads independently)             β”‚
β”‚  β”‚  1     β”‚  β”‚  2     β”‚                                            β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜                                            β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”‚
β”‚  β”‚  Object Storage (S3/GCS)               β”‚ ← Files, backups,     β”‚
β”‚  β”‚  + Blob Store for uploads              β”‚   static assets        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                       β”‚
β”‚  β”‚  Observability Stack                    β”‚                        β”‚
β”‚  β”‚  Metrics β”‚ Logs β”‚ Traces β”‚ Alerts       β”‚                        β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Database Scaling Strategies ​

Strategy Comparison Table ​

StrategyWhen to UseComplexityRead PerfWrite PerfCost
Vertical scalingFirst bottleneck, < 1TB dataLow+2–4x+2–4x$$
Read replicasRead-heavy (> 80% reads)Medium+5–10xSame$$
Connection poolingMany app instances, connection limitsLow+2–3x+2–3x$
Query optimizationBefore any scaling (always do this first)Low+2–50x+2–50x$
Table partitioningSingle large tables (> 100M rows)Medium+5–20x+2–5x$
Functional shardingDifferent domains, separable dataHigh+5–10x+5–10x$$$
Horizontal sharding> 1TB, high write throughput neededVery High+10–50x+10–50x$$$$

Database Scaling Decision Path ​

Database feeling slow?
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Check your       β”‚
β”‚    queries first!    │──→  EXPLAIN ANALYZE everything
β”‚    (Free)            β”‚     Add missing indexes
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     Fix N+1 queries
          β”‚
     Still slow?
          β”‚
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 2. Vertical scale   │──→  Upgrade CPU/RAM/IOPS
β”‚    ($50–500/mo)      β”‚     Usually buys 6–12 months
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
     Still slow?
          β”‚
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 3. Add caching      │──→  Redis for hot data
β”‚    ($50–200/mo)      β”‚     Reduce DB reads by 60–90%
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
     Still slow?
          β”‚
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 4. Read replicas    │──→  Route reads to replicas
β”‚    ($200–1000/mo)    β”‚     Scale reads independently
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
     Still slow?
          β”‚
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 5. Partition large  │──→  Partition by date/tenant
β”‚    tables           β”‚     Prune old partitions
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚
     Still slow?
          β”‚
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 6. Sharding         │──→  Last resort. High complexity.
β”‚    ($$$$)            β”‚     Consider managed DBs (PlanetScale,
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     CockroachDB, Vitess)

Caching Strategy Layers ​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                      CACHING LAYERS                             β”‚
β”‚                                                                 β”‚
β”‚  Layer 1: Browser Cache                                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚ Cache-Control headers, ETags, Service Workers    β”‚          β”‚
β”‚  β”‚ Latency: 0ms  β”‚  Hit rate: 30–50%  β”‚  Cost: Free β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                          β”‚ miss                                 β”‚
β”‚                          β–Ό                                      β”‚
β”‚  Layer 2: CDN / Edge Cache                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚ Cloudflare, CloudFront, Fastly                   β”‚          β”‚
β”‚  β”‚ Latency: 5–20ms β”‚ Hit rate: 60–90% β”‚ Cost: $     β”‚          β”‚
β”‚  β”‚ Best for: Static assets, public API responses    β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                          β”‚ miss                                 β”‚
β”‚                          β–Ό                                      β”‚
β”‚  Layer 3: Application Cache (Redis/Memcached)                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚ Session data, computed results, API responses    β”‚          β”‚
β”‚  β”‚ Latency: 1–5ms β”‚ Hit rate: 70–95% β”‚ Cost: $$    β”‚          β”‚
β”‚  β”‚ Best for: User-specific data, expensive queries  β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                          β”‚ miss                                 β”‚
β”‚                          β–Ό                                      β”‚
β”‚  Layer 4: Database Query Cache                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚ Materialized views, precomputed aggregates       β”‚          β”‚
β”‚  β”‚ Latency: 10–50ms β”‚ Hit rate: varies β”‚ Cost: $    β”‚          β”‚
β”‚  β”‚ Best for: Dashboard data, reports, analytics     β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                          β”‚ miss                                 β”‚
β”‚                          β–Ό                                      β”‚
β”‚  Source: Primary Database                                      β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚  β”‚ Full query execution                             β”‚          β”‚
β”‚  β”‚ Latency: 20–500ms β”‚  Cost: $$$                   β”‚          β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Cache Invalidation Strategies ​

StrategyHow It WorksProsConsUse When
TTL (Time-based)Cache expires after N secondsSimple, predictableStale data during TTLData freshness tolerance > 30s
Write-throughUpdate cache on every writeAlways consistentHigher write latencyConsistency is critical
Write-behindQueue cache updates asyncFast writesTemporary inconsistencyHigh write throughput needed
Event-basedInvalidate on specific eventsPrecise controlComplex to implementComplex dependency graphs
Cache-asideApp checks cache, fills on missFlexible, resilientCache stampede riskGeneral-purpose, most common

Message Queues and Async Processing ​

Not everything needs to happen in the request-response cycle. Move heavy work to background processing.

SYNCHRONOUS (Before)                ASYNCHRONOUS (After)
─────────────────────              ─────────────────────

User Request                       User Request
    β”‚                                  β”‚
    β–Ό                                  β–Ό
Process Payment (2s)               Process Payment (2s)
    β”‚                                  β”‚
    β–Ό                                  β–Ό
Send Email (1s)                    Queue: Send Email ──→ Worker (later)
    β”‚                                  β”‚
    β–Ό                                  β–Ό
Generate PDF (3s)                  Queue: Gen PDF ──→ Worker (later)
    β”‚                                  β”‚
    β–Ό                                  β–Ό
Update Analytics (0.5s)            Queue: Analytics ──→ Worker (later)
    β”‚                                  β”‚
    β–Ό                                  β–Ό
Response (6.5s total)              Response (2s total) βœ…

What to move to async queues:

  • Email / SMS / push notifications
  • PDF and report generation
  • Image / video processing
  • Analytics and event tracking
  • Webhook delivery
  • Search index updates
  • Data exports

Popular queue technologies:

TechnologyBest ForComplexityManaged Options
Redis (Bull/BullMQ)Simple job queues, < 10K jobs/minLowAWS ElastiCache, Upstash
RabbitMQComplex routing, pub/sub patternsMediumCloudAMQP, AWS MQ
AWS SQSServerless, high durabilityLowFully managed
Apache KafkaEvent streaming, very high throughputHighConfluent, AWS MSK

CDN and Edge Computing ​

WITHOUT CDN                           WITH CDN
──────────                            ────────

User (Tokyo) ──────────────────→ Server (US-East)
              3,200ms round trip

User (London) ─────────────────→ Server (US-East)
              1,800ms round trip

User (SΓ£o Paulo) ──────────────→ Server (US-East)
              2,400ms round trip


User (Tokyo) ───→ Edge (Tokyo) = 40ms βœ…
                      β”‚
User (London) ──→ Edge (London) = 20ms βœ…
                      β”‚
User (SΓ£o Paulo) β†’ Edge (SΓ£o Paulo) = 35ms βœ…
                      β”‚
              Origin (US-East) only on cache miss

CDN checklist:

  • [ ] Static assets (JS, CSS, images, fonts) served via CDN
  • [ ] Cache-Control headers properly configured
  • [ ] Cache busting strategy in place (content hashing)
  • [ ] API responses cached at edge where appropriate
  • [ ] Compression enabled (Brotli > gzip)
  • [ ] Image optimization (WebP/AVIF with fallbacks)

Observability and Monitoring Stack ​

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   OBSERVABILITY STACK                            β”‚
β”‚                                                                 β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
β”‚   β”‚   METRICS     β”‚  β”‚    LOGS      β”‚  β”‚   TRACES     β”‚        β”‚
β”‚   β”‚              β”‚  β”‚              β”‚  β”‚              β”‚        β”‚
β”‚   β”‚ - CPU/Memory β”‚  β”‚ - App logs   β”‚  β”‚ - Request    β”‚        β”‚
β”‚   β”‚ - Request    β”‚  β”‚ - Error logs β”‚  β”‚   flow       β”‚        β”‚
β”‚   β”‚   rate/      β”‚  β”‚ - Audit logs β”‚  β”‚ - Service    β”‚        β”‚
β”‚   β”‚   latency    β”‚  β”‚ - Access     β”‚  β”‚   deps       β”‚        β”‚
β”‚   β”‚ - Error rate β”‚  β”‚   logs       β”‚  β”‚ - Latency    β”‚        β”‚
β”‚   β”‚ - Queue      β”‚  β”‚              β”‚  β”‚   breakdown  β”‚        β”‚
β”‚   β”‚   depth      β”‚  β”‚              β”‚  β”‚              β”‚        β”‚
β”‚   β”‚ - DB conns   β”‚  β”‚              β”‚  β”‚              β”‚        β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
β”‚          β”‚                 β”‚                  β”‚                 β”‚
β”‚          β–Ό                 β–Ό                  β–Ό                 β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚   β”‚              AGGREGATION LAYER                   β”‚          β”‚
β”‚   β”‚   Datadog / Grafana Cloud / New Relic            β”‚          β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β”‚                          β”‚                                      β”‚
β”‚                          β–Ό                                      β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          β”‚
β”‚   β”‚              ALERTING + DASHBOARDS               β”‚          β”‚
β”‚   β”‚   PagerDuty / OpsGenie / Slack                   β”‚          β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Monitoring Stack Components ​

ComponentOpen Source OptionManaged OptionWhat It Tracks
MetricsPrometheus + GrafanaDatadog, New RelicCPU, memory, request rates, custom business metrics
LoggingELK Stack (Elasticsearch, Logstash, Kibana)Datadog Logs, PapertrailApplication events, errors, audit trails
TracingJaeger, ZipkinDatadog APM, HoneycombRequest flow across services, latency breakdown
Error trackingSentry (self-hosted)Sentry, BugsnagExceptions, stack traces, user impact
UptimeBlackbox ExporterPingdom, Better UptimeEndpoint availability, SSL expiry, response time
AlertingAlertmanagerPagerDuty, OpsGenieOn-call routing, escalation, incident management

Key Metrics to Monitor ​

CategoryMetricWarning ThresholdCritical Threshold
AvailabilityUptime %< 99.9%< 99.5%
Latencyp50 response time> 200ms> 500ms
Latencyp99 response time> 1s> 3s
ErrorsError rate> 0.5%> 2%
SaturationCPU utilization> 70% sustained> 90%
SaturationMemory usage> 80%> 90%
SaturationDB connections> 70% pool> 90% pool
SaturationDisk usage> 75%> 90%
QueueQueue depth> 1000> 10000
QueueProcessing latency> 30s> 5min
BusinessSignup success rate< 95%< 85%
BusinessPayment success rate< 98%< 95%

In Practice ​

Scaling Strategy by Stage ​

StageUsersPrimary StrategyMonthly Infra Budget
MVP0–100Single server, managed DB$50–200
Early traction100–1KVertical scaling, add caching$200–800
Growth1K–10KRead replicas, CDN, queues$800–3,000
Scale10K–100KHorizontal scaling, sharding prep$3K–15K
At scale100K+Full distributed architecture$15K–100K+

Common Infrastructure Scaling Mistakes ​

MistakeWhy It HappensWhat To Do Instead
Sharding at 1K users"Netflix does it"Use a bigger managed DB instance
No caching layer"Data must be real-time"Most data can tolerate 5–30s staleness
Kubernetes at seed stageResume-driven developmentUse a PaaS (Railway, Render, Fly.io)
No monitoring until outage"We'll add it later"Set up basic monitoring on day one
Manual deployments"Automation takes too long"CI/CD pays for itself in 2 weeks
Single point of failure"It's never gone down"It will. Add redundancy for DB and critical services.

Infrastructure Scaling Checklist ​

Phase 1: Foundation (Do This Now)
──────────────────────────────────
[ ] Managed database with automated backups
[ ] Basic monitoring (uptime, errors, latency)
[ ] CI/CD pipeline for automated deployments
[ ] HTTPS everywhere, security headers configured
[ ] Log aggregation (even basic CloudWatch/Papertrail)
[ ] Database connection pooling
[ ] Static assets on CDN

Phase 2: Growth-Ready (Before 1K Users)
────────────────────────────────────────
[ ] Redis cache layer for hot data
[ ] Background job processing (email, PDFs, etc.)
[ ] Read replica for reporting/analytics queries
[ ] Auto-scaling for application servers
[ ] Structured logging with correlation IDs
[ ] Error tracking (Sentry or equivalent)
[ ] Load testing completed at 5x current traffic

Phase 3: Scale (Before 10K Users)
──────────────────────────────────
[ ] Multi-AZ deployment for high availability
[ ] Database query performance dashboard
[ ] Distributed tracing across services
[ ] Rate limiting and abuse prevention
[ ] Automated alerting with on-call rotation
[ ] Disaster recovery plan tested
[ ] Cost optimization review (right-sizing instances)

Phase 4: At Scale (10K+ Users)
──────────────────────────────
[ ] Service decomposition where bottlenecks exist
[ ] Database partitioning or sharding strategy
[ ] Edge computing for latency-sensitive paths
[ ] Capacity planning and forecasting model
[ ] Chaos engineering / game days
[ ] Multi-region strategy (if global users)
[ ] SLA commitments backed by architecture

Key Takeaways ​

  • Optimize before you scale. Query optimization and caching solve 90% of performance problems at a fraction of the cost of new infrastructure.
  • Scale vertically first. It's simple, fast, and often sufficient far longer than engineers expect.
  • Cache aggressively. Most SaaS data can tolerate seconds of staleness. A well-designed caching layer can reduce DB load by 80–90%.
  • Move work to background queues. If the user doesn't need to see the result immediately, don't make them wait for it.
  • Monitor from day one. You can't improve what you can't measure, and you can't fix what you can't see.
  • Avoid resume-driven development. Kubernetes, microservices, and sharding are rarely needed below 10K users. Simplicity scales better than complexity.

Action Items ​

🏒 Owner:

  • [ ] Ensure the team has a monitoring dashboard you can understand (uptime, errors, response time)
  • [ ] Budget for infrastructure scaling based on the stage table above
  • [ ] Ask engineering: "What breaks at 3x current load? At 10x?"
  • [ ] Review when to scale before committing infrastructure spend

πŸ’» Dev:

  • [ ] Implement the Phase 1 checklist if you haven't already
  • [ ] Run EXPLAIN ANALYZE on your top 10 slowest queries
  • [ ] Set up Redis caching for your most frequent read operations
  • [ ] Move email sending and PDF generation to background queues
  • [ ] Configure auto-scaling policies for your application tier
  • [ ] Set up alerts for the critical thresholds in the monitoring table above

πŸ“‹ PM:

  • [ ] Understand current infrastructure limitations and communicate them to stakeholders
  • [ ] Factor infrastructure work into roadmap planning (it's not "overhead" β€” it's "keeping the lights on")
  • [ ] Track performance metrics as part of your SaaS metrics dashboard
  • [ ] Prioritize features that reduce infrastructure load (batch operations, smart defaults)

🎨 Designer:

  • [ ] Design loading states and skeleton screens for slower operations
  • [ ] Optimize images before upload (WebP format, responsive sizes)
  • [ ] Work with devs to understand which UI patterns are expensive (real-time updates, infinite scroll, complex filters)
  • [ ] Design graceful degradation for when services are slow or unavailable
  • [ ] Reference product design principles for performance-aware design patterns

The Product Builder's Playbook