Appearance
Chapter 30: Infrastructure Scaling β
Your code doesn't need to be perfect β but your infrastructure needs to be ready before your users tell you it isn't.
Why This Matters β
- π’ Owner: Infrastructure failures directly translate to revenue loss. A 1-hour outage at $500K MRR costs ~$700. At $5M MRR, it's ~$7,000 β plus the trust you can't buy back.
- π» Dev: This is your domain. Understanding when to use read replicas vs. sharding vs. caching is the difference between a weekend deploy and a 3-month rewrite.
- π PM: You need to understand infrastructure constraints to set realistic timelines and make informed trade-off decisions with engineering.
- π¨ Designer: Page load time is a design problem. Every 100ms of latency reduces conversions by 1%. Infrastructure decisions directly impact the experience you design.
The Concept (Simple) β
Analogy: The Highway System
Think of your infrastructure as a highway system:
- Vertical scaling = widening the existing road (bigger server)
- Horizontal scaling = building parallel roads (more servers)
- Caching = building shortcuts and exits so not everyone goes downtown
- Message queues = adding traffic lights so cars don't all arrive at once
- CDN = building local branches so people don't have to drive to the main office
Traffic Growth vs. Infrastructure Response
ββββββββββββββββββββββββββββββββββββββββββ
Users β β± Horizontal
100K β β±β± Scaling
β β±β±
50K β β±β±β±β±β±
β β±β±β±β±β±β±β±β±
10K β β±β±β±β±β±β±β± Vertical Scaling
β β±β±β±β±β±β± Hits Ceiling Here βββ β
1K β β±β±β±β±β±β±
ββ±β±
100 ββββββββββββββββββββββββββββββββββββββββββββββββββ
Start +6mo +12mo +18mo +24moYou always start by scaling vertically (bigger machines). When that ceiling hits, you go horizontal. The trick is preparing for horizontal before you need it.
How It Works (Detailed) β
Infrastructure Scaling Architecture Overview β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SCALED SaaS ARCHITECTURE β
β β
β Users βββ CDN (Static Assets, Edge Cache) β
β β β
β βΌ β
β Load Balancer βββ Rate Limiter β
β β± β β² β
β β± β β² β
β βββββΌββββββΌββββββΌββββ β
β βApp 1ββApp 2ββApp N β β Stateless App Servers β
β ββββ¬βββββββ¬βββββββ¬ββββ (Auto-scaled) β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββββββββββββ ββββββββββββββββββββ β
β β Redis Cache β β Message Queue β β
β β (Session + Data) β β (Async Jobs) β β
β βββββββββββ¬ββββββββββββ ββββββββββ¬ββββββββββ β
β β β β
β βΌ βΌ β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β Primary DB β β Worker Servers β β
β β (Writes) β β (Background Jobs)β β
β ββββββββββ¬ββββββββββ ββββββββββββββββββββ β
β β β
β ββββββββ΄βββββββ β
β βΌ βΌ β
β ββββββββββ ββββββββββ β
β βRead β βRead β β Read Replicas β
β βReplica β βReplica β (Scale reads independently) β
β β 1 β β 2 β β
β ββββββββββ ββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β Object Storage (S3/GCS) β β Files, backups, β
β β + Blob Store for uploads β static assets β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β Observability Stack β β
β β Metrics β Logs β Traces β Alerts β β
β βββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββDatabase Scaling Strategies β
Strategy Comparison Table β
| Strategy | When to Use | Complexity | Read Perf | Write Perf | Cost |
|---|---|---|---|---|---|
| Vertical scaling | First bottleneck, < 1TB data | Low | +2β4x | +2β4x | $$ |
| Read replicas | Read-heavy (> 80% reads) | Medium | +5β10x | Same | $$ |
| Connection pooling | Many app instances, connection limits | Low | +2β3x | +2β3x | $ |
| Query optimization | Before any scaling (always do this first) | Low | +2β50x | +2β50x | $ |
| Table partitioning | Single large tables (> 100M rows) | Medium | +5β20x | +2β5x | $ |
| Functional sharding | Different domains, separable data | High | +5β10x | +5β10x | $$$ |
| Horizontal sharding | > 1TB, high write throughput needed | Very High | +10β50x | +10β50x | $$$$ |
Database Scaling Decision Path β
Database feeling slow?
β
βΌ
βββββββββββββββββββββββ
β 1. Check your β
β queries first! ββββ EXPLAIN ANALYZE everything
β (Free) β Add missing indexes
βββββββββββ¬ββββββββββββ Fix N+1 queries
β
Still slow?
β
βΌ
βββββββββββββββββββββββ
β 2. Vertical scale ββββ Upgrade CPU/RAM/IOPS
β ($50β500/mo) β Usually buys 6β12 months
βββββββββββ¬ββββββββββββ
β
Still slow?
β
βΌ
βββββββββββββββββββββββ
β 3. Add caching ββββ Redis for hot data
β ($50β200/mo) β Reduce DB reads by 60β90%
βββββββββββ¬ββββββββββββ
β
Still slow?
β
βΌ
βββββββββββββββββββββββ
β 4. Read replicas ββββ Route reads to replicas
β ($200β1000/mo) β Scale reads independently
βββββββββββ¬ββββββββββββ
β
Still slow?
β
βΌ
βββββββββββββββββββββββ
β 5. Partition large ββββ Partition by date/tenant
β tables β Prune old partitions
βββββββββββ¬ββββββββββββ
β
Still slow?
β
βΌ
βββββββββββββββββββββββ
β 6. Sharding ββββ Last resort. High complexity.
β ($$$$) β Consider managed DBs (PlanetScale,
βββββββββββββββββββββββ CockroachDB, Vitess)Caching Strategy Layers β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CACHING LAYERS β
β β
β Layer 1: Browser Cache β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Cache-Control headers, ETags, Service Workers β β
β β Latency: 0ms β Hit rate: 30β50% β Cost: Free β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β miss β
β βΌ β
β Layer 2: CDN / Edge Cache β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Cloudflare, CloudFront, Fastly β β
β β Latency: 5β20ms β Hit rate: 60β90% β Cost: $ β β
β β Best for: Static assets, public API responses β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β miss β
β βΌ β
β Layer 3: Application Cache (Redis/Memcached) β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Session data, computed results, API responses β β
β β Latency: 1β5ms β Hit rate: 70β95% β Cost: $$ β β
β β Best for: User-specific data, expensive queries β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β miss β
β βΌ β
β Layer 4: Database Query Cache β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Materialized views, precomputed aggregates β β
β β Latency: 10β50ms β Hit rate: varies β Cost: $ β β
β β Best for: Dashboard data, reports, analytics β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β miss β
β βΌ β
β Source: Primary Database β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Full query execution β β
β β Latency: 20β500ms β Cost: $$$ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββCache Invalidation Strategies β
| Strategy | How It Works | Pros | Cons | Use When |
|---|---|---|---|---|
| TTL (Time-based) | Cache expires after N seconds | Simple, predictable | Stale data during TTL | Data freshness tolerance > 30s |
| Write-through | Update cache on every write | Always consistent | Higher write latency | Consistency is critical |
| Write-behind | Queue cache updates async | Fast writes | Temporary inconsistency | High write throughput needed |
| Event-based | Invalidate on specific events | Precise control | Complex to implement | Complex dependency graphs |
| Cache-aside | App checks cache, fills on miss | Flexible, resilient | Cache stampede risk | General-purpose, most common |
Message Queues and Async Processing β
Not everything needs to happen in the request-response cycle. Move heavy work to background processing.
SYNCHRONOUS (Before) ASYNCHRONOUS (After)
βββββββββββββββββββββ βββββββββββββββββββββ
User Request User Request
β β
βΌ βΌ
Process Payment (2s) Process Payment (2s)
β β
βΌ βΌ
Send Email (1s) Queue: Send Email βββ Worker (later)
β β
βΌ βΌ
Generate PDF (3s) Queue: Gen PDF βββ Worker (later)
β β
βΌ βΌ
Update Analytics (0.5s) Queue: Analytics βββ Worker (later)
β β
βΌ βΌ
Response (6.5s total) Response (2s total) β
What to move to async queues:
- Email / SMS / push notifications
- PDF and report generation
- Image / video processing
- Analytics and event tracking
- Webhook delivery
- Search index updates
- Data exports
Popular queue technologies:
| Technology | Best For | Complexity | Managed Options |
|---|---|---|---|
| Redis (Bull/BullMQ) | Simple job queues, < 10K jobs/min | Low | AWS ElastiCache, Upstash |
| RabbitMQ | Complex routing, pub/sub patterns | Medium | CloudAMQP, AWS MQ |
| AWS SQS | Serverless, high durability | Low | Fully managed |
| Apache Kafka | Event streaming, very high throughput | High | Confluent, AWS MSK |
CDN and Edge Computing β
WITHOUT CDN WITH CDN
ββββββββββ ββββββββ
User (Tokyo) βββββββββββββββββββ Server (US-East)
3,200ms round trip
User (London) ββββββββββββββββββ Server (US-East)
1,800ms round trip
User (SΓ£o Paulo) βββββββββββββββ Server (US-East)
2,400ms round trip
User (Tokyo) ββββ Edge (Tokyo) = 40ms β
β
User (London) βββ Edge (London) = 20ms β
β
User (SΓ£o Paulo) β Edge (SΓ£o Paulo) = 35ms β
β
Origin (US-East) only on cache missCDN checklist:
- [ ] Static assets (JS, CSS, images, fonts) served via CDN
- [ ] Cache-Control headers properly configured
- [ ] Cache busting strategy in place (content hashing)
- [ ] API responses cached at edge where appropriate
- [ ] Compression enabled (Brotli > gzip)
- [ ] Image optimization (WebP/AVIF with fallbacks)
Observability and Monitoring Stack β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β OBSERVABILITY STACK β
β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β METRICS β β LOGS β β TRACES β β
β β β β β β β β
β β - CPU/Memory β β - App logs β β - Request β β
β β - Request β β - Error logs β β flow β β
β β rate/ β β - Audit logs β β - Service β β
β β latency β β - Access β β deps β β
β β - Error rate β β logs β β - Latency β β
β β - Queue β β β β breakdown β β
β β depth β β β β β β
β β - DB conns β β β β β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ β
β β β β β
β βΌ βΌ βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β AGGREGATION LAYER β β
β β Datadog / Grafana Cloud / New Relic β β
β ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ALERTING + DASHBOARDS β β
β β PagerDuty / OpsGenie / Slack β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββMonitoring Stack Components β
| Component | Open Source Option | Managed Option | What It Tracks |
|---|---|---|---|
| Metrics | Prometheus + Grafana | Datadog, New Relic | CPU, memory, request rates, custom business metrics |
| Logging | ELK Stack (Elasticsearch, Logstash, Kibana) | Datadog Logs, Papertrail | Application events, errors, audit trails |
| Tracing | Jaeger, Zipkin | Datadog APM, Honeycomb | Request flow across services, latency breakdown |
| Error tracking | Sentry (self-hosted) | Sentry, Bugsnag | Exceptions, stack traces, user impact |
| Uptime | Blackbox Exporter | Pingdom, Better Uptime | Endpoint availability, SSL expiry, response time |
| Alerting | Alertmanager | PagerDuty, OpsGenie | On-call routing, escalation, incident management |
Key Metrics to Monitor β
| Category | Metric | Warning Threshold | Critical Threshold |
|---|---|---|---|
| Availability | Uptime % | < 99.9% | < 99.5% |
| Latency | p50 response time | > 200ms | > 500ms |
| Latency | p99 response time | > 1s | > 3s |
| Errors | Error rate | > 0.5% | > 2% |
| Saturation | CPU utilization | > 70% sustained | > 90% |
| Saturation | Memory usage | > 80% | > 90% |
| Saturation | DB connections | > 70% pool | > 90% pool |
| Saturation | Disk usage | > 75% | > 90% |
| Queue | Queue depth | > 1000 | > 10000 |
| Queue | Processing latency | > 30s | > 5min |
| Business | Signup success rate | < 95% | < 85% |
| Business | Payment success rate | < 98% | < 95% |
In Practice β
Scaling Strategy by Stage β
| Stage | Users | Primary Strategy | Monthly Infra Budget |
|---|---|---|---|
| MVP | 0β100 | Single server, managed DB | $50β200 |
| Early traction | 100β1K | Vertical scaling, add caching | $200β800 |
| Growth | 1Kβ10K | Read replicas, CDN, queues | $800β3,000 |
| Scale | 10Kβ100K | Horizontal scaling, sharding prep | $3Kβ15K |
| At scale | 100K+ | Full distributed architecture | $15Kβ100K+ |
Common Infrastructure Scaling Mistakes β
| Mistake | Why It Happens | What To Do Instead |
|---|---|---|
| Sharding at 1K users | "Netflix does it" | Use a bigger managed DB instance |
| No caching layer | "Data must be real-time" | Most data can tolerate 5β30s staleness |
| Kubernetes at seed stage | Resume-driven development | Use a PaaS (Railway, Render, Fly.io) |
| No monitoring until outage | "We'll add it later" | Set up basic monitoring on day one |
| Manual deployments | "Automation takes too long" | CI/CD pays for itself in 2 weeks |
| Single point of failure | "It's never gone down" | It will. Add redundancy for DB and critical services. |
Infrastructure Scaling Checklist β
Phase 1: Foundation (Do This Now)
ββββββββββββββββββββββββββββββββββ
[ ] Managed database with automated backups
[ ] Basic monitoring (uptime, errors, latency)
[ ] CI/CD pipeline for automated deployments
[ ] HTTPS everywhere, security headers configured
[ ] Log aggregation (even basic CloudWatch/Papertrail)
[ ] Database connection pooling
[ ] Static assets on CDN
Phase 2: Growth-Ready (Before 1K Users)
ββββββββββββββββββββββββββββββββββββββββ
[ ] Redis cache layer for hot data
[ ] Background job processing (email, PDFs, etc.)
[ ] Read replica for reporting/analytics queries
[ ] Auto-scaling for application servers
[ ] Structured logging with correlation IDs
[ ] Error tracking (Sentry or equivalent)
[ ] Load testing completed at 5x current traffic
Phase 3: Scale (Before 10K Users)
ββββββββββββββββββββββββββββββββββ
[ ] Multi-AZ deployment for high availability
[ ] Database query performance dashboard
[ ] Distributed tracing across services
[ ] Rate limiting and abuse prevention
[ ] Automated alerting with on-call rotation
[ ] Disaster recovery plan tested
[ ] Cost optimization review (right-sizing instances)
Phase 4: At Scale (10K+ Users)
ββββββββββββββββββββββββββββββ
[ ] Service decomposition where bottlenecks exist
[ ] Database partitioning or sharding strategy
[ ] Edge computing for latency-sensitive paths
[ ] Capacity planning and forecasting model
[ ] Chaos engineering / game days
[ ] Multi-region strategy (if global users)
[ ] SLA commitments backed by architectureKey Takeaways β
- Optimize before you scale. Query optimization and caching solve 90% of performance problems at a fraction of the cost of new infrastructure.
- Scale vertically first. It's simple, fast, and often sufficient far longer than engineers expect.
- Cache aggressively. Most SaaS data can tolerate seconds of staleness. A well-designed caching layer can reduce DB load by 80β90%.
- Move work to background queues. If the user doesn't need to see the result immediately, don't make them wait for it.
- Monitor from day one. You can't improve what you can't measure, and you can't fix what you can't see.
- Avoid resume-driven development. Kubernetes, microservices, and sharding are rarely needed below 10K users. Simplicity scales better than complexity.
Action Items β
π’ Owner:
- [ ] Ensure the team has a monitoring dashboard you can understand (uptime, errors, response time)
- [ ] Budget for infrastructure scaling based on the stage table above
- [ ] Ask engineering: "What breaks at 3x current load? At 10x?"
- [ ] Review when to scale before committing infrastructure spend
π» Dev:
- [ ] Implement the Phase 1 checklist if you haven't already
- [ ] Run EXPLAIN ANALYZE on your top 10 slowest queries
- [ ] Set up Redis caching for your most frequent read operations
- [ ] Move email sending and PDF generation to background queues
- [ ] Configure auto-scaling policies for your application tier
- [ ] Set up alerts for the critical thresholds in the monitoring table above
π PM:
- [ ] Understand current infrastructure limitations and communicate them to stakeholders
- [ ] Factor infrastructure work into roadmap planning (it's not "overhead" β it's "keeping the lights on")
- [ ] Track performance metrics as part of your SaaS metrics dashboard
- [ ] Prioritize features that reduce infrastructure load (batch operations, smart defaults)
π¨ Designer:
- [ ] Design loading states and skeleton screens for slower operations
- [ ] Optimize images before upload (WebP format, responsive sizes)
- [ ] Work with devs to understand which UI patterns are expensive (real-time updates, infinite scroll, complex filters)
- [ ] Design graceful degradation for when services are slow or unavailable
- [ ] Reference product design principles for performance-aware design patterns