WeatherNote Architecture - Stage 3 Implementation

🏗️ System Architecture Overview

┌─────────────┐
│ Flutter App │ (Mobile Client)
└──────┬──────┘
       │ HTTPS
       ▼
┌──────────────────────┐
│  Load Balancer/CDN   │ (AWS ALB / Cloudflare)
└──────────┬───────────┘
           │
     ┌─────┴─────┬─────────┬─────────┐
     │           │         │         │
     ▼           ▼         ▼         ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│Backend 1│ │Backend 2│ │Backend 3│ │Backend N│  (Rust Servers)
└────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘
     │           │         │         │
     └───────────┴─────────┴─────────┘
                 │
                 ▼
          ┌──────────────┐
          │ Redis Cluster│  (Caching Layer)
          └──────┬───────┘
                 │
                 ▼
          ┌──────────────┐
          │ Open-Meteo   │  (External API)
          │     API      │
          └──────────────┘

📊 Data Flow

Current Weather Request

1. User opens app
2. App gets GPS coordinates (40.7128, -74.0060)
3. App calls: GET https://api.weathernote.com/api/v1/weather?lat=40.7128&lon=-74.0060

4. Backend receives request:
   - Rounds coordinates: (40.7, -74.0)
   - Creates cache key: "weather:lat:40.7:lon:-74.0"
   - Checks Redis cache

5a. Cache HIT (99% case):
   - Returns cached data instantly (<10ms)
   - Response includes "cached": true

5b. Cache MISS (1% case):
   - Calls Open-Meteo API
   - Stores result in Redis (TTL: 10 minutes)
   - Returns fresh data (~300ms)

6. App receives response and displays weather

Pre-warming Process

Background Worker (runs every 10 minutes):

1. Iterate through TOP_CITIES list (100 cities)
2. For each city:
   - Round coordinates
   - Check if cache exists
   - If expired, fetch from Open-Meteo
   - Update cache
   - Wait 100ms (to avoid rate limiting)

Result: Popular cities always have fresh cached data

🔢 Scaling Math

Coordinate Rounding Impact

Without Rounding:

10,000 users in New York City area
Each at slightly different GPS coordinates
Result: 10,000 unique API calls

With 0.1° Rounding (~11km grid):

10,000 users in New York City area
All rounded to (40.7, -74.0)
Result: 1 API call (cached for 10 minutes)
Reduction: 99.99%

Real-World Example

Scenario: 500,000 active users

Without Optimization:

500,000 users checking weather once/hour
8 hours average usage/day
= 4,000,000 API calls/day

With Our Architecture:

Users distributed across ~2,000 unique rounded locations
Each location cached for 10 minutes
2,000 locations × 6 updates/hour × 24 hours
= 288,000 API calls/day
With pre-warming: ~5,000 API calls/day
Reduction: 99.875%

🎯 Cache Strategy

Cache Key Design

Format: "{prefix}:lat:{latitude}:lon:{longitude}"

Examples:
- weather:lat:40.7:lon:-74.0
- forecast:lat:40.7:lon:-74.0:days:7
- rate_limit:hash:abc123def

Cache TTL Strategy

Current Weather: 600 seconds (10 minutes)
├─ Weather changes slowly
├─ 10 minutes is acceptable staleness
└─ Balances freshness vs API calls

Forecast: 600 seconds (10 minutes)
├─ Daily forecast rarely changes
└─ Can be extended to 30-60 minutes

Rate Limit: 60 seconds (1 minute)
├─ Rolling window
└─ Auto-expires

🚦 Rate Limiting Architecture

Per-Client Tracking

1. Extract client identifier:
   - Custom header: x-client-id (device UUID)
   - Fallback: IP address (x-forwarded-for)

2. Hash identifier (privacy):
   - BLAKE3 hash of client ID
   - First 8 chars used as key

3. Redis increment:
   - Key: rate_limit:hash:abc12345
   - Increment counter
   - Set TTL: 60 seconds
   - If counter > limit: return 429 Too Many Requests

Multi-Layer Rate Limiting

Layer 1: Load Balancer (DDoS protection)
├─ 1000 req/sec per IP

Layer 2: Backend Rate Limiter (fair usage)
├─ 60 req/minute per client
├─ Enforced by Rust middleware

Layer 3: Redis Circuit Breaker
├─ If Redis fails, allow requests
└─ Graceful degradation

💾 Redis Configuration

Memory Strategy

# redis.conf

# Maximum memory: 512MB for 500k users
maxmemory 512mb

# Eviction: Remove least recently used keys
maxmemory-policy allkeys-lru

# Persistence: AOF for durability
appendonly yes
appendfsync everysec

# Performance
tcp-keepalive 300
timeout 300

Memory Usage Estimation

Per Cache Entry:
- Key: "weather:lat:40.7:lon:-74.0" (~30 bytes)
- Value: JSON response (~500 bytes)
- Total: ~600 bytes

Cache Capacity:
- 512MB / 600 bytes = ~850,000 entries
- Active cached locations: ~2,000
- Actual usage: ~1.2MB data + overhead
- Remaining: 500MB for rate limiting, etc.

🔄 Horizontal Scaling

Stateless Backend Design

Each backend instance:
├─ No local state
├─ Shares Redis connection
├─ Independent workers
└─ Can be added/removed dynamically

Load Balancer:
├─ Round-robin distribution
├─ Health check: /health endpoint
├─ Automatic failover
└─ Session-independent (no sticky sessions)

Scaling Triggers

Auto-Scaling Rules:

Scale Up When:
- CPU > 70% for 5 minutes
- Memory > 80%
- Request latency > 500ms

Scale Down When:
- CPU < 30% for 15 minutes
- Traffic drops
- Minimum: 2 instances (high availability)

Max Instances: 20 (cost control)

📈 Performance Benchmarks

Expected Response Times

Scenario 1: Cache Hit (Hot Data)
├─ Redis lookup: ~1-5ms
├─ Network latency: ~20-50ms
└─ Total: 30-60ms

Scenario 2: Cache Miss (Cold Data)
├─ Open-Meteo API call: ~200-400ms
├─ Redis store: ~5ms
├─ Network latency: ~20-50ms
└─ Total: 250-500ms

Scenario 3: Pre-warmed Data (Most Common)
├─ Redis lookup: ~1-5ms
├─ Network latency: ~20-50ms
└─ Total: 30-60ms

Throughput Capacity

Single Backend Instance:
- CPU: 2 vCPU
- RAM: 2GB
- Capacity: ~1,000 req/sec (cached)
- Capacity: ~50 req/sec (fresh API calls)

3 Backend Instances:
- Combined: ~3,000 req/sec (cached)
- With 99% cache hit rate
- Effective: ~2,970 cached + 30 fresh
- Daily capacity: ~250M requests

🔐 Security Architecture

API Security Layers

1. Transport Layer:
   ├─ TLS 1.3 encryption
   └─ SSL certificate (Let's Encrypt)

2. Application Layer:
   ├─ Rate limiting
   ├─ Input validation (lat/lon ranges)
   └─ Client ID hashing

3. Infrastructure Layer:
   ├─ Firewall rules (only 80, 443, 22)
   ├─ VPC/Private networking
   └─ Redis password protection

4. Monitoring Layer:
   ├─ Request logging
   ├─ Anomaly detection
   └─ Alert on unusual patterns

🛠️ Technology Choices Explained

Why Rust?

Performance:
├─ Native code (no VM overhead)
├─ Zero-cost abstractions
└─ Memory efficient (~20MB per instance)

Reliability:
├─ Memory safety (no segfaults)
├─ Type system prevents bugs
└─ Production-ready ecosystem

Concurrency:
├─ Async/await (tokio runtime)
├─ Handles 10k concurrent connections
└─ No GC pauses

Why Redis?

Speed:
├─ In-memory data structure store
├─ Single-digit millisecond latency
└─ 100k+ ops/sec on commodity hardware

Simplicity:
├─ Built-in TTL support
├─ Atomic operations (INCR for rate limiting)
└─ Easy replication/clustering

Cost:
├─ Low memory footprint
├─ Can run on small instances
└─ Managed services available ($50/month)

Why Open-Meteo?

Advantages:
├─ Free tier (10,000 calls/day)
├─ No API key required
├─ Reliable uptime
└─ Global coverage

Our Usage:
├─ ~5,000 calls/day (with caching)
├─ Well within free tier
└─ Can scale to paid tier if needed

🎨 Architecture Patterns Used

Cache-Aside Pattern: Check cache first, populate on miss
Circuit Breaker: Graceful degradation if Redis fails
Rate Limiting: Token bucket algorithm
Pre-warming: Cache warming with background workers
Coordinate Rounding: Spatial quantization for cache efficiency
Stateless Services: Horizontal scalability
Health Checks: Automatic failover detection

📊 Monitoring Metrics

Application Metrics:
  - requests_total
  - request_duration_seconds
  - cache_hit_rate
  - cache_miss_rate
  - rate_limit_exceeded_total
  - open_meteo_calls_total

Infrastructure Metrics:
  - cpu_usage_percent
  - memory_usage_percent
  - redis_connected_clients
  - redis_keyspace_hits
  - redis_keyspace_misses

Business Metrics:
  - active_users
  - requests_per_user
  - popular_locations
  - cost_per_million_requests

🚀 Next Steps for Optimization

CDN Integration: Cache API responses at edge
GraphQL API: Reduce over-fetching
WebSocket: Real-time weather updates
ML Pre-warming: Predict user locations
Multi-region: Deploy closer to users

This implementation can easily handle 1M+ users with proper infrastructure scaling.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WeatherNote Architecture - Stage 3 Implementation

🏗️ System Architecture Overview

📊 Data Flow

Current Weather Request

Pre-warming Process

🔢 Scaling Math

Coordinate Rounding Impact

Real-World Example

🎯 Cache Strategy

Cache Key Design

Cache TTL Strategy

🚦 Rate Limiting Architecture

Per-Client Tracking

Multi-Layer Rate Limiting

💾 Redis Configuration

Memory Strategy

Memory Usage Estimation

🔄 Horizontal Scaling

Stateless Backend Design

Scaling Triggers

📈 Performance Benchmarks

Expected Response Times

Throughput Capacity

🔐 Security Architecture

API Security Layers

🛠️ Technology Choices Explained

Why Rust?

Why Redis?

Why Open-Meteo?

🎨 Architecture Patterns Used

📊 Monitoring Metrics

🚀 Next Steps for Optimization

FilesExpand file tree

ARCHITECTURE_DETAILED.md

Latest commit

History

ARCHITECTURE_DETAILED.md

File metadata and controls

WeatherNote Architecture - Stage 3 Implementation

🏗️ System Architecture Overview

📊 Data Flow

Current Weather Request

Pre-warming Process

🔢 Scaling Math

Coordinate Rounding Impact

Real-World Example

🎯 Cache Strategy

Cache Key Design

Cache TTL Strategy

🚦 Rate Limiting Architecture

Per-Client Tracking

Multi-Layer Rate Limiting

💾 Redis Configuration

Memory Strategy

Memory Usage Estimation

🔄 Horizontal Scaling

Stateless Backend Design

Scaling Triggers

📈 Performance Benchmarks

Expected Response Times

Throughput Capacity

🔐 Security Architecture

API Security Layers

🛠️ Technology Choices Explained

Why Rust?

Why Redis?

Why Open-Meteo?

🎨 Architecture Patterns Used

📊 Monitoring Metrics

🚀 Next Steps for Optimization