Rate limiting strategy
6 limit tiers diferenciados · 4 algorithms (token bucket primary + sliding window auth + leaky outbound) · 8 response format rules · 6 graceful degradation patterns. Rate limiting que protege infra + cost + abuso · sin friction client legitimate.
6 limit tiers · differentiated protection
| Tier | Limit | Purpose | Backend |
|---|---|---|---|
| Per-clinic (tenant) limit | 1000 requests/min sustained · 5000 burst (1min window) | Prevents single tenant noisy-neighbor impacting others · cost protection | Upstash Redis SETEX per-clinic counter · 60s TTL · atomic INCR |
| Per-endpoint global limit | 10000 requests/min global app · 50000 burst (1min) | Protects infra-wide overload · DDoS mitigation layer · cost-cap aggregate | Cloudflare Rate Limiting WAF rules · IP-aware + JA3 fingerprinting |
| Per-IP limit (unauthenticated) | 100 requests/min sustained · 500 burst · auth endpoints stricter 30/min | Brute force prevention · auth endpoint protection · scraping mitigation | Cloudflare native WAF + custom CF Worker logic for auth endpoints |
| Per-conversation limit (bot) | 50 messages/conversation/hour · auto-handoff at threshold | Prevent abuse · LLM cost protection · forces escalation legitimate complex cases | Postgres counter per conversation_id · auto-reset hourly · trigger handoff |
| Per-user LLM token budget | 100k tokens/clinic/day · breaker at 80k | Cost protection LLM provider · prevents runaway prompts · explicit budget | Postgres aggregate per-clinic-day · checked pre-LLM call · degrade gracefully |
| Webhook receiver limit (provider-specific) | Meta 80 req/sec · Stripe 100 req/sec · Cal.com 20 req/sec (per provider docs) | Stay within provider quotas · avoid 429 from upstream | Smart queue back-off · QStash delays · circuit breaker if provider rate-limits us |
4 algorithms · purpose-fit
Response format · 8 rules
- HTTP status: 429 Too Many Requests · standard semantic
- Header `Retry-After: <seconds>` · client knows when retry safely
- Header `X-RateLimit-Limit: <max>` · current tier max
- Header `X-RateLimit-Remaining: <count>` · countdown to limit
- Header `X-RateLimit-Reset: <unix-timestamp>` · when window resets
- Body JSON: `{error, retry_after_seconds, limit_type, upgrade_url}` · structured machine-readable
- NEVER return 500 series for rate limits · always 429 · semantic clarity
- Auth endpoints rate limit response: NO username enumeration · same response valid+invalid users
Graceful degradation · 6 patterns
| Trigger | Action |
|---|---|
| Approaching limit (80% used) | Add `X-RateLimit-Warning: approaching limit` header · client can self-throttle preemptively |
| Hit limit first time | Return 429 + helpful message · log event for monitoring · NO punishment (cliente legitimate) |
| Repeat hits same client (10+ per hour) | Backoff multiplier · 2x Retry-After per repeat · prevents thrashing · educates client |
| Suspected abuse (1000+ 429s) | Temporary IP ban Cloudflare WAF · 1h cooldown · audit log · alert if pattern unusual |
| Whitelist trusted client (Enterprise) | Customer-specific elevated limits · documented per-contract · audit overrides · ADR if exceptional |
| Emergency override (incident) | Manual disable rate limiting feature flag · founder approval · audit log · time-bound (15min max default) |
Limits set conservatively based on infra plan capacity. NOT stress-tested production scale · 2 demo clínicas low volume never approaches limits. Cuando llegue traffic real será re-baselined.
Compromise: limits visible publicamente · NO hidden surprise limits · si necesitas Enterprise tier elevated limits contact pre-onboarding · custom per-contract.
¿Tu engineering team necesita rate limit details?
Para Enterprise · custom limits per-contract · pre-onboarding capacity planning + load testing scenarios · code samples retry logic exponential backoff disponibles bajo NDA.