Saltar al contenido principal
Rate limiting · implementation deep-dive

Rate limiting strategy

6 limit tiers diferenciados · 4 algorithms (token bucket primary + sliding window auth + leaky outbound) · 8 response format rules · 6 graceful degradation patterns. Rate limiting que protege infra + cost + abuso · sin friction client legitimate.

6 limit tiers · differentiated protection

TierLimitPurposeBackend
Per-clinic (tenant) limit1000 requests/min sustained · 5000 burst (1min window)Prevents single tenant noisy-neighbor impacting others · cost protectionUpstash Redis SETEX per-clinic counter · 60s TTL · atomic INCR
Per-endpoint global limit10000 requests/min global app · 50000 burst (1min)Protects infra-wide overload · DDoS mitigation layer · cost-cap aggregateCloudflare Rate Limiting WAF rules · IP-aware + JA3 fingerprinting
Per-IP limit (unauthenticated)100 requests/min sustained · 500 burst · auth endpoints stricter 30/minBrute force prevention · auth endpoint protection · scraping mitigationCloudflare native WAF + custom CF Worker logic for auth endpoints
Per-conversation limit (bot)50 messages/conversation/hour · auto-handoff at thresholdPrevent abuse · LLM cost protection · forces escalation legitimate complex casesPostgres counter per conversation_id · auto-reset hourly · trigger handoff
Per-user LLM token budget100k tokens/clinic/day · breaker at 80kCost protection LLM provider · prevents runaway prompts · explicit budgetPostgres aggregate per-clinic-day · checked pre-LLM call · degrade gracefully
Webhook receiver limit (provider-specific)Meta 80 req/sec · Stripe 100 req/sec · Cal.com 20 req/sec (per provider docs)Stay within provider quotas · avoid 429 from upstreamSmart queue back-off · QStash delays · circuit breaker if provider rate-limits us

4 algorithms · purpose-fit

Token bucket (primary)
Bucket holds N tokens · refilled at rate R/second · request consumes 1 token · empty bucket = 429 · allows burst within capacity
Most flexible · handles burst gracefully · industry standard · easy to reason about
Sliding window log (secondary)
Track timestamps last N requests · count requests within window · slide window forward · precise rate enforcement
Used for auth endpoints · precision more important than burst handling · prevents abuse
Fixed window counter (legacy)
Counter resets at fixed intervals (minute boundary) · simple but allows 2x burst at boundary edges
NOT used (legacy reference) · replaced by sliding window for accuracy
Leaky bucket (alternative)
Requests queue at constant rate · overflow = 429 · smooth output rate guarantee
Used for outbound to providers (Meta · Stripe) · smooths our outbound traffic

Response format · 8 rules

  • HTTP status: 429 Too Many Requests · standard semantic
  • Header `Retry-After: <seconds>` · client knows when retry safely
  • Header `X-RateLimit-Limit: <max>` · current tier max
  • Header `X-RateLimit-Remaining: <count>` · countdown to limit
  • Header `X-RateLimit-Reset: <unix-timestamp>` · when window resets
  • Body JSON: `{error, retry_after_seconds, limit_type, upgrade_url}` · structured machine-readable
  • NEVER return 500 series for rate limits · always 429 · semantic clarity
  • Auth endpoints rate limit response: NO username enumeration · same response valid+invalid users

Graceful degradation · 6 patterns

TriggerAction
Approaching limit (80% used)Add `X-RateLimit-Warning: approaching limit` header · client can self-throttle preemptively
Hit limit first timeReturn 429 + helpful message · log event for monitoring · NO punishment (cliente legitimate)
Repeat hits same client (10+ per hour)Backoff multiplier · 2x Retry-After per repeat · prevents thrashing · educates client
Suspected abuse (1000+ 429s)Temporary IP ban Cloudflare WAF · 1h cooldown · audit log · alert if pattern unusual
Whitelist trusted client (Enterprise)Customer-specific elevated limits · documented per-contract · audit overrides · ADR if exceptional
Emergency override (incident)Manual disable rate limiting feature flag · founder approval · audit log · time-bound (15min max default)
Pre-revenue caveat · limits not stress-tested production

Limits set conservatively based on infra plan capacity. NOT stress-tested production scale · 2 demo clínicas low volume never approaches limits. Cuando llegue traffic real será re-baselined.

Compromise: limits visible publicamente · NO hidden surprise limits · si necesitas Enterprise tier elevated limits contact pre-onboarding · custom per-contract.

¿Tu engineering team necesita rate limit details?

Para Enterprise · custom limits per-contract · pre-onboarding capacity planning + load testing scenarios · code samples retry logic exponential backoff disponibles bajo NDA.