Rate limiting · implementation deep-dive

Rate limiting strategy

6 limit tiers diferenciados · 4 algorithms (token bucket primary + sliding window auth + leaky outbound) · 8 response format rules · 6 graceful degradation patterns. Rate limiting que protege infra + cost + abuso · sin friction client legitimate.

API rate limits API docs Cost optimization

6 limit tiers · differentiated protection

Tier	Limit	Purpose	Backend
Per-clinic (tenant) limit	1000 requests/min sustained · 5000 burst (1min window)	Prevents single tenant noisy-neighbor impacting others · cost protection	Upstash Redis SETEX per-clinic counter · 60s TTL · atomic INCR
Per-endpoint global limit	10000 requests/min global app · 50000 burst (1min)	Protects infra-wide overload · DDoS mitigation layer · cost-cap aggregate	Cloudflare Rate Limiting WAF rules · IP-aware + JA3 fingerprinting
Per-IP limit (unauthenticated)	100 requests/min sustained · 500 burst · auth endpoints stricter 30/min	Brute force prevention · auth endpoint protection · scraping mitigation	Cloudflare native WAF + custom CF Worker logic for auth endpoints
Per-conversation limit (bot)	50 messages/conversation/hour · auto-handoff at threshold	Prevent abuse · LLM cost protection · forces escalation legitimate complex cases	Postgres counter per conversation_id · auto-reset hourly · trigger handoff
Per-user LLM token budget	100k tokens/clinic/day · breaker at 80k	Cost protection LLM provider · prevents runaway prompts · explicit budget	Postgres aggregate per-clinic-day · checked pre-LLM call · degrade gracefully
Webhook receiver limit (provider-specific)	Meta 80 req/sec · Stripe 100 req/sec · Cal.com 20 req/sec (per provider docs)	Stay within provider quotas · avoid 429 from upstream	Smart queue back-off · QStash delays · circuit breaker if provider rate-limits us

4 algorithms · purpose-fit

Token bucket (primary)

Bucket holds N tokens · refilled at rate R/second · request consumes 1 token · empty bucket = 429 · allows burst within capacity

Most flexible · handles burst gracefully · industry standard · easy to reason about

Sliding window log (secondary)

Track timestamps last N requests · count requests within window · slide window forward · precise rate enforcement

Used for auth endpoints · precision more important than burst handling · prevents abuse

Fixed window counter (legacy)

Counter resets at fixed intervals (minute boundary) · simple but allows 2x burst at boundary edges

NOT used (legacy reference) · replaced by sliding window for accuracy

Leaky bucket (alternative)

Requests queue at constant rate · overflow = 429 · smooth output rate guarantee

Used for outbound to providers (Meta · Stripe) · smooths our outbound traffic

Response format · 8 rules

HTTP status: 429 Too Many Requests · standard semantic
Header `Retry-After: <seconds>` · client knows when retry safely
Header `X-RateLimit-Limit: <max>` · current tier max
Header `X-RateLimit-Remaining: <count>` · countdown to limit
Header `X-RateLimit-Reset: <unix-timestamp>` · when window resets
Body JSON: `{error, retry_after_seconds, limit_type, upgrade_url}` · structured machine-readable
NEVER return 500 series for rate limits · always 429 · semantic clarity
Auth endpoints rate limit response: NO username enumeration · same response valid+invalid users

Graceful degradation · 6 patterns

Trigger	Action
Approaching limit (80% used)	Add `X-RateLimit-Warning: approaching limit` header · client can self-throttle preemptively
Hit limit first time	Return 429 + helpful message · log event for monitoring · NO punishment (cliente legitimate)
Repeat hits same client (10+ per hour)	Backoff multiplier · 2x Retry-After per repeat · prevents thrashing · educates client
Suspected abuse (1000+ 429s)	Temporary IP ban Cloudflare WAF · 1h cooldown · audit log · alert if pattern unusual
Whitelist trusted client (Enterprise)	Customer-specific elevated limits · documented per-contract · audit overrides · ADR if exceptional
Emergency override (incident)	Manual disable rate limiting feature flag · founder approval · audit log · time-bound (15min max default)

Pre-revenue caveat · limits not stress-tested production

Limits set conservatively based on infra plan capacity. NOT stress-tested production scale · 2 demo clínicas low volume never approaches limits. Cuando llegue traffic real será re-baselined.

Compromise: limits visible publicamente · NO hidden surprise limits · si necesitas Enterprise tier elevated limits contact pre-onboarding · custom per-contract.

¿Tu engineering team necesita rate limit details?

Para Enterprise · custom limits per-contract · pre-onboarding capacity planning + load testing scenarios · code samples retry logic exponential backoff disponibles bajo NDA.

Solicitar Enterprise API rate limits SLA