Rate Limiting — Algorithms and Implementation
Rate Limiting — Algorithms and Implementation
Rate limiting protects resources, controls cost, and reduces abuse. It looks simple, but in distributed systems decisions accumulate around accuracy, fairness, and UX. This article covers the fixed window, sliding window, token bucket, and leaky bucket algorithms, Redis-based implementations, options at other layers (CDN, proxy), and the shape of a 429 response.
1. Two axes
Two axes of rate limiting:
- Accuracy — "Does it really not exceed N per period?"
- Smoothness — "Even with bursts, do user-facing responses stay continuous?"
The two axes trade off. Algorithm choice fixes the balance.
2. Fixed Window
Maintain a counter for a fixed time bucket (for example, one minute):
key = "rl:user:42:" + (now / 60)
count = INCR key
EXPIRE key 60 NX
if count > limit: reject
Pros — simple. Cons — boundary effect (burst at the boundary). With a 50 per minute limit, 100 calls can pass between second 59 and second 01.
3. Sliding Window Log
Store each request's timestamp in a sorted set, drop entries older than one minute, and count:
ZADD rl:user:42 <ts> <ts>
ZREMRANGEBYSCORE rl:user:42 0 (now-60)
count = ZCARD rl:user:42
Pros — accurate. Cons — memory grows with request volume.
4. Sliding Window Counter
Take a weighted average of the previous and current windows. It compromises between fixed window's memory efficiency and sliding window's smoothness:
weight = (now - current_window_start) / window_size
estimate = previous_count * (1 - weight) + current_count
Often cited from Cloudflare's public write-up.
5. Token Bucket
A bucket with capacity N and refill rate r; requests consume tokens. If empty, reject or wait:
on request:
tokens = min(capacity, tokens + (now - last) * rate)
if tokens >= 1:
tokens -= 1; allow
else:
reject
last = now
Pros — burst tolerance plus average control. Adopted in many libraries and frameworks (Guava RateLimiter, NGINX limit_req with burst).
6. Leaky Bucket
A variant of the token bucket. Requests enter a queue and drain at a steady rate. The average is steady, but bursts are delayed in the queue or rejected. A familiar model from network traffic shaping.
7. Redis-based implementation
The simplest counter:
INCR rl:<key>:<window>
EXPIRE rl:<key>:<window> <ttl> NX
if count > limit: reject
NX ensures EXPIRE is set only once. Otherwise the TTL refreshes on every request, blurring the window boundary.
Use a Lua script for atomicity — prevents another command from slipping between INCR and EXPIRE:
local c = redis.call('INCR', KEYS[1])
if c == 1 then
redis.call('EXPIRE', KEYS[1], ARGV[1])
end
return c
Sliding window with a Sorted Set:
redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, ARGV[1] - ARGV[2])
local count = redis.call('ZCARD', KEYS[1])
if count >= tonumber(ARGV[3]) then
return -1
end
redis.call('ZADD', KEYS[1], ARGV[1], ARGV[1] .. ':' .. ARGV[4])
redis.call('EXPIRE', KEYS[1], ARGV[2])
return count + 1
8. Libraries
- Node —
rate-limiter-flexible,@upstash/ratelimit(serverless friendly). - Java / Spring — Bucket4j, Resilience4j, Spring Cloud Gateway.
- Python —
limits,slowapi(FastAPI),django-ratelimit. - Go —
golang.org/x/time/rate.
A vetted library is safer than rolling your own.
9. Where to enforce
CDN / edge — block costs as early as possible:
- Cloudflare Rate Limiting — by IP, URL, or header.
- AWS WAF Rate-based Rules — counts per IP.
- Akamai, Fastly — similar features.
The strength of edge blocking is that traffic never reaches origin. The limit is fine-grained per-authenticated-user control is hard.
Reverse proxies:
- NGINX
limit_req/limit_conn— memory-based leaky bucket. burst option. - HAProxy
stick-table— counting against varied keys. - Caddy
rate_limit— provided as a module.
Application level — fine-grained control by business logic (user, plan, API key). Defaults to Redis or in-memory.
A two-layer combination (edge + application) is common — the edge protects broadly, the application enforces business rules.
10. HTTP 429
RFC 6585 (2012). The standard status code for telling the client they have made too many requests.
Headers to send along:
| Header | Meaning |
|---|---|
Retry-After |
Seconds to wait until the next request, or an absolute time |
X-RateLimit-Limit |
The window's limit |
X-RateLimit-Remaining |
Remaining requests |
X-RateLimit-Reset |
Reset time (epoch) |
The X-RateLimit-* headers are de facto standard but not yet a formal RFC (RFC 9210 draft and others in progress). Choose consistent names and document them.
11. Key choice and tiering
Key choice:
- IP — default for anonymous users. Beware NAT and shared mobile carrier IPs.
- User ID — for authenticated users.
- API key — for machine callers.
- IP + route — for sensitive endpoints like login.
Multiple keys at once (per-user + per-IP) gives layered protection.
Policy tiering:
- Different limits for unauthenticated, authenticated, and paid plans.
- Different limits per route (writes < reads).
- Progressive backoff (short violation → short block, repeated → longer block).
12. Common pitfalls
Counter accuracy in distributed systems — when several servers handle requests for the same user, counters must aggregate in one place (Redis). In-memory counters multiply the limit by the number of nodes.
Clock drift — out-of-sync node clocks blur window boundaries. Sync via NTP.
Cache miss for time-keyed entries — fixed window keys expire automatically over time, but if the first request misses the EXPIRE, the key sticks forever. Use the NX option or Lua.
Missing burst tolerance — even normal users hit short bursts via page reloads. Too tight a limit damages UX.
Collateral damage on shared IPs — multiple users behind NAT get blocked together. Switch to a user key after authentication.
Missing response shape — if clients cannot distinguish 429 from a generic error, retry logic breaks. Retry-After and a standard format help.
Redis as SPOF — decide how the rate limit behaves when Redis is down (fail-open or fail-closed).
Hard to test — rate-limit behavior depends on time. Tests should abstract time or use very short windows.
Closing thoughts
Rate limiting can start as a simple counter, but operations slowly reveal trade-offs in accuracy, fairness, and UX. Redis Sliding Window Counter + Lua atomicity + tiered keys (user / IP / route) + standard 429 headers make a stable foundation when they sit together. Use vetted libraries; tests should use short windows.
Next
- input-validation-zod
- password-hashing
We refer to Cloudflare blog — sliding window, Stripe Engineering — Rate Limiters, RFC 6585 (429), RFC 9210 draft RateLimit headers, NGINX limit_req, Bucket4j, rate-limiter-flexible, and @upstash/ratelimit.