API Rate Limiting: The Security Guide for Developers in 2026
API rate limiting is not just a scaling concern — it is one of the most direct security controls you can add to any web application. Without it, your authentication endpoints are open to brute force, your data APIs are available for bulk scraping, and your infrastructure is vulnerable to application-layer DDoS attacks that bypass network-level mitigations. This guide covers the theory, the algorithms, and practical implementations across the most common stacks.
Scan your site in 60 seconds — it's free: ZeriFlow →
Why Rate Limiting Is a Security Control, Not Just a Performance Feature
Most developers encounter rate limiting as a way to protect database performance or prevent bill shock on third-party API calls. Both are valid. But the security use cases are often more urgent:
Brute force attacks on authentication: Without rate limiting, an attacker can try millions of password combinations against your login endpoint. Even with bcrypt hashing slowing individual comparisons, a distributed botnet can run credential stuffing attacks at scale. Rate limiting is the first line of defense.
Account enumeration: Login endpoints that respond differently to "user not found" vs "wrong password" leak user existence. Rate limiting reduces the speed at which an attacker can enumerate valid accounts.
Data scraping: APIs that return paginated data without rate limits can be fully scraped in minutes. Even if the data is not sensitive, scraped data feeds competitor intelligence tools and LLM training datasets.
Application-layer DDoS: Traditional DDoS mitigation handles volumetric attacks at the network layer. Application-layer attacks — sending slow, expensive requests that hit your database or trigger heavy computations — require rate limiting at the application layer.
SMS/email flooding: OTP and password reset endpoints without rate limits allow attackers to trigger thousands of messages to a victim's phone or inbox. This costs you money and harasses your users.
Rate Limiting Algorithms Explained
Fixed Window
The simplest approach: count requests in a fixed time window (e.g., 100 requests per minute). When the counter hits the limit, reject until the next window starts.
Problem: Burst exploitation. An attacker can send 100 requests at 00:59 and another 100 at 01:01 — 200 requests in 2 seconds while technically staying within the limit.
Use when: Simple endpoints where burst exploitation is acceptable and you want minimal implementation complexity.
Sliding Window Log
Track the timestamp of each request. When a new request arrives, count how many requests occurred in the last N seconds. More accurate than fixed window, but memory-intensive for high-traffic APIs.
Sliding Window Counter
A hybrid: split time into smaller buckets and use a weighted sum of the current and previous bucket counts. Much more memory-efficient than a full log while eliminating most of the fixed window burst problem.
Token Bucket
Each user or IP has a "bucket" that holds up to N tokens. Tokens refill at a constant rate (e.g., 1 per second, up to a maximum of 60). Each request consumes a token. If the bucket is empty, the request is rejected.
Advantage: Allows controlled bursting — a user who has been idle can make several requests quickly, which is natural for human usage patterns. Good for user-facing APIs.
Leaky Bucket
Requests enter a queue (the "bucket"). They are processed at a fixed rate. If the queue is full, new requests are dropped. This produces smooth, predictable output rates.
Advantage: Great for backend-to-backend calls where you need guaranteed throughput limits regardless of incoming burst patterns. Less intuitive for user-facing rate limiting.
Implementation Patterns
nginx Rate Limiting
nginx's built-in limit_req module implements a leaky bucket algorithm and is the fastest way to add rate limiting to any site without touching application code.
http {
limit_req_zone $binary_remote_addr zone=login:10m rate=5r/m;
limit_req_zone $binary_remote_addr zone=api:10m rate=100r/m;
server {
location /api/auth/login {
limit_req zone=login burst=3 nodelay;
limit_req_status 429;
proxy_pass http://app;
}
location /api/ {
limit_req zone=api burst=20 nodelay;
limit_req_status 429;
proxy_pass http://app;
}
}
}Key parameters:
- rate=5r/m — 5 requests per minute
- burst=3 — allow a burst of 3 requests before limiting
- nodelay — reject immediately instead of queuing when burst is exceeded
- limit_req_status 429 — return standard HTTP 429 Too Many Requests
Express Rate Limiting (Node.js)
For Express applications, express-rate-limit with a Redis store is the production-grade solution.
import rateLimit from 'express-rate-limit';
import RedisStore from 'rate-limit-redis';
import { createClient } from 'redis';
const redisClient = createClient({ url: process.env.REDIS_URL });
await redisClient.connect();
// Strict limit for authentication endpoints
export const authLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 10,
standardHeaders: true, // Return RateLimit-* headers
legacyHeaders: false,
store: new RedisStore({ sendCommand: (...args) => redisClient.sendCommand(args) }),
message: { error: 'Too many login attempts. Please try again in 15 minutes.' },
keyGenerator: (req) => req.ip, // Rate limit by IP; consider user ID post-auth
});
// General API limit
export const apiLimiter = rateLimit({
windowMs: 60 * 1000,
max: 100,
standardHeaders: true,
legacyHeaders: false,
store: new RedisStore({ sendCommand: (...args) => redisClient.sendCommand(args) }),
});
// Apply to routes
app.post('/api/auth/login', authLimiter, loginHandler);
app.use('/api/', apiLimiter);Why Redis: In-memory rate limiting only works on a single process. As soon as you run multiple instances (any real production deployment), you need a shared store. Redis is the standard choice.
API Gateways
If you use AWS API Gateway, Kong, or Cloudflare, rate limiting is available as a native feature:
- AWS API Gateway: Usage Plans with API keys, or per-method throttling settings
- Cloudflare Rate Limiting: Rules-based rate limiting at the CDN layer, before requests reach your origin
- Kong:
rate-limitingorrate-limiting-advancedplugins with Redis support
API gateway-level rate limiting is particularly effective for DDoS mitigation because it rejects requests before they touch your application servers.
The Retry-After Header: Do It Correctly
When you return a 429 response, always include a Retry-After header. This tells clients when they can try again and enables well-behaved API clients to back off automatically rather than hammering your endpoint.
Two formats are accepted:
Retry-After: 60 # seconds to wait
Retry-After: Fri, 25 Apr 2026 12:00:00 GMT # HTTP dateAlso return RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset headers (now standardized in the IETF draft for rate limit headers). These allow API clients to manage their own request pacing proactively, before they hit the limit.
A 429 without Retry-After is a common API design mistake — it forces clients to use arbitrary backoff instead of respecting your system's actual capacity.
Rate Limiting Strategy by Endpoint Type
Not all endpoints need the same limits. A tiered strategy is more effective than a blanket rule:
| Endpoint Type | Recommended Limit | Notes |
|---|---|---|
| Login / auth | 5–10 / 15 min per IP | Lockout after N failures, alert on anomalies |
| Password reset | 3–5 / hour per email | Prevents SMS/email flooding |
| OTP/2FA | 5 attempts / 10 min | Hard lockout with user notification |
| Public API (unauthed) | 20–50 / min per IP | Aggressive — unauthed means no accountability |
| Authenticated API | 100–500 / min per user | Based on plan/tier |
| File upload | 10–20 / hour per user | Prevents storage abuse |
| Search / heavy queries | 20–30 / min | Protect database-heavy endpoints |
FAQ
Q: Should I rate limit by IP or by user ID?
A: Both, at different layers. IP-based limiting is the only option for unauthenticated endpoints (login, registration, password reset). Once a user is authenticated, user ID-based limiting is more precise — it prevents a single abusive account from being masked by IP rotation, and it doesn't penalize legitimate users sharing an IP (NAT, corporate proxies, universities).
Q: How do I avoid blocking legitimate users with rate limiting?
A: Use generous burst allowances for human-realistic usage patterns. A user submitting a form 10 times in 5 seconds is unusual; submitting it 3 times in 30 seconds is plausible if they had a network error. Calibrate limits against your actual traffic data before enforcing them in production. Also provide clear error messages with retry timing.
Q: Does rate limiting stop DDoS attacks?
A: Application-layer rate limiting stops application-layer DDoS (slow, expensive requests). For volumetric DDoS (raw packet floods), you need network-level mitigation (Cloudflare, AWS Shield). The two work at different layers and are complementary.
Q: What is the right rate limit for a login endpoint?
A: A common safe default is 5 attempts per 15 minutes per IP, with a hard lockout at 10 attempts that requires email verification or a CAPTCHA to unlock. For higher-security applications (banking, healthcare), consider 3 attempts per 10 minutes. Always implement account lockout notification so users know if someone is attempting to access their account.
Conclusion
Rate limiting is one of the highest-leverage security controls available to API developers — it is cheap to implement, requires no cryptography, and directly stops several of the most common attack patterns. Token bucket for user-facing APIs, nginx limit_req for infrastructure-level enforcement, Redis for distributed state, and Retry-After headers for well-behaved clients.
While rate limiting protects your API layer, your public-facing URLs also need correct TLS configuration, security headers, and cookie flags. Running a scan before launch catches the full picture.
Start your free ZeriFlow scan → — no credit card, instant results.