Rate Limiting
The AEGIS API gateway enforces per-identity rate limiting to protect backend services from excessive load. Rate limits are applied after CORS handling and before authentication, using a token bucket algorithm.
Limits
| Parameter | Value |
|---|---|
| Rate | 100 requests per minute (~1.67 requests/second) |
| Burst | 10 requests |
| Scope | Per authenticated identity |
The token bucket allows short bursts of up to 10 requests, then replenishes at a rate of approximately 1.67 tokens per second (100 per minute).
Rate Limit Identity
The gateway determines your rate limit identity using this priority order:
- User ID from JWT — if authenticated (via Bearer token or the
aegis_tokencookie), theuser_idclaim is used - Remote IP address — fallback for unauthenticated requests (e.g., health checks)
This means each user account has its own independent rate limit bucket.
Exceeding the Limit
When you exceed the rate limit, the gateway returns a 429 Too Many Requests response:
curl -i http://localhost:8000/api/v1/conversations \
-H "Authorization: Bearer $TOKEN"HTTP/1.1 429 Too Many Requests
Content-Type: application/json
{"error":"rate limit exceeded — 100 requests/minute"}The gateway does not currently return Retry-After or X-RateLimit-* headers. Plan your retry strategy based on the fixed 100 req/min limit.
Retry Strategy
When you receive a 429 response, implement exponential backoff:
Python
import time
import requests
def call_with_retry(url, headers, max_retries=3):
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code != 429:
return response
wait = 2 ** attempt # 1s, 2s, 4s
print(f"Rate limited. Retrying in {wait}s...")
time.sleep(wait)
raise Exception("Rate limit exceeded after retries")Best Practices
-
Cache responses — Avoid redundant API calls by caching results locally, especially for rarely-changing data like entity types, rules, and compliance summaries.
-
Use SSE for real-time data — Instead of polling endpoints repeatedly, use the SSE streaming endpoints for agent execution and workspace updates.
-
Batch where possible — Some endpoints accept arrays (e.g., ingestion, flaring events). Send multiple items in a single request rather than one-by-one.
-
Stagger requests — If running batch operations, spread requests evenly across the minute rather than sending them all at once.
-
Monitor for 429s — Log rate limit responses and alert if they become frequent. This is a signal to optimize your integration.
Rate Limit Implementation Details
The gateway uses Go’s golang.org/x/time/rate package, which implements a token bucket algorithm:
- Each identity gets an independent
rate.Limiterinstance. - The limiter is created with
rate.Limit(100.0/60.0)(1.667 tokens/second) and a burst size of 10. - Limiters are stored in a synchronized map keyed by the identity string.
- If
Allow()returns false, the request is immediately rejected with 429.
Rate limiters are stored in memory on the gateway process. If the gateway restarts, all rate limit buckets are reset.