Skip to Content
API ReferenceRate Limiting

Rate Limiting

The AEGIS API gateway enforces per-identity rate limiting to protect backend services from excessive load. Rate limits are applied after CORS handling and before authentication, using a token bucket algorithm.

Limits

ParameterValue
Rate100 requests per minute (~1.67 requests/second)
Burst10 requests
ScopePer authenticated identity

The token bucket allows short bursts of up to 10 requests, then replenishes at a rate of approximately 1.67 tokens per second (100 per minute).

Rate Limit Identity

The gateway determines your rate limit identity using this priority order:

  1. User ID from JWT — if authenticated (via Bearer token or the aegis_token cookie), the user_id claim is used
  2. Remote IP address — fallback for unauthenticated requests (e.g., health checks)

This means each user account has its own independent rate limit bucket.

Exceeding the Limit

When you exceed the rate limit, the gateway returns a 429 Too Many Requests response:

curl -i http://localhost:8000/api/v1/conversations \ -H "Authorization: Bearer $TOKEN"
HTTP/1.1 429 Too Many Requests Content-Type: application/json {"error":"rate limit exceeded — 100 requests/minute"}

The gateway does not currently return Retry-After or X-RateLimit-* headers. Plan your retry strategy based on the fixed 100 req/min limit.

Retry Strategy

When you receive a 429 response, implement exponential backoff:

import time import requests def call_with_retry(url, headers, max_retries=3): for attempt in range(max_retries): response = requests.get(url, headers=headers) if response.status_code != 429: return response wait = 2 ** attempt # 1s, 2s, 4s print(f"Rate limited. Retrying in {wait}s...") time.sleep(wait) raise Exception("Rate limit exceeded after retries")

Best Practices

  1. Cache responses — Avoid redundant API calls by caching results locally, especially for rarely-changing data like entity types, rules, and compliance summaries.

  2. Use SSE for real-time data — Instead of polling endpoints repeatedly, use the SSE streaming endpoints for agent execution and workspace updates.

  3. Batch where possible — Some endpoints accept arrays (e.g., ingestion, flaring events). Send multiple items in a single request rather than one-by-one.

  4. Stagger requests — If running batch operations, spread requests evenly across the minute rather than sending them all at once.

  5. Monitor for 429s — Log rate limit responses and alert if they become frequent. This is a signal to optimize your integration.

Rate Limit Implementation Details

The gateway uses Go’s golang.org/x/time/rate package, which implements a token bucket algorithm:

  • Each identity gets an independent rate.Limiter instance.
  • The limiter is created with rate.Limit(100.0/60.0) (1.667 tokens/second) and a burst size of 10.
  • Limiters are stored in a synchronized map keyed by the identity string.
  • If Allow() returns false, the request is immediately rejected with 429.

Rate limiters are stored in memory on the gateway process. If the gateway restarts, all rate limit buckets are reset.

Last updated on