Rate Limiting

The AEGIS API gateway enforces per-identity rate limiting to protect backend services from excessive load. Rate limits are applied after CORS handling and before authentication, using a token bucket algorithm.

Limits

Parameter	Value
Rate	100 requests per minute (~1.67 requests/second)
Burst	10 requests
Scope	Per authenticated identity

The token bucket allows short bursts of up to 10 requests, then replenishes at a rate of approximately 1.67 tokens per second (100 per minute).

Rate Limit Identity

The gateway determines your rate limit identity using this priority order:

User ID from JWT — if authenticated (via Bearer token or the aegis_token cookie), the user_id claim is used
Remote IP address — fallback for unauthenticated requests (e.g., health checks)

This means each user account has its own independent rate limit bucket.

Exceeding the Limit

When you exceed the rate limit, the gateway returns a 429 Too Many Requests response:


curl -i http://localhost:8000/api/v1/conversations \
  -H "Authorization: Bearer $TOKEN"


HTTP/1.1 429 Too Many Requests
Content-Type: application/json
 
{"error":"rate limit exceeded — 100 requests/minute"}

The gateway does not currently return Retry-After or X-RateLimit-* headers. Plan your retry strategy based on the fixed 100 req/min limit.

Retry Strategy

When you receive a 429 response, implement exponential backoff:

Python


import time
import requests
 
def call_with_retry(url, headers, max_retries=3):
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)
        if response.status_code != 429:
            return response
 
        wait = 2 ** attempt  # 1s, 2s, 4s
        print(f"Rate limited. Retrying in {wait}s...")
        time.sleep(wait)
 
    raise Exception("Rate limit exceeded after retries")

JavaScript


async function callWithRetry(url, headers, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(url, { headers });
    if (response.status !== 429) return response;
 
    const wait = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s
    console.log(`Rate limited. Retrying in ${wait}ms...`);
    await new Promise(r => setTimeout(r, wait));
  }
  throw new Error('Rate limit exceeded after retries');
}

curl


# Simple retry loop in bash
for i in 1 2 3; do
  RESPONSE=$(curl -s -w "%{http_code}" \
    http://localhost:8000/api/v1/conversations \
    -H "Authorization: Bearer $TOKEN")
 
  HTTP_CODE="${RESPONSE: -3}"
  if [ "$HTTP_CODE" != "429" ]; then
    echo "${RESPONSE:0:${#RESPONSE}-3}"
    break
  fi
 
  echo "Rate limited. Waiting ${i}s..."
  sleep $i
done

Best Practices

Cache responses — Avoid redundant API calls by caching results locally, especially for rarely-changing data like entity types, rules, and compliance summaries.
Use SSE for real-time data — Instead of polling endpoints repeatedly, use the SSE streaming endpoints for agent execution and workspace updates.
Batch where possible — Some endpoints accept arrays (e.g., ingestion, flaring events). Send multiple items in a single request rather than one-by-one.
Stagger requests — If running batch operations, spread requests evenly across the minute rather than sending them all at once.
Monitor for 429s — Log rate limit responses and alert if they become frequent. This is a signal to optimize your integration.

Rate Limit Implementation Details

The gateway uses Go’s golang.org/x/time/rate package, which implements a token bucket algorithm:

Each identity gets an independent rate.Limiter instance.
The limiter is created with rate.Limit(100.0/60.0) (1.667 tokens/second) and a burst size of 10.
Limiters are stored in a synchronized map keyed by the identity string.
If Allow() returns false, the request is immediately rejected with 429.

Rate limiters are stored in memory on the gateway process. If the gateway restarts, all rate limit buckets are reset.