Testing Strategy
AEGIS follows a layered testing approach across its microservices architecture. Each of the 8 Python services maintains its own test suite under services/{service}/tests/, with shared test patterns and conventions enforced across the codebase.
Philosophy
The testing strategy balances two goals:
- Fast, isolated unit tests that run without external dependencies (databases, Redis, Kafka) by using in-memory fakes and mocks.
- Integration tests that exercise the full service pipeline end-to-end, verifying cross-service communication and data flow.
The majority of tests are unit tests that use mock infrastructure. Integration tests are reserved for the end-to-end script (infrastructure/scripts/integration-test.sh) that requires all services to be running.
Async-First Testing
Every Python service uses FastAPI with async/await, so all tests are written using pytest-asyncio with automatic async mode:
# In each service's pyproject.toml
[tool.pytest.ini_options]
asyncio_mode = "auto"With asyncio_mode = "auto", any async def test_* function is automatically treated as an async test — no @pytest.mark.asyncio decorator is needed.
Test Coverage by Service
| Service | Test Files | Focus Areas |
|---|---|---|
| memory-service | test_working_memory.py, test_episodic_memory.py, test_ledger.py | Redis Hash operations, working memory CRUD, episodic vector search, injection ledger mark/check/evict |
| approval-service | test_api.py, test_audit.py | HITL approval lifecycle (create, decide, list, escalate), audit trail signing and write |
| auth-service | test_auth.py | Email/password verification (bcrypt), JWT token generation, token roundtrip validation |
| ingestion-service | test_ingestion.py | RRC scraper data extraction, CSV import parsing, entity extraction with edge generation |
| compliance-monitor | test_compliance.py | Deadline scanning, production compliance checks, rule change detection, risk scoring |
| flaring-monitor | test_flaring.py, test_burn_rate.py, test_routes.py, test_event_type_service.py, test_event_type_e2e.py, test_submission_guard_engine.py, test_guard_rule_versioning.py | Volume compliance assessment, R-32 validation, emissions tagging, forecast/prediction, burn rate intelligence, event type definitions, guard rules |
| knowledge-graph-service | test_api.py, test_crud.py, test_context.py, test_context_managed.py, test_event_detection.py, test_impact.py, test_relationship_types.py | Vertex/edge CRUD, Cypher query simulation, context assembly, graph traversal, impact analysis |
| orchestration-engine | test_routes.py, test_checklists.py, test_rules.py, test_schemas.py, test_engine.py, test_budget.py, test_queries.py, test_workspace.py, test_skill_base.py, test_skill_field_event.py, test_skill_rule32_formpr.py, test_skill_rule37.py | Compliance dashboard endpoints, checklist CRUD and workflow, rule versioning, workspace SSE, agent skill execution, budget enforcement |
Unit Tests vs Integration Tests
Unit Tests (per-service)
Unit tests form the bulk of the test suite. They isolate service logic from infrastructure by substituting real database pools and Redis clients with in-memory fakes.
Pattern: In-memory fakes over generic mocks. Rather than using plain MagicMock objects, AEGIS tests build purpose-built fake implementations that simulate real behavior:
FakePostgresPool(approval-service, orchestration-engine) — maintains an in-memory dict of rows, interprets basic SQL patterns (INSERT,SELECT,UPDATE,DELETE), and supportsexecute,fetch,fetchrow,fetchval.FakeAgeGraph(knowledge-graph-service) — an in-memory graph store that simulates Apache AGE Cypher operations including vertex/edge CRUD, traversal, and pattern matching.- Mock Redis client (memory-service) — a mock that uses
side_effectcallbacks backed by an in-memory dict store, supporting Hash operations (hgetall,hget,hset,hdel,hexists) and pipeline support.
This approach means tests exercise real parsing logic, query routing, and business rules — not just “was the function called with the right arguments.”
Integration Tests (end-to-end)
The integration test script at infrastructure/scripts/integration-test.sh tests the full platform pipeline:
- Health checks across all services (ports 8001-8009 and gateway at 8000)
- Authentication — obtain a JWT token via the dev email/password login
- Data ingestion — trigger RRC scraper, verify entity extraction (20 wells)
- Knowledge graph verification — assemble context for a specific well
- Agent execution — run Rule 37 agent through the gateway, verify HITL pause
- HITL approval — approve a pending filing, verify status transition
- Emissions calculation — verify flaring monitor computes CO2e correctly
Integration tests require all services to be running. Use ./infrastructure/scripts/start-all.sh to boot the entire platform before running the integration test.
Test Isolation Strategy
FastAPI TestClient with Noop Lifespan
Every service that uses a FastAPI lifespan handler (which connects to databases and Redis on startup) replaces it with a no-op lifespan during tests:
from contextlib import asynccontextmanager
@asynccontextmanager
async def noop_lifespan(app):
yield
main_module.app.router.lifespan_context = noop_lifespanThis prevents tests from attempting real infrastructure connections while still allowing the full FastAPI router to be exercised via TestClient.
Module-Level Dependency Injection
Test fixtures patch the service’s global variables directly before creating the TestClient:
@pytest.fixture
def app_client(fake_pg):
import approval.main as main_module
main_module.pg = fake_pg
@asynccontextmanager
async def noop_lifespan(app):
yield
main_module.app.router.lifespan_context = noop_lifespan
return TestClient(main_module.app)This gives each test a fresh, isolated environment with no shared state between test runs.
Test Organization
Tests follow a class-based organization pattern, grouping related test cases:
class TestVertexCrud:
async def test_create_vertex(self, crud):
...
async def test_get_vertex_not_found(self, crud):
...
class TestEdgeCrud:
async def test_create_edge(self, crud):
...Both sync and async test methods are supported. API-level tests (testing HTTP endpoints via TestClient) are synchronous. Business logic tests (testing service classes directly) are async.
Flaring Monitor: Real Database Tests
The flaring-monitor service is unique in that some of its tests connect to the real local PostgreSQL database. Its conftest.py uses pytest_asyncio.fixture to create a real PostgresPool, runs migration SQL files, and performs cleanup before and after each test:
@pytest_asyncio.fixture
async def pg():
pool = PostgresPool(dsn=DATABASE_URL, min_size=1, max_size=5)
await pool.connect()
# Run migrations, clean up test data...
yield pool
# Clean up after test...
await pool.disconnect()The flaring-monitor integration tests require a running PostgreSQL instance. Make sure Docker Compose infrastructure is up before running these tests: docker compose up -d.
What is Not Tested
- LLM calls — The orchestration engine’s LLM interactions are not tested with real API calls. Tests focus on the surrounding pipeline (routing, state management, budget enforcement).
- Kafka event publishing — Event publishing is mocked in tests. The integration test verifies the full pipeline indirectly.
- Frontend — The Next.js frontend does not currently have a test suite. Testing is manual.
- Go API Gateway — The gateway does not have Go unit tests. It is validated through the integration test script.