Testing Strategy

AEGIS follows a layered testing approach across its microservices architecture. Each of the 8 Python services maintains its own test suite under services/{service}/tests/, with shared test patterns and conventions enforced across the codebase.

Philosophy

The testing strategy balances two goals:

Fast, isolated unit tests that run without external dependencies (databases, Redis, Kafka) by using in-memory fakes and mocks.
Integration tests that exercise the full service pipeline end-to-end, verifying cross-service communication and data flow.

The majority of tests are unit tests that use mock infrastructure. Integration tests are reserved for the end-to-end script (infrastructure/scripts/integration-test.sh) that requires all services to be running.

Async-First Testing

Every Python service uses FastAPI with async/await, so all tests are written using pytest-asyncio with automatic async mode:


# In each service's pyproject.toml
[tool.pytest.ini_options]
asyncio_mode = "auto"

With asyncio_mode = "auto", any async def test_* function is automatically treated as an async test — no @pytest.mark.asyncio decorator is needed.

Test Coverage by Service

Service	Test Files	Focus Areas
memory-service	`test_working_memory.py`, `test_episodic_memory.py`, `test_ledger.py`	Redis Hash operations, working memory CRUD, episodic vector search, injection ledger mark/check/evict
approval-service	`test_api.py`, `test_audit.py`	HITL approval lifecycle (create, decide, list, escalate), audit trail signing and write
auth-service	`test_auth.py`	Email/password verification (bcrypt), JWT token generation, token roundtrip validation
ingestion-service	`test_ingestion.py`	RRC scraper data extraction, CSV import parsing, entity extraction with edge generation
compliance-monitor	`test_compliance.py`	Deadline scanning, production compliance checks, rule change detection, risk scoring
flaring-monitor	`test_flaring.py`, `test_burn_rate.py`, `test_routes.py`, `test_event_type_service.py`, `test_event_type_e2e.py`, `test_submission_guard_engine.py`, `test_guard_rule_versioning.py`	Volume compliance assessment, R-32 validation, emissions tagging, forecast/prediction, burn rate intelligence, event type definitions, guard rules
knowledge-graph-service	`test_api.py`, `test_crud.py`, `test_context.py`, `test_context_managed.py`, `test_event_detection.py`, `test_impact.py`, `test_relationship_types.py`	Vertex/edge CRUD, Cypher query simulation, context assembly, graph traversal, impact analysis
orchestration-engine	`test_routes.py`, `test_checklists.py`, `test_rules.py`, `test_schemas.py`, `test_engine.py`, `test_budget.py`, `test_queries.py`, `test_workspace.py`, `test_skill_base.py`, `test_skill_field_event.py`, `test_skill_rule32_formpr.py`, `test_skill_rule37.py`	Compliance dashboard endpoints, checklist CRUD and workflow, rule versioning, workspace SSE, agent skill execution, budget enforcement

Unit Tests vs Integration Tests

Unit Tests (per-service)

Unit tests form the bulk of the test suite. They isolate service logic from infrastructure by substituting real database pools and Redis clients with in-memory fakes.

Pattern: In-memory fakes over generic mocks. Rather than using plain MagicMock objects, AEGIS tests build purpose-built fake implementations that simulate real behavior:

FakePostgresPool (approval-service, orchestration-engine) — maintains an in-memory dict of rows, interprets basic SQL patterns (INSERT, SELECT, UPDATE, DELETE), and supports execute, fetch, fetchrow, fetchval.
FakeAgeGraph (knowledge-graph-service) — an in-memory graph store that simulates Apache AGE Cypher operations including vertex/edge CRUD, traversal, and pattern matching.
Mock Redis client (memory-service) — a mock that uses side_effect callbacks backed by an in-memory dict store, supporting Hash operations (hgetall, hget, hset, hdel, hexists) and pipeline support.

This approach means tests exercise real parsing logic, query routing, and business rules — not just “was the function called with the right arguments.”

Integration Tests (end-to-end)

The integration test script at infrastructure/scripts/integration-test.sh tests the full platform pipeline:

Health checks across all services (ports 8001-8009 and gateway at 8000)
Authentication — obtain a JWT token via the dev email/password login
Data ingestion — trigger RRC scraper, verify entity extraction (20 wells)
Knowledge graph verification — assemble context for a specific well
Agent execution — run Rule 37 agent through the gateway, verify HITL pause
HITL approval — approve a pending filing, verify status transition
Emissions calculation — verify flaring monitor computes CO2e correctly

Integration tests require all services to be running. Use ./infrastructure/scripts/start-all.sh to boot the entire platform before running the integration test.

Test Isolation Strategy

FastAPI TestClient with Noop Lifespan

Every service that uses a FastAPI lifespan handler (which connects to databases and Redis on startup) replaces it with a no-op lifespan during tests:


from contextlib import asynccontextmanager
 
@asynccontextmanager
async def noop_lifespan(app):
    yield
 
main_module.app.router.lifespan_context = noop_lifespan

This prevents tests from attempting real infrastructure connections while still allowing the full FastAPI router to be exercised via TestClient.

Module-Level Dependency Injection

Test fixtures patch the service’s global variables directly before creating the TestClient:


@pytest.fixture
def app_client(fake_pg):
    import approval.main as main_module
    main_module.pg = fake_pg
 
    @asynccontextmanager
    async def noop_lifespan(app):
        yield
 
    main_module.app.router.lifespan_context = noop_lifespan
    return TestClient(main_module.app)

This gives each test a fresh, isolated environment with no shared state between test runs.

Test Organization

Tests follow a class-based organization pattern, grouping related test cases:


class TestVertexCrud:
    async def test_create_vertex(self, crud):
        ...
    async def test_get_vertex_not_found(self, crud):
        ...
 
class TestEdgeCrud:
    async def test_create_edge(self, crud):
        ...

Both sync and async test methods are supported. API-level tests (testing HTTP endpoints via TestClient) are synchronous. Business logic tests (testing service classes directly) are async.

Flaring Monitor: Real Database Tests

The flaring-monitor service is unique in that some of its tests connect to the real local PostgreSQL database. Its conftest.py uses pytest_asyncio.fixture to create a real PostgresPool, runs migration SQL files, and performs cleanup before and after each test:


@pytest_asyncio.fixture
async def pg():
    pool = PostgresPool(dsn=DATABASE_URL, min_size=1, max_size=5)
    await pool.connect()
    # Run migrations, clean up test data...
    yield pool
    # Clean up after test...
    await pool.disconnect()

The flaring-monitor integration tests require a running PostgreSQL instance. Make sure Docker Compose infrastructure is up before running these tests: docker compose up -d.

What is Not Tested

LLM calls — The orchestration engine’s LLM interactions are not tested with real API calls. Tests focus on the surrounding pipeline (routing, state management, budget enforcement).
Kafka event publishing — Event publishing is mocked in tests. The integration test verifies the full pipeline indirectly.
Frontend — The Next.js frontend does not currently have a test suite. Testing is manual.
Go API Gateway — The gateway does not have Go unit tests. It is validated through the integration test script.