Writing Tests

This guide covers the patterns and conventions used across AEGIS test suites, with concrete examples from the codebase.

File and Directory Structure

Tests live in services/{service}/tests/ alongside the service source code:


services/memory-service/
├── src/memory/          # Service source code
├── tests/
│   ├── __init__.py      # Required for pytest discovery
│   ├── conftest.py      # Shared fixtures
│   ├── test_working_memory.py
│   ├── test_episodic_memory.py
│   └── test_ledger.py
└── pyproject.toml

Naming Conventions

Test files: test_{module}.py or test_{feature}.py
Test classes: TestFeatureName (e.g., TestCreateApproval, TestVertexCrud)
Test methods: test_{behavior} (e.g., test_create_named_individual, test_get_vertex_not_found)
Every test directory must contain an __init__.py file

The conftest.py Pattern

Each service with complex test infrastructure defines shared fixtures in conftest.py. These fixtures provide mock infrastructure (databases, Redis, HTTP clients) and service instances that tests consume via dependency injection.

Pattern 1: Mock Redis Client (memory-service)

The memory-service conftest builds a mock Redis client backed by an in-memory dict, simulating Hash operations:


# services/memory-service/tests/conftest.py
 
@pytest.fixture
def mock_redis():
    store: dict[str, dict[str, str]] = {}
    client = AsyncMock()
 
    async def hgetall(key):
        return dict(store.get(key, {}))
 
    async def hset(key, field, value):
        store.setdefault(key, {})[field] = value
 
    client.hgetall = AsyncMock(side_effect=hgetall)
    client.hset = AsyncMock(side_effect=hset)
    # ... other operations
 
    return client, store

The fixture returns both the mock client and the backing store, so tests can inspect stored data directly.

Pattern 2: Fake PostgreSQL Pool (approval-service)

The approval-service builds a FakePostgresPool class that maintains in-memory dicts for approvals and audit_logs, and interprets basic SQL patterns:


class FakePostgresPool:
    def __init__(self):
        self.approvals: dict[str, dict[str, Any]] = {}
        self.audit_logs: list[dict[str, Any]] = []
 
    async def execute(self, query: str, *args) -> str:
        q = query.strip().upper()
        if "INSERT INTO" in q and "APPROVAL_REQUESTS" in q:
            self._insert_approval(args)
        elif "UPDATE" in q and "APPROVAL_REQUESTS" in q:
            self._update_approval(query, args)
        return "OK"
 
    async def fetch(self, query: str, *args) -> list[dict[str, Any]]:
        ...

Pattern 3: Fake Graph Store (knowledge-graph-service)

The knowledge-graph-service builds a FakeAgeGraph that simulates Apache AGE Cypher operations with vertex/edge CRUD and a regex-based Cypher interpreter:


class FakeAgeGraph:
    def __init__(self):
        self._vertices: dict[int, dict[str, Any]] = {}
        self._edges: dict[int, dict[str, Any]] = {}
 
    def create_vertex(self, label: str, props: dict) -> dict:
        vid = self._alloc_id()
        vertex = {"id": vid, "label": label, "properties": dict(props)}
        self._vertices[vid] = vertex
        return vertex
 
    def match_vertices(self, label: str, match_props=None, limit=50):
        ...

A companion _interpret_cypher() function uses regex to parse CREATE, MATCH, SET, and DETACH DELETE patterns and dispatch them to the fake graph.

Pattern 4: Real Database with Cleanup (flaring-monitor)

The flaring-monitor uses real database connections with careful cleanup:


@pytest_asyncio.fixture
async def pg():
    pool = PostgresPool(dsn=DATABASE_URL, min_size=1, max_size=5)
    await pool.connect()
 
    # Cleanup before yielding
    await pool.execute("DELETE FROM event_type_definitions WHERE tenant_id = $1", TEST_TENANT)
 
    yield pool
 
    # Cleanup after test
    await pool.execute("DELETE FROM event_type_definitions WHERE tenant_id = $1", TEST_TENANT)
    await pool.disconnect()

When using real database fixtures, always use a fixed test tenant ID and clean up both before and after the test to ensure isolation regardless of previous test failures.

Writing Unit Tests

Async Unit Tests

With asyncio_mode = "auto", write async test methods directly in test classes:


class TestWorkingMemoryUnit:
    async def test_set_and_get(self, working_mem):
        await working_mem.set("conv-1", {"key1": "value1", "key2": 42})
        result = await working_mem.get("conv-1")
        assert result["key1"] == "value1"
        assert result["key2"] == 42
 
    async def test_get_empty(self, working_mem):
        result = await working_mem.get("nonexistent")
        assert result == {}

No decorators are needed. The working_mem fixture is injected by pytest from conftest.

Synchronous API Tests

Tests that exercise FastAPI endpoints via TestClient are synchronous, because TestClient handles the async event loop internally:


class TestWorkingMemoryAPI:
    def test_get_empty(self, app_client):
        r = app_client.get("/working-memory/conv-new")
        assert r.status_code == 200
        body = r.json()
        assert body["conversation_id"] == "conv-new"
        assert body["data"] == {}
 
    def test_put_and_get(self, app_client):
        r = app_client.put(
            "/working-memory/conv-1",
            json={"data": {"scratchpad": "notes", "count": 5}},
        )
        assert r.status_code == 200

Testing Error Cases

Always test error paths and edge cases alongside the happy path:


class TestCreateApproval:
    def test_create_invalid_strategy(self, app_client):
        r = app_client.post("/approvals", json={
            "execution_id": "exec-3",
            "agent_id": "agent-1",
            "checkpoint_type": "pre_filing",
            "state_snapshot": {},
            "reviewer_strategy": "invalid",
        })
        assert r.status_code == 400
 
    def test_create_named_individual_without_reviewer_id(self, app_client):
        r = app_client.post("/approvals", json={
            "execution_id": "exec-4",
            "agent_id": "agent-1",
            "checkpoint_type": "pre_filing",
            "state_snapshot": {},
            "reviewer_strategy": "named_individual",
            # Missing reviewer_id
        })
        assert r.status_code == 400

Testing with pytest.raises

For functions that raise exceptions, use pytest.raises:


async def test_create_invalid_label(self, crud):
    with pytest.raises(ValueError, match="Unknown vertex label"):
        await crud.create_vertex("InvalidLabel", {"entity_id": "x"})

Writing API Integration Tests

API tests follow a setup-act-assert pattern, often creating prerequisite data before testing the target endpoint:


class TestDecideApproval:
    def _create_pending(self, app_client):
        """Helper to create a pending approval for testing decisions."""
        r = app_client.post("/approvals", json={
            "execution_id": "e1",
            "agent_id": "a1",
            "checkpoint_type": "pre_filing",
            "state_snapshot": {"messages": [{"role": "assistant", "content": "draft"}]},
            "reviewer_strategy": "named_individual",
            "reviewer_id": "rev-1",
        })
        return r.json()["id"]
 
    def test_approve(self, app_client):
        aid = self._create_pending(app_client)
        r = app_client.post(f"/approvals/{aid}/decide", json={
            "decision": "approved",
            "reviewer_id": "rev-1",
            "reviewer_comments": "Looks good",
        })
        assert r.status_code == 200
        body = r.json()
        assert body["status"] == "approved"
        assert body["decided_at"] is not None
 
    def test_decide_already_decided(self, app_client):
        aid = self._create_pending(app_client)
        # First decision succeeds
        app_client.post(f"/approvals/{aid}/decide", json={
            "decision": "approved", "reviewer_id": "rev-1",
        })
        # Second decision should fail with 409 Conflict
        r = app_client.post(f"/approvals/{aid}/decide", json={
            "decision": "rejected", "reviewer_id": "rev-2",
        })
        assert r.status_code == 409

Using Seeded Test Data

The orchestration-engine conftest provides helper functions and seeded fixtures for complex test scenarios:


# conftest.py helpers
def make_compliance_row(entity_id="well-1", status="compliant", domain="rule_37", **extra):
    return {
        "id": str(uuid.uuid4()),
        "tenant_id": "default",
        "entity_id": entity_id,
        "entity_type": "Well",
        "compliance_domain": domain,
        "status": status,
        ...
    }
 
@pytest.fixture
def seeded_pg(fake_pg):
    """FakePostgresPool pre-seeded with templates and rule versions."""
    template = make_template_row()
    fake_pg.seed_table("checklist_templates", [template])
    fake_pg.seed_table("rule_versions", [
        make_rule_version_row(rule_domain="spacing", rule_identifier="SWR_37"),
    ])
    return fake_pg

Tests then use these fixtures to verify behavior against known data:


class TestComplianceSummary:
    def test_with_overdue(self, app_client, fake_pg):
        fake_pg.seed_compliance_status([
            make_compliance_row(entity_id="w-1", status="overdue"),
            make_compliance_row(entity_id="w-2", status="overdue"),
        ])
        resp = app_client.get("/compliance/summary")
        assert resp.json()["overdue_count"] >= 2

Testing Pure Functions

Some services have pure business logic functions that need no mocking at all:


from compliance.deadlines import scan_deadlines
from compliance.risk_scoring import score_well_risk
 
class TestDeadlines:
    def test_critical_deadline(self):
        today = date(2025, 4, 10)
        permits = [{"permit_number": "P-1", "expiration_date": "2025-04-14"}]
        result = scan_deadlines(permits, [], reference_date=today)
        assert result["critical_count"] == 1
 
class TestRiskScoring:
    def test_high_risk(self):
        result = score_well_risk(
            well={"well_name": "W-1", "api_number": "42-1"},
            deadlines=[{"days_remaining": -5}],
            production_issues=["GOR exceeded", "Zero production"],
            violation_count=2,
            flaring_exposure={"pct_of_max": 105},
        )
        assert result["risk_level"] == "HIGH"
        assert result["risk_score"] >= 70

These tests are straightforward and fast — no fixtures required beyond the function under test.

Adding a New Test

Follow this checklist when adding tests to an existing service:

Create or extend a test file in services/{service}/tests/test_{feature}.py
Add an __init__.py if the tests directory does not already have one
Add fixtures to conftest.py if you need shared mock infrastructure
Organize tests in classes grouped by feature or endpoint
Use async for business logic tests and sync for TestClient API tests
Test the happy path, error cases, and edge cases

Run the tests to verify they pass:


cd services/{service-name}
poetry run pytest tests/test_{feature}.py -v

Adding Tests for a New Service

When creating tests for a brand-new service:

Create the tests/ directory with __init__.py
Create conftest.py with the appropriate mock infrastructure:
- PostgreSQL-backed service: build a FakePostgresPool
- Redis-backed service: build a mock Redis client with AsyncMock and side_effect
- Graph-backed service: build a FakeAgeGraph

Create an app_client fixture that patches the service’s global dependencies and replaces the lifespan:


@pytest.fixture
def app_client(fake_pg):
    import my_service.main as main_module
    main_module.pg = fake_pg
 
    @asynccontextmanager
    async def noop_lifespan(app):
        yield
 
    main_module.app.router.lifespan_context = noop_lifespan
    return TestClient(main_module.app)

Add [tool.pytest.ini_options] with asyncio_mode = "auto" in pyproject.toml

Add dev dependencies for pytest, pytest-asyncio, and httpx:


[tool.poetry.group.dev.dependencies]
pytest = "^8.0"
pytest-asyncio = "^0.23"
httpx = "^0.28"