Skip to Content
Developer DocsTestingStrategy

Testing Strategy

AEGIS follows a layered testing approach across its microservices architecture. Each of the 8 Python services maintains its own test suite under services/{service}/tests/, with shared test patterns and conventions enforced across the codebase.

Philosophy

The testing strategy balances two goals:

  1. Fast, isolated unit tests that run without external dependencies (databases, Redis, Kafka) by using in-memory fakes and mocks.
  2. Integration tests that exercise the full service pipeline end-to-end, verifying cross-service communication and data flow.

The majority of tests are unit tests that use mock infrastructure. Integration tests are reserved for the end-to-end script (infrastructure/scripts/integration-test.sh) that requires all services to be running.

Async-First Testing

Every Python service uses FastAPI with async/await, so all tests are written using pytest-asyncio with automatic async mode:

# In each service's pyproject.toml [tool.pytest.ini_options] asyncio_mode = "auto"

With asyncio_mode = "auto", any async def test_* function is automatically treated as an async test — no @pytest.mark.asyncio decorator is needed.

Test Coverage by Service

ServiceTest FilesFocus Areas
memory-servicetest_working_memory.py, test_episodic_memory.py, test_ledger.pyRedis Hash operations, working memory CRUD, episodic vector search, injection ledger mark/check/evict
approval-servicetest_api.py, test_audit.pyHITL approval lifecycle (create, decide, list, escalate), audit trail signing and write
auth-servicetest_auth.pyEmail/password verification (bcrypt), JWT token generation, token roundtrip validation
ingestion-servicetest_ingestion.pyRRC scraper data extraction, CSV import parsing, entity extraction with edge generation
compliance-monitortest_compliance.pyDeadline scanning, production compliance checks, rule change detection, risk scoring
flaring-monitortest_flaring.py, test_burn_rate.py, test_routes.py, test_event_type_service.py, test_event_type_e2e.py, test_submission_guard_engine.py, test_guard_rule_versioning.pyVolume compliance assessment, R-32 validation, emissions tagging, forecast/prediction, burn rate intelligence, event type definitions, guard rules
knowledge-graph-servicetest_api.py, test_crud.py, test_context.py, test_context_managed.py, test_event_detection.py, test_impact.py, test_relationship_types.pyVertex/edge CRUD, Cypher query simulation, context assembly, graph traversal, impact analysis
orchestration-enginetest_routes.py, test_checklists.py, test_rules.py, test_schemas.py, test_engine.py, test_budget.py, test_queries.py, test_workspace.py, test_skill_base.py, test_skill_field_event.py, test_skill_rule32_formpr.py, test_skill_rule37.pyCompliance dashboard endpoints, checklist CRUD and workflow, rule versioning, workspace SSE, agent skill execution, budget enforcement

Unit Tests vs Integration Tests

Unit Tests (per-service)

Unit tests form the bulk of the test suite. They isolate service logic from infrastructure by substituting real database pools and Redis clients with in-memory fakes.

Pattern: In-memory fakes over generic mocks. Rather than using plain MagicMock objects, AEGIS tests build purpose-built fake implementations that simulate real behavior:

  • FakePostgresPool (approval-service, orchestration-engine) — maintains an in-memory dict of rows, interprets basic SQL patterns (INSERT, SELECT, UPDATE, DELETE), and supports execute, fetch, fetchrow, fetchval.
  • FakeAgeGraph (knowledge-graph-service) — an in-memory graph store that simulates Apache AGE Cypher operations including vertex/edge CRUD, traversal, and pattern matching.
  • Mock Redis client (memory-service) — a mock that uses side_effect callbacks backed by an in-memory dict store, supporting Hash operations (hgetall, hget, hset, hdel, hexists) and pipeline support.

This approach means tests exercise real parsing logic, query routing, and business rules — not just “was the function called with the right arguments.”

Integration Tests (end-to-end)

The integration test script at infrastructure/scripts/integration-test.sh tests the full platform pipeline:

  1. Health checks across all services (ports 8001-8009 and gateway at 8000)
  2. Authentication — obtain a JWT token via the dev email/password login
  3. Data ingestion — trigger RRC scraper, verify entity extraction (20 wells)
  4. Knowledge graph verification — assemble context for a specific well
  5. Agent execution — run Rule 37 agent through the gateway, verify HITL pause
  6. HITL approval — approve a pending filing, verify status transition
  7. Emissions calculation — verify flaring monitor computes CO2e correctly

Integration tests require all services to be running. Use ./infrastructure/scripts/start-all.sh to boot the entire platform before running the integration test.

Test Isolation Strategy

FastAPI TestClient with Noop Lifespan

Every service that uses a FastAPI lifespan handler (which connects to databases and Redis on startup) replaces it with a no-op lifespan during tests:

from contextlib import asynccontextmanager @asynccontextmanager async def noop_lifespan(app): yield main_module.app.router.lifespan_context = noop_lifespan

This prevents tests from attempting real infrastructure connections while still allowing the full FastAPI router to be exercised via TestClient.

Module-Level Dependency Injection

Test fixtures patch the service’s global variables directly before creating the TestClient:

@pytest.fixture def app_client(fake_pg): import approval.main as main_module main_module.pg = fake_pg @asynccontextmanager async def noop_lifespan(app): yield main_module.app.router.lifespan_context = noop_lifespan return TestClient(main_module.app)

This gives each test a fresh, isolated environment with no shared state between test runs.

Test Organization

Tests follow a class-based organization pattern, grouping related test cases:

class TestVertexCrud: async def test_create_vertex(self, crud): ... async def test_get_vertex_not_found(self, crud): ... class TestEdgeCrud: async def test_create_edge(self, crud): ...

Both sync and async test methods are supported. API-level tests (testing HTTP endpoints via TestClient) are synchronous. Business logic tests (testing service classes directly) are async.

Flaring Monitor: Real Database Tests

The flaring-monitor service is unique in that some of its tests connect to the real local PostgreSQL database. Its conftest.py uses pytest_asyncio.fixture to create a real PostgresPool, runs migration SQL files, and performs cleanup before and after each test:

@pytest_asyncio.fixture async def pg(): pool = PostgresPool(dsn=DATABASE_URL, min_size=1, max_size=5) await pool.connect() # Run migrations, clean up test data... yield pool # Clean up after test... await pool.disconnect()

The flaring-monitor integration tests require a running PostgreSQL instance. Make sure Docker Compose infrastructure is up before running these tests: docker compose up -d.

What is Not Tested

  • LLM calls — The orchestration engine’s LLM interactions are not tested with real API calls. Tests focus on the surrounding pipeline (routing, state management, budget enforcement).
  • Kafka event publishing — Event publishing is mocked in tests. The integration test verifies the full pipeline indirectly.
  • Frontend — The Next.js frontend does not currently have a test suite. Testing is manual.
  • Go API Gateway — The gateway does not have Go unit tests. It is validated through the integration test script.
Last updated on