Developer Glossary

Technical terms and concepts used throughout the AEGIS codebase and documentation.

A

Alembic

A database migration tool for SQLAlchemy and raw SQL. AEGIS uses per-service Alembic configurations to manage schema changes. Each service that needs migrations maintains its own alembic.ini and alembic/ directory. See Migrations.

Apache AGE

A PostgreSQL extension that adds graph database capabilities. AGE stores vertices and edges in a property graph and supports openCypher queries. AEGIS uses AGE to model oil and gas entities (wells, leases, operators) and their relationships. Requires LOAD 'age' and SET search_path = ag_catalog, "$user", public before any Cypher query.

API Gateway

A Go reverse proxy (services/api-gateway/) that sits in front of all backend services. It handles JWT authentication (Bearer header or aegis_token cookie), rate limiting (100 requests/minute), CORS, and request routing. All frontend traffic flows through port 8000.

AsyncPG

A high-performance async PostgreSQL client library for Python. Used by all Python services for database access. The shared PostgresPool helper in shared/src/aegis_shared/db/postgres.py wraps asyncpg connection pools.

agtype

The return type used by Apache AGE for all Cypher query results. Values are returned as strings with type suffixes like ::vertex, ::edge, ::integer. The AgePool.cypher() method automatically parses agtype values into Python dicts and primitives.

B

Budget (Token/Cost)

Per-agent limits on token usage and dollar cost per execution and per day. Enforced by orchestration/budget.py. When a budget is exceeded, a BudgetExceededError is raised and the execution stops. Each agent YAML config defines max_tokens_per_execution, max_cost_usd_per_execution, max_tokens_per_day, and max_cost_usd_per_day.

C

Confluent Kafka

The Kafka distribution used by AEGIS (Confluent Platform 7.6.0). The confluent-kafka Python package provides the producer and consumer clients wrapped in KafkaProducerManager and KafkaConsumerManager.

HTTP headers that control which origins can access an API. The API gateway and all backend services set permissive CORS headers (allow_origins=["*"]) for local development. In production, this should be restricted to the frontend domain.

D

Domain Tags

Labels on skills that indicate which compliance domain they belong to (e.g., spacing, rule_37, flaring, rule_32). Used by the context assembler to determine which graph sections to include when building context for an agent execution.

E

EventSourceResponse

A Starlette response class from the sse-starlette package that implements the Server-Sent Events protocol. Used by the orchestration engine to stream LangGraph execution events and workspace assessment events to the frontend.

F

FastAPI

The async Python web framework used by all 8 Python services. FastAPI uses Pydantic models for request/response validation and supports async/await natively.

G

GraphState

A TypedDict that defines the state schema for the LangGraph execution pipeline. Contains fields like messages, agent_id, agent_type, execution_id, tokens_used, cost_usd, pending_tool_calls, injected_skill_ids, and more. Defined in orchestration/state.py.

H

HITL (Human-in-the-Loop)

Mandatory checkpoints in the agent execution pipeline where a human reviewer must approve, reject, or modify the agent’s output before proceeding. HITL checkpoints are defined per-agent in the YAML configs under hitl_policies. Common checkpoint types include pre_filing and good_cause_review. HITL is never optional — all regulatory filings require human review.

HMAC (Hash-based Message Authentication Code)

Used to sign audit log entries. Each audit_logs row has a signature column containing an HMAC computed from the event data using HMAC_SIGNING_KEY. This ensures tamper detection on the append-only audit trail. In local dev, the key is a static value from .env; in production, it comes from HashiCorp Vault.

I

Injection Ledger

A Redis Hash stored at skill:ledger:{conversation_id} that tracks which skill content has been injected into an agent conversation. This prevents duplicate injection — if a Tier 2 skill definition was already injected in a prior turn, the ledger check prevents re-injection. Managed by the memory service and checked via ledger_check() / ledger_mark() in orchestration/services.py.

J

JWT (JSON Web Token)

The authentication token format used by AEGIS. Users log in with an email and password, the auth service returns a JWT in the aegis_token httpOnly cookie, and all subsequent API requests include the token in the Authorization: Bearer {token} header (the gateway also accepts the cookie directly for SSE). The API gateway validates tokens locally using the shared JWT_SECRET.

K

Kafka Topic

A named stream of events in Apache Kafka. AEGIS currently uses the entity-extraction-worker topic for publishing entity extraction events from the ingestion service. Topics are auto-created with 3 partitions and replication factor 1.

KRaft

Kafka’s built-in consensus protocol that replaces ZooKeeper. AEGIS runs Kafka in KRaft mode (KAFKA_PROCESS_ROLES: broker,controller) which combines the broker and controller roles in a single process, simplifying the local dev setup.

L

LangGraph

A library from LangChain for building stateful, multi-step AI agent pipelines as directed graphs. AEGIS uses LangGraph’s StateGraph to define the execution pipeline: START -> system_prompt -> memory -> llm_call -> tool_loop -> skill_select -> approval -> synthesis -> output_format -> END. Defined in orchestration/engine.py.

LiteLLM

A Python library for routing LLM calls across providers (OpenAI, Anthropic, etc.) using a unified API. AEGIS uses LiteLLM for model routing, allowing agents to specify primary and fallback models (e.g., gpt-4o primary, gpt-4o-mini fallback).

M

Memory Service

The service (services/memory-service/, port 8002) that manages three types of memory:

Working memory: Short-term conversation state stored in Redis
Episodic memory: Long-term conversation summaries with vector embeddings stored in PostgreSQL (pgvector)
Injection ledger: Redis Hashes tracking which skills have been injected per conversation

O

openCypher

The query language used by Apache AGE for graph operations. Similar to Neo4j’s Cypher. Example: MATCH (w:Well)-[:LOCATED_IN]->(l:Lease) WHERE w.api_number = '42-329-40001' RETURN w, l. All graph queries in AEGIS use openCypher syntax wrapped in the AGE SQL envelope: SELECT * FROM cypher('oilgas', $$ ... $$) AS (v agtype).

P

pgvector

A PostgreSQL extension for vector similarity search. AEGIS uses it to store 1536-dimensional embeddings (from OpenAI’s text-embedding-3-small model) on episodic memories. The episodic_memories table has an IVFFlat index for cosine similarity search.

Poetry

The Python dependency manager used by all AEGIS services. Each service has its own pyproject.toml managed by Poetry. Poetry handles virtual environments, dependency resolution, and package installation. The project uses virtualenvs.prefer-active-python = true to respect the pyenv-managed Python version.

Pydantic v2

The data validation library used for all request/response schemas and domain models in AEGIS. Pydantic v2 provides type-safe models with automatic JSON serialization. All shared models inherit from AegisBase (defined in shared/src/aegis_shared/models/common.py).

pyenv

A Python version manager. AEGIS pins Python 3.12 via a .python-version file at the repo root. pyenv ensures all developers and CI environments use the same Python version.

R

Redis Hash

A Redis data structure used for the injection ledger. Each conversation has a hash at skill:ledger:{conversation_id} where field names are skill/content keys and values indicate injection status. Redis Hashes support atomic field-level reads and writes, making them ideal for the ledger pattern.

Reverse Proxy

The API gateway pattern used by AEGIS. The Go gateway accepts all incoming HTTP requests on port 8000 and forwards them to the appropriate backend service based on the URL path. It also handles authentication, rate limiting, and CORS before forwarding.

S

Skill Tiers

The three-level injection architecture for agent skills:

Tier 1: Compact manifest (~50 tokens) containing name, description, and trigger keywords. Always injected into the system prompt so the LLM knows what skills exist.
Tier 2: Full skill specification (~200-800 tokens) with detailed steps, requirements, and output format. Injected on-demand when the skill is selected by the skill_select_node.
Tier 3: Artifact content (reference tables, form field guides, regulatory text). Large content stored separately and injected only when the specific skill requires it.

SSE (Server-Sent Events)

A web standard for one-directional streaming from server to client over HTTP. The server sends events as text/event-stream responses, and the browser receives them via the EventSource API. AEGIS uses SSE for real-time agent execution streaming and workspace assessment updates. See Event System.

StateGraph

The LangGraph class used to define the agent execution pipeline. A StateGraph is parameterized by a TypedDict (the state schema) and contains nodes (functions that transform state) connected by edges (routing logic). The compiled graph is invoked via agent_graph.astream(state) for streaming or agent_graph.ainvoke(state) for synchronous execution.

T

Tenant

A logical isolation boundary in AEGIS. Most database tables include a tenant_id column for multi-tenant data separation. The default tenant for local development is "default".

W

WorkspaceEventType

A StrEnum defining the structured event types emitted during workspace assessment: checklist_item_update, artifact_generated, data_table_update, form_field_update, validation_result, spatial_update, agent_status, assessment_complete. Each type maps to a Pydantic model in orchestration/compliance/workspace/events.py. See Event System.