Memory System
AEGIS uses a three-layer memory architecture to provide agents with relevant context during execution. The memory service (port 8002) manages all three layers.
Architecture
+------------------------------------------------------------------+
| Memory Service (:8002) |
+------------------------------------------------------------------+
| |
| +------------------+ +--------------------+ +----------------+ |
| | Working Memory | | Episodic Memory | | Injection | |
| | (Redis Hash) | | (pgvector) | | Ledger | |
| | | | | | (Redis Hash) | |
| | Key-value per | | Conversation | | | |
| | conversation | | summaries with | | Tracks what | |
| | 24h TTL | | vector embeddings | | has been | |
| | 64KB max | | Semantic search | | injected | |
| +--------+---------+ +---------+----------+ +--------+-------+ |
| | | | |
| v v v |
| Redis 7 PostgreSQL 15 Redis 7 |
+------------------------------------------------------------------+Layer 1: Working Memory
Working memory stores ephemeral, fast-access data for the current conversation session. It is implemented as a Redis Hash at the key working_memory:{conversation_id}.
Properties
| Property | Value |
|---|---|
| Storage | Redis Hash |
| Key pattern | working_memory:{conversation_id} |
| TTL | 24 hours (refreshed on every read or write) |
| Max size | 64KB per conversation |
| Serialization | JSON for complex values, raw strings for simple values |
Data Stored
| Field | Type | Description |
|---|---|---|
scratchpad | string | Free-form notes and intermediate results |
entities | list[dict] | Extracted entities (type, id, name) from the conversation |
well_api | string | Active well API number for context assembly |
entity_id | string | Active managed entity ID |
entity_type_key | string | Entity type key for managed entity context |
API Endpoints
| Method | Path | Description |
|---|---|---|
GET /working-memory/{conversation_id} | Get all working memory fields | Returns the full Hash as a dict |
PUT /working-memory/{conversation_id} | Set fields (merge, not replace) | Body: {data: {key: value}} |
DELETE /working-memory/{conversation_id} | Delete fields or entire memory | Body: {fields: ["key1"]} or no body for full delete |
Implementation
The WorkingMemory class (memory/working.py) wraps Redis Hash operations:
class WorkingMemory:
async def get(self, conversation_id: str) -> dict[str, Any]:
"""Get all fields. Refreshes TTL on access."""
async def set(self, conversation_id: str, data: dict[str, Any]) -> None:
"""Set one or more fields. Checks 64KB size limit."""
async def delete(self, conversation_id: str, fields: list[str] | None) -> None:
"""Delete specific fields or entire working memory."""
async def set_scratchpad(self, conversation_id: str, content: str) -> None:
async def get_scratchpad(self, conversation_id: str) -> str | None:
async def set_entities(self, conversation_id: str, entities: list[dict]) -> None:
async def get_entities(self, conversation_id: str) -> list[dict]:The 64KB size limit is enforced after every write operation by checking redis.memory_usage() on the Hash key. If the limit is exceeded, the write succeeds but raises a ValueError that returns HTTP 413 to the caller.
Layer 2: Episodic Memory
Episodic memory stores long-term conversation summaries with vector embeddings for semantic retrieval. It persists across sessions and enables agents to recall relevant past interactions.
Properties
| Property | Value |
|---|---|
| Storage | PostgreSQL table episodic_memories |
| Embedding model | OpenAI text-embedding-3-small (1536 dimensions) |
| Index | IVFFlat with cosine distance, 100 lists |
| Search | Cosine similarity via pgvector <=> operator |
| Filtered by | agent_id (required), user_id (optional) |
Database Schema
CREATE TABLE episodic_memories (
id UUID PRIMARY KEY,
agent_id VARCHAR(100) NOT NULL,
user_id VARCHAR(100),
conversation_id VARCHAR(100) NOT NULL,
summary TEXT NOT NULL,
key_decisions JSONB,
entities_mentioned JSONB,
tools_called JSONB,
embedding vector(1536),
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE INDEX idx_episodic_embedding ON episodic_memories
USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);API Endpoints
| Method | Path | Description |
|---|---|---|
POST /episodic/store | Store a conversation summary | Generates embedding, inserts into PostgreSQL |
POST /episodic/search | Semantic search | Returns top-k results by cosine similarity |
GET /episodic/{conversation_id} | Get memories for a conversation | Returns all stored summaries |
Search Flow
- The query text is embedded using
text-embedding-3-small - pgvector performs approximate nearest neighbor search using the IVFFlat index
- Results are filtered by
agent_id(and optionallyuser_id) - Similarity score is computed as
1 - cosine_distance - Top-k results are returned ordered by similarity
# Semantic search query
SELECT id, summary, key_decisions, entities_mentioned,
1 - (embedding <=> $1::vector) AS similarity
FROM episodic_memories
WHERE agent_id = $2
ORDER BY embedding <=> $1::vector
LIMIT $3How Episodic Memory is Used
During agent execution, the memory_node calls search_episodic_memory() with the latest user message. The top 3 most similar past conversations are injected as context:
[Episodic Memory -- Relevant Past Conversations]
- (similarity: 0.87) Previously filed Rule 37 exception for Mitchell Ranch 2H...
- (similarity: 0.72) Discussed spacing requirements for Spraberry Trend wells...Layer 3: Injection Ledger
The injection ledger prevents duplicate context injection within a conversation. It tracks which skills, entities, and artifacts have already been injected so the same content is not added to the message history twice.
Properties
| Property | Value |
|---|---|
| Storage | Redis Hash |
| Key pattern | skill:ledger:{conversation_id} |
| TTL | None (persists until conversation ends or manual eviction) |
| Values | String markers (typically “injected” or “1”) |
How It Works
When the skill injection node processes a skill:
- Check:
HEXISTS skill:ledger:{conversation_id} skill:{skill_id}— if the key exists, skip injection - Inject: Add the skill’s Tier 2/3/3.5 content as system messages
- Mark:
HSET skill:ledger:{conversation_id} skill:{skill_id} "injected"— record that this skill was injected
This ensures that even if the LLM emits SKILL_SELECT:spacing-calculation multiple times across turns, the skill content is only injected once.
API Endpoints
| Method | Path | Description |
|---|---|---|
GET /ledger/{conversation_id} | Get all injected items | Returns the full ledger Hash |
POST /ledger/{conversation_id}/check | Check if an item was injected | Body: {item_key: "skill:spacing-calculation"} |
POST /ledger/{conversation_id}/mark | Mark an item as injected | Body: {item_key: "skill:spacing-calculation", value: "injected"} |
POST /ledger/{conversation_id}/evict | Remove an item from the ledger | Body: {item_key: "skill:spacing-calculation"} — allows re-injection |
Shared Redis Manager
Both the working memory and injection ledger use the shared RedisManager class from aegis_shared/db/redis.py:
class RedisManager:
# Key-value helpers
async def get(self, key: str) -> str | None
async def set(self, key: str, value: str, ex: int | None = None) -> None
async def delete(self, key: str) -> None
# Injection ledger helpers
def _ledger_key(self, conversation_id: str) -> str:
return f"skill:ledger:{conversation_id}"
async def ledger_has(self, conversation_id: str, item_key: str) -> bool
async def ledger_mark(self, conversation_id: str, item_key: str, value: str = "1") -> None
async def ledger_get_all(self, conversation_id: str) -> dict[str, str]Memory in the Pipeline
The memory_node in the LangGraph pipeline integrates all three layers:
memory_node
|
+-- GET /working-memory/{conversation_id} --> scratchpad, entities
|
+-- POST /episodic/search --> similar past conversations
|
+-- (injection ledger is used later by skill_inject_node)The retrieved context is formatted as a system message and inserted after the main system prompt, before the user’s first message. This gives the LLM awareness of both the current conversation state (working memory) and relevant past interactions (episodic memory) without the user needing to repeat context.