Skip to Content

Memory System

AEGIS uses a three-layer memory architecture to provide agents with relevant context during execution. The memory service (port 8002) manages all three layers.

Architecture

+------------------------------------------------------------------+ | Memory Service (:8002) | +------------------------------------------------------------------+ | | | +------------------+ +--------------------+ +----------------+ | | | Working Memory | | Episodic Memory | | Injection | | | | (Redis Hash) | | (pgvector) | | Ledger | | | | | | | | (Redis Hash) | | | | Key-value per | | Conversation | | | | | | conversation | | summaries with | | Tracks what | | | | 24h TTL | | vector embeddings | | has been | | | | 64KB max | | Semantic search | | injected | | | +--------+---------+ +---------+----------+ +--------+-------+ | | | | | | | v v v | | Redis 7 PostgreSQL 15 Redis 7 | +------------------------------------------------------------------+

Layer 1: Working Memory

Working memory stores ephemeral, fast-access data for the current conversation session. It is implemented as a Redis Hash at the key working_memory:{conversation_id}.

Properties

PropertyValue
StorageRedis Hash
Key patternworking_memory:{conversation_id}
TTL24 hours (refreshed on every read or write)
Max size64KB per conversation
SerializationJSON for complex values, raw strings for simple values

Data Stored

FieldTypeDescription
scratchpadstringFree-form notes and intermediate results
entitieslist[dict]Extracted entities (type, id, name) from the conversation
well_apistringActive well API number for context assembly
entity_idstringActive managed entity ID
entity_type_keystringEntity type key for managed entity context

API Endpoints

MethodPathDescription
GET /working-memory/{conversation_id}Get all working memory fieldsReturns the full Hash as a dict
PUT /working-memory/{conversation_id}Set fields (merge, not replace)Body: {data: {key: value}}
DELETE /working-memory/{conversation_id}Delete fields or entire memoryBody: {fields: ["key1"]} or no body for full delete

Implementation

The WorkingMemory class (memory/working.py) wraps Redis Hash operations:

class WorkingMemory: async def get(self, conversation_id: str) -> dict[str, Any]: """Get all fields. Refreshes TTL on access.""" async def set(self, conversation_id: str, data: dict[str, Any]) -> None: """Set one or more fields. Checks 64KB size limit.""" async def delete(self, conversation_id: str, fields: list[str] | None) -> None: """Delete specific fields or entire working memory.""" async def set_scratchpad(self, conversation_id: str, content: str) -> None: async def get_scratchpad(self, conversation_id: str) -> str | None: async def set_entities(self, conversation_id: str, entities: list[dict]) -> None: async def get_entities(self, conversation_id: str) -> list[dict]:

The 64KB size limit is enforced after every write operation by checking redis.memory_usage() on the Hash key. If the limit is exceeded, the write succeeds but raises a ValueError that returns HTTP 413 to the caller.

Layer 2: Episodic Memory

Episodic memory stores long-term conversation summaries with vector embeddings for semantic retrieval. It persists across sessions and enables agents to recall relevant past interactions.

Properties

PropertyValue
StoragePostgreSQL table episodic_memories
Embedding modelOpenAI text-embedding-3-small (1536 dimensions)
IndexIVFFlat with cosine distance, 100 lists
SearchCosine similarity via pgvector <=> operator
Filtered byagent_id (required), user_id (optional)

Database Schema

CREATE TABLE episodic_memories ( id UUID PRIMARY KEY, agent_id VARCHAR(100) NOT NULL, user_id VARCHAR(100), conversation_id VARCHAR(100) NOT NULL, summary TEXT NOT NULL, key_decisions JSONB, entities_mentioned JSONB, tools_called JSONB, embedding vector(1536), created_at TIMESTAMPTZ DEFAULT NOW() ); CREATE INDEX idx_episodic_embedding ON episodic_memories USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

API Endpoints

MethodPathDescription
POST /episodic/storeStore a conversation summaryGenerates embedding, inserts into PostgreSQL
POST /episodic/searchSemantic searchReturns top-k results by cosine similarity
GET /episodic/{conversation_id}Get memories for a conversationReturns all stored summaries

Search Flow

  1. The query text is embedded using text-embedding-3-small
  2. pgvector performs approximate nearest neighbor search using the IVFFlat index
  3. Results are filtered by agent_id (and optionally user_id)
  4. Similarity score is computed as 1 - cosine_distance
  5. Top-k results are returned ordered by similarity
# Semantic search query SELECT id, summary, key_decisions, entities_mentioned, 1 - (embedding <=> $1::vector) AS similarity FROM episodic_memories WHERE agent_id = $2 ORDER BY embedding <=> $1::vector LIMIT $3

How Episodic Memory is Used

During agent execution, the memory_node calls search_episodic_memory() with the latest user message. The top 3 most similar past conversations are injected as context:

[Episodic Memory -- Relevant Past Conversations] - (similarity: 0.87) Previously filed Rule 37 exception for Mitchell Ranch 2H... - (similarity: 0.72) Discussed spacing requirements for Spraberry Trend wells...

Layer 3: Injection Ledger

The injection ledger prevents duplicate context injection within a conversation. It tracks which skills, entities, and artifacts have already been injected so the same content is not added to the message history twice.

Properties

PropertyValue
StorageRedis Hash
Key patternskill:ledger:{conversation_id}
TTLNone (persists until conversation ends or manual eviction)
ValuesString markers (typically “injected” or “1”)

How It Works

When the skill injection node processes a skill:

  1. Check: HEXISTS skill:ledger:{conversation_id} skill:{skill_id} — if the key exists, skip injection
  2. Inject: Add the skill’s Tier 2/3/3.5 content as system messages
  3. Mark: HSET skill:ledger:{conversation_id} skill:{skill_id} "injected" — record that this skill was injected

This ensures that even if the LLM emits SKILL_SELECT:spacing-calculation multiple times across turns, the skill content is only injected once.

API Endpoints

MethodPathDescription
GET /ledger/{conversation_id}Get all injected itemsReturns the full ledger Hash
POST /ledger/{conversation_id}/checkCheck if an item was injectedBody: {item_key: "skill:spacing-calculation"}
POST /ledger/{conversation_id}/markMark an item as injectedBody: {item_key: "skill:spacing-calculation", value: "injected"}
POST /ledger/{conversation_id}/evictRemove an item from the ledgerBody: {item_key: "skill:spacing-calculation"} — allows re-injection

Shared Redis Manager

Both the working memory and injection ledger use the shared RedisManager class from aegis_shared/db/redis.py:

class RedisManager: # Key-value helpers async def get(self, key: str) -> str | None async def set(self, key: str, value: str, ex: int | None = None) -> None async def delete(self, key: str) -> None # Injection ledger helpers def _ledger_key(self, conversation_id: str) -> str: return f"skill:ledger:{conversation_id}" async def ledger_has(self, conversation_id: str, item_key: str) -> bool async def ledger_mark(self, conversation_id: str, item_key: str, value: str = "1") -> None async def ledger_get_all(self, conversation_id: str) -> dict[str, str]

Memory in the Pipeline

The memory_node in the LangGraph pipeline integrates all three layers:

memory_node | +-- GET /working-memory/{conversation_id} --> scratchpad, entities | +-- POST /episodic/search --> similar past conversations | +-- (injection ledger is used later by skill_inject_node)

The retrieved context is formatted as a system message and inserted after the main system prompt, before the user’s first message. This gives the LLM awareness of both the current conversation state (working memory) and relevant past interactions (episodic memory) without the user needing to repeat context.

Last updated on