Memory System

AEGIS uses a three-layer memory architecture to provide agents with relevant context during execution. The memory service (port 8002) manages all three layers.

Architecture


+------------------------------------------------------------------+
|                       Memory Service (:8002)                      |
+------------------------------------------------------------------+
|                                                                    |
|  +------------------+  +--------------------+  +----------------+ |
|  | Working Memory   |  | Episodic Memory    |  | Injection      | |
|  | (Redis Hash)     |  | (pgvector)         |  | Ledger         | |
|  |                  |  |                    |  | (Redis Hash)   | |
|  | Key-value per    |  | Conversation       |  |                | |
|  | conversation     |  | summaries with     |  | Tracks what    | |
|  | 24h TTL          |  | vector embeddings  |  | has been       | |
|  | 64KB max         |  | Semantic search    |  | injected       | |
|  +--------+---------+  +---------+----------+  +--------+-------+ |
|           |                      |                       |         |
|           v                      v                       v         |
|        Redis 7              PostgreSQL 15             Redis 7      |
+------------------------------------------------------------------+

Layer 1: Working Memory

Working memory stores ephemeral, fast-access data for the current conversation session. It is implemented as a Redis Hash at the key working_memory:{conversation_id}.

Properties

Property	Value
Storage	Redis Hash
Key pattern	`working_memory:{conversation_id}`
TTL	24 hours (refreshed on every read or write)
Max size	64KB per conversation
Serialization	JSON for complex values, raw strings for simple values

Data Stored

Field	Type	Description
`scratchpad`	string	Free-form notes and intermediate results
`entities`	list[dict]	Extracted entities (type, id, name) from the conversation
`well_api`	string	Active well API number for context assembly
`entity_id`	string	Active managed entity ID
`entity_type_key`	string	Entity type key for managed entity context

API Endpoints

Method	Path	Description
`GET /working-memory/{conversation_id}`	Get all working memory fields	Returns the full Hash as a dict
`PUT /working-memory/{conversation_id}`	Set fields (merge, not replace)	Body: `{data: {key: value}}`
`DELETE /working-memory/{conversation_id}`	Delete fields or entire memory	Body: `{fields: ["key1"]}` or no body for full delete

Implementation

The WorkingMemory class (memory/working.py) wraps Redis Hash operations:


class WorkingMemory:
    async def get(self, conversation_id: str) -> dict[str, Any]:
        """Get all fields. Refreshes TTL on access."""
 
    async def set(self, conversation_id: str, data: dict[str, Any]) -> None:
        """Set one or more fields. Checks 64KB size limit."""
 
    async def delete(self, conversation_id: str, fields: list[str] | None) -> None:
        """Delete specific fields or entire working memory."""
 
    async def set_scratchpad(self, conversation_id: str, content: str) -> None:
    async def get_scratchpad(self, conversation_id: str) -> str | None:
    async def set_entities(self, conversation_id: str, entities: list[dict]) -> None:
    async def get_entities(self, conversation_id: str) -> list[dict]:

The 64KB size limit is enforced after every write operation by checking redis.memory_usage() on the Hash key. If the limit is exceeded, the write succeeds but raises a ValueError that returns HTTP 413 to the caller.

Layer 2: Episodic Memory

Episodic memory stores long-term conversation summaries with vector embeddings for semantic retrieval. It persists across sessions and enables agents to recall relevant past interactions.

Properties

Property	Value
Storage	PostgreSQL table `episodic_memories`
Embedding model	OpenAI `text-embedding-3-small` (1536 dimensions)
Index	IVFFlat with cosine distance, 100 lists
Search	Cosine similarity via pgvector `<=>` operator
Filtered by	`agent_id` (required), `user_id` (optional)

Database Schema


CREATE TABLE episodic_memories (
    id UUID PRIMARY KEY,
    agent_id VARCHAR(100) NOT NULL,
    user_id VARCHAR(100),
    conversation_id VARCHAR(100) NOT NULL,
    summary TEXT NOT NULL,
    key_decisions JSONB,
    entities_mentioned JSONB,
    tools_called JSONB,
    embedding vector(1536),
    created_at TIMESTAMPTZ DEFAULT NOW()
);
 
CREATE INDEX idx_episodic_embedding ON episodic_memories
    USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);

API Endpoints

Method	Path	Description
`POST /episodic/store`	Store a conversation summary	Generates embedding, inserts into PostgreSQL
`POST /episodic/search`	Semantic search	Returns top-k results by cosine similarity
`GET /episodic/{conversation_id}`	Get memories for a conversation	Returns all stored summaries

Search Flow

The query text is embedded using text-embedding-3-small
pgvector performs approximate nearest neighbor search using the IVFFlat index
Results are filtered by agent_id (and optionally user_id)
Similarity score is computed as 1 - cosine_distance
Top-k results are returned ordered by similarity


# Semantic search query
SELECT id, summary, key_decisions, entities_mentioned,
       1 - (embedding <=> $1::vector) AS similarity
FROM episodic_memories
WHERE agent_id = $2
ORDER BY embedding <=> $1::vector
LIMIT $3

How Episodic Memory is Used

During agent execution, the memory_node calls search_episodic_memory() with the latest user message. The top 3 most similar past conversations are injected as context:


[Episodic Memory -- Relevant Past Conversations]
  - (similarity: 0.87) Previously filed Rule 37 exception for Mitchell Ranch 2H...
  - (similarity: 0.72) Discussed spacing requirements for Spraberry Trend wells...

Layer 3: Injection Ledger

The injection ledger prevents duplicate context injection within a conversation. It tracks which skills, entities, and artifacts have already been injected so the same content is not added to the message history twice.

Properties

Property	Value
Storage	Redis Hash
Key pattern	`skill:ledger:{conversation_id}`
TTL	None (persists until conversation ends or manual eviction)
Values	String markers (typically “injected” or “1”)

How It Works

When the skill injection node processes a skill:

Check: HEXISTS skill:ledger:{conversation_id} skill:{skill_id} — if the key exists, skip injection
Inject: Add the skill’s Tier 2/3/3.5 content as system messages
Mark: HSET skill:ledger:{conversation_id} skill:{skill_id} "injected" — record that this skill was injected

This ensures that even if the LLM emits SKILL_SELECT:spacing-calculation multiple times across turns, the skill content is only injected once.

API Endpoints

Method	Path	Description
`GET /ledger/{conversation_id}`	Get all injected items	Returns the full ledger Hash
`POST /ledger/{conversation_id}/check`	Check if an item was injected	Body: `{item_key: "skill:spacing-calculation"}`
`POST /ledger/{conversation_id}/mark`	Mark an item as injected	Body: `{item_key: "skill:spacing-calculation", value: "injected"}`
`POST /ledger/{conversation_id}/evict`	Remove an item from the ledger	Body: `{item_key: "skill:spacing-calculation"}` — allows re-injection

Shared Redis Manager

Both the working memory and injection ledger use the shared RedisManager class from aegis_shared/db/redis.py:


class RedisManager:
    # Key-value helpers
    async def get(self, key: str) -> str | None
    async def set(self, key: str, value: str, ex: int | None = None) -> None
    async def delete(self, key: str) -> None
 
    # Injection ledger helpers
    def _ledger_key(self, conversation_id: str) -> str:
        return f"skill:ledger:{conversation_id}"
 
    async def ledger_has(self, conversation_id: str, item_key: str) -> bool
    async def ledger_mark(self, conversation_id: str, item_key: str, value: str = "1") -> None
    async def ledger_get_all(self, conversation_id: str) -> dict[str, str]

Memory in the Pipeline

The memory_node in the LangGraph pipeline integrates all three layers:


memory_node
  |
  +-- GET /working-memory/{conversation_id}     --> scratchpad, entities
  |
  +-- POST /episodic/search                     --> similar past conversations
  |
  +-- (injection ledger is used later by skill_inject_node)

The retrieved context is formatted as a system message and inserted after the main system prompt, before the user’s first message. This gives the LLM awareness of both the current conversation state (working memory) and relevant past interactions (episodic memory) without the user needing to repeat context.