Skip to Content

Docker Problems

AEGIS uses Docker Compose to run three infrastructure services: PostgreSQL (with Apache AGE and pgvector), Redis, and Kafka. This page covers common Docker-related issues and how to resolve them.

Docker Not Running

Symptoms

  • Cannot connect to the Docker daemon
  • docker compose up fails immediately
  • Error response from daemon: dial unix /var/run/docker.sock: connect: connection refused

Solution

Start Docker Desktop (macOS/Windows) or the Docker daemon (Linux):

# macOS — open Docker Desktop open -a Docker # Linux sudo systemctl start docker

Verify Docker is running:

docker info docker compose version

Containers Won’t Start

Symptoms

  • docker compose up -d completes but containers are not running
  • Container status shows Restarting or Exit 1

Diagnosis

# Check container status docker compose ps # Check logs for the failing container docker compose logs postgres docker compose logs redis docker compose logs kafka

Common Causes

Ports already in use:

# Check if something else is using the port lsof -i :5432 lsof -i :6379 lsof -i :9092

See the Port Conflicts page for solutions.

Previous container state:

If a container was stopped uncleanly, its data volume may be in a bad state:

# Remove containers and recreate docker compose down docker compose up -d

Insufficient resources:

Docker Desktop has memory and CPU limits. AEGIS needs at least 4 GB of memory allocated to Docker. Check Docker Desktop preferences.

PostgreSQL Init Failures

Symptoms

  • aegis-postgres container exits during startup
  • Logs show FATAL: database "aegis" does not exist or init script errors
  • Tables or extensions are missing after startup

Diagnosis

docker compose logs postgres 2>&1 | tail -50

How Init Scripts Work

The PostgreSQL container mounts several SQL files into /docker-entrypoint-initdb.d/:

volumes: - ./infrastructure/docker/postgres/init.sql:/docker-entrypoint-initdb.d/001_init.sql - ./infrastructure/docker/postgres/002_checklist_compliance_tables.sql:/docker-entrypoint-initdb.d/002_checklist_compliance_tables.sql - ./infrastructure/docker/postgres/007_entity_type_definitions.sql:/docker-entrypoint-initdb.d/007_entity_type_definitions.sql - ./infrastructure/docker/postgres/00-create-extension-age.sql:/docker-entrypoint-initdb.d/00-create-extension-age.sql

Init scripts only run on first container creation (when the data volume is empty). If you change an init script, you must delete the volume to re-run it.

Solutions

Init scripts not running (volume already exists):

docker compose down rm -rf docker-volumes/postgres docker compose up -d postgres

SQL syntax error in init script:

Check the postgres logs for the specific error:

docker compose logs postgres 2>&1 | grep -i error

Fix the SQL file and recreate the volume:

docker compose down rm -rf docker-volumes/postgres docker compose up -d postgres

Apache AGE extension fails to load:

The AEGIS PostgreSQL image is based on apache/age:release_PG15_1.6.0 with pgvector compiled on top. The 00-create-extension-age.sql script creates the extension with IF NOT EXISTS to prevent crashes on fresh init.

If AGE is not loading, check that the custom Dockerfile built successfully:

docker compose build postgres docker compose up -d postgres

Verifying the database is ready:

# Connect to PostgreSQL psql -h localhost -U aegis -d aegis # Check AGE extension LOAD 'age'; SET search_path = ag_catalog, "$user", public; SELECT * FROM ag_catalog.ag_graph; # Check pgvector extension SELECT * FROM pg_extension WHERE extname = 'vector'; # Check tables exist \dt

Redis Connectivity Issues

Symptoms

  • aegis-redis container is running but services cannot connect
  • redis-cli ping returns an error or times out

Diagnosis

# Check container status and health docker compose ps redis # Container health check docker compose exec redis redis-cli ping # Expected: PONG # Check from host redis-cli -h localhost -p 6379 ping

Solutions

Container health check failing:

The Redis container has a health check configured:

healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 5s timeout: 5s retries: 5

If the health check is failing, the container may be starting slowly. Wait a few seconds and check again.

Redis data corruption:

docker compose stop redis rm -rf docker-volumes/redis docker compose up -d redis

Memory issues:

Redis defaults can run out of memory if large amounts of working memory or ledger data accumulate. For local development, this is rarely an issue, but you can flush all data:

redis-cli FLUSHALL

Kafka Connectivity Issues

Symptoms

  • aegis-kafka container is Restarting in a loop
  • Services report KafkaError{code=_TRANSPORT} or NoBrokersAvailable
  • Topics cannot be created

Diagnosis

# Check container status docker compose ps kafka # Check logs docker compose logs kafka 2>&1 | tail -30

Solutions

Cluster ID mismatch:

Kafka in AEGIS uses KRaft mode (no ZooKeeper) with a hardcoded cluster ID. If the data volume has stale metadata from a previous cluster, Kafka will fail to start:

docker compose stop kafka rm -rf docker-volumes/kafka docker compose up -d kafka

Broker not reachable from services:

The Kafka advertised listener is configured as localhost:9092:

KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092

This works for services running on the host machine. If you run AEGIS services inside Docker containers, you need to change this to the container network address.

Kafka takes too long to start:

Kafka KRaft mode takes a few seconds to elect a controller and become ready. Services that start before Kafka is ready may fail to connect. The start-all.sh script adds a sleep after infrastructure startup to handle this.

Rebuilding Everything from Scratch

When all else fails, a clean rebuild resolves most Docker issues:

# Stop all services ./infrastructure/scripts/start-all.sh stop # Stop and remove all containers docker compose down # Remove all data volumes rm -rf docker-volumes/postgres docker-volumes/redis docker-volumes/kafka # Rebuild the custom PostgreSQL image docker compose build # Start fresh docker compose up -d

Verify everything is healthy:

docker compose ps

Expected output:

NAME IMAGE STATUS PORTS aegis-kafka confluentinc/cp-kafka:7.6.0 Up X seconds 0.0.0.0:9092->9092/tcp aegis-postgres aegis-postgres Up X seconds (healthy) 0.0.0.0:5432->5432/tcp aegis-redis redis:7-alpine Up X seconds (healthy) 0.0.0.0:6379->6379/tcp

Rebuilding from scratch deletes all local data including knowledge graph entities, agent definitions, episodic memories, and any seeded data. You will need to re-seed after rebuilding.

Checking Container Logs

Quick reference for viewing logs:

# All containers docker compose logs # Specific container docker compose logs postgres # Follow logs (stream in real time) docker compose logs -f postgres # Last 50 lines docker compose logs --tail 50 postgres # Logs with timestamps docker compose logs -t postgres

Docker Compose Reference

The full docker-compose.yml defines three services:

ServiceImageData VolumeHealth Check
postgresCustom (apache/age + pgvector)docker-volumes/postgrespg_isready -U aegis
redisredis:7-alpinedocker-volumes/redisredis-cli ping
kafkaconfluentinc/cp-kafka:7.6.0docker-volumes/kafkaNone (no built-in health check)

All services are on the aegis-network Docker network. The PostgreSQL container builds from a custom Dockerfile at infrastructure/docker/postgres/Dockerfile which extends the Apache AGE image with pgvector support.

Docker Resource Requirements

Minimum recommended Docker Desktop settings for AEGIS:

ResourceMinimumRecommended
Memory4 GB6 GB
CPU2 cores4 cores
Disk5 GB10 GB

PostgreSQL with AGE and pgvector is the most resource-intensive container. If you experience slow query performance or out-of-memory errors, increase Docker’s memory allocation.

Last updated on