Seed Data
AEGIS provides multiple seed scripts that populate the database and knowledge graph with sample data for development and demos. This page documents each seed script, what it creates, and how to run it.
Seed Scripts Overview
| Script | Location | Populates | Run Method |
|---|---|---|---|
| Knowledge Graph Seed | services/knowledge-graph-service/src/knowledge_graph/seed.py | Apache AGE graph (wells, leases, operators, etc.) | curl -X POST http://localhost:8003/seed |
| Skill Registry Seed | services/orchestration-engine/src/orchestration/seed_skills.py | skills and skill_artifacts tables | poetry run python -m orchestration.seed_skills |
| Checklist Template Seed | services/orchestration-engine/src/orchestration/seed_checklists.py | checklist_templates table | poetry run python -m orchestration.seed_checklists |
| Rule Version Seed | services/orchestration-engine/src/orchestration/seed_rules.py | rule_versions table | poetry run python -m orchestration.seed_rules |
| Demo Data Seed | services/orchestration-engine/src/orchestration/seed_demo_data.py | compliance_status, filing_checklists, chart data | poetry run python -m orchestration.seed_demo_data |
| Conversation Seed | services/orchestration-engine/src/orchestration/seed_conversations.py | conversations and conversation_messages tables | poetry run python -m orchestration.seed_conversations |
Running All Seeds
After starting infrastructure with docker compose up -d and ensuring all services are running:
# 1. Seed the knowledge graph (requires KG service running on port 8003)
curl -X POST http://localhost:8003/seed
# 2. Seed skills, checklists, rules, and demo data
cd services/orchestration-engine
poetry run python -m orchestration.seed_skills
poetry run python -m orchestration.seed_checklists
poetry run python -m orchestration.seed_rules
poetry run python -m orchestration.seed_demo_dataAll seed scripts are idempotent — they check for existing records before inserting and will update if the record already exists. They are safe to re-run.
Knowledge Graph Seed
Endpoint: POST http://localhost:8003/seed
File: services/knowledge-graph-service/src/knowledge_graph/seed.py
Creates a realistic Permian Basin scenario in the Apache AGE oilgas graph. This is the most comprehensive seed script, building the full entity topology.
Entities Created
| Entity Type | Count | Examples |
|---|---|---|
| Operators | 2 | Permian Basin Energy LLC, Basin Midstream Partners |
| Fields | 3 | Spraberry (Trend Area), Delaware Basin, Goldsmith |
| Formations | 2 | Wolfcamp A, Bone Spring |
| Leases | 5 | Mitchell Ranch, Jones Unit, Davis Ranch, Howard Unit, Delaware Basin Unit |
| Regulations | 2 | Rule 37 (Spacing), Rule 32 (Flaring) |
| Permits | 1 | W-1 permit for Mitchell Ranch 1H |
| Flaring Authorizations | 2 | Mitchell Ranch (expiring), Howard Unit (expiring ~30d) |
| Wells | 12 | Across 4 wellpads (Pad A-D) with full Form PR data |
| Wellpads | 4 | Pad A (Spraberry, 4 wells), Pad B (Spraberry, 3 wells), Pad C (Delaware, 3 wells), Pad D (Goldsmith, 2 wells) |
| Facilities | 4 | Tank battery, compressor, separator, CPF |
| Pipeline Routes | 3 | Including cross-field connector |
| Infrastructure Projects | 2 | Gathering system, processing plant |
Relationships Created
The seed data connects entities with edges including:
OPERATED_BY— wells, leases, facilities to operatorsLOCATED_IN— wells to leases and fieldsLOCATED_ON— wells to wellpadsCOMPLETED_IN— wells to formationsGOVERNED_BY— wells to regulationsOFFSET_TO— bidirectional between nearby wellsFLARES_AT— authorizations to wells/leasesPRODUCES_TO— wells to facilitiesFEEDS_INTO— facilities to pipeline routesCONNECTS_TO— infrastructure projects to leases
Demo Scenarios
The seed data supports three demo scenarios:
- Compliance blast radius: Mitchell Ranch 1H -> Pad A -> Tank Battery -> Gathering Line
- Infrastructure failure: Delaware Basin Compressor -> 3 wells, third-party operator
- Planning surface: Howard R-32 expiring -> Howard Connector -> Spraberry system
Well Production Data
Every well includes full Form PR production data with 12 months of history:
{
"production_oil_bbls": 5100.0,
"production_gas_mcf": 10800.0,
"production_casinghead_mcf": 420.0,
"production_condensate_bbls": 85.0,
"production_water_bbls": 2400.0,
"gas_sold_mcf": 9200.0,
"gas_used_on_lease_mcf": 380.0,
"gas_flared_mcf": 1100.0,
"gas_vented_mcf": 120.0,
"producing_days": 30,
"allocation_method": "test",
"form_pr_period": "2026-02",
"production_history_12m": [
{"period": "2026-02", "oil_bbls": 5100, "gas_mcf": 10800, ...},
{"period": "2026-01", "oil_bbls": 5300, "gas_mcf": 11200, ...},
...
]
}Skill Registry Seed
Command: poetry run python -m orchestration.seed_skills (from services/orchestration-engine/)
File: services/orchestration-engine/src/orchestration/seed_skills.py
Populates the skills table with the three-tier skill definitions for Rule 37 and Rule 32 agents, plus checklist-item-level skills.
Skills Seeded
Rule 37 Skills:
| Skill ID | Name | Domain Tags |
|---|---|---|
spacing-calculation | Spacing Calculation | spacing, rule_37, drilling |
offset-well-analysis | Offset Well Analysis | spacing, rule_37 |
rule37-filing-assembly | Rule 37 Filing Assembly | spacing, rule_37, drilling |
good-cause-narrative | Good Cause Narrative | spacing, rule_37 |
Rule 32 Skills:
| Skill ID | Name | Domain Tags |
|---|---|---|
flaring-volume-calc | Flaring Volume Calculation | flaring, rule_32, compliance |
gas-analysis | Gas Composition Analysis | flaring, rule_32 |
rule32-filing-assembly | Rule 32 Filing Assembly | flaring, rule_32 |
emissions-estimate | Emissions Estimate | flaring, rule_32, compliance |
Checklist-Item Skills (mapped to specific checklist items):
The seed also creates skills that correspond 1:1 to checklist items:
- Rule 37: 11 skills (
r37-exception-type,r37-field-rules,r37-offset-identification, etc.) - Rule 32: 10 skills
- Form PR: 8 skills
- Flaring Monitor: 6 skills
Each checklist-item skill specifies which SSE events it emits (e.g., DATA_TABLE_UPDATE, FORM_FIELD_UPDATE, ARTIFACT_GENERATED, CHECKLIST_ITEM_UPDATE).
Three-Tier Architecture
Each skill has three injection tiers:
- Tier 1 manifest (~50 tokens): name, description, triggers, domain_tags
- Tier 2 definition (~200-800 tokens): full spec with steps, requirements, output format
- Tier 3 artifact refs: references to large content artifacts in
skill_artifacts
The agent also seeds corresponding records in the agents table for each of the four agents (rule37, rule32, compliance-monitor, flaring-monitor) along with their configurations.
Checklist Template Seed
Command: poetry run python -m orchestration.seed_checklists (from services/orchestration-engine/)
File: services/orchestration-engine/src/orchestration/seed_checklists.py
Creates checklist templates for all four compliance domains.
Templates Seeded
| Domain | Items | Description |
|---|---|---|
rule_37 | 11 items | Rule 37 spacing exception filing checklist |
rule_32 | 10 items | Rule 32 flaring exception filing checklist |
form_pr | 8 items | Monthly Form PR production report checklist |
flaring_monitor | 6 items | Flaring compliance monitoring checklist |
Checklist Item Structure
Each item defines:
index— Order within the checklistname— Human-readable item nameitem_type—data,document,form,artifacts, orvalidationcompletion_method—auto(agent-only) orhybrid(agent + human)agent_can— What the agent is capable of doing for this itemuser_must— What the human user is responsible forrequired_for_submission— Whether this item blocks filing submission
Example Rule 37 checklist items:
| # | Name | Type | Method | Required |
|---|---|---|---|---|
| 0 | Exception Type Determination | data | hybrid | Yes |
| 1 | Field Rule Lookup | data | hybrid | Yes |
| 2 | Offset Well Identification | data | hybrid | Yes |
| 3 | Affected Party Service List | data | hybrid | Yes |
| 4 | Waiver Collection Status | artifacts | hybrid | No |
| 5 | Form W-1 Population | form | hybrid | Yes |
| 6 | Certified Plat | artifacts | hybrid | No |
| 7 | Good-Cause Statement | document | hybrid | Yes |
| 8 | Supporting Technical Exhibits | document | hybrid | No |
| 9 | Fee Calculation | data | auto | Yes |
| 10 | Filing Readiness Check | validation | hybrid | Yes |
Rule Version Seed
Command: poetry run python -m orchestration.seed_rules (from services/orchestration-engine/)
File: services/orchestration-engine/src/orchestration/seed_rules.py
Populates the rule_versions table with statewide and field-specific rules.
Rules Seeded
| Identifier | Type | Domain | Description |
|---|---|---|---|
SWR_37 | statewide | spacing | Statewide Rule 37 — Spacing (467 ft well-to-well, 1,200 ft lease line, 40-acre default) |
SWR_32 | statewide | flaring | Statewide Rule 32 — Gas Flaring and Venting (180-day max, R-32 authorization) |
SWR_38 | statewide | density | Statewide Rule 38 — Well Density (40-acre oil, 640-acre gas) |
form_pr_deadline | statewide | reporting | Form PR Monthly Production Reporting (due 15th of each month) |
Spraberry_Trend_Area_spacing | field_specific | spacing | Spraberry field rules — 1,320 ft between wells (vs 1,200 statewide) |
Spraberry_Trend_Area_density | field_specific | density | Spraberry field rules — 40-acre proration, 50/50 allocation |
Each rule version includes structured rule_data as JSONB with the actual regulatory parameters (distances, thresholds, deadlines, etc.).
Demo Data Seed
Command: poetry run python -m orchestration.seed_demo_data (from services/orchestration-engine/)
File: services/orchestration-engine/src/orchestration/seed_demo_data.py
Seeds realistic compliance data for a 15-minute demo walkthrough.
What Gets Seeded
- 25 wells with Permian Basin API numbers and compliance status distribution: 60% green (compliant), 25% amber (action needed), 15% red (overdue).
- Compliance status records in the
compliance_statustable for 4 domains per well (Rule 37, Rule 32, Form PR, Flaring). - Pre-built Rule 37 checklist at ~40% completion for Mitchell Ranch 1H.
- Flare events with burn rate data for the flaring dashboard.
- Chart-ready historical data for compliance trends and deadline distributions.
Compliance Distribution
The demo data creates a realistic distribution of compliance statuses:
| Status | Target % | Description |
|---|---|---|
| Compliant | 60% | No action needed |
| Action Needed | 25% | Upcoming deadlines or warnings |
| Overdue | 15% | Missed deadlines or violations |
The demo data seed should be run after the knowledge graph seed and the checklist template seed, as it references entities and templates created by those scripts.