Skip to Content

Seed Data

AEGIS provides multiple seed scripts that populate the database and knowledge graph with sample data for development and demos. This page documents each seed script, what it creates, and how to run it.

Seed Scripts Overview

ScriptLocationPopulatesRun Method
Knowledge Graph Seedservices/knowledge-graph-service/src/knowledge_graph/seed.pyApache AGE graph (wells, leases, operators, etc.)curl -X POST http://localhost:8003/seed
Skill Registry Seedservices/orchestration-engine/src/orchestration/seed_skills.pyskills and skill_artifacts tablespoetry run python -m orchestration.seed_skills
Checklist Template Seedservices/orchestration-engine/src/orchestration/seed_checklists.pychecklist_templates tablepoetry run python -m orchestration.seed_checklists
Rule Version Seedservices/orchestration-engine/src/orchestration/seed_rules.pyrule_versions tablepoetry run python -m orchestration.seed_rules
Demo Data Seedservices/orchestration-engine/src/orchestration/seed_demo_data.pycompliance_status, filing_checklists, chart datapoetry run python -m orchestration.seed_demo_data
Conversation Seedservices/orchestration-engine/src/orchestration/seed_conversations.pyconversations and conversation_messages tablespoetry run python -m orchestration.seed_conversations

Running All Seeds

After starting infrastructure with docker compose up -d and ensuring all services are running:

# 1. Seed the knowledge graph (requires KG service running on port 8003) curl -X POST http://localhost:8003/seed # 2. Seed skills, checklists, rules, and demo data cd services/orchestration-engine poetry run python -m orchestration.seed_skills poetry run python -m orchestration.seed_checklists poetry run python -m orchestration.seed_rules poetry run python -m orchestration.seed_demo_data

All seed scripts are idempotent — they check for existing records before inserting and will update if the record already exists. They are safe to re-run.


Knowledge Graph Seed

Endpoint: POST http://localhost:8003/seed

File: services/knowledge-graph-service/src/knowledge_graph/seed.py

Creates a realistic Permian Basin scenario in the Apache AGE oilgas graph. This is the most comprehensive seed script, building the full entity topology.

Entities Created

Entity TypeCountExamples
Operators2Permian Basin Energy LLC, Basin Midstream Partners
Fields3Spraberry (Trend Area), Delaware Basin, Goldsmith
Formations2Wolfcamp A, Bone Spring
Leases5Mitchell Ranch, Jones Unit, Davis Ranch, Howard Unit, Delaware Basin Unit
Regulations2Rule 37 (Spacing), Rule 32 (Flaring)
Permits1W-1 permit for Mitchell Ranch 1H
Flaring Authorizations2Mitchell Ranch (expiring), Howard Unit (expiring ~30d)
Wells12Across 4 wellpads (Pad A-D) with full Form PR data
Wellpads4Pad A (Spraberry, 4 wells), Pad B (Spraberry, 3 wells), Pad C (Delaware, 3 wells), Pad D (Goldsmith, 2 wells)
Facilities4Tank battery, compressor, separator, CPF
Pipeline Routes3Including cross-field connector
Infrastructure Projects2Gathering system, processing plant

Relationships Created

The seed data connects entities with edges including:

  • OPERATED_BY — wells, leases, facilities to operators
  • LOCATED_IN — wells to leases and fields
  • LOCATED_ON — wells to wellpads
  • COMPLETED_IN — wells to formations
  • GOVERNED_BY — wells to regulations
  • OFFSET_TO — bidirectional between nearby wells
  • FLARES_AT — authorizations to wells/leases
  • PRODUCES_TO — wells to facilities
  • FEEDS_INTO — facilities to pipeline routes
  • CONNECTS_TO — infrastructure projects to leases

Demo Scenarios

The seed data supports three demo scenarios:

  1. Compliance blast radius: Mitchell Ranch 1H -> Pad A -> Tank Battery -> Gathering Line
  2. Infrastructure failure: Delaware Basin Compressor -> 3 wells, third-party operator
  3. Planning surface: Howard R-32 expiring -> Howard Connector -> Spraberry system

Well Production Data

Every well includes full Form PR production data with 12 months of history:

{ "production_oil_bbls": 5100.0, "production_gas_mcf": 10800.0, "production_casinghead_mcf": 420.0, "production_condensate_bbls": 85.0, "production_water_bbls": 2400.0, "gas_sold_mcf": 9200.0, "gas_used_on_lease_mcf": 380.0, "gas_flared_mcf": 1100.0, "gas_vented_mcf": 120.0, "producing_days": 30, "allocation_method": "test", "form_pr_period": "2026-02", "production_history_12m": [ {"period": "2026-02", "oil_bbls": 5100, "gas_mcf": 10800, ...}, {"period": "2026-01", "oil_bbls": 5300, "gas_mcf": 11200, ...}, ... ] }

Skill Registry Seed

Command: poetry run python -m orchestration.seed_skills (from services/orchestration-engine/)

File: services/orchestration-engine/src/orchestration/seed_skills.py

Populates the skills table with the three-tier skill definitions for Rule 37 and Rule 32 agents, plus checklist-item-level skills.

Skills Seeded

Rule 37 Skills:

Skill IDNameDomain Tags
spacing-calculationSpacing Calculationspacing, rule_37, drilling
offset-well-analysisOffset Well Analysisspacing, rule_37
rule37-filing-assemblyRule 37 Filing Assemblyspacing, rule_37, drilling
good-cause-narrativeGood Cause Narrativespacing, rule_37

Rule 32 Skills:

Skill IDNameDomain Tags
flaring-volume-calcFlaring Volume Calculationflaring, rule_32, compliance
gas-analysisGas Composition Analysisflaring, rule_32
rule32-filing-assemblyRule 32 Filing Assemblyflaring, rule_32
emissions-estimateEmissions Estimateflaring, rule_32, compliance

Checklist-Item Skills (mapped to specific checklist items):

The seed also creates skills that correspond 1:1 to checklist items:

  • Rule 37: 11 skills (r37-exception-type, r37-field-rules, r37-offset-identification, etc.)
  • Rule 32: 10 skills
  • Form PR: 8 skills
  • Flaring Monitor: 6 skills

Each checklist-item skill specifies which SSE events it emits (e.g., DATA_TABLE_UPDATE, FORM_FIELD_UPDATE, ARTIFACT_GENERATED, CHECKLIST_ITEM_UPDATE).

Three-Tier Architecture

Each skill has three injection tiers:

  • Tier 1 manifest (~50 tokens): name, description, triggers, domain_tags
  • Tier 2 definition (~200-800 tokens): full spec with steps, requirements, output format
  • Tier 3 artifact refs: references to large content artifacts in skill_artifacts

The agent also seeds corresponding records in the agents table for each of the four agents (rule37, rule32, compliance-monitor, flaring-monitor) along with their configurations.


Checklist Template Seed

Command: poetry run python -m orchestration.seed_checklists (from services/orchestration-engine/)

File: services/orchestration-engine/src/orchestration/seed_checklists.py

Creates checklist templates for all four compliance domains.

Templates Seeded

DomainItemsDescription
rule_3711 itemsRule 37 spacing exception filing checklist
rule_3210 itemsRule 32 flaring exception filing checklist
form_pr8 itemsMonthly Form PR production report checklist
flaring_monitor6 itemsFlaring compliance monitoring checklist

Checklist Item Structure

Each item defines:

  • index — Order within the checklist
  • name — Human-readable item name
  • item_typedata, document, form, artifacts, or validation
  • completion_methodauto (agent-only) or hybrid (agent + human)
  • agent_can — What the agent is capable of doing for this item
  • user_must — What the human user is responsible for
  • required_for_submission — Whether this item blocks filing submission

Example Rule 37 checklist items:

#NameTypeMethodRequired
0Exception Type DeterminationdatahybridYes
1Field Rule LookupdatahybridYes
2Offset Well IdentificationdatahybridYes
3Affected Party Service ListdatahybridYes
4Waiver Collection StatusartifactshybridNo
5Form W-1 PopulationformhybridYes
6Certified PlatartifactshybridNo
7Good-Cause StatementdocumenthybridYes
8Supporting Technical ExhibitsdocumenthybridNo
9Fee CalculationdataautoYes
10Filing Readiness CheckvalidationhybridYes

Rule Version Seed

Command: poetry run python -m orchestration.seed_rules (from services/orchestration-engine/)

File: services/orchestration-engine/src/orchestration/seed_rules.py

Populates the rule_versions table with statewide and field-specific rules.

Rules Seeded

IdentifierTypeDomainDescription
SWR_37statewidespacingStatewide Rule 37 — Spacing (467 ft well-to-well, 1,200 ft lease line, 40-acre default)
SWR_32statewideflaringStatewide Rule 32 — Gas Flaring and Venting (180-day max, R-32 authorization)
SWR_38statewidedensityStatewide Rule 38 — Well Density (40-acre oil, 640-acre gas)
form_pr_deadlinestatewidereportingForm PR Monthly Production Reporting (due 15th of each month)
Spraberry_Trend_Area_spacingfield_specificspacingSpraberry field rules — 1,320 ft between wells (vs 1,200 statewide)
Spraberry_Trend_Area_densityfield_specificdensitySpraberry field rules — 40-acre proration, 50/50 allocation

Each rule version includes structured rule_data as JSONB with the actual regulatory parameters (distances, thresholds, deadlines, etc.).


Demo Data Seed

Command: poetry run python -m orchestration.seed_demo_data (from services/orchestration-engine/)

File: services/orchestration-engine/src/orchestration/seed_demo_data.py

Seeds realistic compliance data for a 15-minute demo walkthrough.

What Gets Seeded

  • 25 wells with Permian Basin API numbers and compliance status distribution: 60% green (compliant), 25% amber (action needed), 15% red (overdue).
  • Compliance status records in the compliance_status table for 4 domains per well (Rule 37, Rule 32, Form PR, Flaring).
  • Pre-built Rule 37 checklist at ~40% completion for Mitchell Ranch 1H.
  • Flare events with burn rate data for the flaring dashboard.
  • Chart-ready historical data for compliance trends and deadline distributions.

Compliance Distribution

The demo data creates a realistic distribution of compliance statuses:

StatusTarget %Description
Compliant60%No action needed
Action Needed25%Upcoming deadlines or warnings
Overdue15%Missed deadlines or violations

The demo data seed should be run after the knowledge graph seed and the checklist template seed, as it references entities and templates created by those scripts.

Last updated on