Ingestion Endpoints

Data import and ingestion endpoints. The ingestion-service runs on port 8005 and is proxied through the API gateway at http://localhost:8000.

Endpoints

Method	Path	Description
`POST`	`/ingest/rrc-scrape`	Trigger RRC data scrape
`POST`	`/ingest/operator/csv`	Import operator data from CSV
`POST`	`/ingest/scada`	Ingest SCADA field data
`POST`	`/extract`	Extract entities from text

POST /ingest/rrc-scrape

Trigger a scrape of RRC (Railroad Commission of Texas) data. Supports multiple data types.


curl -X POST http://localhost:8000/ingest/rrc-scrape \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "data_types": ["wells", "permits", "production"],
    "district": "08",
    "date_range": {
      "from": "2026-01-01",
      "to": "2026-03-31"
    }
  }'

Field	Type	Description
`data_types`	string[]	Types to scrape: `wells`, `permits`, `production`, `flaring`, `operators`, `leases`
`district`	string	RRC district number (optional)
`date_range`	object	Date range filter (optional)

In the current development environment, the RRC scraper uses mock data. Production will connect to actual RRC data sources.

POST /ingest/operator/csv

Import operator data from a CSV file. The service normalizes column names and maps data to the knowledge graph schema.


curl -X POST http://localhost:8000/ingest/operator/csv \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@wells_data.csv"

The CSV should include columns for well identification (API number, well name), location (latitude, longitude), and operational data (operator, status, field, lease).

POST /ingest/scada

Ingest real-time SCADA (Supervisory Control and Data Acquisition) field data.


curl -X POST http://localhost:8000/ingest/scada \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "readings": [
      {
        "well_api": "42-383-12345",
        "timestamp": "2026-04-10T10:00:00Z",
        "metrics": {
          "pressure_psi": 2400,
          "flow_rate_mcfd": 850,
          "temperature_f": 165
        }
      }
    ]
  }'

POST /extract

Extract entities from unstructured text. Identifies wells, operators, leases, fields, and other entity types mentioned in text documents.


curl -X POST http://localhost:8000/extract \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "The Smith Ranch #1 well (API 42-383-12345) operated by Permian Energy Inc. in the Spraberry field..."
  }'

Response (200):


{
  "entities": [
    {"type": "Well", "name": "Smith Ranch #1", "api_number": "42-383-12345"},
    {"type": "Operator", "name": "Permian Energy Inc."},
    {"type": "Field", "name": "Spraberry"}
  ],
  "edges": [
    {"from": "Smith Ranch #1", "to": "Permian Energy Inc.", "type": "OPERATED_BY"},
    {"from": "Smith Ranch #1", "to": "Spraberry", "type": "LOCATED_IN"}
  ]
}

Ingested data publishes events to Kafka for downstream processing by other services.