Guide

Data Quality & Anomalies

Understanding how we handle revisions, quality filters, and historical anomalies across all commodity data sources.

Core Principle: No Back-Adjustment

We preserve point-in-time observations exactly as reported by government agencies. When data is revised, the latest revision overwrites the previous estimate via upsert_all. This keeps the API always returning the most current estimate without destroying historical accuracy.

Commodity fundamentals differ from continuous futures prices, which require back-adjustment when rolling contracts. Our data is physical quantities: barrels, bushels, contract positions — what traders actually saw on that date.

Source-Specific Quality Handling

EIA (Energy)

Data: WTI/Brent prices, natural gas prices, crude stocks, storage, production, refinery inputs

  • EIA revises weekly petroleum data in the following week's release
  • Monthly production data lags 2 months (January report = November data)
  • API occasionally returns duplicate rows — handled silently via upsert_all

USDA WASDE (Agriculture Supply/Demand)

Data: Ending stocks for corn, wheat, soybeans — nine monthly revisions per marketing year

  • Same marketing year receives 9 revisions (May–January), each overwrites the prior estimate
  • Released "on or around the 12th" — exact date varies month to month
  • API always returns the latest USDA estimate for that date

USDA NASS (Agriculture Production/Acreage)

Data: Prices, production, yield, planted/harvested acres for corn, wheat, soybeans

  • NASS returns multiple rows when estimates are revised — we keep the latest
  • Annual values mapped to January as a convention, not literal observations
  • National-level data only; geographic aggregation changes are filtered out

CFTC COT (Positioning)

Data: Legacy (1986–present) and disaggregated (2006–present) trader positions

  • Occasional trader reclassifications between categories cause apparent but non-real jumps
  • Legacy and disaggregated formats stored separately — never mix them in analysis
  • Friday evening release captures Tuesday positions

Historical Anomaly Catalog

Events that caused missing, delayed, or anomalous data. Reference these when explaining unexpected gaps.

2013 Government Shutdown (Oct 1–16)

Duration: 16 days

  • EIA weekly reports delayed ~3 weeks after reopening
  • USDA October WASDE cancelled (first missed since 1973)
  • CFTC COT reports delayed; catch-up data published with consolidated figures

2018 Government Shutdown (Jan 20–22)

Duration: 3 days (weekend)

Minimal impact — reports published on normal schedule.

2018–2019 Government Shutdown (Dec 22 – Jan 25)

Duration: 35 days (longest on record)

  • EIA reports delayed 5 weeks after reopening
  • January 2019 WASDE cancelled
  • CFTC and USDA crop reports delayed
  • Catch-up data published February 2019 with multi-week consolidated figures

Hurricane Season Disruptions (June–November)

Gulf Coast energy infrastructure disruptions cause anomalous crude stock patterns. Storage data may show artificial swings during refinery shutdowns.

Federal Holidays

Shift weekly report dates by 1 day. EIA/CFTC publish following day; USDA crop data may skip a week.

Check the Data Release Calendar for expected disruptions.