The Situation

Carrier tracking data is messy by nature. Events arrive out of order, some never arrive at all, and critical milestones are often missing. A container might suddenly show “arrived at destination” with no record of departure. Operations teams waste hours piecing together what actually happened.

Rather than leaving gaps that forced manual research, I designed and shipped deterministic backfill logic that intelligently reconstructs missing milestones while staying transparent about what was inferred versus observed.

This work sat right next to the multi-provider fallback system — one finds alternative sources, the other repairs gaps inside the data we already have.

What I Changed

I replaced ad-hoc, heuristic fixes with a principled, reproducible backfill pipeline.

Key pieces I built:

  • Event normalization layer that canonicalized timestamps, deduplicated events, and mapped every carrier’s vocabulary to our internal milestone model.
  • Expected timeline templates per shipment type (ocean FCL, air, truck, etc.) that defined the logical sequence of milestones.
  • Gap detection + inference engine that applied explicit, documented rules (e.g., “if we see arrival at destination port + vessel voyage, infer origin departure using schedule data”).
  • Metadata-rich backfilled events — every inferred milestone carried:
    • inferred: true
    • the specific rule that produced it
    • confidence level (High/Medium/Low)
    • reference to the triggering observed events

The system was designed to be fully deterministic: same input events always produced the same output. When new events arrived, we recomputed only the affected backfills instead of blindly overwriting history.

I deliberately chose conservative inference — better a visible gap than a confidently wrong event.

Validation

I validated the logic in three ways:

  • Ran it against historical shipments that later became complete, measuring how accurately we had inferred the missing pieces.
  • Had senior logistics operators review backfilled timelines for operational usefulness.
  • Tracked production metrics: reduction in “no visible updates” states and manual timeline research tickets.

The backfills meaningfully improved timeline coherence without introducing noisy false positives.

Outcomes

  • Operations teams got far more usable, continuous timelines instead of scattered events with big holes.
  • Fewer escalations to carrier customer service just to understand basic progress.
  • The tracking UI became more trustworthy even when raw carrier feeds were incomplete.

Architecturally, it made our milestone store much more resilient to real-world data quality issues.

Tradeoffs & Lessons Learned

Tradeoffs

  • More transparent (and slightly more complex) data model vs. hiding inference from users.
  • Conservative rules left some gaps unfilled — but protected us from misleading operators.
  • Recomputing backfills on updates added some processing cost (mitigated with targeted invalidation).

Lessons that stuck

  1. Transparency beats perfection. Users and downstream systems must know what was observed versus inferred.
  2. Determinism is non-negotiable for backfill logic — otherwise debugging becomes impossible.
  3. Logistics domain knowledge is the bottleneck. Generic algorithms fail; explicit rules grounded in operations reality win.
  4. Backfill and multi-provider fallback are complementary: one finds better data, the other makes the best use of whatever data exists.

What I’d Improve Next

  • Add historical accuracy tracking per inference rule to automatically tune confidence scores.
  • Build an internal “Explain Timeline” view that shows exactly which events drove each backfill.
  • Introduce lightweight ML to suggest new inference rules from patterns in completed shipments.
  • Give operations a feedback loop to flag bad backfills and improve the rule set.

If you’re building visibility platforms and tired of messy carrier data creating broken timelines, I’d be happy to talk about how we made it usable. Let’s connect .