The Situation

In our ocean and truck visibility platform, shipment tracking pulled from a mix of third-party providers — Project44, Ocean Insights, Shipsgo, plus direct carrier APIs. Each had strengths and glaring weaknesses: different coverage, uptime, data freshness, and wildly inconsistent event schemas.

When the primary provider went down, returned partial data, or simply had no coverage for a lane, the entire tracking view went blank. Customer service ended up manually hunting for updates across multiple portals. Not exactly the automated visibility customers were paying for.

The Core Problem

We needed tracking that degraded gracefully instead of failing hard.

Specific failure modes I targeted:

  • Complete provider outages
  • Coverage gaps (e.g. Project44 strong on US truck, weak on certain ocean carriers)
  • Stale but “successful” responses
  • Incompatible event models that made merging dangerous

The previous single-provider-per-shipment approach created a brittle dependency that hurt both reliability and customer experience.

What I Built

I replaced the single-provider calls with a layered, intelligent fallback system.

I started by introducing provider adapters — a clean abstraction that normalized every external payload into our internal Milestone model. This isolated all provider-specific quirks and made the rest of the system provider-agnostic.

On top of that, I built a ProviderChain service that:

  • Maintained dynamic priority + health state for each provider
  • Evaluated response quality using a scoring function (HTTP status, milestone completeness, data freshness, coverage relevance)
  • Cascaded to the next provider only when the current response fell below a quality threshold
  • Tracked provider health to avoid hammering known failing services (and burning rate limits)

I also added careful multi-source merging logic with event deduplication based on type + location + timestamp window, plus explicit provenance tagging so we always knew which provider contributed each milestone.

The result: the system would try Project44 → fall back to Ocean Insights → Shipsgo → direct carrier if needed, all while staying under rate limits and never creating duplicate or contradictory events.

How I Validated It

  • Unit + contract tests for every adapter
  • Chaos-style integration tests that simulated provider failures, slow responses, and partial data
  • Real-world validation in staging against live tracking IDs
  • Production metrics: % of tracking requests with at least one usable response, source distribution per response, and rate of customer-service tracking lookups

We saw a clear directional improvement in tracking availability during known provider incidents.

Outcomes

  • Tracking continuity improved noticeably during primary provider degradation
  • Reduced manual intervention by CS when the system previously showed “no updates”
  • Shifted internal conversations from “is the provider down?” to “what’s the best coverage for this lane right now?”

More importantly, tracking stopped being a collection of brittle integrations and became a true platform capability. Adding a new provider now means writing one adapter and updating the chain config — no downstream changes required.

Tradeoffs & Lessons Learned

Tradeoffs:

  • Serial fallback adds latency (mitigated with aggressive caching and background refresh)
  • Merged data can occasionally feel “messy” — I chose transparency (source labels + confidence indicators) over forced reconciliation
  • Ongoing maintenance cost grows with each new provider

Key lessons:

  1. Abstract early and aggressively — provider-specific code is toxic if it leaks.
  2. Health-aware routing > static failover lists. Rate limits are precious.
  3. Never hide ambiguity from users or downstream systems. Provenance and confidence metadata are first-class citizens.

What’s Next

I’d like to evolve this further with:

  • Predictive provider selection based on historical lane/carrier performance
  • Formal SLI/SLOs for visibility availability and freshness
  • Automated schema drift detection to catch breaking changes faster
  • Customer-facing confidence scores so users understand data reliability at a glance

If you’re wrestling with flaky third-party data sources in logistics (or any domain), I’d be happy to talk about how we made tracking resilient at scale. Let’s connect .