Back to Resources Carrier API Reliability Checklist
Carrier API integration failure modes
Carrier APIs fail in boring, expensive ways: late events, duplicate milestones, timeouts, rate limits, payload drift, and unclear source truth.
Payload shape drift
A provider can change optional fields, timestamp formats, enum values, or nested structures without making it feel like a versioned API change.
- Store raw payloads for replay and inspection.
- Validate provider responses before normalization.
- Treat unknown status values as reviewable, not invisible.
Out-of-order or duplicate events
Tracking and milestone feeds rarely arrive in the clean order users expect. The system needs event identity and confidence rules.
- Use idempotency keys for event ingestion.
- Preserve provider timestamps separately from ingestion timestamps.
- Make confidence and provenance visible in downstream timelines.
Timeouts, rate limits, and partial outages
The happy path is not the integration. Recovery behavior is the integration. Backoff, fallback, dead-letter queues, and health checks decide whether operators trust the system.
- Use bounded retries with jitter and provider-specific limits.
- Separate transient provider issues from internal processing errors.
- Alert on business impact, not only HTTP status codes.
What should be true before this is production-ready
- Raw payloads are retained for investigation.
- Provider-specific mappers convert into one canonical business model.
- Retries cannot duplicate downstream actions.
- Events are deduplicated before user-facing updates.
- Provider health appears in dashboards or incident views.
- Fallback behavior is explicit and auditable.