Context
Raw upstream payloads always look flexible right up until multiple teams need to use them.
In this system, one shipment payload from the upstream TMS had to support tracking, routing, customer context, items, events, finance linkage, and derived operational state. Leaving that as one semi-structured blob would have kept ingestion simple and pushed complexity everywhere else: queries would be painful, joins would be improvised, and downstream tools would keep re-deriving the same answers in slightly different ways.
The real problem was not “how do we store this JSON?” It was “how do we turn one messy source into a model the rest of the business can actually use?”
Problem
The upstream data made that harder than it sounds:
- some collections arrived as arrays, others as singleton objects
- master, house, and booking relationships were not represented consistently
- important business signals were hidden in events and custom fields
- partial updates could arrive out of order
- finance and operational concerns needed to land in related but different structures
If we got the normalization layer wrong, every downstream system would inherit the confusion.
Constraints
This had to work as operational infrastructure, not a one-time import job.
- Upstream data contracts could not be cleaned up at the source.
- Replays and repeated syncs had to be safe.
- Downstream users needed relational queryability.
- Derived state had to be explicit enough to trust, not recomputed ad hoc everywhere.
- Performance mattered because this was a live synchronization path, not an overnight reporting batch.
That meant the design had to be robust, boring, and explicit.
What I Built
I built a fanout-style normalization layer that turned one upstream shipment payload into a structured set of write models.
First, I split the payload into dedicated responsibility areas. Instead of one giant persistence function, the system handled shipment core fields, general attributes, entities, routing, events, items, and calculated state through separate handlers. That made the write model easier to reason about and much easier to evolve as new downstream needs appeared.
Second, I used explicit upsert-oriented sync behavior. Repeated ingest runs are normal in integration work. I designed the write paths to tolerate replays and partial updates through controlled insert/update behavior instead of assuming each payload was unique and perfectly ordered.
Third, I materialized calculated state into its own model. Operationally useful answers like current milestone, delivered date, booking confirmation, or tracking URL should not require every consumer to replay raw event history. Materializing those derived answers created one trustworthy place for downstream systems to read from.
Fourth, I normalized reference data and relationship handling. Shipment relationships, charge references, event lookups, and similar supporting structures were treated as part of the model, not incidental glue. That mattered because the upstream source was not consistent enough to leave those decisions to each consumer.
Finally, I preserved the ability to grow. Once the payload was normalized into stable tables and derived state, additional reports, portals, and sync paths could build on the model without each one becoming its own parsing project.
Validation
Validation meant more than proving rows were written.
I reviewed:
- payloads with singleton-versus-array variations
- shipments with tricky parent-child relationships
- updates that arrived more than once
- event histories that needed calculated-state materialization
- downstream queries that had previously been awkward or fragile
I also cared about maintenance validation: could a future engineer understand where a new field belonged, or would they be tempted to stuff it into a generic blob because the system boundary was unclear?
Outcome
The result was a much more usable integration foundation.
- downstream systems got relational, queryable data instead of brittle payload spelunking
- replay-safe ingestion became normal behavior instead of special handling
- calculated state became easier to trust and reuse
- debugging improved because there was a clearer line between source data, normalized state, and derived state
This is one of the strongest staff-level systems stories in the set because it shows judgment about source-of-truth boundaries. Good integration work is not just moving data. It is deciding what shape that data needs to take so the rest of the product can move faster.
Lessons
Normalization is product work, full stop.
It is tempting to treat data modeling as backend plumbing, but the model determines what the rest of the company can ask, automate, and trust. A well-shaped operational model reduces repeated logic, improves debugging, and gives future teams leverage they do not have to rediscover.
That is why I like this kind of work so much. It quietly changes what becomes possible downstream.
If you have one ugly upstream system feeding five cleaner downstream expectations, helping shape that boundary is exactly the kind of systems work I enjoy. Let’s talk .