Context

Logistics ops teams don’t lack solutions — they lack instant access to the solutions already buried in old tickets, Slack threads, and tribal memory.

That was the motivation for this project. The goal was not to build a chatbot for its own sake. It was to reduce repeated lookup work and make first-pass issue triage more consistent when recurring problems showed up: missing milestones, inconsistent provider events, charge mismatches, and timing exceptions.

The audience was not an ML team. It was operations staff who needed grounded recommendations they could act on quickly. I built the system with tools the team could actually maintain: n8n for orchestration, Qdrant for retrieval, and model routing designed around practical cost and latency.

Problem

Before this project, issue triage had two recurring costs.

  1. Retrieval cost — Finding prior similar cases took too long and relied too heavily on individual memory.
  2. Consistency cost — The same issue pattern produced different recommendations depending on who handled it.

In a high-throughput ops environment, those costs compound quickly into escalations, context switching, and avoidable delays.

A prompt-only assistant would not have solved that problem well. It might have produced confident-sounding answers, but they would have been based on generic model priors rather than internal case history. We needed retrieval grounded in actual operational records.

Constraints

I had to design for production constraints, not demo conditions.

  • Historical issue data varied in quality and structure.
  • Users needed trustworthy suggestions, not polished hallucinations.
  • The workflow had to fit existing channels and habits.
  • Cost and latency had to stay practical for daily usage.

There was also a change-management constraint. If the first experience felt unreliable, adoption would collapse. That pushed me to prioritize grounding, clear answer structure, and explicit uncertainty over flashy conversational behavior.

What I changed

I built the system as a retrieval-first workflow rather than a prompt-only assistant.

Historical issues were cleaned, segmented, embedded, and stored in Qdrant with metadata that made retrieval useful in practice. At query time, the workflow retrieves similar prior cases, assembles concise context, and prompts the model to generate recommendation-style output tied to retrieved evidence.

n8n handled orchestration across the whole path, including:

  • intake normalization
  • retrieval query construction
  • top-k result filtering
  • prompt assembly
  • model routing
  • response formatting for ops usability

I also used a mixed-model approach. Routine cases could use cheaper models, while more ambiguous prompts could be routed to stronger models when the extra reasoning justified the added cost and latency.

Just as important, I added prompt guardrails that pushed the assistant to stay grounded: reference retrieved context, acknowledge uncertainty when evidence is thin, and avoid pretending confidence when it does not have enough support.

The aim was not to sound smart. It was to give ops staff a useful next step faster than searching manually through tickets and message history.

Validation

Validation happened on two levels: technical quality and real usage.

On the technical side, I spot-checked retrieval relevance, tested edge-case queries, and reviewed outputs against known issue outcomes. The main question was simple: was the assistant actually using retrieved internal precedent, or drifting into generic advice?

On the operational side, I looked at whether people kept using it after rollout and whether the answers were understandable without ML expertise. That mattered more than novelty.

The strongest validation signal was continued production usage by about 30 operations staff. The system was also seeded with more than 500 historical issues, which gave the retrieval layer enough real precedent to be useful across recurring problem categories.

I am still keeping performance claims directional here. The system clearly improved first-pass triage speed and consistency in practice, but I have not attached a formal before-and-after time study to this page yet.

Outcome

The system is in production as an AI-augmented support layer, not an autonomous replacement.

Directional outcomes observed through usage and team feedback:

  • faster first-pass triage for recurring issue types
  • more consistent recommendations across staff
  • lower dependency on individual memory for common problem patterns

It also shifted internal perception of AI work. Instead of reading like a demo, the system proved itself as a workflow tool because it solved a real retrieval bottleneck with constrained, operations-aware output.

Tradeoffs and lessons

Retrieval quality drives answer quality. If chunking, metadata, and indexing are weak, no prompt can rescue the results consistently.

The second lesson was that trust depends on explicit uncertainty handling. In operations, “I am not sure, here are the closest precedents” is better than a polished wrong answer.

The third lesson was that adoption depends on workflow fit. Keeping orchestration in n8n and focusing on response usability mattered as much as model selection.

The main tradeoff is ongoing maintenance. New issues have to be ingested and normalized regularly or retrieval quality will decay over time.

What I’d improve next

If I took this to the next iteration, I would focus on four things:

  1. formal triage-time benchmarking before and after by issue category
  2. confidence scoring tied to retrieval density and similarity
  3. a stronger feedback loop so user corrections improve ranking and prompt behavior
  4. lightweight citation snippets in every answer to make verification faster

That would move the system from useful in practice to easier to measure and govern over time.

If you want AI systems that operations teams actually use in production, I can build that.

FAQ

Questions I usually get about this work.

What makes a production RAG system different from a chatbot demo?

Production RAG requires controlled retrieval, fallback behavior, versioned prompts, cost-aware model routing, and operational guardrails that make output inspectable and improvable over time.

How do you keep AI recommendations trustworthy for operations teams?

By grounding responses in retrieved internal case history, handling low-confidence cases honestly, and making every stage of the workflow observable so weak spots can be fixed instead of hidden.

What tools did you use to build this system?

n8n for orchestration, Qdrant for vector retrieval, and mixed-model routing for cost and latency control. The stack was chosen for inspectability and maintainability by non-ML engineers.