
A PE-owned e-commerce company was entirely dependent on a third-party BPO to perform a core back-office operation: matching inbound shopping lists — arriving as images, PDFs, spreadsheets, and other unstructured formats — to its internal standard product taxonomy. The offshore team had accumulated all the institutional knowledge for this process, creating deep vendor lock-in and fragility. Turnaround time exceeded 24 hours per batch, degrading the customer experience. Without change, the business would remain hostage to an expensive, slow, and opaque external dependency with no path to operational ownership or cost reduction.
Fractional AI designed and deployed a custom generative AI pipeline that automated the end-to-end shopping list processing workflow — from ingesting unstructured inputs to outputting items mapped against the company's standard product taxonomy. The system used OpenAI models for the majority of processing, Gemini for text extraction, and Claude for select intermediate steps. Critically, evaluation infrastructure was established early to objectively measure accuracy and guide iteration — and it proved the AI system was outperforming the BPO before go-live. A confidence-scoring layer flags low-certainty outputs for human review, and a feedback loop saves corrections back to the database, allowing the system to improve automatically over time. The unexpected outcome: the company discovered for the first time how inaccurate the BPO's manual work had actually been.
Fractional AI began by establishing evaluation infrastructure before building the solution — a deliberate choice that allowed the team to benchmark accuracy objectively and demonstrate the AI's performance against the BPO baseline before go-live.
The pipeline was built on AWS to ingest the full range of input formats the client received: shopping lists arriving as images, PDFs, and spreadsheets with no standardized structure. Google Gemini handled text extraction across these diverse formats; OpenAI models performed the core taxonomy classification work; Claude handled select intermediate processing steps. Each model was chosen for the specific subtask it performs most reliably.
A confidence-scoring layer was built on top to flag outputs below a certainty threshold for human review, ensuring the system could scale without unacceptable error rates. A feedback loop saved human corrections back to the database, allowing the system to improve its accuracy automatically over time.
The engagement ran over 2–4 months. A parallel QA layer was retained at the client's discretion. The unexpected finding from the accuracy benchmarking: the BPO had been performing worse than assumed — a fact that had been invisible without a measurement system in place.
Infrastructure
- AWS (hosting and orchestration environment)
- Client's proprietary internal product taxonomy database
Integration Points
- OpenAI API — primary taxonomy classification and product mapping
- Google Gemini API — text extraction from unstructured inputs (images, PDFs, spreadsheets)
- Claude API — intermediate processing steps
- Confidence-scoring layer routing flagged outputs to human review queue
- Feedback loop writing human corrections back to taxonomy database for continuous improvement




