BazBiff

DATA & REPORTING

How to Audit and Fix Your FMCG Master Data Without a Full Migration

SHS Group cut master data fields from 227 to 70, a 69% effort reduction. Here's how to audit and fix your FMCG product data without a PIM migration.

18 May 202613 min readBy BazBiff Team

You don't need a six-figure PIM project to get your product data ready for AI. SHS Group reduced their master data from 227 fields to just 70by implementing governance workflows, cutting effort by 69% (Bluestonex, 2025). That's the kind of result you can replicate with a focused audit of your most important SKUs.

This guide walks through a five-step process for brands that need clean data now but aren't ready for a full system migration. It's built for ops directors and founders at UK food and drink businesses who need their data AI-ready without blowing the budget on enterprise tooling.

Related: data readiness framework

The Bottom Line: - Audit your top 500 SKUs first; don't boil the ocean - SHS Group cut 227 master data fields to 70, reducing effort by 69% (Bluestonex, 2025) - 40-60% of your operational data is likely already AI-ready but siloed (iFactory, 2026) - Governance rules matter more than the tool you use - You can complete this in 6-8 weeks without a PIM

Why don't you need a full migration?

Most FMCG brands assume fixing master data means buying a PIM and migrating everything. That's wrong for 80% of mid-market food and drink businesses. According to Veeva's industry research, 82% of CPG companies are consolidating legacy systems (Veeva, 2026), but consolidation doesn't require migration.

Citation capsule:SHS Group, a drinks distributor, reduced their master data management effort by 69% by trimming 227 fields to 70 essential ones through governance workflows rather than a full platform migration (Bluestonex, September 2025).

In most food brands we work with, the master data lives in SAP, a shared drive, and at least one person's head. The instinct is to unify everything into one tool. But if your goal is getting AI working on demand forecasting or automated reporting, you only need a subset of that data to be clean and connected.

The real question isn't "how do I migrate?" It's "what's the minimum viable data layer I need to get value from automation?"

Here's what that looks like in practice: a focused audit, a canonical format, and lightweight governance rules. No six-month IT project. No new platform purchase.

Related: why spreadsheets become a bottleneck


Step 1: How do you define your critical data scope?

Start with your top 500 SKUs by revenue. ColdAI's researchrecommends a schema audit of your top 500 SKUs as the first step, costing $60K-$120K at enterprise scale (ColdAI, 2026). For mid-market brands, you can do this internally for a fraction of that.

Why 500? Because in a typical UK food and drink brand with 800-2,000 active SKUs, the top 500 represent 75-90% of revenue. Fix those and your AI tools have the data they need.

Identify your working SKU set

Pull your top 500 by annual revenue from your ERP or order management system. If you don't have that easily, ask your sales team. They'll know the list.

Then identify which of these SKUs appear in your priority workflows:

  • Demand forecasting- needs: product name, case size, shelf life, lead time, category
  • Trade promotion analysis- needs: price, margin, retailer codes, promo history
  • Automated reporting- needs: canonical product name, hierarchy, supplier

We typically see brands complete the top-500 audit in 2-3 weeks when one person owns it. The blockers are almost never technical. They're about finding the person who knows which spreadsheet holds the "real" supplier codes versus the outdated ones in the ERP.

Map fields to use cases

Don't audit all possible fields. Map backwards from your use case.

SHS Group's experience is instructive here. They had 227 fields across their systems. After mapping fields to actual operational needs, only 70 were required. That's a 69% reduction in maintenance burden.

Ask yourself: which 15-20 fields does each priority workflow actually consume? Write those down. That's your audit scope.


Step 2: How do you audit what you have across systems?

iFactory's analysis found that 40-60% of operational data in FMCG businesses is already AI-ready but sits in disconnected silos (iFactory, 2026). Your data is probably better than you think. It's just scattered.

Catalogue your data sources

For each of your top 500 SKUs, document where master data currently lives. Common sources in UK food and drink businesses:

  1. ERP(SAP, NetSuite, Sage) - product codes, pricing, lead times
  2. Shared drives- spec sheets, certifications, artwork briefs
  3. Retailer portals- GTINs, listing data, packaging info
  4. People's heads- "ring Dave, he knows the recipe weights"

Create a simple matrix: rows are your priority fields, columns are your data sources. Mark where each field currently lives and which version is authoritative.

Authoritative source matrix for auditing FMCG master data fields across systems
Authoritative source matrix for auditing FMCG master data fields across systems

Score each field for completeness

For each priority field across your top 500 SKUs, score completeness:

  • Green: 90%+ populated, consistent format
  • Amber: 50-89% populated or inconsistent naming
  • Red: Below 50% or exists only in someone's memory

But what does "consistent format" actually mean here? It means every entry for that field follows the same structure. "500ml" versus "500 ML" versus "0.5L" might all be correct, but they'll break any AI model that tries to parse them.


Step 3: How do you fix naming conventions and establish a canonical format?

GS1 UK's GDSN compliance requires a GLN, GTIN per item, standardised attributes, and packaging hierarchy as minimum standards (GS1 UK, 2026). Use this as your baseline, then add the fields your specific workflows need.

Define your canonical naming format

Pick one format and enforce it everywhere. For product names in food and drink, a proven structure is:

Brand + Variant + Size + Pack Format

Example: "Acme Organic Tomato Soup 400g Can" not "Tomato soup - organic (Acme) 400g" or "ACME TOM SOUP 400G CN."

Standardise units and codes

Agree on conventions and document them in a one-page reference:

  • Weights: always grams (g) or kilograms (kg), no "ml" for solids
  • Dates: YYYY-MM-DD format
  • Codes: define prefix logic (e.g., FIN- for finished goods, RAW- for ingredients)
  • Categories: use a maximum three-level hierarchy

Most data cleanup projects fail not because the rules are wrong, but because they're stored in a document nobody reads. We've found the most effective approach is embedding validation rules directly into whatever tool captures the data, even if that tool is just a Google Sheet with data validation dropdowns.

Build your canonical source

You don't need a PIM for this. For brands with fewer than 2,000 SKUs, a well-structured Google Sheet or Airtable base with:

  • Column-level data validation
  • Dropdown lists for categorical fields
  • A "last verified" date per row
  • Named owner for each section

This becomes your single source of truth. Everything else syncs from here, or references it.

Six to eight week FMCG master data cleanup timeline
Six to eight week FMCG master data cleanup timeline

Step 4: How do you create lightweight governance for new entries?

M&S Food initiated a 2.5-year data quality programme in March 2021 and now runs automated dashboards that "take seconds instead of weeks" (M&S Food, 2021-2026). You don't need 2.5 years. But you do need rules for what happens when someone adds a new SKU.

Define entry and change workflows

Governance doesn't mean bureaucracy. It means answering three questions:

  1. Who can create a new SKU record?(Usually one or two people)
  2. What's required before a record goes live?(Minimum field checklist)
  3. Who reviews changes to existing records?(Owner per data section)

Write these rules in fewer than 500 words. Pin them wherever your team creates new entries.

Set a minimum field checklist for new SKUs

Before any new product goes into your canonical source, require at minimum:

  • GTIN (or placeholder pending GS1 allocation)
  • Canonical product name (in agreed format)
  • Case size and unit weight
  • Category assignment (from your agreed hierarchy)
  • Primary supplier code
  • Shelf life in days
  • Launch date

Anything else can be added later. But these seven fields must be complete before the record exists. No exceptions.

Schedule quarterly reviews

Block 2 hours per quarter to review:

  • New SKUs added since last review (do they follow format?)
  • Fields that have degraded (completeness scores dropping)
  • Retired SKUs that should be archived

The brands that maintain clean data long-term are the ones that make it someone's named responsibility. Not a team. One person who feels ownership. Even 2 hours per week of dedicated governance prevents the slow rot that makes data projects necessary in the first place.

Related: AI readiness assessment


Step 5: How do you connect the minimum viable data layer to AI?

Once your top 500 SKUs are clean, you have enough to power most operational AI use cases. McKinsey's manufacturing researchconfirms that data quality, not data volume, is the primary blocker for AI in manufacturing and supply chain operations (McKinsey, 2024).

Expose your canonical source via API or export

Your AI tools need to read from your canonical source. Options from simplest to most robust:

  1. CSV export on schedule- weekly automated export to a shared location
  2. Google Sheets API- direct read access for AI workflows
  3. Lightweight REST API- a simple middleware layer if you need real-time access
  4. ERP integration- sync back to your ERP if that's where downstream systems pull from

For most brands starting out, option 1 or 2 works fine. Don't over-engineer this.

Start with one workflow

Don't try to connect everything at once. Pick your highest-value use case and prove it works with clean data. Common first choices:

  • Automated weekly sales reporting- needs: product names, categories, case sizes
  • Demand signal analysis- needs: product names, lead times, shelf life, history
  • Supplier performance tracking- needs: supplier codes, lead times, order data

Get one workflow running on clean data. Then expand.

Related: practical AI guide


What does this cost and how long does it take?

At enterprise scale, ColdAI estimates a schema audit costs $60K-$120K (ColdAI, 2026). For mid-market UK food and drink brands, the cost is mostly internal time. Budget 6-8 weeks and roughly 1-2 days per week of one person's capacity.

Realistic timeline

PhaseDurationEffort
Define scope (top 500)Week 12 days
Audit existing dataWeeks 2-33-4 days
Fix naming and build canonical sourceWeeks 3-54-5 days
Create governance rulesWeek 5-61-2 days
Connect to first AI workflowWeeks 6-82-3 days

Total internal effort:approximately 12-16 person-days spread over 6-8 weeks.

External costs (if any)

  • Google Sheets/Airtable: free to low cost
  • Consultant support for schema design: GBP 3,000-8,000
  • Simple API middleware: GBP 2,000-5,000 if needed
  • Full PIM: GBP 30,000-150,000+ (which is why you're avoiding this)

The point is clear: you can get 80% of the value of a PIM project at less than 10% of the cost, if your SKU count and channel complexity allow it.


When do you need a PIM, and when don't you?

With 82% of CPG companies consolidating legacy systems (Veeva, 2026), it's reasonable to ask whether you're just delaying the inevitable. Sometimes you are. But often you're not.

You probably DON'T need a PIM if:

  • Fewer than 2,000 active SKUs
  • Selling through fewer than 3 channels with different data requirements
  • Team of fewer than 5 people touching product data
  • Your primary goal is AI readiness, not multi-channel syndication

You probably DO need a PIM if:

  • 5,000+ active SKUs across multiple geographies
  • Selling through 5+ retailers each with unique data specs
  • More than 10 people creating or editing product records
  • You need automated syndication to retailer portals
  • Compliance requirements demand full audit trails

The honest middle ground

For brands in the 1,000-3,000 SKU range, we've found the best approach is this: do the audit and governance work first regardless. If you later decide you need a PIM, you'll migrate cleaner data into it faster. If you don't, you've still solved 80% of your problem.

Think of this guide as the prerequisite to any future system purchase, not a permanent workaround.

Related: data readiness before PIM


Frequently asked questions

How long does an FMCG master data audit take?

A focused audit of your top 500 SKUs typically takes 2-3 weeks with one dedicated owner. The full cleanup including governance wraps in 6-8 weeks. M&S Food's larger programme took 2.5 years, but they were working with tens of thousands of SKUs and legacy integrations most mid-market brands won't face.

Do I need a PIM system to fix my FMCG master data?

Not necessarily. Brands with fewer than 2,000 active SKUs and fewer than three channels can use a well-structured spreadsheet with governance rules as their canonical source. SHS Group achieved a 69% effort reduction through governance workflows, not a new platform purchase (Bluestonex, 2025).

Related: full data readiness guide

What fields does a food and drink brand actually need in master data?

GS1 UK's GDSN compliance requires a GLN, GTIN per item, standardised attributes, and packaging hierarchy at minimum. Beyond compliance, most AI use cases need: canonical product name, case size, unit weight, shelf life, category hierarchy, supplier code, and lead time. SHS Group found 70 fields covered all operational needs after previously maintaining 227.


Conclusion

Clean master data doesn't require a migration project or an expensive platform. It requires focus, format rules, and governance.

Start with your top 500 SKUs. Audit what you have. Fix naming conventions. Set rules for new entries. Then connect the clean data to one AI workflow and prove the value.

SHS Group proved you can cut effort by 69% through governance alone. iFactory's research confirms 40-60% of your data is probably already usable. The gap between "messy" and "AI-ready" is smaller than most vendors want you to believe.

The hardest part isn't the cleanup itself. It's deciding to start small instead of waiting for the perfect system.

Related: start your AI readiness assessment


Sources