What \"Clean Enough\" Data Actually Means Before You Deploy AI in FMCG

You've heard it before: "Get your data in order first." It's become a convenient reason to delay AI projects indefinitely. But here's what nobody's telling you: according to iFactory(April 2026), 40-60% of existing operational data in FMCG is already AI-ready. It's just siloed across disconnected systems.

The real question isn't whether your data is clean enough. It's whether it meets the specific threshold for your specific use case. Those thresholds vary enormously. Demand forecasting has different requirements from predictive maintenance, which differs again from reporting automation.

This post defines the exact "clean enough" bar for four common FMCG AI use cases. No vague "improve your data maturity" advice. Concrete thresholds you can check against your current state today.

Related: practical AI guide

The Bottom Line: - 40-60% of your operational data is likely already AI-ready, just siloed (iFactory, April 2026) - "Clean enough" thresholds differ per use case: 12-24 months for forecasting, 6-8 weeks for maintenance - Data plumbing (connecting systems, not cleaning data) costs 40-60% of real AI project budgets - You can close most gaps in 4-8 weeks of focused work

The "perfect data" myth is killing AI projects

McKinsey(January 2023) identifies data quality as the leading limiter on AI value capture, not algorithm choice. But "data quality" doesn't mean perfection. It means fitness for purpose. The distinction matters because too many brands interpret this as "we need to fix everything before we start anything."

That interpretation creates an infinite loop. You delay AI until data is perfect. Without AI projects creating demand for better data, nobody prioritises the fixes. We've seen food brands sit in this loop for two years.

In our experience working with food brands, the ones that actually deploy AI don't have better data than those still waiting. They just defined a lower bar, focused on one use case, and accepted that 85% completeness beats 100% in theory.

Here's the thing most consultancies won't say: Heizen(2026) reports that data plumbing costs 40-60% of the real AI project cost. That plumbing work, connecting existing sources and mapping fields, isn't something you do before the project. It IS the project. Treating it as a separate prerequisite doubles the timeline.

Related: readiness assessment

Citation capsule:According to McKinsey (January 2023), data quality is the leading limiter on AI value capture in manufacturing, not algorithm choice. Yet iFactory (April 2026) finds 40-60% of FMCG operational data is already AI-ready but siloed across disconnected systems.

What does "clean enough" mean for demand forecasting?

Demand forecasting requires the most historical depth of any common FMCG AI use case. You need 12-24 months of SKU-level sales data with fewer than 10% of weeks missing. That's the threshold. Not three years of spotless daily data, not a perfectly clean master data hierarchy.

The minimum viable dataset

Here's what actually needs to be in place:

12-24 months of weekly sales by SKU.Monthly won't cut it for seasonal pattern detection. Daily is nice but not necessary for most brands under 500 SKUs.
Fewer than 10% missing weeks per SKU.If you've got 104 weeks of data and 8-9 weeks are blank or clearly wrong, that's workable. A model can interpolate gaps. Beyond 10%, accuracy drops sharply.
One consistent product identifier.Whether that's your internal SKU code, an EAN, or a retailer product code, you need one ID that maps the same product across the full history.

What you can safely ignore (for now)

You don't need perfect promotional flags on every historical week. You don't need weather data integrated. You don't need competitor pricing. Those improve accuracy later. They aren't prerequisites.

We typically see UK food brands with 200-800 SKUs get to "clean enough" for a forecasting pilot in 3-5 weeks. The work is almost always consolidation (pulling data from Tesco TRS, Sainsbury's S4S, Asda InfoBay into one format), not cleaning individual records.

Related: demand forecasting detail

Demand forecasting data requirements checklist for FMCG AI readiness

Citation capsule:Demand forecasting in FMCG requires 12-24 months of SKU-level sales data with fewer than 10% of weeks missing. This threshold allows models to detect seasonal patterns and promotional effects without requiring the "perfect data" that stalls most AI projects indefinitely.

What does "clean enough" mean for predictive maintenance?

Predictive maintenance has the lowest data history requirement of the four use cases covered here. Schneider Electric(April 2026) identifies lack of contextualised operational data as the third biggest blocker to AI adoption (36.3% of CPG executives). But "contextualised" doesn't mean years of history. It means 6-8 weeks of sensor baseline data.

The minimum viable dataset

6-8 weeks of continuous sensor readingsfrom the equipment you want to monitor. Temperature, vibration, pressure, current draw. Whatever your PLCs or IoT sensors already capture.
Normal operating range documented.Someone needs to define what "healthy" looks like for each sensor. This is often tribal knowledge held by maintenance engineers.
Failure or downtime events tagged.Even a simple log, "Line 3 stopped at 14:22 on Tuesday, bearing failure," gives a model something to learn against.

The gap most brands actually face

The sensors are usually already there. Modern filling lines and packaging equipment generate data continuously. The problem is that nobody's pulling it off the PLC into a system where it's accessible. It stays locked inside the machine controller.

Connecting that output to a cloud store or historian takes a few days of integration work per line. Not months. Not a new capital project.

Citation capsule:Predictive maintenance for FMCG production lines requires only 6-8 weeks of sensor baseline data. Schneider Electric (April 2026) reports that lack of contextualised operational data blocks 36.3% of CPG companies, yet the data typically already exists inside PLCs and machine controllers.

What does "clean enough" mean for reporting and dashboards?

Reporting automation has the simplest data requirement but the most frustrating failure mode. You don't need deep history. You need consistent field naming across sources. That's the threshold. One inconsistent column header across four spreadsheets can break an automated pipeline.

The minimum viable dataset

Consistent field names.If your ERP calls it "product_code" and your retailer data calls it "SKU_ID" and your promo tracker calls it "Item", you need a mapping table. Fifteen minutes of work, but it has to be done.
Agreed definitions.Does "revenue" mean gross or net? Does "volume" mean cases or units? These semantic mismatches cause more dashboard failures than any technical issue.
Regular refresh cadence.Automated reporting needs data that updates predictably. If your sales data arrives from retailers on different days with different delays, the pipeline needs to handle that.

Why this one stalls projects

Here's the irony: reporting automation should be the easiest AI use case to deploy. But it fails most often because nobody owns the data definitions. There's no single source of truth. Three people maintain three versions of the weekly sales report, and they don't quite agree. Fixing this isn't a data quality project. It's a governance decision that takes one meeting.

Related: spreadsheet bottleneck

Citation capsule:Reporting automation in FMCG requires consistent field naming across data sources rather than deep historical data. According to Veeva (January 2026), 82% of CPG companies are consolidating legacy systems, a process that directly addresses the field-naming inconsistencies blocking automated reporting pipelines.

What does "clean enough" mean for trade promotion analysis?

Trade promo analysis sits between forecasting and reporting in complexity. Veeva(January 2026) found that 82% of CPG companies are consolidating legacy systems, and trade promotion data is typically the most fragmented source. You need two things: weekly POS data by store and a promotional calendar with mechanic details.

The minimum viable dataset

Weekly POS data by store (or at minimum by retailer).Not aggregated monthly, not just total brand sales. Store-level granularity lets you isolate promotional lift from baseline.
Promotional calendar with mechanic details."We ran a promotion in Tesco in March" is not enough. You need: which stores, what dates, what mechanic (BOGOF, percentage off, multi-buy), what feature/display support.
12 months minimum.Promotions are seasonal. A summer price cut performs differently from an autumn one. Without a year of data, you can't disentangle seasonality from promotional effect.

Where most brands fall short

The POS data usually exists. Retailers provide it. The promotional calendar is the gap. It lives in someone's head, or scattered across emails, presentations to buyers, and joint business plans. Reconstructing 12 months of promo activity typically takes 2-3 weeks of detective work.

But once you've got it documented, you don't lose it again. The second year of analysis is vastly easier than the first.

Related: master data audit

Citation capsule:Trade promotion analysis requires weekly POS data by store plus a promotional calendar with mechanic details covering at least 12 months. Veeva (January 2026) reports 82% of CPG companies are consolidating legacy systems, and trade promo data is typically the most fragmented source requiring reconstruction.

What gaps actually matter (versus the ones that don't)?

Not all data gaps kill AI projects. The gaps that matter are structural: missing time periods, inconsistent identifiers, undefined business logic. The gaps that don't matter are cosmetic: messy formatting, minor typos in descriptions, incomplete metadata fields that no model will use.

Gaps that will block you

Missing identifier consistency.If you can't reliably match the same product across two datasets, you're stuck. This is the number one gap we see.
Undocumented time gaps.Two months of zero sales because you were out of stock looks identical to two months of missing data unless someone's flagged it.
Ambiguous metrics.A "sales" column that sometimes means units and sometimes means revenue will poison any model silently.

Gaps you can safely ignore

Incomplete product descriptions or attributes.Nice for reporting, irrelevant for forecasting.
Historical pricing that's 90% complete.A model can handle 10% gaps in pricing data. It can't handle 10% gaps in the target variable (sales).
Inconsistent date formats across sources.This is a five-minute fix in any data pipeline. It feels like a data quality issue but it's really a formatting task.

Why does this distinction matter? Because brands spend months fixing cosmetic issues while structural gaps sit unaddressed. Focus your limited time on the three blockers above.

Comparison of FMCG data gaps that block AI versus gaps that can be ignored

Citation capsule:Structural data gaps, specifically inconsistent identifiers, undocumented time gaps, and ambiguous metrics, block FMCG AI projects. Cosmetic gaps like formatting inconsistencies or incomplete descriptions are safely ignorable. iFactory (April 2026) confirms 40-60% of operational data is already AI-ready once structural barriers are addressed.

How do you close the data gap in 4-8 weeks?

Data plumbing accounts for 40-60% of AI project costs (Heizen, 2026). But "plumbing" sounds worse than it is. For most UK food brands with 200-1,000 SKUs, closing the readiness gap for a single use case takes 4-8 weeks. Not months. Not a multi-year "data transformation programme."

Week 1-2: Audit and map

Pick your first use case. Identify which datasets you need (use the thresholds above). Map where they currently live. Document the gaps against the "clean enough" threshold. This is a spreadsheet exercise, not a technology project.

Related: data audit process

Week 3-5: Connect and consolidate

Pull data from its various homes into one structured location. For most brands this means: exporting from retailer portals, extracting from ERP, and consolidating spreadsheets. You're building a single, version-controlled dataset for your chosen use case. Nothing more.

Week 6-8: Validate and baseline

Run basic checks. Are there unexpected zeros? Do totals reconcile against known figures? Can you reproduce last quarter's management report from this consolidated data? If yes, you're ready for a pilot.

We've run this process with brands ranging from 150 to 900 SKUs. The pattern is consistent: week one feels overwhelming because you're documenting everything that's wrong. By week four, you realise most of it was already fine; you just needed it in one place.

One person should own this

Don't committee it. Assign one person. Give them protected time. The brands that take 4 months instead of 4 weeks are always the ones that split ownership across three people who each have other priorities.

Citation capsule:Closing FMCG data readiness gaps for AI deployment typically takes 4-8 weeks of focused work. According to Heizen (2026), data plumbing consumes 40-60% of AI project costs, but this plumbing work, connecting and mapping existing sources, is the project itself rather than a prerequisite to delay.

FAQ

How clean does FMCG data need to be for AI deployment?

It depends entirely on the use case. Demand forecasting needs 12-24 months of SKU-level sales data with fewer than 10% missing weeks. Predictive maintenance needs just 6-8 weeks of sensor baseline data. Reporting automation needs consistent field naming. Research from iFactory (April 2026) confirms 40-60% of operational data is already AI-ready; the barrier is access, not quality.

What's the biggest data blocker for AI in food and drink manufacturing?

Data siloing, not data quality. Your sales data sits in retailer portals, production data lives in ERP or PLCs, and promo data exists in emails and spreadsheets. Schneider Electric(April 2026) found lack of contextualised operational data blocks 36.3% of CPG companies from scaling AI.

How long does data preparation take before an AI pilot?

For a single use case, 4-8 weeks. Heizen (2026) reports data plumbing costs 40-60% of AI project budgets, but that work isn't a prerequisite you do before the project starts. It IS the early phase of the project. Brands that take 4+ months typically split ownership across too many people or try to fix all use cases simultaneously.

Related: full readiness assessment

Sources

iFactory (April 2026). "AI Readiness in Manufacturing." ifactory.com.au
McKinsey & Company (January 2023). "Clearing data quality roadblocks: Unlocking AI in manufacturing." mckinsey.com
Heizen (2026). "Data plumbing costs in AI projects." heizen.com
Veeva Systems (January 2026). "Industry Research Finds Establishing a Foundation for AI is Top Priority for CPG Enterprises." prnewswire.com
Schneider Electric (April 2026). "CPG manufacturers brace for mounting production losses and see industrial AI as a critical competitiveness lever by 2030." globenewswire.com