Why AI Demand Forecasting Pilots Fail in FMCG (and How to Fix Them)

Related: complete guide to AI for FMCG

Everyone in FMCG knows AI should help with demand forecasting. McKinseyreports that well-deployed AI can reduce forecast errors by up to 50%. So why do most pilots go quiet after three months? Gartner data shows fewer than 30% of AI pilots ever reach production. The technology isn't the issue. The failure patterns are structural, predictable, and almost entirely avoidable.

The Bottom Line: - Most FMCG demand forecasting pilots fail for structural reasons, not technical ones. - An override rate above 40% means planners have stopped trusting the model — the pilot is effectively over. - Data quality, success metrics, and pilot ownership must be settled before a vendor is approached. - BCG (June 2026)found only 18% of CPG companies have successfully scaled AI — the gap is almost always in implementation, not capability.

What Is Actually Going Wrong With AI Demand Forecasting Pilots?

Research cited in Harvard Business Review puts the figure starkly: 95% of AI projects deliver zero measurable return. That's not because the underlying forecasting models don't work. It's because the conditions for success were never put in place. The failure modes are specific. Each one has a name and a fix.

Related: picking your first AI use case

We work with food and drink brands ranging from around £20m to £150m in revenue. The pattern we see most often isn't a vendor problem or a model problem. It's a project design problem. The same five structural failures appear, in roughly the same order, in brands that struggle to get past the pilot stage.

Two-week pre-pilot data audit timeline for AI demand forecasting in FMCG

Failure 1: The Data Quality Trap

In our experience, teams routinely spend 70-80% of their pilot budget cleaning data they assumed was clean before the project started. This isn't an edge case. It's the norm. Stock-out records are missing, promotional uplifts aren't flagged, and SKU hierarchies have drifted across system updates. The vendor arrives, requests the data, and then the real work begins.

The fix isn't complicated. Run a two-week data audit before you sign any vendor contract. Pull two years of sales history at SKU level. Check for gaps, identify where promotional events aren't recorded in your demand signal, and document your stock-out records. If the audit reveals significant problems, you'll know exactly what preparation work the project needs - before anyone has committed budget. That's worth far more than a polished vendor demo.

Related: data readiness

This audit doesn't require a data scientist. An experienced ops or supply chain manager can complete most of it with access to your ERP and a few hours with a spreadsheet. What you're looking for isn't perfection - it's a clear picture of what you have and what's missing.

AI demand forecasting pilot contract with highlighted exit criteria checklist

Failure 2: The Override Rate Problem

This is the failure mode almost nobody talks about, which is why it keeps happening. When planners systematically override AI outputs, the tool is still running and the vendor is still reporting "accuracy" figures. But the pilot is functionally dead.

The override rate is the single most important signal during a demand forecasting pilot. It tells you whether planners trust the model enough to act on it. We treat 40% as the critical threshold. If planners are overriding more than 40% of the model's outputs, they've decided the model doesn't reflect reality well enough to rely on. The model isn't influencing decisions anymore. You're paying for a tool that generates numbers your team then ignores.

Why does override rate climb so high? Usually, the model doesn't capture events the planner knows about: a promotional burst, a new product launch, a range review with a major retailer, a competitor going out of stock. These aren't obscure edge cases in FMCG - they're a normal part of every planning cycle.

The fix is to involve planners in model design from week one, not after go-live. They know where the model will struggle before it's been trained. That knowledge is free. Not using it is expensive.

Failure 3: Measuring the Wrong Thing

We've seen a brand run a six-month AI demand forecasting pilot, hit impressive statistical accuracy numbers, and still shelve the tool. The accuracy was real. But it wasn't answering the question the business actually needed answered.

Forecast accuracy in isolation is a poor success metric for a commercial business. A model can be statistically accurate at an aggregate level while performing poorly on the specific SKUs that drive your waste or your service level failures. Accuracy at the wrong granularity, or measured against the wrong baseline, produces numbers that look good and mean nothing.

The fix is to agree on your business metric before the pilot starts. Not "forecast accuracy" as an abstract number. Something operational: service level improvement against your top-20 SKUs, percentage reduction in waste across your short-life range, or overstock cost reduction in a specific product category. If you can't connect the model's output to one of those metrics in a sentence, you don't yet have a success criterion. Define it first. Then design the pilot around it.

Failure 4: Vendor Lock-In Before Proof

Research cited in Harvard Business Review suggests 95% of AI projects deliver zero measurable return. Despite that, the pattern we see most often is a company signing a 12-month contract after a two-week demo. The demo is impressive. The vendor's reference customer is a much larger business with better data infrastructure. The contract doesn't include a meaningful exit clause.

This is avoidable with one rule: insist on a 90-day pilot clause with defined exit criteria before signing anything. The exit criteria should be agreed in writing before go-live, not after results come in. What service level improvement do you need to see? What override rate is acceptable? What does "successful" look like in specific, measurable terms?

Vendors confident in their product will accept this structure. Those who push back on a 90-day pilot clause are telling you something important about their confidence in the outcome.

Failure 5: No Pilot Owner

The quietest failure mode is also the most common. The AI pilot is "everyone's responsibility" - which means it belongs to no one. The vendor's onboarding manager handles the first six weeks. Then they move on to their next client. The internal champion gets pulled onto a more urgent project. Meetings get postponed. Data updates stop flowing. Nobody officially cancels the pilot. It just stops producing anything useful.

We've observed this pattern in roughly half the pilots we've been called in to review after the fact. The tool is still active. Nobody has formally decided to stop. But nothing meaningful has happened in two months.

The fix is genuinely simple: name one person internally who owns the pilot. Not a steering committee, not "the ops team." One named individual, with a clear mandate and enough protected time to stay on top of it. That person attends every vendor meeting, tracks the success metrics weekly, and has the authority to call problems early.

This doesn't need to be a senior person. It needs to be someone with enough context to notice when the pilot is drifting and enough access to do something about it.

How Do You Know Which Failure Mode You're In?

Most pilots fail for more than one reason at once. But there's usually a primary failure mode, and it tends to show up in the data early. If your override rate climbed past 40% in the first six weeks, that's failure mode 2. If your data preparation ran more than two months over, that's failure mode 1. If you're three months in and still debating what "success" means, that's failure mode 3.

The honest question to ask at week four of any pilot is: "If we stopped today, could we clearly say whether this worked or not?" If the answer is no, you don't yet have the conditions for success. Fix that first. Everything else follows.

FAQ

What is a healthy override rate for an AI demand forecasting model?

A planner override rate below 20% suggests the model is trusted and adding value. Between 20% and 40% is a warning sign worth investigating - usually it means the model is missing promotional events, new product launches, or seasonal patterns. Above 40%, the pilot is functionally dead even if the tool is still running.

How long should an AI demand forecasting pilot run?

Ninety days is the minimum meaningful duration for demand forecasting AI in FMCG. Shorter pilots don't cover enough demand cycles, promotional periods, or NPD events to test the model properly. The pilot should include a defined exit clause and agreed success criteria before it starts. BCG (June 2026)found only 18% of CPG companies have successfully scaled AI - premature commitment before proof is a significant contributing factor.

What data do you actually need before starting a pilot?

At minimum: 24 months of clean sales history at SKU level, promotional event calendars, and stock-out records. Missing stock-out data makes historical demand look lower than actual demand, skewing the model from day one. Run a two-week data audit before signing any vendor contract. For a detailed breakdown, see what "clean enough" data looks like.

Why do planners override AI forecasts?

Planners override AI outputs when the model doesn't reflect what they know about upcoming demand. Promotional bursts, new product launches, range reviews, and competitor activity are all things a planner has visibility on that a standard model won't capture unless it's been specifically built to include them. Involving planners in model design from week one dramatically reduces override rates in our experience.

Is AI demand forecasting worth it for a £20m-£50m FMCG brand?

Yes - but the conditions matter more than the technology. McKinseyreports that well-deployed AI reduces forecast errors by up to 50%. The "well-deployed" part is the work. Brands at this revenue level often have the data they need - they just haven't done the pre-pilot preparation to make use of it.

What to Do Next

The five failure modes above aren't theoretical. They appear in pilots across the UK food and drink sector with enough regularity that we can predict which ones are likely before a project starts - based on how a brand describes its planning process and data setup.

If you're considering a demand forecasting pilot, start with the pre-pilot data audit. Two weeks, one person, your ERP and your promotional calendar. Before any vendor conversation. That single step eliminates failure mode 1 entirely and gives you the data quality picture you need to have an honest conversation with a vendor about what's actually possible.

If you're already in a pilot that feels like it's drifting, check the override rate first. That number tells you more about pilot health than any accuracy metric the vendor is likely to show you.

For a structured approach to scoping the project before you start, the briefing an AI demand forecasting projectguide covers the five-section brief format we use with clients. If you want to step back further and check whether your data and operations are ready for AI more broadly, the AI readiness self-assessment for FMCGis a good starting point.

Related: complete guide to AI for FMCG

BazBiff is a UK FMCG AI and automation consultancy. We help food and drink brands implement AI systems that actually reach production. [About us](https://bazbiff.com/about-us/).