Your Gut Instinct Was Right: The Data Was Wrong

January 9, 2024 | 4 min to read

Submit release Save for later

In this article:

In the December 2023 issue of Produce Business, John Pandol emphasizes the risks of relying on flawed data for decision-making in the produce industry. He highlights the issues with self-reported surveys, which often yield aspirational answers rather than actual behaviors, thus leading to biased insights. Pandol questions whether over-reliance on data, even from reputable sources, can mislead businesses, suggesting that the quality and vetting of data must be improved to enhance decision-making.

Originally printed in the December 2023 issue of Produce Business.

“Data driven” isn’t just jargon; it’s a widely used decision-making process. But how good are the insights and decisions if the process starts with flawed data? Is bad data driving us in the wrong direction?

My inbox is populated with at least two dozen RSS feeds every morning, from produce, grocery, retail, specific business areas, production agriculture and global trade. Some are thoughtful topics. Many are infomercials and self-serving press releases. Of 200 titles, I may read a paragraph of 10%, and 5% read the entire article. From time to time, a topic comes up that sends me down rabbit holes of further searches, conversations with colleagues, contacts to authors and note-taking.

Recently, I clicked on an infographic from a market research company that does a lot of work in the grocery area. The infographic was a tease of a white paper, which will probably funnel me into contacting the firm to talk about contracting their services. More rabbit holes. The premise is that much of the data used in decision-making is flawed data and this firm qualifies its sources in a more rigorous way, so the data is more reliable.

Let’s set aside the advertising claim that this shop’s data is better than brand X down the street. Is the data on which we base our business decisions fundamentally flawed? All these algorithms of the A.I. Everything run on data. If our data quality is poor, the outcomes will be poor. It takes a brave decision-maker to disregard the data-driven insight, to somehow know the result isn’t right, and to trust the result of the executive function of the human brain rather than following the science.

Self-reported survey responses have always been a problem area. The market intelligence company claimed its respondent pool is better screened than competitors, with the implication that the resultant data will be better. In this case, the claim is surveying populations that purchased a specific category will give a better result than surveying a population who self-reported buying the category.

OK, so what are the problems with self-reporting samples? Consumers often report aspirational rather than actual behaviors or motivations. Consumers’ memories are faulty. I would propose that survey respondents are inherently a biased subset of all consumers — that is, those who are willing to take the time to state their opinion on a survey hold their opinion in high regard. The attribute enthusiast, those who are motivated to tell you about their purchase because of vanity or activism, is overcounted. The “indifferent majority” doesn’t respond to the survey because, well, they don’t care.

You know about indifferents? This is the population that, in matters of church and state, religiously identifies as “nones” and politically registers as non-partisan — yet, according to surveys, supposedly cares deeply about how their broccoli was farmed, where and by whom.

If consumers are much more indifferent than self-reporting would indicate, private label, the branding equivalent to non-partisan voter registration, is something they would buy, even though in surveys they may say other things.

The difference between “the say” and “the do” is nothing new. Think of all those attributes that keep coming up, like local, that just don’t seem to pan out in our business. Are we being juked by dirty data?

On the B2B side, there are also issues with data quality as well. Production agriculture is really good at data forecasting. Forecasts and estimates are put together for credit lines with banks, requests for labor, purchase of packaging, needs of storage and transportation, and informing our retail partners when to expect what.

Yet the same dilemma with the quality of self-reported surveys exists. The U.S. Department of Agriculture, state departments of agriculture and commodity groups all collect data from growers: estimates, real-time harvest, shipping and price data and historical season data. Some are required and some are optional.

An optional grower survey may not be answered because the respondent is busy or indifferent. The required survey may be answered incorrectly or incompletely, by accident or on purpose. I find it amusing to watch shipping point folks who overly analyze these reports, yet don’t seem to have any useful insight to drive their business decisions.

Even the masters of data-driven commerce, companies like Amazon, don’t seem to have cracked the code to the grocery business. Maybe there are too many blind spots in the data? Maybe we are being blinded by too much data?

If produce is going to be a data-driven business, better vetting of data for reliability and relevancy must be addressed.