Investigator calls for more standardized data on real-world nutrition composition

Researchers recently revealed a GroceryDB database, aiming to inform consumers about the processing levels of 50,000 foods at three large US grocery retailers. The open-access database allows consumers to compare foods’ processing scores, nutrition facts, and ingredient composition.

We continue our conversation on this consumer-oriented innovation with Giulia Menichetti, Ph.D., an investigator in the Channing Division of Network Medicine at Mass General Brigham, US, and Women’s Hospital.

Last week, she explained that most nutrition research relies on manual creation, where the database serves as a proof of concept showing what accessible and algorithm-ready data can achieve.

Since environmental and dietary factors play a crucial role in disease risk, Menichetti tells Nutrition Insight there is an urgent need to systematically collect high-resolution (detailed) data on real-world food composition beyond what’s listed on the packaging and to map grocery store offerings over time and space.

“Grocery stores, as a primary touchpoint between consumers and food, are the ideal starting place for this type of ‘temperature check.’ By monitoring and analyzing these environments, we can better predict future risks for chronic diseases and design interventions to improve public health at scale.”

She says GroceryDB aims to catalyze a global effort toward open-access, internationally comparable data that “advances nutrition security and ensures equitable access to healthier food options for all.”

NOVA classification gaps

The NOVA system is generally used to determine a food’s processing levels. The system includes four categories — unprocessed/minimally processed foods (NOVA 1), processed culinary ingredients (NOVA 2), processed foods (NOVA 3), and ultra-processed foods (NOVA 4).

Despite its success and paradigm-shifting impact on nutrition research, Menichetti says the NOVA classification is qualitative and descriptive.

“Furthermore, it tackles the labor-intensive and often incomplete task of categorizing foods based on processing levels, relying on varying levels of information available across studies that led to inconsistencies and ambiguities in the literature. These issues have limited the scope and precision of research into the impact of processed foods.”

However, she says using NOVA is necessary as there is limited available data on compound concentrations linked to food matrix changes, such as cellular wall breakdown or industrial processing techniques.

Food classification systems like NOVA are often labor-intensive and rely on expertise-based, descriptive approaches.Limited standardized nutrition data

At the same time, Menichetti notes that these challenges are not unique to NOVA. “Many classification systems suffer from poor inter-rater reliability and lack of reproducibility, stemming from the reliance on expertise-based, descriptive approaches, which inherently allow for subjectivity and differences in interpretation, especially when the data supporting the decision-making significantly changes from study to study.”

She adds that the growing recognition of subjective classification systems’ limitations prompts scientists to call for a more objective framework to define food processing. Such a system should be rooted in measurable biological mechanisms instead of variable interpretations.

“Among the potential foundations for such a framework, the nutritional profile of foods stands out as the only aspect consistently regulated and reported worldwide, making it a logical starting point for standardization.”

However, Menichetti says a broader issue lies in a lack of classification systems grounded in clear, measurable data science or clinical outcomes, emphasizing the broader problem of limited open-access data in nutrition and food science.

“Without standardized, high-resolution data, ambiguities persist, leading to disagreements among experts and inconsistencies in outcomes,” she argues. “These issues undermine trust in the field and hinder the integration of important concepts like food processing into public dietary guidelines.”

Framework to evaluate processing

To develop GroceryDB, the researchers used FPro, a quantitative algorithm that uses standardized inputs to produce reproducible continuous scores. Based on nutritional composition data, the algorithm is trained to evaluate the degree of food processing in a reproducible, portable, and scalable way.

The researchers prioritized products’ nutritional facts as there is a lack of comprehensive, well-regulated global ingredient data. Using nutritional facts allowed the team to use a “lightweight approach.”

Menichetti explains this allows FPro to bridge the gap between model food databases used in epidemiology with manually curated NOVA labels and nutritional tables without ingredient lists and real-world grocery store products with nutrition facts and ingredient lists.

Woman behind several computer screens. Menichetti says there is an urgent need to systematically collect detailed data on food composition beyond ingredient lists.She adds that the approach enables sensitivity analyses and uncertainty estimations, critical features often missing from traditional classification systems.

“Reporting estimation errors is an integral aspect of FPro, enhancing its reliability, transparency, and interpretability. By reducing the subjective biases and inconsistencies associated with manual, descriptive classifications — where nutrition specialists often demonstrate low inter-rater reliability — FPro provides a more robust framework for evaluating food processing.”

Using real-world data

Menichetti details that FPro also allows the researchers to identify trends, such as the relationship between pierce per calorie and processing score, which varies across food categories.

“Overall, the data from GroceryDB reveals a positive correlation between food processing and producing more affordable calories. This trend suggests that lower-income populations are more likely to consume ultra-processed foods habitually, which may exacerbate socioeconomic disparities in nutrition security.”

“However, it is important to note that the strength and direction of this correlation differ by food category. For instance, milk and milk substitutes show the opposite trend compared to soups,” she says.

Consumer information access

Menichetti explains that GroceryDB organizes ingredients in a way that tells a product’s story, as how products are made, processed, and marketed influences their health effects.

“Consumers can visually explore this complexity through ingredient trees, which provide a structured and intuitive representation of a product’s composition.”

Factory worker holding a processed bread. Consumers can use GroceryDB to explore foods’ processing levels through their in-depth ingredient trees.She adds that the width and depth of these ingredient trees offer valuable insight into an item’s processing degree. “Consumers should be particularly cautious of products with highly complex ingredient trees that are marketed as a ‘good source of’ specific nutrients, as this can be misleading — our diet is influenced by the consumption of a specific food as a whole, not just by individual nutrients.”

In addition, the open-access database on the TrueFood website allows consumers to unpack information further. For example, Menichetti says consumers can examine US FDA-mandated additives, compare products to identify commonalities and differences and understand how changes in nutrition facts contribute to variations in FPro scores.

Future developments

The researchers are currently developing more data-intensive models that incorporate curated information from ingredient lists, leveraging language models and other tools. These models will build on FPro and available data.

Menichetti says the goal is to “transform FPro into an unsupervised system, independent of manual classifications, by expanding the database to include the ‘dark matter of nutrition’ — chemical signatures of additives and processing by-products.”

“I am also pushing my lab to take this work a step further: creating models capable of predicting not only the level of processing or ultra-processing but also the specific processes each product has undergone,” she continues.

“This detailed level of granularity is essential for identifying the exact processes and mechanisms in specific foods that are detrimental to health. Such insights could then inform epidemiological studies, experimental research, and clinical trials, ultimately unveiling the mechanisms behind the impact of ultra-processed foods on our health.”