AI-powered study reveals genetic secrets of pathogen behind US infant formula crisis
New research reveals key genetic adaptations that allow Cronobacter sakazakii — a foodborne pathogen linked to serious infections in infants and the immunocompromised — to persist in powdered foods like infant formula. The study presents the most comprehensive pangenomic analysis of C. sakazakii to date and may signal a new role for AI in food safety surveillance.
The research team highlights that these findings could have significant implications for the infant formula sector.
“We’re seeing how certain accessory genes — those not essential to survival but beneficial under specific environmental conditions — could confer advantages that help Cronobacter sakazakii persist in food systems and possibly even resist sanitation protocols,” says the study’s senior author, Ryan Blaustein.
AI enters the chat
C. sakazakii has drawn global scrutiny following multiple high-profile recalls of powdered infant formula. The research spotlights that, though rare, infections can result in severe outcomes such as meningitis, sepsis, long-term developmental impairment, and, in some cases, death, with an increased risk for newborns and other vulnerable populations.
To better understand the bacterium’s resilience and transmission routes, the research team analyzed 748 whole genome sequences of C. sakazakii isolates from food, clinical, and environmental sources across North America, Europe, and Asia.
The team highlights that the use of AI was central to the study, which included a large language model to standardize inconsistent metadata about each strain’s origin — an essential step for cross-comparative genomics.
Traditionally, large genomic datasets are hindered by poor metadata quality. By using generative AI tools similar to ChatGPT, the researchers say they were able to harmonize the data and extract meaningful biological insights at scale.
“There’s so much data available, but that data is not always standardized,” Blaustein explains. “It’s not just the assembled DNA sequences, but the descriptor metadata.”
“Everyone enters things differently, from the date and time to things like ‘powdered infant formula’ using a capital ‘P’ or lower case ‘p’ or just powdered formula or even PFI. We used the language model to recategorize everything that was already in the public database and assign it with a very high accuracy. That hadn’t been done in this setting before.”
Processing the information
Once standardized, machine learning models that included random forest classifiers — a type of learning that integrates several decision trees to enhance prediction accuracy and reliability — were employed to distinguish between core and accessory genes and correlate gene presence with source and geography.
Key findings include:
- Strains isolated from powdered foods had larger genomes and genes linked to enhanced survival mechanisms in dry environments.
- The strains harbored more virulence-associated genes, suggesting a higher pathogenic potential among variants that persist in food production and distribution systems.
- Geographic variation was evident, with biofilm-forming genes and heavy metal resistance traits, showing up more frequently in specific regional isolates, likely reflecting differences in local agricultural or manufacturing practices.
Published in the International Journal of Food Microbiology and conducted by the University of Maryland, US, the researchers underscore that the discovery of such diverse and environment-specific accessory genes points to the pathogen’s ability to adapt across a wide range of ecological niches — from hospital settings to dry food manufacturing facilities.
The researchers further emphasize that the study lays the groundwork for using AI to support real-time molecular surveillance of emerging foodborne pathogens.
The authors point out that, for food manufacturers producing powdered milk and infant formula, the findings highlight the need for more targeted sanitation protocols and processing technologies informed by microbial genomics. They also reinforce the importance of international data sharing to track and mitigate transboundary food safety risks.