The National Academies of Sciences, Engineering, and Medicine is calling for stronger safeguards and testing frameworks to improve the safety of machine learning systems used in high-stakes environments. While these technologies promise new efficiencies in automation, diagnostics, and robotics, their reliability remains below what’s required for safety-critical operations.
While the National Academies’ report focuses on safety-critical applications such as autonomous systems and robotics, its findings offer valuable lessons for laboratories increasingly integrating AI and machine learning tools.
“There is a critical gap between the performance of machine learning [and] what we would expect in a safety-critical system,” said George Pappas, associate dean of research at the University of Pennsylvania. “In the machine learning community … people may be happy with a performance of 95 or 97 percent. In safety-critical systems, we’d like errors of 10¯⁹”—the equivalent of near-zero failure tolerance.”
The new report, presented during a National Academies webinar, outlines strategies to bridge this reliability gap and ensure that AI-enabled systems perform safely and predictably in real-world environments.
Developing AI guardrails for safety-critical systems
Machine learning systems make predictions based on patterns in training data, but incomplete datasets or unfamiliar conditions can lead to errors with serious consequences. “Machine learning systems tend to fail when they encounter novelty,” said Thomas Dietterich, distinguished professor emeritus at Oregon State University. “We need processes in place, both automated and human, to detect and address those novelties.”
Experts recommend new architectures that incorporate redundancy, real-time monitoring, and fallback mechanisms when uncertainty is high. “Every time we use machine learning in safety-critical settings, we need to develop safety filters, guardrails,” said Pappas. “If a car or robot misclassifies someone, there should be safeguards that can prevent an accident.”
The report also emphasizes the need for updated testing standards, transparency in incident reporting, and post-market data collection to maintain public trust and system integrity.
Improving AI reliability through education and cross-disciplinary collaboration
Bridging the gap between traditional safety engineering and emerging AI technologies will require collaboration and education. “A focused effort is needed to educate the next generation of researchers and engineers on how to build these machine learning-enabled safety-critical systems,” said Jonathan How, Ford Professor of Engineering at the Massachusetts Institute of Technology.
Lab Safety Management Certificate
The Lab Safety Management certificate is more than training—it’s a professional advantage.
Gain critical skills and IACET-approved CEUs that make a measurable difference.
Industry leaders are encouraged to provide ongoing training to engineers on safety regulations and to integrate these considerations early in system design. The report further calls for developing cross-disciplinary frameworks that unite the philosophies of both the machine learning and safety-critical communities.
What machine learning safety means for lab managers
For lab managers, the findings highlight the growing need for AI risk assessment, validation, and governance in automated workflows. As machine learning increasingly supports tasks like image analysis, predictive maintenance, and process optimization, leaders should treat algorithmic reliability as a safety consideration—not just a performance metric. Establishing clear protocols for data quality, system validation, and user training will help ensure that laboratory AI systems are accurate, transparent, and safe to deploy.
This article was created with the assistance of Generative AI and has undergone editorial review before publishing.










