Machine Learning Enhances Non-Targeted Analysis Using High-Resolution Mass Spectrometry

A new review published in Artificial Intelligence & Environment examines how machine learning is transforming non-targeted analysis workflows for detecting environmental pollutants, helping laboratories address persistent analytical limitations.

Environmental pollutants are highly diverse and include pharmaceuticals, pesticides, industrial additives, and their transformation products. Many lack commercially available reference standards, complicating identification and quantification using traditional analytical methods.

Non-targeted analysis based on liquid chromatography coupled with high-resolution mass spectrometry can detect thousands of chemical features in a single environmental sample. However, only a small fraction of these signals can typically be identified with confidence using existing spectral libraries.

“Less than a few percent of environmentally relevant compounds can currently be confidently identified using traditional workflows,” the authors explain. This data interpretation bottleneck has limited the full potential of high-resolution mass spectrometry in environmental science.

Machine learning offers a way forward. By applying predictive models to spectral data, researchers can expand identification capabilities beyond the constraints of conventional rule-based approaches.

Compendium

Chemistry lab (shallow DOF; color toned image)

Trace Chromatographic Failures to Their Water Source

A diagnostic compendium mapping four contaminant classes to nine chromatographic symptoms—plus a water grade selection framework for HPLC, AAS, ICP-OES, and LC-MS

Expanding high-resolution mass spectrometry with predictive modeling

Machine learning models can predict tandem mass spectra from known molecular structures, effectively expanding spectral libraries in silico and strengthening non-targeted analysis capabilities.

These tools can infer molecular formulas, structural fragments, and molecular fingerprints directly from experimental spectra, narrowing candidate structures and improving identification confidence.

The review also highlights generative modeling approaches that propose plausible chemical structures even when compounds are absent from existing databases. This capability is particularly important for emerging environmental pollutants and transformation products that have not been formally cataloged.

Orthogonal parameters, such as retention time and collision cross-section, further enhance structural confirmation. Neural network models can predict these properties across chromatographic and ion mobility platforms, reducing false positives and improving reliability in high-resolution mass spectrometry workflows.

Addressing quantification challenges in non-targeted analysis

Quantification presents an additional challenge in non-targeted analysis, particularly when authentic standards are unavailable. The review describes machine learning approaches that predict ionization efficiency and response factors from molecular structure and experimental conditions, enabling semi-quantitative analysis of environmental pollutants without requiring standards for every detected compound.

Reliable quantification remains essential for exposure assessment and environmental risk evaluation. The authors note that machine-learning–based prediction of ionization behavior offers a pathway to more scalable, standard-free quantification in large-scale screening programs.

Implications for environmental laboratories

Despite rapid progress, challenges remain, including model transferability across instruments, limited representation of environmental pollutants in training datasets, and the need for improved interpretability. The authors call for multimodal learning strategies that integrate molecular features with experimental parameters, as well as for expanded databases that more accurately reflect environmental chemical space.

Looking ahead, researchers envision integrated machine-learning–driven screening platforms that combine compound identification, property prediction, and quantification within unified non-targeted analysis workflows.

For laboratories conducting environmental monitoring, regulatory screening, or exposure assessment, advances in non-targeted analysis supported by high-resolution mass spectrometry and machine learning may improve scalability, reduce manual data interpretation, and enhance confidence in pollutant detection.

This article was created with the assistance of Generative AI and has undergone editorial review before publishing.

Machine Learning Advances Non-Targeted Analysis of Environmental Pollutants

Review highlights how AI improves the identification and quantification of environmental pollutants

Expanding high-resolution mass spectrometry with predictive modeling

Addressing quantification challenges in non-targeted analysis

Implications for environmental laboratories

About the Author

Michelle Gaulin

Related Topics

When the Unexpected Hits

Sponsored

Trace Chromatographic Failures to Their Water Source

epMotion®: A Pipette—Only Smarter

Matrix-Matched Centrifugation: Decisions Behind Reproducible Extracellular Vesicle Isolation

From Monitoring to Meaning: Agentic AI Strategies to Lower LabOps Operational Costs and Risks