Instrument data is key to the base layer of science, and experimental results and quantitative measurements are the foundation of scientific meaning. However, when data from disparate instruments and systems remain fragmented and inaccessible, scientists risk missing connections and overlooking crucial insights. If you want to take advantage of artificial intelligence and machine learning (AI and ML), you must do the necessary work to increase the accessibility and searchability of your data.
To realize the potential of instrument data, organizations must prioritize the development of comprehensive plans for standardization, specifically aimed at enabling the use of AI and ML tools. Central to this effort is the adoption of the FAIR principles—making data findable, accessible, interoperable, and reusable. The initial and most foundational step toward achieving FAIR data is the standardization of data formats.1
However, much has already been written about the process of making data FAIR, so that will not be the focus here. This article examines the ongoing efforts to pool instruments’ FAIR data resources for effective AI and ML applications for the betterment of science.
Understanding instrument data terminology
Data formats are paramount to scientific research, and it is increasingly obvious that data should be the first consideration in digital transformations. Let’s begin by defining some of the terms we’ll use in what follows: instrument data, metadata, ontology, and standardization.
Instrument data refers to any data generated by or with a device. In most cases, this will be the result reports that your analysts require from the instruments. But it also includes any images collected and that instrument’s metadata.
Metadata is the key to searchability and the automated transformations you are aiming for. It’s the data about the data – when it was generated, by whom, and for what purpose. It could include information about when the instrument was last calibrated or validated. The metadata is often structured by, and enriched with, an ontology.
An ontology is a system for organizing data by defining the characteristics of entities and their relationships. It is used to structure data across multiple instruments and systems. There are several frameworks for developing an ontology, all with the common goal of standardization.
Lab Management Certificate
The Lab Management certificate is more than training—it’s a professional advantage.
Gain critical skills and IACET-approved CEUs that make a measurable difference.
Standardization of data is a key step in making that data FAIR. Standardization transforms data into a consistent, uniform format. This process makes data easier to understand by multiple systems.
The need for standardization
At the Materials Science and Technology conference in Pittsburgh in October 2024, one talk was about ontologies in physics. Researchers who use particle beam accelerators explained that because of the scarcity of beam time, it’s often necessary to compile results from several different accelerators. Unfortunately, this means that the outputs are in several different formats. Translating the results was a time-consuming process, but now, the Materials Science and Engineering Ontology offers an option for simplification by providing preferred names for data fields, which allows the data to be compiled easily.
This is not a situation reserved for materials scientists. Anyone who’s used a scanning electron microscope or an X-ray diffractometer knows that the structure of the analysis reports will vary by manufacturer. Think about aluminum, aluminium, and Al, for example—these are all acceptable ways to refer to element 13. You might find ethyl alcohol, ethanol, or EtOH in reports, too.
These barriers to effective communication and data interoperability can be addressed with standardization, which many organizations across scientific disciplines and industries are working to address.
Industry initiatives and organizations working toward standardization
Industry initiatives like the Pistoia Alliance’s Allotrope Simple Model (ASM) and the analytical information markup language (AnIML) are crucial for achieving data standardization and interoperability in scientific research. Such efforts are supported by platforms like Zenodo, OpenLDR, and the European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI). A structured approach supported by ontologies allows researchers to effectively categorize and describe their data, which improves data interoperability, enhances usability, and adheres to FAIR data principles. In turn, FAIR data supports integrated data management.
Relevant ontologies in laboratory settings include the Chemical Information Ontology, available through the EMBL-EBI Ontology Lookup Service (OLS), and specialized resources such as the Pharma General Ontology from the Pistoia Alliance, the Open Biological and Biomedical Ontology Foundry, and the Chemical Analysis Metadata Platform. The development of knowledge graphs like MatKG further illustrates the power of ontologies in making standardized data more accessible and contextually rich.
Integrating instruments with systems and platforms using standardized data
Standardizing data outputs and terminologies across laboratory instruments and systems allows the big picture to emerge from their data. Successful integration enhances the applicability of AI and ML tools, allowing for improved data analysis and enhanced decision-making capabilities throughout the organization.
To make instrument integration even easier going forward, instruments and systems manufacturers could collaborate with ontology developers to further standardize data structures. Of course, manufacturers may be disincentivized to make data standardization across similar instruments a reality because proprietary data formats help to lock customers in with one supplier. The option of including preferred names for data fields in an ontology represents a possible solution. Researchers could be trained in the use of developing preferred names for use with ontologies, to ensure stability in naming conventions.
AI and ML will continue to drive innovations in data analysis and integration, requiring ever greater data interoperability and reusability. The future is likely to involve:
- Standardized instrument data via preferred name conventions that is seamlessly integrated into laboratory systems,
- collaboration and innovation that enhance scientific data applications and spread, and
- Research breakthroughs for society’s most urgent problems.
Standardization improves the interoperability and accessibility of instrument data. By adopting the FAIR principles and leveraging defined data formats, researchers can foster collaboration and data sharing to ultimately drive scientific discovery. Stakeholders in the laboratory community, as well as those across the broader organizations in which they sit, are encouraged to embrace these practices and to build integrated and efficient research environments, clearing the way for the effective implementation of AI and ML strategies and tools.
References
- Mugahid, D., et al. (2025). "A practical guide to FAIR data management in the age of multi-OMICs and AI." Front. Immunol., 15. https://doi.org/10.3389/fimmu.2024.1439434










