Modern scientific laboratories are powerhouses of data generation, yet insights from these data are severely undermined by fundamental incompatibilities among laboratory instrument models.
Imagine trying to automate a workflow in which a mass spectrometer from Vendor A labels a critical parameter retention time (RT), but the liquid chromatograph from Vendor B labels the identical value T(Ret). These are both output in minutes and seconds, but the LIMS only accepts an input in seconds. This lack of standardization—where every instrument speaks a slightly different, proprietary dialect and saves data in a unique format—creates crippling data silos in which data cannot flow from one system to another, preventing large-scale integration, automated analysis, and collaboration.
Fortunately, a solution exists: ontologies.
Ontologies: The semantic backbone of scientific data
An ontology is a structured, vocabulary-based model that defines the fundamental entities (things, concepts) and the precise relationships between them within a specific domain.
For scientific data, an ontology works like a semantic roadmap, ensuring that when two different systems reference the same concept—whether it's an instrument, a unit of measure, or a specific chemical—they are both referring to the exact same thing and its properties. This makes the data machine-readable and semantically consistent across all devices and systems.
Ontologies are found in many disciplines outside laboratories, such as in biodiversity collections, geology, and engineering, among others. This article explains how ontologies can transform your laboratory data, maximizing the potential of your information to finally achieve the long-promised goal of findable, accessible, interoperable, and reusable (FAIR) scientific data.
Achieving standardization with data ontologies
A primary goal of system and instrument data standardization is interoperability, the ability of different systems to exchange and use data. For example, an ontology can specify that a cell culture in one system is the same as a tissue culture in another, allowing data to flow across instruments, labs, and even entire research institutions.
Ontologies also structure metadata, which is data about the data. For instrument outputs, this could include who performed the experiment, when it was done, the specific instrument used, and its last calibration date. An ontology defines the terminology for every metadata field, ensuring consistency.
Ontologies in laboratory settings
An ontology strictly defines instrument data terms like sample, reagent, measurement, calibration, and assay, along with their properties and relationships. Thus, ontologies eliminate confusion and ensure that data from different instruments, even from different manufacturers, can be understood (is findable and accessible) by various systems and researchers (see Table 1). This is a critical step in making data machine-ready for automated analysis and machine learning (reusable).
Table 1. Common Terminology Discrepancies
Area of Difference | Example Discrepancy | Why It's a Problem |
Chemical Names | One lab uses acetylsalicylic acid, while another uses the brand name or the more formal 2-acetoxybenzoic acid. | Different names for the same substance make it difficult to search databases, share data, or ensure researchers are using the correct materials. |
Units of Measurement | Volume measurements are recorded in mL, milliliters, or cubic centimeters (cc). | Inconsistent units require manual conversion, leading to potential errors and slowing down data analysis. Automated systems can't easily compare the values. |
Instrument Data Fields | A chromatograph from one vendor labels a data field Retention Time, but one from another vendor calls it RT. | Automated data integration is impossible when field names aren't standardized. Human effort is needed to map and align the data before analysis. |
Sample Identifiers | One lab's system uses a format like ProjectA-Sample123, while a collaborating lab uses S-00123-A. | Data from different sources can't be automatically linked. This hinders collaborative research and makes it hard to trace a sample's history across multiple labs. |
Experimental Procedures | A procedure might be described as a PCR reaction in one document but Polymerase Chain Reaction in another, with differing details on temperature cycles. | Lack of a standardized vocabulary for procedures makes it difficult to replicate experiments, a key tenet of the scientific method. |
Gene/Protein Names | A gene is referred to as tumor protein p53 in one paper and TP53 (its official HUGO Gene Nomenclature Committee symbol) in another. | Inconsistent gene and protein names create ambiguity in biological databases and make it challenging to correlate findings across genomics, proteomics, and cell biology studies. |
Standardized terminology makes cooperation between data sets much easier. Standardized data is crucial for regulatory compliance and effective collaboration between sites (such as a quality lab and a contract research organization, or labs in different geographic regions of a global organization). In turn, standardization enables cross-disciplinary collaboration, data interoperability, and experimental reproducibility. Reproducibility is a foundational principle in the scientific method and one that has historically been the most difficult to achieve.
Ontologies and the four pillars of FAIR data
Ontologies are not just a tool for one part of the data lifecycle; they are the fundamental enabler for all four aspects of the FAIR principles.
FAIR Pillar | Requirement | Ontology Application and Impact |
Findable | Data and rich metadata are easy to find for both humans and computers. | Semantic Clarity and Discovery: Ontologies mandate the use of globally unique, persistent identifiers (PIDs) for concepts (e.g., specific ID for 'Mass Spectrometer'). This ensures automated indexing and allows a computer to find all relevant datasets (e.g., searching "Chromatography" finds "LC"), regardless of the source's terminology. |
Accessible | Data is retrievable via a standardized, open protocol. | Standardizing Access Terms: Ontologies define the terms of access and the required metadata fields (e.g., access permissions, roles like 'Principal Investigator'). This ensures that a data access request (e.g., "give me the data for sample X") is understood consistently by the data repository, enabling machine-based retrieval. |
Interoperable | Data and metadata can be integrated with other data and analyzed by algorithms. | Shared Language and Mapping Rules: This is the core strength. Ontologies provide the shared vocabulary and explicit mapping rules (the relationships). They allow systems to automatically exchange and use each other's data (e.g., mapping RT from one instrument to T(Ret) from another and converting units to seconds), making complex data integration automatic. |
Reusable | Data is well-described, licensed, and has clear provenance. | Consistent Provenance and Context: Reusability relies on rich, accurate metadata. Ontologies define the standard terms for documenting every aspect of data provenance: the Instrument used, the Calibration Status, the Experimental Procedure, and the Analyst. This structured, machine-readable context ensures data quality and maximizes its long-term value. |
The benefits of ontologies extend beyond access to FAIR data. An ontology enables automatic transformation (e.g., mapping different units of measurement or chemical identifiers). With an ontology, your lab will gain better data searchability, accelerated data ingestion and transformation, faster adoption of artificial intelligence (AI) and machine learning (ML) methods, and reduced scalability challenges.
Lab Quality Management Certificate
The Lab Quality Management certificate is more than training—it’s a professional advantage.
Gain critical skills and IACET-approved CEUs that make a measurable difference.
Where to start with ontology development
Developing an ontology could seem like a daunting task, but you don’t have to reinvent the wheel. Laboratory managers, directors, and researchers can use this section as a roadmap for overcoming challenges in ontology development for your lab.
Leverage existing ontologies
Compilations of existing ontologies are available, such as
- the Stanford University Knowledge System AI Laboratory (KSL)’s Ontology Server,
- the National Center for Biomedical Ontology’s BioPortal,
- the Human Gene Nomenclature Committee’s gene name databases
- the Allotrope Foundation Ontologies, or
- the DAML Ontology Library (a Defense Advanced Research Projects Agency initiative).
Choose a relevant ontology to begin but be prepared for multiple challenges in adapting it to your laboratory.
Challenges to ontology development and use
You may struggle to make your chosen ontology robust enough to suit your laboratory’s needs. Existing ontologies may not cover all the specific entities and relationships relevant to your research, requiring you to either extend the ontology (which demands some expertise in semantic relationships) or find workarounds. Extending an ontology requires building consensus on the definitions and relationships, which can be a slow and challenging social process. Often, valuable laboratory data is stored in legacy systems using semantically unstructured terminology. Mapping this data to an ontology can be time-consuming as well, but it is much easier with modern data transformation tools.
The challenges don’t end with development. As with any change in an organization’s processes, you may run into resistance to adoption. Users may be unable to see a clear return on investment (ROI), such as improved data integration and knowledge discovery, before the ontology is in place. The complexity and learning curve may be another deterrent. The ontology will need to be updated continually to adapt as scientific knowledge changes. Ontologies are not mandated for data sharing and reuse, so widespread adoption is not yet happening.
Another set of challenges is found in the technical aspects of ontology implementation. Many LIMS, ELNs, and other data management platforms offer ontology tools that make it easier to integrate with existing ontologies. These could include:
- Commercial or open-source software tools for ontology management and integration
- Exchange–Transform–Load (ETL) tools
- Data modeling and mapping strategies
- Semantic web technologies (e.g., RDF, OWL)
Despite these challenges, ontology adoption is slowly increasing. The Industrial Ontologies Foundry, Open Biological and Biomedical Ontology (OBO) Foundry, and the development of more user-friendly tools like the Allotrope Simple Model or the U.S. Pharmacopoeia’s work on standardized nomenclature are removing some of these barriers. As the volume and complexity of scientific data increase, the need for effective knowledge representation and integration solutions like ontologies will likely drive further adoption.
Future directions in data standardization
Ontologies are a logical next step in the continuous evolution of laboratory informatics and data management. As AI and ML adoption drive innovation in unexpected ways, having a flexible ontology to categorize serendipitous discoveries will be invaluable.
The ideal future state of laboratory data standardization enabled by ontologies encompasses several aspects, including:
- Seamless data exchange and integration across labs and institutions
- Widespread adoption of FAIR data principles
- Fully automated data analysis and knowledge discovery
- Accelerated scientific breakthroughs
Ontologies move laboratory data beyond silos into a rich, interconnected, and semantically consistent ecosystem. This transformation is vital for modern laboratory informatics. It enables the automated, large-scale data analysis required for meaningful scientific discoveries in the face of ever-expanding data sets.











