How to Ensure Data Integrity in the Lab

Data integrity is key in a strategic approach to reduce risks to processes and products. Digitally handling data in a way that ensures its integrity is also the basis for how labs enable productivity. A digital twin is a collection of data capturing the reason for the experiments, the plans, and their execution results. Visualization and review of digital twins help scientists make decisions with risk mitigation in mind. Automation can be applied to creating and analyzing digital twins. This simplifies the application of self-learning and other advanced AI technologies.

According to the FDA, “data integrity refers to the completeness, consistency, and accuracy of data. Complete, consistent, and accurate data should be attributable, legible, contemporaneously recorded, original or a true copy, and accurate (ALCOA).” With data integrity being part of the FDA’s risk-based strategy, there is also a need for that data to be available and accessible.

Data integrity is particularly challenging in a lab setting because a variety of techniques, data types, and formats are necessary. Since this data is at the heart of the strategic and tactical decisions organizations make on a daily basis, it’s more important than ever to ensure data integrity in the lab. So, if you are interested in maximizing productivity and accelerating decision making, here’s an overview of how to make that happen.

Overview of digitalization in labs

The most basic need for digitalization in the lab is to capture data from lab equipment and convert that data into results about analyzed samples. During any given experiment, a variety of equipment, methods, and materials may be used to prepare and characterize synthesized compounds or formulated products. The applied analysis techniques and resulting data may therefore be complex.

A lot of equipment in the lab takes advantage of digitalization, though the biggest challenge is often data heterogeneity or variability. The more variation among lab equipment, the more adaptations needed to handle the resulting data, which increases risk. By digitalizing labs, data can be more easily manipulated, processed, interpreted, distributed, and re-used. Importantly, the data needs to be structured to ensure that automated processing of large datasets is convenient, fast, and reproducible, and that the interpretations of results are stored along with the experimental data. Scientists looking at the data are thus more quickly able to understand what the data mean, and to decide which method or instruments to use next, or which test to perform—whether it’s a simple test on a complex material or a complex test on a complex material.

Digitalization is also important in understanding and capturing the sequence of operations a scientist follows when completing an experiment. At a given time, a scientist may be working on multiple parts of an experiment simultaneously rather than in a sequential fashion. Digitalization helps a scientist ensure plans and data are related appropriately and that data are organized effectively in the digital twin for each experiment.

Digitalized laboratories ideally allow for on-demand access to analytical data. Furthermore, storage of digital twins in standardized formats that meet "original or true copy" criteria can facilitate meeting regulatory requirements for long-term data preservation. Including sample genealogy information with datasets and using software systems that consume and handle a variety of formats can allow for more complete data visualization.

Preventing discrepancies in data

To prevent discrepancies in data, it’s important to understand the types of errors that could arise in a lab setting—those being errors of omission or commission, whether accidental or deliberate.

Errors of omission may arise from not gathering data in an experiment that should have been gathered, and errors of commission are the creation of false values or making incorrect changes to data even by simple transcription or typing mistakes. An accidental error occurs when the change is not on purpose, and a deliberate error is when individual values or entire sets of data are purposefully altered or deleted.

By understanding the type of discrepancies that can arise in scientific data (whether incomplete or inaccurate) the better equipped an organization will be in creating a system that minimizes the risk and prevents the discrepancies from happening.

Many organizations use data standardization to prevent such discrepancies because when paired with digitalization, systems can be put in place to check for inaccuracies and inconsistencies. Such systems help ensure digital twin data quality for an organization.

Bringing together accurate and complete data in the digital twin allows scientists to effectively gain insights that lead to scientific intelligence and enables organizations to make strategic decisions, limit risks, and share data in and out of the organization.

How to properly store, archive, and retrieve data

It is important for organizations to have solutions in place to properly store, archive, and retrieve their data accurately, especially if results are to be trusted.

The FDA requires the long-term preservation of data, especially in drug development, because the data needs to be available and accessible should a product need to be reviewed or investigated. Properly archiving data can help with this process because it enables organizations to preserve data in its original form or true copy for the long term, ultimately maintaining data integrity.

Scientists within an organization are often interested in previously used processes or the compounds and materials made thereby. For this reason, access to the digital twin is ideal. By storing the original data systematically and having a simple way to query for it, a user can easily retrieve the requested data whenever needed. An ideal interface may allow a user to retrieve and work with the most interesting parts of the digital twin rather than the complete dataset when needed and allow appropriate editing and updating without compromising data integrity. For example, querying data is common in looking for impurities that may occur in substances and products and for understanding how they may have arisen in certain processes.

How digital tools can help organize data

Informatics tools make organizing data easy, so R&D organizations and scientists can make faster, smarter, and more independent decisions about the results of their reaction.

Informatics systems and effective digital connections allow an organization to get the most out of their data by reducing the time spent processing and analyzing data, and by streamlining the assembly of related project data into comprehensive reports. A complete knowledge management strategy includes additional decision support tools enabling effective collaboration within or between organizations.

Organizations that enable their scientists to minimize risks and make the best, most informed decisions have a digital strategy and technologies that include data integrity in fundamental ways, not as an afterthought. So, ask yourself whether your systems and tools provide the data integrity you need now and for the foreseeable digital future.

Data Integrity and Cybersecurity in Today’s Labs

How to Ensure Data Integrity in the Lab

Digitally handling data in a way that ensures its integrity is the basis for how labs enable productivity

Overview of digitalization in labs

Preventing discrepancies in data

How to properly store, archive, and retrieve data

How digital tools can help organize data

About the Authors

Andrew Anderson

Graham McGibbon, PhD

Related Topics