With the advancement of automation and new drug modalities, modern labs generate more data than ever, but turning data into intelligence is another story. Lab leaders want to make data actionable or achieve data intelligence from their data pools. They know their laboratory data could help their businesses better—they just need to harness it.
AI doesn’t automatically give you the right intelligence
Laboratory data is fertile ground for artificial intelligence (AI). Data-driven quality control can alert labs to instrument trends and deviations. Analyzing data can improve resource allocation and budgeting and identify emerging patterns around degrading data or process integrity. Data can be actionable; it can generate powerful insights, shape decisions, and improve business outcomes—and with AI, you can extract greater insights from the data than ever before. For example, AI pattern recognition can assist in process monitoring and optimization. But there is also a risk. If AI models are trained on data that are biased or incorrectly contextualized, they can generate biased results.
To reach digital maturity, labs need the right intelligence in to get the right intelligence out. You may have heard the phrase "garbage in, garbage out”: bad data will lead to bad outcomes. "Intelligence in" is about using high-quality data, but it is also about harnessing human intellect. To succeed, AI and machine learning (ML) require both the right data and the right people asking the right questions.
Start with the right data
"Garbage in” includes data with transcription errors or stripped of context; for example, just inputting the method and results of an experiment without the context of what the experiment is. But in the context of AI/ML, "garbage in" can also mean insufficient data. Typically, if a lab runs an experiment and does not achieve the desired outcomes, that data is archived but rarely retrieved for analytical review. However, in ML models, data from failed experiments can actually yield useful information about how parameters interact. Models get more accurate with lots of data on what does and does not achieve desired outcomes. Therefore, "intelligence in" should include data from both successful experiment runs and assays as well as failures.
In addition to being accurate, high-quality data should be complete, comprehensive, current, and unique. Complete data have no missing entries, and metadata or associated data are included. Next, data should be comprehensive for the questions the lab intends to ask. For example, attempting to develop golden batches but only providing a dataset with laboratory information management system (LIMS) data may generate an inaccurate and biased response. A LIMS may only contain partial data, which would necessitate pulling data from other sources across the lab for a more complete picture. Data should also be current. Training an algorithm with out-of-date data could produce an out-of-date answer. Finally, data should be unique. If values were accidentally duplicated, for instance, it may further bias the data.
Get the right data to the right people
Next, for good data to be useful, it must be available and intelligible to both humans and machines. Often, data are stored in different silos and formats; even high-quality data can be hard to retrieve.
Many companies have begun to funnel data from all systems into a single data lake. This collection of structured and unstructured data can provide a single source for data-consuming algorithms. However, this approach is resource-intensive and no longer necessary. Newer tools are designed to provide access to data regardless of the data location, essentially “de-siloing” the systems architecture with no IT involvement.
Wherever data are stored, a well-architected data backbone adds layers on top of the data to maintain integrity and context for data from various sources. These architectures are often built around the FAIR data principles: ensuring data is Findable, Accessible, Interoperable, and Reusable. In the past, it took a trained IT professional working side-by-side with the subject matter expert to construct the complex queries needed to gain desired solution sets. New tools are reaching the point where anyone can learn how to construct meaningful queries without knowing how to program. Putting low- and no-code tools in the hands of lab workers can speed process development and experimentation.
Now, AI/ML have become integral to enhancing low- and no-code platforms, making it easier for non-technical users to perform sophisticated data analysis. The synergy between AI/ML and low/no code tools ensures that high-quality data is accessible and actionable, enabling users with varying levels of expertise to contribute to data-driven decisions.
“Intelligence in, intelligence out” means that outcomes are affected by the people seeking answers as much as by the quality of the data analyzed. This is true once a data backbone is established and optimized, but it is also true leading up to that point. When designing a data backbone, human intelligence is key to ensuring that data are optimally captured, contextualized, stored, and accessed.
Have the right people ask the right questions
Having the right people in the room for a big data project often means having all roles represented. Diverse perspectives help ensure that the right questions will be asked internally. Which data matters? Given those goals, how should data be organized? These are all questions that may vary from lab to lab.
Bench scientists and technicians should be involved from day one of a new data strategy; often they are best placed to understand the problem space and to qualify that the right questions are being asked in the first place.
Business leaders and data experts are also crucial to ensure that the architecture captures data in ways that can be queried to answer business questions and achieve the desired business outcomes.
The most successful labs often partner with industry experts who understand scientific and process development business needs and have data science skills and capabilities. Often, these external partners can also serve as helpful training resources.
While the industry develops in digital maturity from wet experiments to in silico techniques, knowledge gaps and communication can be obstacles. For all team members, a shared foundation of digital literacy around how AI and ML models work is essential; that foundation should include a shared commitment to the importance of stewarding high-quality data. A shared vocabulary can help stakeholders communicate well with each other and with technical partners about data architecture and feasibility.
While AI tools are indeed democratizing access to insight, true data intelligence requires an intelligent approach from beginning to end, with high-quality, well-organized data supported by knowledgeable, thoughtful humans in every part of the business.