Lab Manager | Run Your Lab Like a Business

INSIGHTS

Female scientist looking futuristic GUI

Why a Digital Data Backbone is Necessary and How Companies Can Get There

The integration conundrum can be resolved with a robust data infrastructure

Craig Williamson

Recent FDA warning letters about data security and transcription errors have refocused attention on the challenges of data integrity within biopharma development. Managing instrument data for compliance alone can be a big lift. 

But as smart instruments become the norm, the industry also faces a more tantalizing possibility: smart, interconnected labs that leverage data to drive processes and accelerate discoveries. Today, data should do more than check a regulatory box—it should bring products to market faster. 

Labs need a holistic, accessible view of their data and a digital data backbone that can put it to use. This vision is closer than many may think. 

A new vision for data integration

For a long time, data management in labs was synonymous with showing your work. The ability to completely trace the material specifications behind a scientific discovery or decision has long been essential for regulatory submissions, peer review, and establishing priority in IP filings and litigation. 

To improve traceability, many labs already automate data capture from a variety of instruments and processes. Replacing manual transcription improves data integrity while also saving time. But now, integration is about more than a simple transfer of data between two systems. Labs have bigger goals. 

Data should not just describe what happened; it should also be predictive and prescriptive. The vision is to consolidate data from hundreds of instruments, analyze it in real time, and automatically feed insights back into lab processes and operations. Ultimately, data should help speed time to market. 

graphic of the digital maturity model
Figure 1: Digital maturity model.
IDBS

Consider one small case: an instrument reading is identified as an outlier because it’s outside the expected range of values. Storing this data point for record keeping purposes is one thing. But an outlier could have big implications. It could, perhaps, indicate an instrument failure that, unnoticed, could make a dataset unusable. Flagging the outlier early could avoid costly do-overs.

But catching an outlier and acting on it requires more than simply uploading data. In a smart, connected Pharma 4.0 lab, the informatics platform would not only read, store, semantically enrich, publish, and back-up the new data. In parallel, it could compare it with the expected limits, catch the outlier, and send an alert to the appropriate team for further investigation.

This expanded vision is technically possible today. Repeated for hundreds of instruments and processes, it has the potential to transform labs. But it also means a fundamental change in the requirements that integration designers face.

New requirements for integration platforms

Historically, integrating data from lab instruments was a task for IT staff and software developers. Bringing a new instrument online might require a unique pipeline and custom code. Even deciding to collect a new data field could be costly. Implementations could easily become lengthy and unpredictable.

When platforms had only a handful of integrations, it was viable to manage changes through IT, working with configuration files on a server. Now, though, labs are connecting hundreds of instruments to core systems of record and asking more of each. This complexity is only poised to grow. 

Laboratory informatics leaders, who must be proactive in the face of change, now face difficult tradeoffs. Often, leaders must choose between discovering business-critical insights and saving on costly development overhead. To give key decision makers the insights they need—and to build those insights into the product development cycle with real-time data and analysis—a better solution is needed.

The ideal solution would give end users the power and characteristics of a scripting language, but without the need to learn syntax or pull in a developer. That way, each incoming piece of data can intelligently inform future process steps as appropriate, without any user assistance.  

Instead of direct, custom integrations, labs require a digital data backbone that can easily integrate data from diverse sources without losing meaning or flexibility. For any integration platform, the ability to adapt with inevitable change is key. 

Current approaches and new directions

Some integration service providers have attempted to compete by offering as many connectors as possible out of the box. But delivering a “pipe” between two systems does not mean that data transfer between those systems will be solved thereafter. Connection requirements constantly evolve. So do the connectors themselves, as instrument vendors release new versions of software or mitigate new security threats. 

In many cases, these needs may be satisfied with a simple generic connector, such as a structured text file (e.g. CSV) interface, or a simple internet of things device. In a recent experience with a large biopharmaceutical company, IDBS found that more than 40 percent of their integration needs could be fulfilled in this way, thus greatly reducing the need for hundreds of complicated and bespoke connectors. 

Much of the general discussion about ‘integration” also tends to focus only on the actual pipes between platforms through which data flows. True data integration, however, requires a complete view of everything going on around that specific datum. 

To drive real insights, data needs context. A viability measurement from a cell counter, for example, is not just a number—it represents the health of living cells at a particular point in a production process. A sudden change can have big implications. 

Graphic of instrument data integration in context
Figure 2: Instrument data integration in context.
IDBS

Each new piece of data that enters a digital data backbone should always be combined with appropriate context so that the precise meaning is unambiguous. All incoming data must be aligned with accepted standards for interoperability and reuse.

Today, instruments store and share semantic metadata in a variety of formats, but the industry is working to unify standards. These initiatives are based on the F.A.I.R approach: the idea that data should be findable, accessible, interoperable, and reusable. 

The AllotropeTM Data Format and Analytical Information Markup Language projects, for example, offer XML-based standards for storing data and metadata from diverse instruments in consistent ways. The Standardization in Lab Automation project provides a consistent communication protocol for interacting with data, and BioPhorum is connecting organizations to drive a variety of standardization initiatives that make a robust digital backbone approach possible.

The new way: predictive and prescriptive data

Scalable connectors and clear metadata are the foundation for helping data add value. Next, when a piece of data is acquired, scientists need to be able to choose what to do with it. Integration designers should be prepared to move beyond simple two-way connectors to support more powerful integrations and data flows. 

Scientists should be able to choose the next step based on configurable parameters. Workflows might involve in-line capture of supporting data and context, the ability to verify and validate data values and business rules, or calculations and normalizations that can run in the background.

Integrations should also offer branching workflows, capable of acquiring and processing discrete data sets in parallel before consolidation. As process needs and data sources evolve, workflows should be able to evolve and adapt. 

The technologies already exist to run labs in this way. But to capture their full potential, lab leaders need to reimagine good data management: success is not simply about data integrity or the mechanics of data transfer. Instead, the best solutions require a holistic, creative view of how a lab’s universe of data points can drive results.