Laboratories exploring artificial intelligence (AI) and machine learning (ML) are often encouraged to adopt new tools in pursuit of efficiency, automation, or deeper insight. Yet many initiatives struggle to deliver value—not because the technology falls short, but because organizations begin without defining a clear objective or the data needed to support it.
Organizations frequently underestimate both the amount of usable data required and the effort needed to prepare it, says James Smagala, bioinformatics practice manager at Yahara Software.
“These are data-driven technologies, and they require you to have gathered all of the data or have clear sources that you’ll be able to integrate,” Smagala explains. “If you don’t have that data, your initiative is likely to fail.”
AI readiness, in practice, depends less on adopting new tools and more on aligning data, workflows, and organizational priorities around a specific goal. Standardizing laboratory data is not simply a technical exercise but a necessary step toward enabling AI and ML applications.
Building AI readiness starts with defining the problem
Attempting to standardize data broadly without a clear objective often leads to wasted effort. A more effective approach begins by defining a specific problem or desired outcome, then working backward to identify the data needed to support it.
“If you generically tell people to go work on their data, they’ll work on the wrong part of the data,” Smagala says. “Run the problem almost entirely backward. Start at the end state and figure out what you’re trying to accomplish.”
Examples might include uncovering patterns hidden in historical experimental data, reducing reliance on a single employee for complex decision-making processes, or automating repetitive administrative tasks. Processes that rely heavily on one experienced individual’s judgment can be particularly useful starting points because they highlight where institutional knowledge exists but is not formally captured in data systems.
For organizations aiming to implement AI capabilities within the next few years, defining a clear endpoint early helps guide which data streams require attention first and prevents unnecessary large-scale restructuring.
Once the objective is defined, the next question becomes whether the laboratory’s data can support it.
From big data to high-dimensional data
Discussions about AI frequently emphasize the need for “big data,” but that framing can be misleading in laboratory environments. Many organizations do not retain massive historical datasets because detailed information may be discarded once final results are reported. What laboratories often possess instead is complex, multidimensional information embedded across workflows.
“There’s a lot of data already there,” Smagala says. “You use it every day to produce high-quality results. It’s just organized around human processes rather than something an AI system can consume.”
This multidimensional context includes not only analytical results but also metadata, quality assurance and quality control information, instrument parameters, contextual variables, and downstream analytics. Preparing for AI involves reorganizing and connecting these elements so they can be accessed and interpreted consistently across systems.
Rather than creating entirely new datasets, many laboratories must integrate and standardize information that already exists but remains fragmented.
Why data standardization gaps persist
The need for data standardization varies depending on laboratory maturity. Early-stage organizations may rely on disconnected systems and informal knowledge to reconcile naming inconsistencies or interpret results. In some cases, understanding whether two datasets refer to the same parameter depends on institutional knowledge rather than standardized definitions. More mature laboratories may have partial integration but still lack sufficient historical data or machine-readable documentation. Highly mature organizations—often large enterprises with integrated systems and dedicated data science teams—can iterate quickly because their data foundations are already established.
Inconsistent data structures pose particular challenges for ML applications that depend heavily on standardized inputs. While some AI approaches can interpret semi-structured data and infer intent, ML models remain far less tolerant of variation, increasing the importance of consistent naming, formatting, and metadata.
“The Achilles’ heel of every machine learning project is the amount of time spent cleaning and preparing data,” Smagala says. “The structure of the data dictates the quality of the model you can build.”
Without deliberate data standardization, organizations often find that AI initiatives spend far more time preparing data than generating insights.
AI readiness requires organizational change
Because laboratory data spans multiple departments—including operations, quality, informatics, and IT—standardization cannot be treated solely as a technical task. Coordinating expertise across these domains requires leadership engagement and organizational alignment.
“Anything that pulls together multiple kinds of expertise across the organization becomes a leadership responsibility,” Smagala says.
Laboratory managers often sit at the center of these efforts because they have influence across teams while maintaining communication with senior leadership. In many organizations, they become the practical owners of AI readiness initiatives, coordinating operational priorities with technical requirements.
“You need somebody leading it, but it’s not going to be successful with one person saying we’re going to do this,” Smagala explains. “You have to build consensus and drive change in a way your organization understands.”
Viewed this way, data standardization becomes part of a broader effort to align processes, documentation, and systems so information can be reused reliably across workflows.
Iteration drives successful adoption
Incremental progress plays a significant role in successful AI initiatives. Rather than large-scale transformations, many organizations benefit from smaller experiments that refine data and processes over time.
“Data science is science,” Smagala says. “Remain curious and iterate relatively quickly. Take a few small steps, see if you can find a clearer path forward, and iterate again.”
Large, monolithic initiatives rarely succeed because AI systems evolve as organizations learn more about their data and workflows. “Too big a bite, and you’re setting yourself up to fail,” he says.
Organizations that experiment, refine approaches, and invest incrementally are more likely to see measurable improvements, including automation of manual tasks, improved workflow efficiency, and more consistent decision-making processes. Early successes often appear as incremental operational gains rather than dramatic technological transformation.
The changing landscape for laboratory operations
Laboratories can continue operating successfully using existing manual processes for years, but the broader environment is shifting as automation and advanced analytics become more accessible. Organizations that begin exploring data standardization and AI readiness now gain experience that compounds over time, while those that delay may face steeper transitions later.
“There is risk in starting too early,” Smagala acknowledges. “But there is also risk in not starting at all. At some point, someone else will figure out how to do the work more efficiently.”
Defining meaningful problems, organizing data to support them, and taking deliberate steps toward standardization all influence how effectively laboratories adopt AI and ML in the coming years.













