Researchers collaborating on AI data analysis in a laboratory

AI Won’t Fix Broken Data

Standardization and clear objectives drive successful AI adoption

Written byMichelle Gaulin
InterviewingJames Smagala, PhD
| 4 min read
Register for free to listen to this article
Listen with Speechify
0:00
4:00

Laboratories exploring artificial intelligence (AI) and machine learning (ML) are often encouraged to adopt new tools in pursuit of efficiency, automation, or deeper insight. Yet many initiatives struggle to deliver value—not because the technology falls short, but because organizations begin without defining a clear objective or the data needed to support it.

Organizations frequently underestimate both the amount of usable data required and the effort needed to prepare it, says James Smagala, bioinformatics practice manager at Yahara Software.

“These are data-driven technologies, and they require you to have gathered all of the data or have clear sources that you’ll be able to integrate,” Smagala explains. “If you don’t have that data, your initiative is likely to fail.”

AI readiness, in practice, depends less on adopting new tools and more on aligning data, workflows, and organizational priorities around a specific goal. Standardizing laboratory data is not simply a technical exercise but a necessary step toward enabling AI and ML applications.

Building AI readiness starts with defining the problem

Attempting to standardize data broadly without a clear objective often leads to wasted effort. A more effective approach begins by defining a specific problem or desired outcome, then working backward to identify the data needed to support it.

“If you generically tell people to go work on their data, they’ll work on the wrong part of the data,” Smagala says. “Run the problem almost entirely backward. Start at the end state and figure out what you’re trying to accomplish.”

Examples might include uncovering patterns hidden in historical experimental data, reducing reliance on a single employee for complex decision-making processes, or automating repetitive administrative tasks. Processes that rely heavily on one experienced individual’s judgment can be particularly useful starting points because they highlight where institutional knowledge exists but is not formally captured in data systems.

For organizations aiming to implement AI capabilities within the next few years, defining a clear endpoint early helps guide which data streams require attention first and prevents unnecessary large-scale restructuring.

Lab manager academy logo

Lab Management Certificate

The Lab Management certificate is more than training—it’s a professional advantage.

Gain critical skills and IACET-approved CEUs that make a measurable difference.

Once the objective is defined, the next question becomes whether the laboratory’s data can support it.

From big data to high-dimensional data

Discussions about AI frequently emphasize the need for “big data,” but that framing can be misleading in laboratory environments. Many organizations do not retain massive historical datasets because detailed information may be discarded once final results are reported. What laboratories often possess instead is complex, multidimensional information embedded across workflows.

“There’s a lot of data already there,” Smagala says. “You use it every day to produce high-quality results. It’s just organized around human processes rather than something an AI system can consume.”

This multidimensional context includes not only analytical results but also metadata, quality assurance and quality control information, instrument parameters, contextual variables, and downstream analytics. Preparing for AI involves reorganizing and connecting these elements so they can be accessed and interpreted consistently across systems.

Rather than creating entirely new datasets, many laboratories must integrate and standardize information that already exists but remains fragmented.

Interested in lab leadership?

Register for a FREE Lab Manager account to subscribe to our Lab Leadership Digest Newsletter.
Subscribe for Free

Why data standardization gaps persist

The need for data standardization varies depending on laboratory maturity. Early-stage organizations may rely on disconnected systems and informal knowledge to reconcile naming inconsistencies or interpret results. In some cases, understanding whether two datasets refer to the same parameter depends on institutional knowledge rather than standardized definitions. More mature laboratories may have partial integration but still lack sufficient historical data or machine-readable documentation. Highly mature organizations—often large enterprises with integrated systems and dedicated data science teams—can iterate quickly because their data foundations are already established.

Inconsistent data structures pose particular challenges for ML applications that depend heavily on standardized inputs. While some AI approaches can interpret semi-structured data and infer intent, ML models remain far less tolerant of variation, increasing the importance of consistent naming, formatting, and metadata.

“The Achilles’ heel of every machine learning project is the amount of time spent cleaning and preparing data,” Smagala says. “The structure of the data dictates the quality of the model you can build.”

Without deliberate data standardization, organizations often find that AI initiatives spend far more time preparing data than generating insights.

AI readiness requires organizational change

Because laboratory data spans multiple departments—including operations, quality, informatics, and IT—standardization cannot be treated solely as a technical task. Coordinating expertise across these domains requires leadership engagement and organizational alignment.

“Anything that pulls together multiple kinds of expertise across the organization becomes a leadership responsibility,” Smagala says.

Laboratory managers often sit at the center of these efforts because they have influence across teams while maintaining communication with senior leadership. In many organizations, they become the practical owners of AI readiness initiatives, coordinating operational priorities with technical requirements.

“You need somebody leading it, but it’s not going to be successful with one person saying we’re going to do this,” Smagala explains. “You have to build consensus and drive change in a way your organization understands.”

Viewed this way, data standardization becomes part of a broader effort to align processes, documentation, and systems so information can be reused reliably across workflows.

Iteration drives successful adoption

Incremental progress plays a significant role in successful AI initiatives. Rather than large-scale transformations, many organizations benefit from smaller experiments that refine data and processes over time.

“Data science is science,” Smagala says. “Remain curious and iterate relatively quickly. Take a few small steps, see if you can find a clearer path forward, and iterate again.”

Large, monolithic initiatives rarely succeed because AI systems evolve as organizations learn more about their data and workflows. “Too big a bite, and you’re setting yourself up to fail,” he says.

Organizations that experiment, refine approaches, and invest incrementally are more likely to see measurable improvements, including automation of manual tasks, improved workflow efficiency, and more consistent decision-making processes. Early successes often appear as incremental operational gains rather than dramatic technological transformation.

The changing landscape for laboratory operations

Laboratories can continue operating successfully using existing manual processes for years, but the broader environment is shifting as automation and advanced analytics become more accessible. Organizations that begin exploring data standardization and AI readiness now gain experience that compounds over time, while those that delay may face steeper transitions later.

“There is risk in starting too early,” Smagala acknowledges. “But there is also risk in not starting at all. At some point, someone else will figure out how to do the work more efficiently.”

Defining meaningful problems, organizing data to support them, and taking deliberate steps toward standardization all influence how effectively laboratories adopt AI and ML in the coming years.

Add Lab Manager as a preferred source on Google

Add Lab Manager as a preferred Google source to see more of our trusted coverage.

Frequently Asked Questions (FAQs)

  • What is AI readiness?

    AI readiness refers to an organization's preparedness to implement artificial intelligence systems effectively, which involves aligning data, workflows, and organizational priorities around specific objectives.

  • Why is data standardization important for AI initiatives?

    Data standardization is crucial because it ensures consistent naming, formatting, and metadata, which are essential for machine learning applications that rely on standardized data inputs.

  • How can laboratories prepare their data for AI applications?

    Laboratories can prepare their data by reorganizing and connecting multidimensional information, integrating fragmented data sources, and ensuring that all necessary data is accessible and interpretable for AI systems.

  • What common challenges do organizations face when adopting AI?

    Organizations often face challenges such as undefined objectives, underestimating data preparation efforts, and dealing with inconsistent data structures that hinder machine learning applications.

  • How should labs begin their journey toward AI readiness?

    Labs should start by defining a specific problem or desired outcome, then identify the data needed to support that goal, ensuring they undertake standardization efforts with a clear objective in mind.

About the Author

  • Headshot photo of Michelle Gaulin

    Michelle Gaulin is an associate editor for Lab Manager. She holds a bachelor of journalism degree from Toronto Metropolitan University in Toronto, Ontario, Canada, and has two decades of experience in editorial writing, content creation, and brand storytelling. In her role, she contributes to the production of the magazine’s print and online content, collaborates with industry experts, and works closely with freelance writers to deliver high-quality, engaging material.

    Her professional background spans multiple industries, including automotive, travel, finance, publishing, and technology. She specializes in simplifying complex topics and crafting compelling narratives that connect with both B2B and B2C audiences.

    In her spare time, Michelle enjoys outdoor activities and cherishes time with her daughter. She can be reached at mgaulin@labmanager.com.

    View Full Profile

Interviewing

  • James Smagala, PhD
    James Smagala is a bioinformatician and scientific software leader with deep expertise in analytical chemistry, laboratory automation, and data-driven solution design. He holds a PhD in analytical chemistry from the University of Colorado Boulder and brings a strong wet-lab background in biochemistry and analytical chemistry to his work.

    James specializes in translating the needs of experimental scientists into practical, scalable software applications. His experience spans scientific application development, sequence analysis, database architecture, next-generation sequencing workflows, laboratory process automation, and automated data curation. He has a particular interest in biological data modeling, information visualization, and the application of AI-driven technologies to infectious disease research.

    As the bioinformatics practice manager at Yahara Software, James helps life sciences organizations modernize their data infrastructure and leverage advanced analytics to improve laboratory performance and scientific insight.
    View Full Profile

Related Topics

Loading Next Article...
Loading Next Article...

CURRENT ISSUE - March/2026

When the Unexpected Hits

How Lab Leaders Can Prepare for Safety Crises That Don’t Follow the Script

Lab Manager March 2026 Cover Image