Managing Big Data

Will Tashman

Will Tashman, co-founder and chief revenue officer at Uncountable, explains the benefits of taking a structured approach to data management, and offers managers tips on how to determine what tools or data management systems will work best for specific needs.

Q: Can you explain the difference between, or provide examples of, unstructured and structured data in labs?

A: The primary difference between structured and unstructured data is that structured data is unified across an organization. For example, a structured data system can enforce that viscosity is tied to a test temperature, with the test temperature always recorded in a similar manner. An unstructured approach might have one scientist writing “Viscosity,” another “visc RT,” and a third “Brookfield Viscosity at 23°C.” Cross referencing the data is much easier in a structured approach. Good structured approaches will also ensure that input details are tied to output details for future analysis, rather than existing in two separate locations.

Q: What is the best way for labs to make the switch to a more structured approach?

A: Labs should first decide how they want to collaborate and view their data moving forward. Often, this is patterned off of how a senior scientist works, or involves getting multiple teams to jointly determine the best way of working together. Once that is done, they should put together a representative dataset and evaluate if their existing tools can meet the needs of storing everything associated with this dataset—inputs, outputs, conditions of testing, etc. If not, they should ask potential vendors to demonstrate how their solution could effectively map this data. Ensuring that scientists can work as easily in the vendor’s system rather than Excel or a similar software should be the number one goal of the evaluation process.

“Ensuring that scientists can work as easily in the vendor’s system rather than Excel or a similar software should be the number one goal of the evaluation process.”

Q: In what ways can effective collection, storage, and analysis of large datasets help a lab manager to be more effective?

A: The biggest immediate return on investment is increased collaboration. The best experiment is the one that has already been run. If someone else in your organization has run the experiment, you are a step ahead. Similarly, if a senior scientist who left your organization stored their data, you can take advantage of their experience, even after they have left.

Q: What current limitations do you see among tools and technologies that aim to help labs manage their data? How can these limitations be overcome?

A: We see two big limitations: 1) Inputs and outputs are not linked in the systems in use. Some teams use a LIMS system to collect outputs, and ELNs to collect inputs. The problem is that this hinders understanding of how the two connect, which is the end goal of most labs. 2) Many tools are not sophisticated enough to capture the full nuance of a scientist’s work. This results in a “shadow data ecosystem,” usually Excel notebooks stored on a local computer, limiting the original benefit of the tool. I recommend ensuring that no matter which tool you choose, it is easy to analyze the connection of inputs and outputs, and the tool is used in place of Excel, not as an afterthought.

Q: Do you have tips for how lab managers can determine what type of tools or solution they need based on the amount or type of data they collect?

A: I recommend looking for quick wins, rather than trying to solve every single problem immediately. There are many exciting technologies on the horizon, from more automated lab equipment to artificial intelligence promising novel insights from existing data. Trying to accomplish all of them at once is a recipe for failure. Instead, identify the most pressing needs in your organization that will generate benefit to other stakeholders and work to solve them first. For example, the structured data approach will pay off in the short term with improved collaboration, but also is a necessary prerequisite for doing more advanced artificial intelligence.

“The two future trends I think will make an impact are the automation of data collection, and the increased volume of data.”

Q: For those implementing a new data management system, what security challenges or risks do lab managers need to be aware of?

A: There are two main security risks to be aware of. The first is the security of your vendor’s system. Your vendor should have gone through a security audit—either a SOC 2 or ISO 27001 is standard. The vendor should also be willing to talk through their security best practices, encryption of data, limited internal access to the data, and much more.

The second, and actually more important, is how the tool will handle security on your side. The benefit of these tools is bringing data together in one place, but this also means employees could access all of the data in one place. Any good system should have role-based access rights so you can limit who sees the data. There should be controls over how much data can be exported, so a scientist leaving for a competitor cannot take everything on their last day.

Q: From your perspective, what is the future outlook of how big data and data management tools will continue to impact the future of laboratories?

A: Data management and big data is just starting in the lab. The two future trends I think will make an impact are the automation of data collection, and the increased volume of data. Automation will free lab managers and other scientists to do more meaningful work with their time, and the increased volume of data will require tools built for R&D to help managers and scientists effectively manage the data they have.

Will Tashman is co-founder and chief revenue officer at Uncountable. In his role, he works closely with Uncountable’s customers to implement their larger vision for material informatics across vastly different fields. Will was previously a product design engineer at Apple, where he developed key design features in ground-breaking laptops. While at Apple, he worked in large-scale assembly plants to implement design features and processes that were optimized for mass-production environments. Will holds a degree in materials science and engineering from MIT.