Ask the Expert

Trends in Informatics

Trends in Informatics

In this Q&A, CEO Yali Friedman discusses his extensive interest in information and how to collect and analyze data

Mike May, PhD
Yali Friedman.

Yali Friedman, CEO of, discusses his extensive interest in information, how to collect and analyze data, and even what information is needed for an academic scientist to commercialize a compound.

Q: When did you first become interested in gathering and analyzing information?

A: For as long as I can remember, I’ve been interested in gathering and analyzing information. When I was in college, I researched DNA-based computers for a class project. This was in the mid-1990s, so the web was in its earliest stages. I remember thinking that it took me so long to understand the subject matter, and I had a great leg-up from being able to directly access the lab pages of leading researchers on the web. So, I decided that I should give back and make a web page summarizing what I’d learned. The page took off and quickly gained popularity, and I quickly became aware of the power of using the web to democratize publishing.

Q: What did you learn from collecting and analyzing the data for the Scientific American Worldview Scorecard that might benefit the work of other scientists?

A: Worldview was a fun project. We had a good measure of people admiring the project and also those questioning the validity of the findings. Ultimately, it was a useful exercise in dealing with dirty and incomplete data. In presenting the project before numerous audiences, I had the opportunity to stress-test it and I was, thankfully, always able to come up with agreeable policy recommendations backed by the data.

Q: What do you see as today’s most complex informatic task for a scientist?

A: Extracting useful information from data requires several steps to be integrated. First, you must find appropriate data sources. Most of the time, they don’t exist. So, can you assemble the data from disparate sources, or can you find a surrogate? Next, you have to clean the data. Few data sets come ready to use, so you need to assess the quality of the data and remove any inconsistent or incorrect entries which may have been introduced in collection and assembly. And, if you’re combining data sets, you need to ensure that you’re not introducing new errors. Improper data hygiene can yield uninterpretable information, and this is where many people operating outside their expertise stumble—if you don’t know the context of the data, you’re unlikely to appreciate how to clean it or manipulate it. Finally, you must have a way to analyze the data that makes sense given the source of the data, the manipulations made during cleaning, and the greater scope of the project. Without all three elements working properly and in a coordinated fashion, you’re likely to get poor outcomes.

Q: Following research is a tricky task in today’s science; how do you approach that?

A: I have a complex and proprietary system of keyword and citation-based alerts to help me track new literature that is most relevant to me.

Q: If an academic scientist has a compound or molecule that seems to have clinical potential, how can they find data on the potential market opportunity?

A: A common mistake scientists make is in limiting the patentability of their discoveries through public disclosures at conferences, in research articles, or in other forums. There’s a lot of very good guidance published that encourages scientists to talk to patent counsel as early as possible, but I’d like to go a step further. Many technology transfer offices, investors, corporate partners, etc., have a hard time deciding which discoveries they should back. Beyond ensuring the patentability of their discoveries, it is important that academic scientists understand the business drivers and processes of turning a new molecule into a drug and nurturing it over its commercial lifespan. By understanding the complex and often conflicting objectives of legal, marketing, sales, and scientific teams, it is possible to position a new compound for commercial launch, where it can have a positive impact on the problem it addresses.

Q: For information gathering and analysis in general, what tools do you use?

A: I collect data in every way imaginable. Sometimes, there are existing web application programming interfaces that do the trick; sometimes, websites need custom-built scrapers; and sometimes, I need to do optical character recognition from PDF files. The amazing thing is that once the data are collected and cleaned, they integrate very well to allow actionable information to be extracted.

Q: If you could wave a magic wand and get a new tool, what would it be?

A: I’m not interested in platforms that automate analysis. Ultimately, I think that analytics have to be developed with an understanding of the context and nature of that data, or else you risk drawing irrelevant or illogical conclusions. So, for me the most useful tool would be a better way to keep track of all the data sets I have and record how to interpret them.

Yali Friedman is publisher of the Journal of Commercial Biotechnology and CEO of, which is a pharmaceutical-industry, competitive-intelligence platform based on a business plan that was awarded second place in the Panasci Entrepreneurial Awards Competition. He received his PhD from the University at Buffalo, and his MBA-level textbook Building Biotechnology is used as a course text in dozens of biotechnology programs. His other books include Best Practices in Biotechnology Education and Best Practices in Biotechnology Business Development. Scientific American also named him as one of the 100 most influential people in biotechnology. He served as the head of data analytics for Scientific American Custom Media and architect of the Scientific American Worldview Scorecard, a global perspective profiling biotechnology industries and innovation capacity in more than 50 countries for over a decade. He also developed the “Student Guide to DNA Based Computers,” which was sponsored by FUJI Television.