Michelle Arkin, PhD, professor and chair of the Department of Pharmaceutical Chemistry and co-director of the Small Molecule Discovery Center at the University of California, San Francisco, talks to contributing editor Tanuja Koppal, PhD, about the growing applications of artificial intelligence (AI) and machine learning (ML) for automating chemistry, drug target optimization, systems-level modeling, and eventually for predicting if a drug is going to work in the patient. She discusses the vision of ATOM (Accelerating Therapeutics for Opportunities in Medicine), a public-private endeavor that she is working with, to transform drug discovery using computational tools.
Q: Can you share with us the goals of the ATOM consortium?
A: The vision of the ATOM research initiative is to use ML and AI to bring together data from public databases and from pharmaceutical partners to perform multi-parameter optimization on a drug target. Another aspect of the ATOM pipeline is to do automated experimentation. Nearly five years ago, the pharmaceutical company GlaxoSmithKline (GSK) and the national laboratories (Lawrence Livermore, Oak Ridge, Argonne, and Brookhaven) started re-envisioning drug discovery as a computationally driven approach. They realized that if we are going to do personalized medicine for a patient, we need to do it much faster, with fewer resources and a higher success rate. That’s where the idea of ATOM and using computational tools along with rapid experimental drug discovery came from.
Our goal is to start with a drug target and a set of molecules that impinge on that target, along with a set of design criteria for the drug. The AL/ML models use that information to design new molecules in silico and virtually assess whether they meet those design criteria. This is done iteratively until you get a set of compounds that fits the criteria well. Laboratory automation then enables automated synthesis and purification of those compounds and testing in biological assays of interest. The goal was to go from an identified target to a drug worth testing in animals in about a year. People used to say that’s crazy, but now they are asking, “what is it that you are doing differently from what everyone else is trying to do?” which shows how fast the field is moving.
Q: How do the experimental and computational components work together?
A: There are two kinds of computational models. Parameter-level models measure and predict experimental endpoints such as hERG channel activity, MDCK permeability, and more. There is a lot of data around those parameters that can be used to develop AI/ML models. The long-term goal, however, is to use systems level computation, where models can predict a “therapeutic index,” i.e., how safe and effective a drug is based on its on-target activity and toxicity, at predicted in vivo concentrations of the drug. What we can do right now is parameter level modeling and some amount of systems level modeling for pharmacokinetics. However, in the future we are looking to do mostly systems level modeling. We are also using transfer learning or matrix learning approaches to see how little data you need to understand a target based on what you already know about a related target.
“Human biology is very complex and drug discovery is a hard problem to tackle.”
There are two reasons why we do experiments alongside computation. One is to make and test compounds to validate predictions and then use the compounds in “real” biology. The other goal is to make and test compounds in a chemical space where none exists. Data obtained from the new molecules that are designed, made, and tested, is fed back into the model, which continually updates itself and is self-correcting. We can do this intense computational work because we are working in collaboration with the national laboratories who have the biggest computers in the country. Human biology is very complex and drug discovery is a hard problem to tackle. If we crack biological problems using computational approaches, we can push our computational capabilities forward.
Q: What has been your biggest challenge so far?
A: We ran into two main challenges with our first project. When we started with data that was collected a long time ago or over a long period of time, we found that the new molecules that we designed were consistent with the old dataset, but the data itself could not always be reproduced. Thus, we need ways to demonstrate that the data sets are robust and experimentally reproducible. Secondly, it can take several months to source and synthesize compounds to test. With computational design, you can have several different scaffolds that are not related to each other and making those compounds can take time. Hence, we needed flexible, robust, automated chemistry to support the computational chemistry efforts. These are both active areas of research.
Q: How is ATOM different from other public-private partnerships?
A: There are a few things that make ATOM different. One is the integration of computational and experimental data, and the other is the systems-based modeling. Most companies are working only on parts of the puzzle, such as finding hits against a particular target or improving the pharmacokinetics or therapeutic index of a molecule. Big companies do most of the work internally and small companies take on focused aspects with a vision of doing more. What it’s going to take is people sharing data and models, and groups becoming comfortable with finding ways to do that. One basic approach is data sharing with an honest broker who maintains that data and creates a model using all the data. Alternatively, each organization can make models based on its own data, and the models themselves can be shared and “federated.” Another differentiation is that ATOM products are all open science. The goal is to put all the models and data in public domain so people can use the models and continuously improve them. We intend to publish all the datasets, and be open about describing what we are learning, what works and what doesn’t, and developing best practices. We have more of an educational and sharing approach.
Q: What are some of the trends that will likely help improve AI-driven drug discovery?
A: People are developing automated ways to design chemical synthetic routes and optimize chemical reactions. Then there is parallel, automated chemistry; the slow step in automated chemistry is always the purification. We are also interested in selecting the initial inputs to the chemical optimization. DNA encoded libraries could be an amazing way to seed our initial design loop. These libraries include billions of molecules, and compounds are screened for binding to the target of interest. Machine learning can use a lot of the screening data that was previously thrown out due to its size and noisiness. We can use this data to design and predict better molecules that can then be tested. DNA encoded library technology is rapidly changing because of open-source collaboration with companies. Crowdsourcing the information helps advance the field. So, in a way, you are democratizing DNA encoded library screening and drug discovery using computational approaches.
I am excited about AI for academic drug discovery and chemical biology (that is, using the compounds as tools to explore biology). Drug discovery usually requires lengthy and costly cycles of making compounds and testing them. If computational models in the ATOM pipeline can give us compounds with much better properties with less chemistry, we can learn much more biology and get closer to discovering new drugs.
Michelle Arkin is professor and chair of pharmaceutical chemistry at the University of California, San Francisco, and member of the Joint Research Committee for the ATOM Research Initiative. Her lab develops chemical probes and drug leads for novel targets, with a particular interest in protein-protein interactions and protein-degradation networks. Michelle is co-director of the UCSF Small Molecule Discovery Center, president of the board of directors (BOD) of the Academic Drug Discovery Consortium, member of the BOD of the Society for Laboratory Automation and Screening (SLAS), and co-founder of Ambagon Therapeutics and Elgia Therapeutics. Prior to UCSF, Michelle was the associate director of Cell Biology at Sunesis Pharmaceuticals, where she helped discover protein-protein interaction inhibitors for IL-2 and LFA-1 (lifitegrast, marketed by Novartis).