Bringing a new drug to market can require the input of 10 years of labor and as much as $3 billion in R&D funding, especially for a first-in-class biological used to treat a previously intractable disorder. The timeline is built on a foundation of academic basic scientific research and animal model studies; target and biomarker identification; high-throughput screening, validation, and target engagement (TE) assays. These are followed by preclinical safety and toxicity studies, multiple rounds of clinical trials, and submission and review of a New Drug Application with hopeful FDA approval.
Realistically, a potential new drug can fail at any of these steps, and most do. The later they fail, the keener the sense of lost time and resources. Between the mid-1990s and early 2010s, a spate of key failures encountered a long-term plateau in the number of druggable targets being identified, and a discrepancy between the recognized potential of high-throughput screening (HTS) and the availability of resources and technology to build and mine comprehensive chemical libraries. As a result, annual FDA new drug approvals dipped dramatically and many large pharmaceutical companies adapted their efforts toward mergers and acquisitions rather than creating and streamlining development pipelines.
Revolutionary approaches to drug discovery aided by informatics and guided toward precision medicine have helped bring new drug approvals back to pre-1996 levels, with more than 40 almost every year since 2014. Informatics can intervene at any step in the pipeline to mitigate unnecessary or redundant experimentation, creating an exhaustive in silico workflow that can theoretically reduce R&D outlays up to 50-fold. The biggest conceptual workflow bottleneck is the siloed nature of drug discovery data. Innovative basic research that creates disease models, and identifies and validates targets through analysis of next-generation sequencing (NGS) and other data is often initiated by small teams with long-term exclusive access to the raw data. The same principle applies for HTS, with chemical libraries customized and calibrated to individual projects. Clinical trials data is sensitive and therefore subject to regulatory compliance restrictions, in addition to corporate imperatives to maintain competitive advantage through proprietary claims.
Informatics approaches have responded by: 1) instituting systems to break down siloes; and 2) building better mousetraps in exploratory stages to transcend the silo itself. To mitigate losses associated with failed trials, or extend lifecycles of approved drugs, informatics can be used to probe NGS data and identify patterns of gene expression and modification that can be leveraged as molecular signatures to re-purpose drugs. The open-access CMap established a connectivity library linking millions of gene expression profiles to thousands of small molecule and genetic perturbations. Collaborative Drug Discovery (CDD Vault) is a chemoinformatic and bioinformatic repository in which collaborators can access and analyze HTS data to quantify structure-activity relationships, cherry-pick, and validate hit compounds.
To streamline discovery projects that bridge institutional or international boundaries, or necessitate academic-industry partnerships, informatics systems require universal languages. For instance, CDD Vault uses SMILES (simplified molecular input line-entry system), line text codification of chemical structures that can be easily shared and entered into other informatics algorithms. ELIXIR and the Genomic Alliance for Global Health co-developed a universal framework for accessing and protecting genomic data across international borders. Operational principles can additionally promote informatics-based streamlining. Workflows can be remotely coordinated through management system software and cloud computing, enabling exponential increases in the number of off-site CPUs devoted to processing NGS data, decreasing analysis times accordingly. Code sharing through GitHub promotes generalized access so that more investigative minds can access identical and unadulterated data.
Siloes persist between classes of data as well as between institutions and platforms. The overall market for precision medicine is approximately $88 billion, with informatics contributing $6.5 billion, at a compound annual growth rate of 12 percent. Much of this has been potentiated by the genomics revolution, with NGS technologies becoming simultaneously more powerful and much cheaper. Individual $99 genome sequencing is around a very near corner, and will likely soon be incorporated into normal standards of patient care. Retrospective analyses of clinical trials, even failed ones, can accommodate NGS data and re-stratify genetic cohorts to identify carriers of biomarkers responsive to approved or developmental drugs, obviating the need to re-boot trials and validation processes. One major success story is the use of ivacaftor (Kalydeco®) in a subset of cystic fibrosis patients with a specific point mutation in the CFTR gene. This breakthrough marked a clear dividing line between legacy drug development in which broad disease categories were targeted by therapeutics with known efficacy for ameliorating symptoms, and informatics-driven precision medicine founded on underlying causation and patient-specific responses.
What about building a better mousetrap? Data generated by HTS often characterizes upwards of 100,000 molecules, but subsequent experimentation typically invalidates the majority of hit candidates. Informatics based on predictive artificial intelligence/machine learning (AI/ML) algorithms for molecular docking and computational mutagenesis can streamline drug discovery by focusing it only on candidates likely to meet threshold criteria for TE, efficacy, and toxicity. Because of the need for reproducibility, only accurate data and underlying biophysical principles of dynamic docking and electron unpredictability that foster statistical prediction can avoid catastrophic accumulation of errors throughout an in silico workflow. SwissDock is an informatics tool that can predict protein-ligand interactions from among curated databases, using CHARMM simulations to rank potential energies. Algorithms such as those used by Ten63 Therapeutics take a giant step further and can iterate trillions of virtual structure-activity relationships daily, winnowing them down only to the best possible candidates, especially those that human analysis may fail to intuit. In doing so, AI/ML can learn to identify modulators of previously undruggable targets that compose at least 80 percent of the proteome, or to accommodate target mutation rates into their optimization of potential ligands. The latter is especially important to overcome tumor drug resistance as genes mutate and cancers metastasize; the former is also crucial for cancer drug design because many prominent biomarkers are considered undruggable. With these approaches, in silico drug discovery using both classical and AI/ML-based informatics is poised to facilitate a therapeutic bonanza in precision medicine.