The Human Proteoform Project, an international collaborative effort led by the non-profit Consortium for Top-Down Proteomics, is working to generate a definitive reference set of the proteoforms produced from the genome. Accomplishing this complex feat could have major implications across a variety of scientific fields, such as synthetic biology, agriculture, and medicine.
Managing editor Lauren Everett speaks with Lloyd M. Smith, the W. L. Hubbell Professor of Chemistry at the University of Wisconsin-Madison. Smith is an officer of the Consortium and a leader of the project's efforts.
Q: Why are proteins and proteoform-level knowledge important for us to understand? What do they tell us?
A: Biological systems are constructed based on instructions encoded in their genomes. The Human Genome Project revealed those instructions for humans and many other organisms. However, that code alone is not enough to understand normal or disease biology; how and when the DNA is transcribed into RNA molecules, and how and when those RNAs are translated into proteins, is crucial information that is non-obvious from genome sequence alone. Those proteins are the effector agents that transform genotype (the DNA code in the genome) to phenotype (the characteristics of the organism). They are not homogenous strings of amino acids, rather they are heterogeneous in nature and dynamically modified with as many as hundreds of different possible chemical modifications, called post-translational modifications, that often modulate or control their functions. Without full knowledge of the nature of these complex molecules that drive biological systems (the proteoforms), we have little hope of being able to understand the functioning of those systems.
Q: Can you explain the goals of the Human Proteoform Project, who is involved, and your role with the group?
A: Most proteomics in the world today are not able to elucidate the full chemical structure of proteoforms, yet as mentioned above, this information is crucial to understanding of biological systems. The Human Proteoform Project seeks to reveal this critical knowledge, by investing in the development and use of new and improved technology for proteoform analysis, and applying it at scale to elucidate the proteoforms that are present in biological systems. The executive board of the Consortium for Top-Down Proteomics has authored a paper providing the rationale and general structure of the Project. I was lead author on that paper. The project has two main aspects: further development of the technology for proteoform analysis, and application of the technology to develop atlases of the proteoforms extent in humans and important model organisms.
Q: How is this project similar and/or different than the Human Genome Project?
A: The Human Genome Project provides an inspiring model for a well-managed and effective publicly-funded program of great significance. It invested heavily over many years in supporting the development of new technologies for DNA sequencing, both incremental advances to the technology at hand at the beginning of the project, and radical new approaches such as array-based DNA sequencing by synthesis and nanopore sequencing, both of which became very important new platforms for so-called next-gen sequencing. We seek a similar investment in proteoform analysis technology, and imagine a similar model for execution—both incremental and radical advances in proteoform analysis technology, and systematic large-scale deployment of the technologies to reveal the proteoform-level structure of biology. One consequential difference is the relatively closed and finite nature of the DNA genome, in comparison to the more open-ended and less clearly defined nature of the proteoform-ome. Defining the depth to which the proteoform-ome will need to be sampled to capture the critical biological information will be an important aspect of the endeavor.
Q: What tools and technologies will be needed for the Proteoform Project?
A: Proteoform-level analysis is done today by an approach referred to as "top-down proteomics". It is a challenging technology, that requires very sophisticated and complex instrumentation and skilled and experienced practitioners. Much work is needed on all aspects of the proteoform analysis pipeline, which includes proteoform separation technologies, mass spectrometric analysis, and complex data processing. In addition, new platforms for proteomics are under development in both academic labs and the private sector, which offer promise for accelerated capabilities in the future. Investment is needed in all of these efforts, as well as in establishing the informatic capabilities and infrastructure for the research and medical communities to rapidly and effectively access proteoform-level information. Also needed are new approaches to decipher the functional roles played by proteoforms in biological systems; obtaining this requires new capabilities to knock-in and knock-out specific proteoforms, a challenging new frontier in synthetic biology.
Q: What potential impact could the results of this project have in the biology, genomics, and other fields of science?
A: As for the Genome Project, the Human Proteoform Project will revolutionize biology and medicine. Proteoforms bridge the chasm between genotype and phenotype. At present, we have the base code for biology in genome sequences, but we lack the connection between that code and function. Powerful and effective proteoform-level knowledge will open new vistas in the understanding and control of biological systems, driving new opportunities in a vast array of fields, such as synthetic biology, agriculture, and medicine.