Deciphering the Microbiome

Georg K. Gerber, MD, PhD, MPH, is an assistant professor of pathology at Harvard Medical School (HMS), co-director of the Center for Clinical and Translational Metagenomics at Brigham and Women’s Hospital (BWH), and an associate pathologist at the BWH Center for Advanced Molecular Diagnostics. His research interests involve building novel computational models and high-throughput experimental systems to understand the role of the microbiota in human diseases, and applying these findings to develop new diagnostic tests and therapeutic interventions to improve patient care. Dr. Gerber’s training includes a fellowship in Infectious Disease Pathology and Molecular Microbiology at BWH, Residency in Clinical Pathology at BWH, MD from HMS, Master’s and PhD in Computer Science from MIT, and Master’s in Infectious Diseases and BA in Pure Mathematics from UC Berkeley. Prior to returning to graduate school, he founded several companies focused on developing and applying 3D graphics technologies to create feature and IMAX® films.

Q: How has microbiome research evolved in recent years?

A: I have been doing microbiome work for only the past five years. My primary background is in computational biology, although I have training in experimental methods as well. A lot of the initial work in the microbiome field was made possible by short-read next-generation sequencing technologies. Many of the initial studies were focused on figuring out which microbial taxa were present in samples from people or different environments. From the computational perspective, a lot of simple algorithms were used that involved matching sequences to existing databases. The field has since evolved, and people are looking at many more types of data including shotgun metagenomics, which tells you the gene composition of the microbiome, and also non-sequence-based modalities like mass spectrometry for metabolomics and proteomics. So, there are more types of data and larger volumes of data being generated, and the questions now focus more on how these microbes are functioning and how they are interacting with the environment and the host.

Q: How has this changed what you do in your lab?

A: Our approach has evolved more into using animal model systems, so we can control some of the complexities but also have reasonably realistic models. In particular, we use gnotobiotic models a lot. These are animals (mice) that are raised in a completely germ-free environment. We then put in defined groups of microbes so we know exactly what’s in the mice and then we can do controlled experiments to change factors such as the diet to understand the function of the microbes. So, we have invested a lot in the gnotobiotic technologies and also in developing computational methods to help us understand what’s going on in the data. We have also expanded our capabilities to culture microbes, particularly so we can work with anaerobes that are difficult to grow. The newest area that we have gotten into, and that I think is extremely promising, is synthetic biology approaches to change microbial genomes and get the microbes to perform new functions. We haven’t invested as much in sequencing and other high-throughput technologies, because there are a lot of commercial labs and academic collaborators we can work with to get that data.

Q: Are the people in your lab trained in computational biology or microbiology or both?

A: It’s a mix. I collaborate very closely with other faculty here at Harvard, some of whom are microbiologists. In my lab I have more purely computational people whom I directly supervise and I often jointly supervise the experimental folks. The ecosystem at Harvard fosters this type of collaborative research, and students are often trained in multiple labs to get multi-disciplinary expertise.

Q: Are the challenges in microbiome research the same as in traditional microbiology?

A: From the experimental perspective, sample preparation and contamination still remain dominant issues. Sample prep for next-generation sequencing is interesting, because the cost for DNA extraction and cleanup now exceeds the cost of sequencing. So, that remains a challenge, and we are looking into some automated platforms to help with that. As a core facility, automation makes sense, but for a smaller lab with fewer economies of scale, the sample prep can get very expensive. Contamination is always a problem with microbes, because you can get microbial DNA in anything. Labs that are not used to this type of work can have a lot of trouble with contamination. From the computational perspective, the complexity of the data is often a bigger challenge than the volume. There are a lot of complex dependencies inherent in microbiome data, and modeling them can be quite difficult. We deal a lot with longitudinal data, looking at how host-microbial ecosystems evolve over time. This is much harder than analyzing static or cross-sectional data, where you may, for instance, just want to compare microbiomes of people with or without a disease. In longitudinal data, you are concerned with detecting when the microbiome changes, and when particular patterns of change become relevant to the outcome of interest, such as onset of a disease. Then you layer on top of that evolutionary relationships among microbes, noise in the data, and variation within a human or animal population, and it starts to get very complex.

Q: How would you advise people who are looking to get into this field of research?

A: We run a core facility and I spend a lot of time meeting with investigators before they start their projects. Experimental design is particularly important in this area, because there are a lot of variables. For investigators moving into this area from other fields, they may have a lot of assumptions that are false. For instance, I work with a lot of immunologists, and many of them focus on mouse models. But they don’t necessarily think about the microbes living in the mouse. They might buy a particular mouse strain from two vendors. The mice have identical genetic backgrounds, but may have very different microbes, and this can cause a lot of variability in the phenotype. Cage effects are also really important. Mice in the same cage are sharing the same microbes, which may be different from those in the mice in another cage. You can get fairly strong differences in phenotypes with these effects. Lab managers looking to move into this area should definitely partner with people who have experience with microbiome studies and think carefully about the experimental designs. In terms of molecular biology and working with samples, a lot of protocols are really still in the process of being established for microbiome work. There’s a lot of variability, depending on which protocol you follow and how much experience you have with the techniques.

Q: Is there anything you can do to reduce the variability?

A: People are working hard to reduce the variability in some of these methods, but there is still quite a bit that exists. Companies are starting to sell kits and reagents to make this work more routine. We have found that the biggest source of variability is in the DNA extraction protocol. People who are used to working with eukaryotic cells or model bacteria such as E. coli are used to easy and consistent cell lysis. However, in microbiome samples, you are looking at a wide range of organisms, and some of them are quite difficult to lyse. But if you go overboard with your lysis step, you could damage the DNA. Another big source of variability for some protocols, such as 16S rRNA sequencing, is the primers you use. Different protocols amplify different regions of the 16S rRNA gene. So, data generated using different primers can be hard to compare. However, the sequencing aspects following DNA extraction and amplification tend to be fairly consistent.

Q: Where are some of the big changes going to come from that will affect this field in the next couple of years?

A: The sequencing aspect is interesting because on one hand, the technologies are quite advanced, but on the other, the fundamental problem is that the reads are still too short. This is particularly an issue with metagenomic data, because when you are dealing with multiple genomes, working with these very short reads, whether it’s assembling them or using them for identification, it’s computationally very challenging. If we had longer reads, we could get a better answer to some of the questions being asked. So, some of the emerging next-generation sequencing technologies that give longer reads will certainly make a big difference. Microfluidic technologies that are being developed for looking at single cells are also very interesting. These technologies will be important for understanding the variations in microbes at the strain level and also for getting complete sequences of organisms that are very difficult to grow. Scientifically there is going to be a lot more interest in microbial function. To get at those questions, I think large-scale phenotypic screens and some of the automated high-throughput microbiology technologies will really start to be of interest to people.

Q: Who are the people you collaborate with for your work?

A: Our facility is open to anyone, including other academic institutions and industry. It’s a fee-for-service core facility. We also have a lot of collaborations and frequently write grants with people. There are four main units to the facility. One is the microbiology unit, which offers culture-based techniques, with microbiologists who are very skilled at growing anaerobes. We can also do microbial phenotyping and some metabolic profiling. The second unit is focused on molecular biology, where we primarily do 16S rRNA sequencing and qPCR. We can process a lot of samples and have pipelines set up for doing extractions, and we have a Laboratory Information Management System (LIMS) for tracking samples. The third unit is the gnotobiotic facility, where we offer a variety of germ-free mouse strains. We don’t currently derive new strains, but do maintain strains derived elsewhere. We don’t just maintain mice germfree, but can inoculate mice with whatever microbes investigators want, including complex mixtures or pathogens. The fourth unit is the computational one and its purpose is to help investigators who do not have experience analyzing microbiome data. We can process data that is generated in our facility or elsewhere. We do basic analyses, such as comparing the microbiomes of people with and without a disease, as well as more complicated analyses such as looking at longitudinal data or experiments with many different treatment groups.