Dr. Anne Carpenter leads the Imaging Platform at the Broad Institute of Harvard and MIT—a team of biologists and computer scientists who develop image analysis and data mining methods and software that are freely available to the public through the open-source CellProfiler project. She collaborates with dozens of biomedical research groups around the world to help identify disease states, potential therapeutics, and gene function from microscopy images. Carpenter received her PhD in cell biology from the University of Illinois, Urbana-Champaign, and completed her postdoctoral fellowship at the Whitehead Institute for Biomedical Research at MIT.
Dr. Arvind Rao has been an assistant professor in the Department of Bioinformatics and Computational Biology at the University of Texas MD Anderson Cancer Center since 2011. Prior to joining MD Anderson, he was a Lane Postdoctoral Fellow at Carnegie Mellon University, specializing in bioimage informatics. Rao received his PhD in electrical engineering and bioinformatics from the University of Michigan, specializing in transcriptional genomics. At MD Anderson, he is working on using image analysis and machine learning methods to link image-derived phenotypes with genetic data across biological scale (i.e., single-cell, tissue, and radiology data).
Q: What kinds of changes have you seen in image analysis tools in recent years?
A: Carpenter: Microscopy has been prevalent in a lot of different fields and extremely widespread across different types of labs for decades. Twenty years ago, when I was a student, microscopy images were used in a very qualitative way. You would choose a single, representative image from your experiments to publish in your paper, and that was the end of the story. In the past decade, it’s become common to quantify the images from microscopy, especially when publishing your data. Image analysis software has also matured in this timespan, making it feasible for any biologist, no matter their computational expertise, to quantify various types of images.
Q: What changes do you expect to see going forward?
A: Carpenter: The scale of experiments has [increased] and will continue to increase. While previously microscopy experiments were done on glass slides, now they are often done in multi-well plates in order to test multiple replicates, time points, and perturbations. With the large number of experiments being done, image analysis software can help save the timespent looking at the images, and it also lends a higher degree of accuracy and objectiveness to the analysis of the data.
Another change is that, with the software tools becoming very user-friendly, a biologist can try and get started with image analysis on his or her own. Particularly in microscopy, one can always tell whether the software is doing a good job in identifying cells. So it’s easy to get started on your own. There are also online Q&A forums and e-mail lists for various software packages where you can get advice from people. You don’t necessarily need to work with an expert, unless it’s a particularly challenging problem.
Q: Do you think software tools have now become user-friendly enough to be used by biologists without much computational training?
A: Rao: I think there is such a thing as being too user-friendly. In an effort to make software user-friendly, vendors have now started providing default options that may or may not apply to your dataset. There is also no clear information provided that tells people how to change those settings to apply to your data. So people often end up using these tools, especially algorithms, which are not necessarily tailored to address their biological question. It could end up making a really good experiment look bad or [making] an unimportant result look artificially strong. So you then end up either overly disappointed or too optimistic with the results obtained. With user-friendly tools you have to strike the right balance, and it is not easy to do. Hence, collaboration is very important. There has to be a clear and strong partnership between the technical person analyzing the data and the biologist or clinician asking the questions.
Q: Why is there a need to develop and use open source software?
A: Carpenter: When I was doing my postdoctoral work, I went around looking for either an open source or commercial software that could help me count Drosophila nuclei in a high-throughput mode. None of the existing software worked. So I started looking at computational papers that had better algorithms for the biology I was looking at. I taught myself computer programming and collaborated with a graduate student at MIT to launch the CellProfiler project. Our goal was to take the advanced algorithms available to the computer science community and make [them] applicable to the biologists, for their use. The trend that we are seeing across all open source software today is that [it is] becoming more powerful and easier to use.
With most imaging software, you start with setting up a pipeline that resembles what you are looking to accomplish. Then you do need to tweak and configure it based on your image set, the cell type, phenotype, and other characteristics that you want to measure. Through trial and error, you can see what the results look like and make further adjustments if needed.
Q: Do you see any obvious gaps or areas that need improvement?
A: Carpenter: There are certainly a few areas that are still challenging, even for experts in the image analysis community. These include some types of bright field images that are very tough to quantify. Some cell types, like neurons, are particularly challenging to work with. Tissue samples can be difficult to process, and working with whole organisms, like zebrafish and mice, can be difficult as well. Working with co-cultures can be challenging in two ways: on the experimental side it’s difficult to find the right conditions where both cell types grow well together. Computationally, it can also be challenging because most software algorithms are designed to identify one particular type of cell and are not very good at identifying mixtures of cells that are different from each other. So we decided to use a machine-learning approach where we trained a computer to recognize the different cell types. In one experiment where we had primary human hepatocytes together with mouse fibroblasts, we used the CellProfiler Analyst to train the computer to help recognize the two cell types.
We often develop our own image processing algorithms for a project that requires them. We also scour the computer science literature for algorithms that may be useful for certain domains. Often computational scientists publish work showing the usefulness of a certain algorithm in a particular biological domain, but they don’t produce software that a biologist can actually use. One of our goals with the CellProfiler project is to make such useful algorithms available to the biologists. The user interface should make sense to biologists and how they want to work. Hence, we are constantly refining our software based on the feedback we receive.
Q: What areas do you think need to be improved upon?
A: Rao: Biological variations, based on cell type and morphology, are induced by experimental conditions. Similarly, on the technical side, variability comes from using different types of instruments, such as in microscopy. The same sample can look different under different microscopes, and it goes back to the notion of being user-friendly. Some of these microscopes come with preprogrammed settings. So you need to normalize the data for different factors like signal-to-noise ratio, gain, or filter settings, or you can end up with skewed results. These preprocessing settings can also induce systematic software-induced artifacts that are specific for that particular instrument. Normalization between staining conditions across data and appropriate preprocessing is another important consideration. These are all areas that need to be looked at more closely.
Q: Where do you see the biggest changes happening in image analysis tools?
A: Rao: In my opinion, the 3D image analysis area is set to explode. Super-resolution microscopy has shown us things that we could not have appreciated in the past. Being able to mine the 3D images obtained from such sophisticated instruments is going to be very informative. The information that you can mine from these images on a single-cell basis is huge. Extracting the data and using the statistical methods to quantify the heterogeneity of these cells is going to change the way we look at this data. Visualizing this multi-parametric data and correlating it with the biological conditions and integrating it with all the phenotypic information generated from a single cell is going to be a big step forward in the next few years.
Q: What advice do you have for lab managers?
A: Carpenter: I would ask people to be enthusiastic and just dive in and learn new things. Image analysis tools are easy to work with, and it is a useful skill that can be used to answer all types of biological questions. On a more cautionary note, treat microscopy experiments like you would a molecular or biochemical experiment, keeping every condition constant and being consistent and rigorous across samples, if you want to ultimately compare the quantitative results from the various images.
Q: What advice do you have for lab managers evaluating and investing in new software tools?
A: Rao: My advice would be, to be as rigorous with the software tools as you are with the biological experiments. Just like you run duplicates and triplicates to improve the reproducibility in an experiment, the same thing should apply to software. In most cases vendors are open to giving you a free trial for at least a month. So you should look into how the different algorithms perform when analyzing your data. First and foremost, you need a good technical team in place to evaluate the nitty-gritties of all these tools. Every lab should have some gold-standard data set in place to evaluate various software options before they pick the appropriate tool. My personal bias is to stick with open source tools. On the other hand, commercial software is often better validated than the open source tools.