Lab Manager | Run Your Lab Like a Business

News

Scientists Identify Major Flaw in Standard Approach to Global Gene Expression Analysis

Whitehead Institute researchers report that common assumptions employed in the generation and interpretation of data from global gene expression analyses can lead to seriously flawed conclusions about gene activity and cell behavior in a wide range of current biological research.

by Other Author
Register for free to listen to this article
Listen with Speechify
0:00
5:00

CAMBRIDGE, Mass. –Whitehead Institute researchers report that common assumptions employed in the generation and interpretation of data from global gene expression analyses can lead to seriously flawed conclusions about gene activity and cell behavior in a wide range of current biological research.

“Expression analysis is one of the most commonly used methods in modern biology,” says Whitehead Member Richard Young. “So we are concerned that flawed assumptions may affect the interpretation of many biological studies.”

Much of today’s interpretation of gene expression data relies on the assumption that all cells being analyzed have similar total amounts of messenger RNA (mRNA), the roughly 10% of a cell’s RNA that acts as a blueprint for protein synthesis. However, some cells, including aggressive cancer cells, produce several times more mRNA than other cells. Traditional global gene expression analyses have typically ignored such differences.

Whitehead Institute
The assumption in traditional genetic expression analysis that mRNA content is similar between cells (represented by orange and black dots) does not affect the final results when the cells in fact do have equivalent mRNA content, as in Figure A. In Figure B, the cell represented by the orange dots has a significantly higher mRNA content, but when the data is normalized with the assumption that their contents are equal, the perceived response is skewed and inaccurately indicates that some of the genes are repressed (green bars). Using a standardized control, as in Figure C, eliminates the assumptions about mRNA content and presents accurate results. Whitehead Institute  

“We’ve highlighted this common assumption in gene expression analysis that potentially affects many researchers,” says Tony Lee, a scientist in Young’s lab and a corresponding author of the article published in this week’s issue of Cell. “We provided a concrete example of the problem and a solution that can be implemented by investigators.”

Members of the Young lab recently uncovered the flaw while investigating genes expressed in cancer cells expressing high levels of c-Myc, a gene regulator known to be highly expressed in aggressive cancer cells. When comparing cells with high and low c-Myc levels, they were surprised to find very different results using different approaches to gene expression analysis. Further investigation revealed that there were striking differences in the total amounts of RNA from the high and low c-Myc -containing cells, yet these differences were masked by commonly used experimental and analytical methods.

“The different results we saw from different methods of gene expression analysis were shocking, and led us to reinvestigate the whole process on several platforms,” says Jakob Lovén, postdoctoral reseacher in Young’s lab and co-author of the Cell paper. “We then realized that the common assumption that cells contain similar levels of mRNA is badly flawed and can lead to serious misinterpretations, particularly with cancer cells that can have very different amounts of RNA.”

In addition to delineating this problem, the Whitehead scientists also describe a remedy. By using synthetically produced mRNAs, called RNA spike-ins, as standardized controls, researchers can compare experimental data and eliminate assumptions about total cell RNA amounts. The remedy applies to all three gene expression analysis platforms they studied.

Although the researchers believe the use of RNA spike-ins should become the new standard for global gene expression analyses, questions are likely to persist about the interpretations of much prior research.

“There are over 750,000 expression datasets in public databases, and because they generally lack information about the cell numbers used in the analysis, it is unclear whether they can be re-examined in order to validate the original interpretation” says David Orlando, a scientist in the Young lab. “It may be necessary to reinvestigate some important concepts.”

This work was supported by National Institutes of Health (NIH) grants HG002668 and CA146445, the American Cancer Society, and the Swedish Research Council.