How Big Data Helped Discover Biomarkers That Could Give Cancer Patients Better Survival Estimates

A SURVIV analysis of breast cancer isoforms developed at UCLA. Blue lines are associated with longer survival times, and magenta lines with shorter survival times.Image Courtesy of Yi Xing

People with cancer are often told by their doctors approximately how long they have to live, and how well they will respond to treatments, but what if there were a way to improve the accuracy of doctors’ predictions?

A new method developed by UCLA scientists could eventually lead to a way to do just that, using data about patients’ genetic sequences to produce more reliable projections for survival time and how they might respond to possible treatments. The technique is an innovative way of using biomedical big data—which gleans patterns and trends from massive amounts of patient information—to achieve precision medicine—giving doctors the ability to better tailor their care for each individual patient.

The approach is likely to enable doctors to give more accurate predictions for people with many types of cancers. In this research, the UCLA scientists studied cancers of the breast, brain (glioblastoma multiforme, a highly malignant and aggressive form; and lower grade glioma, a less aggressive version), lung, ovary, and kidney.

Related Article: Advancing Big Data Science

In addition, it may allow scientists to analyze people’s genetic sequences and determine which are lethal and which are harmless.

Webinar

Maximizing Lab Workflows: Integrating Slide Staining and Cytocentrifugation with Aerospray

Join Lab Manager and our experts as we discuss slide staining and cytocentrifugation

The new method analyzes various gene isoforms—combinations of genetic sequences that can produce an enormous variety of RNAs and proteins from a single gene—using data from RNA molecules in cancer specimens. That process, called RNA sequencing, or RNA-seq, reveals the presence and quantity of RNA molecules in a biological sample. In the method developed at UCLA, scientists analyzed the ratios of slightly different genetic sequences within the isoforms, enabling them to detect important but subtle differences in the genetic sequences. In contrast, the conventional analysis aggregates all of the isoforms together, meaning that the technique misses important differences within the isoforms.

Yi Xing Yi XingPhoto Courtesy of Yi XingSURVIV (for “survival analysis of mRNA isoform variation”) is the first statistical method for conducting survival analysis on isoforms using RNA-seq data, said senior author Yi Xing, a UCLA associate professor of microbiology, immunology, and molecular genetics. The research was published June 9 in the journal Nature Communications.

The researchers report having identified some 200 isoforms that are associated with survival time for people with breast cancer; some predict longer survival times, others are linked to shorter times. Armed with that knowledge, scientists might eventually be able to target the isoforms associated with shorter survival times in order to suppress them and fight disease, Xing said.

The researchers evaluated the performance of survival predictors using a metric called C-index and found that across the six different types of cancer they analyzed, their isoform-based predictions performed consistently better than the conventional gene-based predictions.

The result was surprising because it suggests, contrary to conventional wisdom, that isoform ratios provide a more robust molecular signature of cancer patients than overall gene abundance, said Xing, director of UCLA’s bioinformatics doctoral program and a member of the UCLA Institute for Quantitative and Computational Biosciences.

“Our finding suggests that isoform ratios provide a more robust molecular signature of cancer patients in large-scale RNA-seq datasets,” he said.

The researchers studied tissues from 2,684 people with cancer whose samples were part of the National Institutes of Health’s Cancer Genome Atlas, and they spent more than two years developing the algorithm for SURVIV.

Related Article: How Big Data Can Save Lives

According to Xing, a human gene typically produces seven to ten isoforms.

“In cancer, sometimes a single gene produces two isoforms, one of which promotes metastasis and one of which represses metastasis,” he said, adding that understanding the differences between the two is extremely important in combatting cancer.

“We have just scratched the surface,” Xing said. “We will apply the method to much larger data sets, and we expect to learn a lot more.”

Co-authors of the research are lead author Shihao Shen, a senior research scientist in Xing’s laboratory; Ying Nian Wu, a UCLA professor of statistics; Yuanyuan Wang, and Chengyang Wang, UCLA doctoral students.

The research was funded by the National Institutes of Health (grants R01GM088342 and R01GM105431) and the National Science Foundation (grant DMS1310391). Xing’s research is also supported by an Alfred Sloan Research Fellowship.