In the early 1990s, an international effort was launched by the U.S. Department of Energy and the National Institutes of Health to sequence the human genome. The project took 13 years, involved many scientists in several countries, and cost $2.7 billion (in FY 1991) dollars.
Since then, technological advances and the advent of next generation sequencing have greatly increased the speed at which the genome or the transcriptome of a model organism such as human or mouse can be sequenced. Nowadays, a typical sequencing platform can generate several billion bases of DNA or RNA in the course of a few days and can do so at a far lower cost.
However, such an embarrassment of riches poses difficulties for the typical research groups who would like to make use of this technology but are neither accustomed nor equipped to handle the great amounts of data that can now be generated. To help address this problem, Thomas Jefferson University is making available to researchers and clinicians such an analytical capability on the web.
The resource, referred to as HandsFree, is a system that was designed and implemented by the Computational Medicine Center at Thomas Jefferson University. The goal of HandsFree is to provide researchers and clinicians at Thomas Jefferson University and Hospitals with the ability to analyze the large datasets that next generation sequencing platforms generate. And since HandsFree is web-based, scientists at other universities, in the Delaware Valley and elsewhere, could also take advantage of it.
“It is a unique resource to academic research and medicine,” says Isidore Rigoutsos, Ph.D., Director of the Center. “I don’t know of any other research institution or medical center that currently makes a similar system available to their researchers and clinicians.”
How does it work? Dr. Rigoutsos offers as an example a researcher who wants to understand a particular aspect of the biology of Alzheimer’s disease, and who has brain samples taken from a deceased patient, as well as samples from a normal brain. The investigator would give the samples to the sequencing facility at the Kimmel Cancer Center at Jefferson and several days later she will get back data files typically containing 200 million sequences for each sequenced sample, he says.
“This is where HandsFree comes in,” Dr. Rigoutsos says. “The investigator can access HandsFree through her computer browser, securely transfer the sequencing dataset to the HandsFree web-server, and answer a few questions about the type of data and desired analyses. At this point, the data will be placed in a queue with other datasets for analysis by the Computational Medicine Center’s computers. When the dataset reaches the front of the queue, it will be quality-trimmed and preprocessed, then mapped on the corresponding genome followed by a series of analyses that are typical for such data.”
“The system will also generate genomic maps for the investigator to also enable subsequent off-line visual exploration. The generated results and maps are then placed back on the HandsFree web-server and the investigator is notified through email that the output is ready for collection,” he says.
“The whole process is as hands-free as it can get for these kinds of datasets,” Dr. Rigoutsos says. “The investigator still has some work ahead of them but the system does all the ‘heavy lifting’ for them taking the guess-work out and making this kind of analysis easy to harness.”
It took his team one and a half years to put the HandsFree system together. The underlying pipeline uses both publicly available standard tools as well as tools that the team specifically developed to automate the whole process. “HandsFree enables others to access the very same pipeline that we use ourselves for our own basic research. In this regard, the pipeline’s components have already been ‘vetted’ by us,” says Dr. Rigoutsos.
Very importantly, the system handles the data in a secure fashion, he adds. “Any data that the investigator exchanges with HandsFree is encrypted in both directions. Moreover, the processing and analyses of the data are carried out by the Center’s machines in a separate and secure high performance computing facility.”
Currently, the HandsFree system can accommodate DNA and RNA datasets generated by several popular sequencing platforms, from both human and mouse. “For these datasets, the user can carry out a number of standard analyses at the click of a button,” Dr. Rigoutsos says. “A whole host of additional capabilities is in the process of being implemented and will be enabled in HandsFree in the months ahead.”
The system is now available to researchers, and the cost for analyzing these datasets “is very reasonable,” he says. “HandsFree will help advance medical science, and we are very pleased to have it online and available to our researchers and to others.”
The system can be accessed at http://cm.jefferson.edu/HandsFree