The largest study of CRISPR action to date has developed a method to predict the exact mutations CRISPR-Cas9 gene editing can introduce to a cell. Researchers at the Wellcome Sanger Institute edited 40,000 different pieces of DNA and analyzed a thousand million resulting DNA sequences to reveal the effects of the gene editing and develop a machine learning predictive tool of the outcomes. This will assist researchers who are using CRISPR-Cas9 to research disease mechanisms and drug targets.

Reported this week in Nature Biotechnology, the new resource will enable scientists to predict the best sequences to target to make CRISPR-Cas9 gene editing more reliable, and therefore cheaper and more efficient.

CRISPR-Cas9 is a gene editing technology that enables researchers to cut DNA at any position in the genome, to create mutations and switch off specific genes. This vital technology is used by scientists worldwide to study which genes are important for various conditions, from cancer to rare diseases. It is also now being trialled therapeutically to correct harmful mutations in people's genes.

A specific guide RNA binds to an exact sequence of target DNA, guiding the Cas9 'scissors' to cut the DNA at the right place. However, it is difficult to predict exactly what the final mutations will be, as further changes often take place when the cell repairs the break, rejoining the two cut ends of the DNA.

To study this, the researchers created more than 40,000 pairs of different target DNA and guide RNA, and carried out CRISPR-Cas9 gene editing. By deep sequencing of each pair in different cells, they were able to analyze in detail how the DNA was cut and rejoined. They found that the repair depended on the exact sequence of DNA and guide and discovered that it was reproducible within the same sequence.

The researchers then used the huge quantity of sequence data to create a machine learning computational tool, which created general rules to determine the outcome of the repair. This program—called FORECasT—enabled them to predict the repaired sequence, using the targeted DNA sequence alone.

Dr Luca Crepaldi, joint first author on the study from the Wellcome Sanger Institute, said: "We have carried out the largest, most comprehensive study on CRISPR-Cas9 action to date, and analyzed more than a thousand million DNA sequences to allow us to study the process. We demonstrated that specific target sequences were repaired by the cell in the same way, proving that the action of the cell mechanisms is reproducible."

Dr Felicity Allen, joint first author from the Wellcome Sanger Institute, said: "The discovery of reproducible DNA repair after CRISPR-Cas9 editing, combined with the vast amount of sequence data we generated, meant that we could create a predictive tool using machine learning methods. Our resource can predict the exact mutations resulting from CRISPR-Cas9 gene editing, just from the sequence of the target DNA. It will save time and resources for future CRISPR-Cas9 applications, and is openly available for use by all researchers using gene editing to study health and disease."

Dr Leopold Parts, senior author on the paper from the Wellcome Sanger Institute, said: "CRISPR-Cas9 is an extremely important system for introducing mutations into DNA for research, and prospective therapeutic purposes. Our research allows scientists to understand its workings much better, and our transformational method enables people to predict the effects of each CRISPR-Cas9 edit in a cell. This allows better design of editing experiments, and may lead to future therapeutic applications."