1
|
Elkayam S, Tziony I, Orenstein Y. DeepCRISTL: deep transfer learning to predict CRISPR/Cas9 on-target editing efficiency in specific cellular contexts. Bioinformatics 2024; 40:btae481. [PMID: 39073893 PMCID: PMC11319645 DOI: 10.1093/bioinformatics/btae481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 05/28/2024] [Accepted: 07/27/2024] [Indexed: 07/31/2024] Open
Abstract
MOTIVATION CRISPR/Cas9 technology has been revolutionizing the field of gene editing. Guide RNAs (gRNAs) enable Cas9 proteins to target specific genomic loci for editing. However, editing efficiency varies between gRNAs and so computational methods were developed to predict editing efficiency for any gRNA of interest. High-throughput datasets of Cas9 editing efficiencies were produced to train machine-learning models to predict editing efficiency. However, these high-throughput datasets have a low correlation with functional and endogenous datasets, which are too small to train accurate machine-learning models on. RESULTS We developed DeepCRISTL, a deep-learning model to predict the editing efficiency in a specific cellular context. DeepCRISTL takes advantage of high-throughput datasets to learn general patterns of gRNA editing efficiency and then fine-tunes the model on functional or endogenous data to fit a specific cellular context. We tested two state-of-the-art models trained on high-throughput datasets for editing efficiency prediction, our newly improved DeepHF and CRISPRon, combined with various transfer-learning approaches. The combination of CRISPRon and fine-tuning all model weights was the overall best performer. DeepCRISTL outperformed state-of-the-art methods in predicting editing efficiency in a specific cellular context on functional and endogenous datasets. Using saliency maps, we identified and compared the important features learned by DeepCRISTL across cellular contexts. We believe DeepCRISTL will improve prediction performance in many other CRISPR/Cas9 editing contexts by leveraging transfer learning to utilize both high-throughput datasets and smaller and more biologically relevant datasets. AVAILABILITY AND IMPLEMENTATION DeepCRISTL is available via https://github.com/OrensteinLab/DeepCRISTL.
Collapse
Affiliation(s)
- Shai Elkayam
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel
| | - Ido Tziony
- Department of Computer Science, Bar-Ilan University, Ramat Gan 5290002, Israel
| | - Yaron Orenstein
- Department of Computer Science, Bar-Ilan University, Ramat Gan 5290002, Israel
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
2
|
Feng Q, Li Q, Zhou H, Wang Z, Lin C, Jiang Z, Liu T, Wang D. CRISPR technology in human diseases. MedComm (Beijing) 2024; 5:e672. [PMID: 39081515 PMCID: PMC11286548 DOI: 10.1002/mco2.672] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 07/01/2024] [Accepted: 07/01/2024] [Indexed: 08/02/2024] Open
Abstract
Gene editing is a growing gene engineering technique that allows accurate editing of a broad spectrum of gene-regulated diseases to achieve curative treatment and also has the potential to be used as an adjunct to the conventional treatment of diseases. Gene editing technology, mainly based on clustered regularly interspaced palindromic repeats (CRISPR)-CRISPR-associated protein systems, which is capable of generating genetic modifications in somatic cells, provides a promising new strategy for gene therapy for a wide range of human diseases. Currently, gene editing technology shows great application prospects in a variety of human diseases, not only in therapeutic potential but also in the construction of animal models of human diseases. This paper describes the application of gene editing technology in hematological diseases, solid tumors, immune disorders, ophthalmological diseases, and metabolic diseases; focuses on the therapeutic strategies of gene editing technology in sickle cell disease; provides an overview of the role of gene editing technology in the construction of animal models of human diseases; and discusses the limitations of gene editing technology in the treatment of diseases, which is intended to provide an important reference for the applications of gene editing technology in the human disease.
Collapse
Affiliation(s)
- Qiang Feng
- Laboratory Animal CenterCollege of Animal ScienceJilin UniversityChangchunChina
- Research and Development CentreBaicheng Medical CollegeBaichengChina
| | - Qirong Li
- Laboratory Animal CenterCollege of Animal ScienceJilin UniversityChangchunChina
| | - Hengzong Zhou
- Laboratory Animal CenterCollege of Animal ScienceJilin UniversityChangchunChina
| | - Zhan Wang
- Laboratory Animal CenterCollege of Animal ScienceJilin UniversityChangchunChina
| | - Chao Lin
- School of Grain Science and TechnologyJilin Business and Technology CollegeChangchunChina
| | - Ziping Jiang
- Department of Hand and Foot SurgeryThe First Hospital of Jilin UniversityChangchunChina
| | - Tianjia Liu
- Research and Development CentreBaicheng Medical CollegeBaichengChina
| | - Dongxu Wang
- Laboratory Animal CenterCollege of Animal ScienceJilin UniversityChangchunChina
- Department of Hand and Foot SurgeryThe First Hospital of Jilin UniversityChangchunChina
| |
Collapse
|
3
|
Guan Z, Jiang Z. A systematic method for solving data imbalance in CRISPR off-target prediction tasks. Comput Biol Med 2024; 178:108781. [PMID: 38936075 DOI: 10.1016/j.compbiomed.2024.108781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 06/05/2024] [Accepted: 06/15/2024] [Indexed: 06/29/2024]
Abstract
Accurately identifying potential off-target sites in the CRISPR/Cas9 system is crucial for improving the efficiency and safety of editing. However, the imbalance of available off-target datasets has posed a major obstacle in enhancing prediction performance. Despite several prediction models have been developed to address this issue, there remains a lack of systematic research on handling data imbalance in off-target prediction. This article systematically investigates the data imbalance issue in off-target datasets and explores numerous methods to process data imbalance from a novel perspective. First, we highlight the impact of the imbalance problem on off-target prediction tasks by determining the imbalance ratios present in these datasets. Then, we provide a comprehensive review of various sampling techniques and cost-sensitive methods to mitigate class imbalance in off-target datasets. Finally, systematic experiments are conducted on several state-of-the-art prediction models to illustrate the impact of applying data imbalance solutions. The results show that class imbalance processing methods significantly improve the off-target prediction capabilities of the models across multiple testing datasets. The code and datasets used in this study are available at https://github.com/gzrgzx/CRISPR_Data_Imbalance.
Collapse
Affiliation(s)
- Zengrui Guan
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China
| | - Zhenran Jiang
- School of Computer Science and Technology, East China Normal University, Shanghai, 200062, China.
| |
Collapse
|
4
|
Yaish O, Orenstein Y. Generating, modeling and evaluating a large-scale set of CRISPR/Cas9 off-target sites with bulges. Nucleic Acids Res 2024; 52:6777-6790. [PMID: 38813823 PMCID: PMC11229338 DOI: 10.1093/nar/gkae428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 04/12/2024] [Accepted: 05/08/2024] [Indexed: 05/31/2024] Open
Abstract
The CRISPR/Cas9 system is a highly accurate gene-editing technique, but it can also lead to unintended off-target sites (OTS). Consequently, many high-throughput assays have been developed to measure OTS in a genome-wide manner, and their data was used to train machine-learning models to predict OTS. However, these models are inaccurate when considering OTS with bulges due to limited data compared to OTS without bulges. Recently, CHANGE-seq, a new in vitro technique to detect OTS, was used to produce a dataset of unprecedented scale and quality. In addition, the same study produced in cellula GUIDE-seq experiments, but none of these GUIDE-seq experiments included bulges. Here, we generated the most comprehensive GUIDE-seq dataset with bulges, and trained and evaluated state-of-the-art machine-learning models that consider OTS with bulges. We first reprocessed the publicly available experimental raw data of the CHANGE-seq study to generate 20 new GUIDE-seq experiments, and hundreds of OTS with bulges among the original and new GUIDE-seq experiments. We then trained multiple machine-learning models, and demonstrated their state-of-the-art performance both in vitro and in cellula over all OTS and when focusing on OTS with bulges. Last, we visualized the key features learned by our models on OTS with bulges in a unique representation.
Collapse
Affiliation(s)
- Ofir Yaish
- School of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer Sheva 8410501, Israel
| | - Yaron Orenstein
- Department of Computer Science, Bar-Ilan University, Ramat Gan 5290002, Israel
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan 5290002, Israel
| |
Collapse
|
5
|
Luo Y, Chen Y, Xie H, Zhu W, Zhang G. Interpretable CRISPR/Cas9 off-target activities with mismatches and indels prediction using BERT. Comput Biol Med 2024; 169:107932. [PMID: 38199209 DOI: 10.1016/j.compbiomed.2024.107932] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Revised: 12/25/2023] [Accepted: 01/01/2024] [Indexed: 01/12/2024]
Abstract
Off-target effects of CRISPR/Cas9 can lead to suboptimal genome editing outcomes. Numerous deep learning-based approaches have achieved excellent performance for off-target prediction; however, few can predict the off-target activities with both mismatches and indels between single guide RNA (sgRNA) and target DNA sequence pair. In addition, data imbalance is a common pitfall for off-target prediction. Moreover, due to the complexity of genomic contexts, generating an interpretable model also remains challenged. To address these issues, firstly we developed a BERT-based model called CRISPR-BERT for enhancing the prediction of off-target activities with both mismatches and indels. Secondly, we proposed an adaptive batch-wise class balancing strategy to combat the noise exists in imbalanced off-target data. Finally, we applied a visualization approach for investigating the generalizable nucleotide position-dependent patterns of sgRNA-DNA pair for off-target activity. In our comprehensive comparison to existing methods on five mismatches-only datasets and two mismatches-and-indels datasets, CRISPR-BERT achieved the best performance in terms of AUROC and PRAUC. Besides, the visualization analysis demonstrated how implicit knowledge learned by CRISPR-BERT facilitates off-target prediction, which shows potential in model interpretability. Collectively, CRISPR-BERT provides an accurate and interpretable framework for off-target prediction, further contributes to sgRNA optimization in practical use for improved target specificity in CRISPR/Cas9 genome editing. The source code is available at https://github.com/BrokenStringx/CRISPR-BERT.
Collapse
Affiliation(s)
- Ye Luo
- College of Engineering, Shantou University, Shantou, 515063, China
| | - Yaowen Chen
- College of Engineering, Shantou University, Shantou, 515063, China
| | - HuanZeng Xie
- College of Engineering, Shantou University, Shantou, 515063, China
| | - Wentao Zhu
- College of Engineering, Shantou University, Shantou, 515063, China
| | - Guishan Zhang
- College of Engineering, Shantou University, Shantou, 515063, China.
| |
Collapse
|
6
|
Zhang G, Luo Y, Dai X, Dai Z. Benchmarking deep learning methods for predicting CRISPR/Cas9 sgRNA on- and off-target activities. Brief Bioinform 2023; 24:bbad333. [PMID: 37775147 DOI: 10.1093/bib/bbad333] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Revised: 08/31/2023] [Accepted: 09/04/2023] [Indexed: 10/01/2023] Open
Abstract
In silico design of single guide RNA (sgRNA) plays a critical role in clustered regularly interspaced, short palindromic repeats/CRISPR-associated protein 9 (CRISPR/Cas9) system. Continuous efforts are aimed at improving sgRNA design with efficient on-target activity and reduced off-target mutations. In the last 5 years, an increasing number of deep learning-based methods have achieved breakthrough performance in predicting sgRNA on- and off-target activities. Nevertheless, it is worthwhile to systematically evaluate these methods for their predictive abilities. In this review, we conducted a systematic survey on the progress in prediction of on- and off-target editing. We investigated the performances of 10 mainstream deep learning-based on-target predictors using nine public datasets with different sample sizes. We found that in most scenarios, these methods showed superior predictive power on large- and medium-scale datasets than on small-scale datasets. In addition, we performed unbiased experiments to provide in-depth comparison of eight representative approaches for off-target prediction on 12 publicly available datasets with various imbalanced ratios of positive/negative samples. Most methods showed excellent performance on balanced datasets but have much room for improvement on moderate- and severe-imbalanced datasets. This study provides comprehensive perspectives on CRISPR/Cas9 sgRNA on- and off-target activity prediction and improvement for method development.
Collapse
Affiliation(s)
- Guishan Zhang
- College of Engineering, Shantou University, Shantou 515063, China
| | - Ye Luo
- College of Engineering, Shantou University, Shantou 515063, China
| | - Xianhua Dai
- School of Cyber Science and Technology, Sun Yat-sen University, Shenzhen 518107, China
- Southern Marine Science and Engineering Guangdong Laboratory, Zhuhai 519000, China
| | - Zhiming Dai
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China
- Guangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou 510006, China
| |
Collapse
|
7
|
Sherkatghanad Z, Abdar M, Charlier J, Makarenkov V. Using traditional machine learning and deep learning methods for on- and off-target prediction in CRISPR/Cas9: a review. Brief Bioinform 2023; 24:bbad131. [PMID: 37080758 PMCID: PMC10199778 DOI: 10.1093/bib/bbad131] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 03/07/2023] [Accepted: 03/13/2023] [Indexed: 04/22/2023] Open
Abstract
CRISPR/Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated protein 9) is a popular and effective two-component technology used for targeted genetic manipulation. It is currently the most versatile and accurate method of gene and genome editing, which benefits from a large variety of practical applications. For example, in biomedicine, it has been used in research related to cancer, virus infections, pathogen detection, and genetic diseases. Current CRISPR/Cas9 research is based on data-driven models for on- and off-target prediction as a cleavage may occur at non-target sequence locations. Nowadays, conventional machine learning and deep learning methods are applied on a regular basis to accurately predict on-target knockout efficacy and off-target profile of given single-guide RNAs (sgRNAs). In this paper, we present an overview and a comparative analysis of traditional machine learning and deep learning models used in CRISPR/Cas9. We highlight the key research challenges and directions associated with target activity prediction. We discuss recent advances in the sgRNA-DNA sequence encoding used in state-of-the-art on- and off-target prediction models. Furthermore, we present the most popular deep learning neural network architectures used in CRISPR/Cas9 prediction models. Finally, we summarize the existing challenges and discuss possible future investigations in the field of on- and off-target prediction. Our paper provides valuable support for academic and industrial researchers interested in the application of machine learning methods in the field of CRISPR/Cas9 genome editing.
Collapse
Affiliation(s)
- Zeinab Sherkatghanad
- Departement d’Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| | - Moloud Abdar
- Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, 3216, Geelong, VIC, Australia
| | - Jeremy Charlier
- Departement d’Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| | - Vladimir Makarenkov
- Departement d’Informatique, Universite du Quebec a Montreal, H2X 3Y7, Montreal, QC, Canada
| |
Collapse
|
8
|
Guo C, Ma X, Gao F, Guo Y. Off-target effects in CRISPR/Cas9 gene editing. Front Bioeng Biotechnol 2023; 11:1143157. [PMID: 36970624 PMCID: PMC10034092 DOI: 10.3389/fbioe.2023.1143157] [Citation(s) in RCA: 76] [Impact Index Per Article: 76.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 02/28/2023] [Indexed: 03/11/2023] Open
Abstract
Gene editing stands for the methods to precisely make changes to a specific nucleic acid sequence. With the recent development of the clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 system, gene editing has become efficient, convenient and programmable, leading to promising translational studies and clinical trials for both genetic and non-genetic diseases. A major concern in the applications of the CRISPR/Cas9 system is about its off-target effects, namely the deposition of unexpected, unwanted, or even adverse alterations to the genome. To date, many methods have been developed to nominate or detect the off-target sites of CRISPR/Cas9, which laid the basis for the successful upgrades of CRISPR/Cas9 derivatives with enhanced precision. In this review, we summarize these technological advancements and discuss about the current challenges in the management of off-target effects for future gene therapy.
Collapse
Affiliation(s)
- Congting Guo
- School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
- Peking University Institute of Cardiovascular Sciences, Beijing, China
| | - Xiaoteng Ma
- Department of Cardiology, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Fei Gao
- Department of Cardiology, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
- *Correspondence: Fei Gao, ; Yuxuan Guo,
| | - Yuxuan Guo
- School of Basic Medical Sciences, Peking University Health Science Center, Beijing, China
- Peking University Institute of Cardiovascular Sciences, Beijing, China
- Ministry of Education Key Laboratory of Molecular Cardiovascular Science, Beijing, China
- Beijing Key Laboratory of Cardiovascular Receptors Research, Beijing, China
- *Correspondence: Fei Gao, ; Yuxuan Guo,
| |
Collapse
|