1
|
Hauns S, Alkhnbashi OS, Backofen R. Deepdefense: annotation of immune systems in prokaryotes using deep learning. Gigascience 2024; 13:giae062. [PMID: 39388605 DOI: 10.1093/gigascience/giae062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 02/19/2024] [Accepted: 08/01/2024] [Indexed: 10/12/2024] Open
Abstract
BACKGROUND Due to a constant evolutionary arms race, archaea and bacteria have evolved an abundance and diversity of immune responses to protect themselves against phages. Since the discovery and application of CRISPR-Cas adaptive immune systems, numerous novel candidates for immune systems have been identified. Previous approaches to identifying these new immune systems rely on hidden Markov model (HMM)-based homolog searches or use labor-intensive and costly wet-lab experiments. To aid in finding and classifying immune systems genomes, we use machine learning to classify already known immune system proteins and discover potential candidates in the genome. Neural networks have shown promising results in classifying and predicting protein functionality in recent years. However, these methods often operate under the closed-world assumption, where it is presumed that all potential outcomes or classes are already known and included in the training dataset. This assumption does not always hold true in real-world scenarios, such as in genomics, where new samples can emerge that were not previously accounted for in the training phase. RESULTS In this work, we explore neural networks for immune protein classification, deal with different methods for rejecting unrelated proteins in a genome-wide search, and establish a benchmark. Then, we optimize our approach for accuracy. Based on this, we develop an algorithm called Deepdefense to predict immune cassette classes based on a genome. This design facilitates the differentiation between immune system-related and unrelated proteins by analyzing variations in model-predicted confidence values, aiding in the identification of both known and potentially novel immune system proteins. Finally, we test our approach for detecting immune systems in the genome against an HMM-based method. CONCLUSIONS Deepdefense can automatically detect genes and define cassette annotations and classifications using 2 model classifications. This is achieved by creating an optimized deep learning model to annotate immune systems, in combination with calibration methods, and a second model to enable the scanning of an entire genome.
Collapse
Affiliation(s)
- Sven Hauns
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
| | - Omer S Alkhnbashi
- Center for Applied and Translational Genomics (CATG), Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai Healthcare City, Al Razi St. P.O 505055, Dubai, United Arab Emirates
- College of Medicine, Mohammed Bin Rashid University of Medicine and Health Sciences, Dubai Healthcare City, Al Razi St. 505055, Dubai, United Arab Emirates
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg 79110, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Freiburg 79104, Germany
| |
Collapse
|
2
|
Liu Z, Liu J, Yang Z, Zhu L, Zhu Z, Huang H, Jiang L. Endogenous CRISPR-Cas mediated in situ genome editing: State-of-the-art and the road ahead for engineering prokaryotes. Biotechnol Adv 2023; 68:108241. [PMID: 37633620 DOI: 10.1016/j.biotechadv.2023.108241] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 08/23/2023] [Accepted: 08/23/2023] [Indexed: 08/28/2023]
Abstract
The CRISPR-Cas systems have shown tremendous promise as heterologous tools for genome editing in various prokaryotes. However, the perturbation of DNA homeostasis and the inherent toxicity of Cas9/12a proteins could easily lead to cell death, which led to the development of endogenous CRISPR-Cas systems. Programming the widespread endogenous CRISPR-Cas systems for in situ genome editing represents a promising tool in prokaryotes, especially in genetically intractable species. Here, this review briefly summarizes the advances of endogenous CRISPR-Cas-mediated genome editing, covering aspects of establishing and optimizing the genetic tools. In particular, this review presents the application of different types of endogenous CRISPR-Cas tools for strain engineering, including genome editing and genetic regulation. Notably, this review also provides a detailed discussion of the transposon-associated CRISPR-Cas systems, and the programmable RNA-guided transposition using endogenous CRISPR-Cas systems to enable editing of microbial communities for understanding and control. Therefore, they will be a powerful tool for targeted genetic manipulation. Overall, this review will not only facilitate the development of standard genetic manipulation tools for non-model prokaryotes but will also enable more non-model prokaryotes to be genetically tractable.
Collapse
Affiliation(s)
- Zhenlei Liu
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing 211816, China
| | - Jiayu Liu
- College of Food Science and Light Industry, Nanjing Tech University, Nanjing 211816, China
| | - Zhihan Yang
- College of Biotechnology and Pharmaceutical Engineering, Nanjing Tech University, Nanjing 211816, China
| | - Liying Zhu
- College of Chemical and Molecular Engineering, Nanjing Tech University, Nanjing 211816, China
| | - Zhengming Zhu
- College of Food Science and Light Industry, Nanjing Tech University, Nanjing 211816, China.
| | - He Huang
- School of Food Science and Pharmaceutical Engineering, Nanjing Normal University, Nanjing 210046, China.
| | - Ling Jiang
- College of Food Science and Light Industry, Nanjing Tech University, Nanjing 211816, China; State Key Laboratory of Materials-Oriented Chemical Engineering, Nanjing Tech University, Nanjing 211816, China.
| |
Collapse
|
3
|
Zhang T, Jia Y, Li H, Xu D, Zhou J, Wang G. CRISPRCasStack: a stacking strategy-based ensemble learning framework for accurate identification of Cas proteins. Brief Bioinform 2022; 23:6674167. [PMID: 35998924 DOI: 10.1093/bib/bbac335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 07/13/2022] [Accepted: 07/23/2022] [Indexed: 11/12/2022] Open
Abstract
CRISPR-Cas system is an adaptive immune system widely found in most bacteria and archaea to defend against exogenous gene invasion. One of the most critical steps in the study of exploring and classifying novel CRISPR-Cas systems and their functional diversity is the identification of Cas proteins in CRISPR-Cas systems. The discovery of novel Cas proteins has also laid the foundation for technologies such as CRISPR-Cas-based gene editing and gene therapy. Currently, accurate and efficient screening of Cas proteins from metagenomic sequences and proteomic sequences remains a challenge. For Cas proteins with low sequence conservation, existing tools for Cas protein identification based on homology cannot guarantee identification accuracy and efficiency. In this paper, we have developed a novel stacking-based ensemble learning framework for Cas protein identification, called CRISPRCasStack. In particular, we applied the SHAP (SHapley Additive exPlanations) method to analyze the features used in CRISPRCasStack. Sufficient experimental validation and independent testing have demonstrated that CRISPRCasStack can address the accuracy deficiencies and inefficiencies of the existing state-of-the-art tools. We also provide a toolkit to accurately identify and analyze potential Cas proteins, Cas operons, CRISPR arrays and CRISPR-Cas locus in prokaryotic sequences. The CRISPRCasStack toolkit is available at https://github.com/yrjia1015/CRISPRCasStack.
Collapse
Affiliation(s)
- Tianjiao Zhang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Yuran Jia
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Hongfei Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Dali Xu
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Jie Zhou
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin, 150040, China
| |
Collapse
|
4
|
Wandera KG, Alkhnbashi OS, Bassett HVI, Mitrofanov A, Hauns S, Migur A, Backofen R, Beisel CL. Anti-CRISPR prediction using deep learning reveals an inhibitor of Cas13b nucleases. Mol Cell 2022; 82:2714-2726.e4. [PMID: 35649413 DOI: 10.1016/j.molcel.2022.05.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 03/25/2022] [Accepted: 05/03/2022] [Indexed: 11/28/2022]
Abstract
As part of the ongoing bacterial-phage arms race, CRISPR-Cas systems in bacteria clear invading phages whereas anti-CRISPR proteins (Acrs) in phages inhibit CRISPR defenses. Known Acrs have proven extremely diverse, complicating their identification. Here, we report a deep learning algorithm for Acr identification that revealed an Acr against type VI-B CRISPR-Cas systems. The algorithm predicted numerous putative Acrs spanning almost all CRISPR-Cas types and subtypes, including over 7,000 putative type IV and VI Acrs not predicted by other algorithms. By performing a cell-free screen for Acr hits against type VI-B systems, we identified a potent inhibitor of Cas13b nucleases we named AcrVIB1. AcrVIB1 blocks Cas13b-mediated defense against a targeted plasmid and lytic phage, and its inhibitory function principally occurs upstream of ribonucleoprotein complex formation. Overall, our work helps expand the known Acr universe, aiding our understanding of the bacteria-phage arms race and the use of Acrs to control CRISPR technologies.
Collapse
Affiliation(s)
- Katharina G Wandera
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
| | - Omer S Alkhnbashi
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
| | - Harris V I Bassett
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
| | | | - Sven Hauns
- Universität Freiburg, 79098 Freiburg, Germany
| | - Anzhela Migur
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
| | - Rolf Backofen
- Universität Freiburg, 79098 Freiburg, Germany; Signalling Research Centres BIOSS and CIBSS, University of Freiburg, 79098 Freiburg, Germany.
| | - Chase L Beisel
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany; Medical Faculty, University of Würzburg, 97080 Würzburg, Germany.
| |
Collapse
|
5
|
Spacer prioritization in CRISPR-Cas9 immunity is enabled by the leader RNA. Nat Microbiol 2022; 7:530-541. [PMID: 35314780 PMCID: PMC7612570 DOI: 10.1038/s41564-022-01074-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Accepted: 02/01/2022] [Indexed: 11/08/2022]
Abstract
CRISPR-Cas systems store fragments of foreign DNA called spacers as immunological recordings used to combat future infections. Of the many spacers stored in a CRISPR array, the newest spacers are known to be prioritized for immune defense. However, the underlying mechanism remains unclear. Here we show that the leader region upstream of CRISPR arrays in CRISPR-Cas9 systems enhances CRISPR RNA (crRNA) processing from the newest spacer, prioritizing defense against the matching invader. Using the CRISPR-Cas9 system from Streptococcus pyogenes as a model, we found that the transcribed leader interacts with the conserved repeats bordering the newest spacer. The resulting interaction promotes tracrRNA hybridization with the second repeat, accelerating crRNA processing. Accordingly, disrupting this structure reduces the abundance of the associated crRNA and immune defense against targeted plasmids and bacteriophages. Beyond the S. pyogenes system, bioinformatics analyses revealed that leader-repeat structures appear across CRISPR-Cas9 systems. CRISPR-Cas systems thus possess an RNA-based mechanism to prioritize defense against the most recently encountered invaders.
Collapse
|
6
|
Alkhnbashi OS, Mitrofanov A, Bonidia R, Raden M, Tran V, Eggenhofer F, Shah S, Öztürk E, Padilha V, Sanches D, de Carvalho A, Backofen R. CRISPRloci: comprehensive and accurate annotation of CRISPR-Cas systems. Nucleic Acids Res 2021; 49:W125-W130. [PMID: 34133710 PMCID: PMC8265192 DOI: 10.1093/nar/gkab456] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/28/2021] [Accepted: 05/17/2021] [Indexed: 11/17/2022] Open
Abstract
CRISPR–Cas systems are adaptive immune systems in prokaryotes, providing resistance against invading viruses and plasmids. The identification of CRISPR loci is currently a non-standardized, ambiguous process, requiring the manual combination of multiple tools, where existing tools detect only parts of the CRISPR-systems, and lack quality control, annotation and assessment capabilities of the detected CRISPR loci. Our CRISPRloci server provides the first resource for the prediction and assessment of all possible CRISPR loci. The server integrates a series of advanced Machine Learning tools within a seamless web interface featuring: (i) prediction of all CRISPR arrays in the correct orientation; (ii) definition of CRISPR leaders for each locus; and (iii) annotation of cas genes and their unambiguous classification. As a result, CRISPRloci is able to accurately determine the CRISPR array and associated information, such as: the Cas subtypes; cassette boundaries; accuracy of the repeat structure, orientation and leader sequence; virus-host interactions; self-targeting; as well as the annotation of cas genes, all of which have been missing from existing tools. This annotation is presented in an interactive interface, making it easy for scientists to gain an overview of the CRISPR system in their organism of interest. Predictions are also rendered in GFF format, enabling in-depth genome browser inspection. In summary, CRISPRloci constitutes a full suite for CRISPR–Cas system characterization that offers annotation quality previously available only after manual inspection.
Collapse
Affiliation(s)
- Omer S Alkhnbashi
- To whom correspondence should be addressed. Tel: +49 761 2037460; Fax: +49 761 2037462;
| | | | | | - Martin Raden
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Van Dinh Tran
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Shiraz A Shah
- Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Denmark
| | - Ekrem Öztürk
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Victor A Padilha
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil
| | - Danilo S Sanches
- Universidade Tecnológica Federal do Paraná, Campus Cornélio Procópio, 86300000 Cornélio Procópio, PR, Brazil
| | | | - Rolf Backofen
- Correspondence may also be addressed to Rolf Backofen.
| |
Collapse
|
7
|
Cui Y, Wang Z, Köster J, Liao X, Peng S, Tang T, Huang C, Yang C. VISPR-online: a web-based interactive tool to visualize CRISPR screening experiments. BMC Bioinformatics 2021; 22:344. [PMID: 34167459 PMCID: PMC8223366 DOI: 10.1186/s12859-021-04275-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Accepted: 06/15/2021] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND VISPR is an interactive visualization and analysis framework for CRISPR screening experiments. However, it only supports the output of MAGeCK, and requires installation and manual configuration. Furthermore, VISPR is designed to run on a single computer, and data sharing between collaborators is challenging. RESULTS To make the tool easily accessible to the community, we present VISPR-online, a web-based general application allowing users to visualize, explore, and share CRISPR screening data online with a few simple steps. VISPR-online provides an exploration of screening results and visualization of read count changes. Apart from MAGeCK, VISPR-online supports two more popular CRISPR screening analysis tools: BAGEL and JACKS. It provides an interactive environment for exploring gene essentiality, viewing guide RNA (gRNA) locations, and allowing users to resume and share screening results. CONCLUSIONS VISPR-online allows users to visualize, explore and share CRISPR screening data online. It is freely available at http://vispr-online.weililab.org , while the source code is available at https://github.com/lemoncyb/VISPR-online .
Collapse
Affiliation(s)
- Yingbo Cui
- School of Computer, National University of Defense Technology, Changsha, 410073, China.
| | - Zihang Wang
- College of Information Science and Engineering, Hunan University, Changsha, 410006, China
| | - Johannes Köster
- Algorithms for Reproducible Bioinformatics, Institute of Human Genetics, University of Duisburg-Essen, 45147, Essen, Germany
| | - Xiangke Liao
- School of Computer, National University of Defense Technology, Changsha, 410073, China
| | - Shaoliang Peng
- College of Information Science and Engineering, Hunan University, Changsha, 410006, China
- National Supercomputing Center in Changsha, Changsha, 410082, China
| | - Tao Tang
- School of Computer, National University of Defense Technology, Changsha, 410073, China
| | - Chun Huang
- School of Computer, National University of Defense Technology, Changsha, 410073, China
| | - Canqun Yang
- School of Computer, National University of Defense Technology, Changsha, 410073, China
| |
Collapse
|