1
|
Motoche-Monar C, Andrade D, Pijal WD, Hidrobo F, Armas R, Sánchez-Real E, Rocha-Chauca G, Castillo JA. CRISPRals: A Web Database for Assessing the CRISPR Defense System in the Ralstonia solanacearum Species Complex to Avoid Phage Resistance. PHYTOPATHOLOGY 2024; 114:1462-1465. [PMID: 38427684 DOI: 10.1094/phyto-01-24-0010-sc] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/03/2024]
Abstract
Clustered regularly interspaced short palindromic repeats (CRISPR) has been widely characterized as a defense system against phages and other invading elements in bacteria and archaea. A low percentage of Ralstonia solanacearum species complex (RSSC) strains possess the CRISPR array and the CRISPR-associated proteins (Cas) that would confer immunity against various phages. To provide a wide-range screen of the CRISPR presence in the RSSC, we analyzed 378 genomes of RSSC strains to find the CRISPR locus. We found that 20.1, 14.3, and 54.5% of the R. solanacearum, R. pseudosolanacearum, and R. syzygii strains, respectively, possess the CRISPR locus. In addition, we performed further analysis to identify the respective phages that are restricted by the CRISPR arrays. We found 252 different phages infecting different strains of the RSSC, by means of the identification of similarities between the protospacers in phages and spacers in bacteria. We compiled this information in a database with web access called CRISPRals (https://crisprals.yachaytech.edu.ec/). Additionally, we made available a number of tools to detect and identify CRISPR array and Cas genes in genomic sequences that could be uploaded by users. Finally, a matching tool to relate bacteria spacer with phage protospacer sequences is available. CRISPRals is a valuable resource for the scientific community that contributes to the study of bacteria-phage interaction and a starting point that will help to design efficient phage therapy strategies.
Collapse
Affiliation(s)
- Cristofer Motoche-Monar
- Phage Therapy Group, School of Biological Sciences and Engineering, Yachay Tech University, Hcda San José y Proyecto Yachay, 100115, Imbabura, Ecuador
| | - Diego Andrade
- Phage Therapy Group, School of Mathematical and Computational Sciences, Yachay Tech University, Hcda San José y Proyecto Yachay, 100115, Imbabura, Ecuador
| | - Washington D Pijal
- Phage Therapy Group, School of Mathematical and Computational Sciences, Yachay Tech University, Hcda San José y Proyecto Yachay, 100115, Imbabura, Ecuador
| | - Francisco Hidrobo
- Phage Therapy Group, School of Mathematical and Computational Sciences, Yachay Tech University, Hcda San José y Proyecto Yachay, 100115, Imbabura, Ecuador
| | - Rolando Armas
- Phage Therapy Group, School of Mathematical and Computational Sciences, Yachay Tech University, Hcda San José y Proyecto Yachay, 100115, Imbabura, Ecuador
| | - Emily Sánchez-Real
- Phage Therapy Group, School of Biological Sciences and Engineering, Yachay Tech University, Hcda San José y Proyecto Yachay, 100115, Imbabura, Ecuador
| | - Gabriela Rocha-Chauca
- Phage Therapy Group, School of Biological Sciences and Engineering, Yachay Tech University, Hcda San José y Proyecto Yachay, 100115, Imbabura, Ecuador
| | - José A Castillo
- Phage Therapy Group, School of Biological Sciences and Engineering, Yachay Tech University, Hcda San José y Proyecto Yachay, 100115, Imbabura, Ecuador
| |
Collapse
|
2
|
Madugula SS, Pujar P, Nammi B, Wang S, Jayasinghe-Arachchige VM, Pham T, Mashburn D, Artiles M, Liu J. Identification of Family-Specific Features in Cas9 and Cas12 Proteins: A Machine Learning Approach Using Complete Protein Feature Spectrum. J Chem Inf Model 2024; 64:4897-4911. [PMID: 38838358 DOI: 10.1021/acs.jcim.4c00625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2024]
Abstract
The recent development of CRISPR-Cas technology holds promise to correct gene-level defects for genetic diseases. The key element of the CRISPR-Cas system is the Cas protein, a nuclease that can edit the gene of interest assisted by guide RNA. However, these Cas proteins suffer from inherent limitations such as large size, low cleavage efficiency, and off-target effects, hindering their widespread application as a gene editing tool. Therefore, there is a need to identify novel Cas proteins with improved editing properties, for which it is necessary to understand the underlying features governing the Cas families. In this study, we aim to elucidate the unique protein features associated with Cas9 and Cas12 families and identify the features distinguishing each family from non-Cas proteins. Here, we built Random Forest (RF) binary classifiers to distinguish Cas12 and Cas9 proteins from non-Cas proteins, respectively, using the complete protein feature spectrum (13,494 features) encoding various physiochemical, topological, constitutional, and coevolutionary information on Cas proteins. Furthermore, we built multiclass RF classifiers differentiating Cas9, Cas12, and non-Cas proteins. All the models were evaluated rigorously on the test and independent data sets. The Cas12 and Cas9 binary models achieved a high overall accuracy of 92% and 95% on their respective independent data sets, while the multiclass classifier achieved an F1 score of close to 0.98. We observed that Quasi-Sequence-Order (QSO) descriptors like Schneider.lag and Composition descriptors like charge, volume, and polarizability are predominant in the Cas12 family. Conversely Amino Acid Composition descriptors, especially Tripeptide Composition (TPC), predominate the Cas9 family. Four of the top 10 descriptors identified in Cas9 classification are tripeptides PWN, PYY, HHA, and DHI, which are seen to be conserved across all Cas9 proteins and located within different catalytically important domains of the Streptococcus pyogenes Cas9 (SpCas9) structure. Among these, DHI and HHA are well-known to be involved in the DNA cleavage activity of the SpCas9 protein. Mutation studies have highlighted the significance of the PWN tripeptide in PAM recognition and DNA cleavage activity of SpCas9, while Y450 from the PYY tripeptide plays a crucial role in reducing off-target effects and improving the specificity in SpCas9. Leveraging our machine learning (ML) pipeline, we identified numerous Cas9 and Cas12 family-specific features. These features offer valuable insights for future experimental and computational studies aiming at designing Cas systems with enhanced gene-editing properties. These features suggest plausible structural modifications that can effectively guide the development of Cas proteins with improved editing capabilities.
Collapse
Affiliation(s)
- Sita Sirisha Madugula
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, 3500 Camp Bowie Blvd, Fort Worth, Texas 76107, United States
| | - Pranav Pujar
- Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, 701 South Nedderman Drive, Arlington, Texas 76019, United States
| | - Bharani Nammi
- Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, 701 South Nedderman Drive, Arlington, Texas 76019, United States
| | - Shouyi Wang
- Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, 701 South Nedderman Drive, Arlington, Texas 76019, United States
| | - Vindi M Jayasinghe-Arachchige
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, 3500 Camp Bowie Blvd, Fort Worth, Texas 76107, United States
| | - Tyler Pham
- School of Biomedical Sciences, University of North Texas Health Science Center, 3500 Camp Bowie Blvd, Fort Worth, Texas 76107, United States
| | - Dominic Mashburn
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, 3500 Camp Bowie Blvd, Fort Worth, Texas 76107, United States
| | - Maria Artiles
- School of Biomedical Sciences, University of North Texas Health Science Center, 3500 Camp Bowie Blvd, Fort Worth, Texas 76107, United States
| | - Jin Liu
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, 3500 Camp Bowie Blvd, Fort Worth, Texas 76107, United States
- School of Biomedical Sciences, University of North Texas Health Science Center, 3500 Camp Bowie Blvd, Fort Worth, Texas 76107, United States
| |
Collapse
|
3
|
Wang JH, Huang PT, Huang YT, Mao YC, Lai CH, Yeh TK, Tseng CH, Kao CC. Characterization of CRISPR-Cas Systems in Shewanella algae and Shewanella haliotis: Insights into the Adaptation and Survival of Marine Pathogens. Pathogens 2024; 13:439. [PMID: 38921737 PMCID: PMC11207072 DOI: 10.3390/pathogens13060439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 04/25/2024] [Accepted: 05/15/2024] [Indexed: 06/27/2024] Open
Abstract
CRISPR-Cas systems are adaptive immune mechanisms present in most prokaryotes that play an important role in the adaptation of bacteria and archaea to new environments. Shewanella algae is a marine zoonotic pathogen with worldwide distribution, which accounts for the majority of clinical cases of Shewanella infections. However, the characterization of Shewanella algae CRISPR-Cas systems has not been well investigated yet. Through whole genome sequence analysis, we characterized the CRISPR-Cas systems in S. algae. Our results indicate that CRISPR-Cas systems are prevalent in S. algae, with the majority of strains containing the Type I-F system. This study provides new insights into the diversity and function of CRISPR-Cas systems in S. algae and highlights their potential role in the adaptation and survival of these marine pathogens.
Collapse
Affiliation(s)
- Jui-Hsing Wang
- Division of Infectious Disease, Department of Internal Medicine, Taichung Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Taichung 427213, Taiwan;
- Department of Internal Medicine, School of Medicine, Tzu Chi University, Hualien 970374, Taiwan
| | - Po-Tsang Huang
- Division of Pharmacy, Kaohsiung Armed Forces General Hospital, Kaohsiung 802301, Taiwan;
| | - Yao-Ting Huang
- Department of Computer Science and Information Engineering, National Chung Cheng University, Chia-Yi 621301, Taiwan;
| | - Yan-Chiao Mao
- Division of Clinical Toxicology, Department of Emergency Medicine, Taichung Veterans General Hospital, Taichung 407219, Taiwan;
| | - Chung-Hsu Lai
- Division of Infectious Diseases, Department of Internal Medicine, E-Da Hospital, Kaohsiung 824005, Taiwan;
- School of Medicine, College of Medicine, I-Shou University, Kaohsiung 840301, Taiwan
| | - Ting-Kuang Yeh
- Division of Infectious Diseases, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung 407219, Taiwan;
- Genomic Center for Infectious Diseases, Taichung Veterans General Hospital, Taichung 407219, Taiwan
| | - Chien-Hao Tseng
- Division of Infectious Diseases, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung 407219, Taiwan;
- Genomic Center for Infectious Diseases, Taichung Veterans General Hospital, Taichung 407219, Taiwan
| | - Chih-Chuan Kao
- Division of Infectious Disease, Department of Internal Medicine, Tungs’ Taichung Metroharbor Hospital, Taichung 435403, Taiwan
| |
Collapse
|
4
|
Madugula SS, Pujar P, Bharani N, Wang S, Jayasinghe-Arachchige VM, Pham T, Mashburn D, Artilis M, Liu J. Identification of Family-Specific Features in Cas9 and Cas12 Proteins: A Machine Learning Approach Using Complete Protein Feature Spectrum. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.22.576286. [PMID: 38328240 PMCID: PMC10849529 DOI: 10.1101/2024.01.22.576286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/09/2024]
Abstract
The recent development of CRISPR-Cas technology holds promise to correct gene-level defects for genetic diseases. The key element of the CRISPR-Cas system is the Cas protein, a nuclease that can edit the gene of interest assisted by guide RNA. However, these Cas proteins suffer from inherent limitations like large size, low cleavage efficiency, and off-target effects, hindering their widespread application as a gene editing tool. Therefore, there is a need to identify novel Cas proteins with improved editing properties, for which it is necessary to understand the underlying features governing the Cas families. In the current study, we aim to elucidate the unique protein attributes associated with Cas9 and Cas12 families and identify the features that distinguish each family from the other. Here, we built Random Forest (RF) binary classifiers to distinguish Cas12 and Cas9 proteins from non-Cas proteins, respectively, using the complete protein feature spectrum (13,495 features) encoding various physiochemical, topological, constitutional, and coevolutionary information of Cas proteins. Furthermore, we built multiclass RF classifiers differentiating Cas9, Cas12, and Non-Cas proteins. All the models were evaluated rigorously on the test and independent datasets. The Cas12 and Cas9 binary models achieved a high overall accuracy of 95% and 97% on their respective independent datasets, while the multiclass classifier achieved a high F1 score of 0.97. We observed that Quasi-sequence-order descriptors like Schneider-lag descriptors and Composition descriptors like charge, volume, and polarizability are essential for the Cas12 family. More interestingly, we discovered that Amino Acid Composition descriptors, especially the Tripeptide Composition (TPC) descriptors, are important for the Cas9 family. Four of the identified important descriptors of Cas9 classification are tripeptides PWN, PYY, HHA, and DHI, which are seen to be conserved across all the Cas9 proteins and were located within different catalytically important domains of the Cas9 protein structure. Among these four tripeptides, tripeptides DHI and HHA are well-known to be involved in the DNA cleavage activity of the Cas9 protein. We therefore propose the the other two tripeptides, PWN and PYY, may also be essential for the Cas9 family. Our identified important descriptors enhanced the understanding of the catalytic mechanisms of Cas9 and Cas12 proteins and provide valuable insights into design of novel Cas systems to achieve enhanced gene-editing properties.
Collapse
Affiliation(s)
- Sita Sirisha Madugula
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Pranav Pujar
- Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, Arlington, Texas, United States
| | - Nammi Bharani
- Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, Arlington, Texas, United States
| | - Shouyi Wang
- Department of Industrial, Manufacturing and Systems Engineering, University of Texas at Arlington, Arlington, Texas, United States
| | - Vindi M. Jayasinghe-Arachchige
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Tyler Pham
- Graduate School of Biomedical Sciences, University of North Texas Health Science Center, Fort Worth, Texas
| | - Dominic Mashburn
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Maria Artilis
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
| | - Jin Liu
- Department of Pharmaceutical Sciences, University of North Texas System College of Pharmacy, University of North Texas Health Science Center, Fort Worth, Texas, United States
- Graduate School of Biomedical Sciences, University of North Texas Health Science Center, Fort Worth, Texas
| |
Collapse
|
5
|
Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024; 2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
Affiliation(s)
- Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg, Germany
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for Non-coding RNA in Technology and Health, Department of Veterinary and Animal Sciences, University of Copenhagen, Frederiksberg, Denmark
| | - Ivo L Hofacker
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria
- Bioinformatics and Computational Biology research group, University of Vienna, Vienna, Austria
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University of Leipzig, Leipzig, Germany.
- Interdisciplinary Center for Bioinformatics, University of Leipzig, Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.
- Universidad National de Colombia, Bogotá, Colombia.
- Institute for Theoretical Chemistry, University of Vienna, Wien, Austria.
- Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.
- Santa Fe Institute, Santa Fe, NM, USA.
| |
Collapse
|
6
|
Muhammad N, Avila F, Nedashkovskaya OI, Kim SG. Three novel marine species of the genus Reichenbachiella exhibiting degradation of complex polysaccharides. Front Microbiol 2023; 14:1265676. [PMID: 38156005 PMCID: PMC10752948 DOI: 10.3389/fmicb.2023.1265676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2023] [Accepted: 11/23/2023] [Indexed: 12/30/2023] Open
Abstract
Three novel strains designated ABR2-5T, BKB1-1T, and WSW4-B4T belonging to the genus Reichenbachiella of the phylum Bacteroidota were isolated from algae and mud samples collected in the West Sea, Korea. All three strains were enriched for genes encoding up to 216 carbohydrate-active enzymes (CAZymes), which participate in the degradation of agar, alginate, carrageenan, laminarin, and starch. The 16S rRNA sequence similarities among the three novel isolates were 94.0%-94.7%, and against all three existing species in the genus Reichenbachiella they were 93.6%-97.2%. The genome sizes of the strains ABR2-5T, BKB1-1T, and WSW4-B4T were 5.5, 4.4, and 5.0 Mb, respectively, and the GC content ranged from 41.1%-42.0%. The average nucleotide identity and the digital DNA-DNA hybridization values of each novel strain within the isolates and all existing species in the genus Reichenbachiella were in a range of 69.2%-75.5% and 17.7-18.9%, respectively, supporting the creation of three new species. The three novel strains exhibited a distinctive fatty acid profile characterized by elevated levels of iso-C15:0 (37.7%-47.4%) and C16:1 ω5c (14.4%-22.9%). Specifically, strain ABR2-5T displayed an additional higher proportion of C16:0 (13.0%). The polar lipids were phosphatidylethanolamine, unidentified lipids, aminolipids, and glycolipids. Menaquinone-7 was identified as the respiratory quinone of the isolates. A comparative genome analysis was performed using the KEGG, RAST, antiSMASH, CRISPRCasFinder, dbCAN, and dbCAN-PUL servers and CRISPRcasIdentifier software. The results revealed that the isolates harbored many key genes involved in central metabolism for the synthesis of essential amino acids and vitamins, hydrolytic enzymes, carotenoid pigments, and antimicrobial compounds. The KEGG analysis showed that the three isolates possessed a complete pathway of dissimilatory nitrate reduction to ammonium (DNRA), which is involved in the conservation of bioavailable nitrogen within the ecosystem. Moreover, all the strains possessed genes that participated in the metabolism of heavy metals, including arsenic, copper, cobalt, ferrous, and manganese. All three isolated strains contain the class 2 type II subtype C1 CRISPR-Cas system in their genomes. The distinguished phenotypic, chemotaxonomic, and genomic characteristics led us to propose that the three strains represent three novel species in the genus Reichenbachiella: R. ulvae sp. nov. (ABR2-5T = KCTC 82990T = JCM 35839T), R. agarivorans sp. nov. (BKB1-1T = KCTC 82964T = JCM 35840T), and R. carrageenanivorans sp. nov. (WSW4-B4T = KCTC 82706T = JCM 35841T).
Collapse
Affiliation(s)
- Neak Muhammad
- Biological Resource Center/Korean Collection for Type Cultures (KCTC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Republic of Korea
- Department of Environmental Biotechnology, KRIBB School of Biotechnology, University of Science and Technology (UST), Daejeon, Republic of Korea
| | - Forbes Avila
- Biological Resource Center/Korean Collection for Type Cultures (KCTC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Republic of Korea
- Department of Environmental Biotechnology, KRIBB School of Biotechnology, University of Science and Technology (UST), Daejeon, Republic of Korea
| | - Olga I. Nedashkovskaya
- G.B. Elyakov Pacific Institute of Bioorganic Chemistry of the Far-Eastern Branch of the Russian Academy of Sciences, Vladivostok, Russia
| | - Song-Gun Kim
- Biological Resource Center/Korean Collection for Type Cultures (KCTC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, Republic of Korea
- Department of Environmental Biotechnology, KRIBB School of Biotechnology, University of Science and Technology (UST), Daejeon, Republic of Korea
| |
Collapse
|
7
|
Booker AE, D'Angelo T, Adams-Beyea A, Brown JM, Nigro O, Rappé MS, Stepanauskas R, Orcutt BN. Life strategies for Aminicenantia in subseafloor oceanic crust. THE ISME JOURNAL 2023; 17:1406-1415. [PMID: 37328571 PMCID: PMC10432499 DOI: 10.1038/s41396-023-01454-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 04/11/2023] [Accepted: 04/17/2023] [Indexed: 06/18/2023]
Abstract
After decades studying the microbial "deep biosphere" in subseafloor oceanic crust, the growth and life strategies in this anoxic, low energy habitat remain poorly described. Using both single cell genomics and metagenomics, we reveal the life strategies of two distinct lineages of uncultivated Aminicenantia bacteria from the basaltic subseafloor oceanic crust of the eastern flank of the Juan de Fuca Ridge. Both lineages appear adapted to scavenge organic carbon, as each have genetic potential to catabolize amino acids and fatty acids, aligning with previous Aminicenantia reports. Given the organic carbon limitation in this habitat, seawater recharge and necromass may be important carbon sources for heterotrophic microorganisms inhabiting the ocean crust. Both lineages generate ATP via several mechanisms including substrate-level phosphorylation, anaerobic respiration, and electron bifurcation driving an Rnf ion translocation membrane complex. Genomic comparisons suggest these Aminicenantia transfer electrons extracellularly, perhaps to iron or sulfur oxides consistent with mineralogy of this site. One lineage, called JdFR-78, has small genomes that are basal to the Aminicenantia class and potentially use "primordial" siroheme biosynthetic intermediates for heme synthesis, suggesting this lineage retain characteristics of early evolved life. Lineage JdFR-78 contains CRISPR-Cas defenses to evade viruses, while other lineages contain prophage that may help prevent super-infection or no detectable viral defenses. Overall, genomic evidence points to Aminicenantia being well adapted to oceanic crust environments by taking advantage of simple organic molecules and extracellular electron transport.
Collapse
Affiliation(s)
- Anne E Booker
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA
| | | | - Annabelle Adams-Beyea
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA
- Eugene Lang College of Liberal Arts at The New School, New York City, NY, USA
| | - Julia M Brown
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA
| | - Olivia Nigro
- Department of Natural Science, Hawai'i Pacific University, Honolulu, HI, USA
| | - Michael S Rappé
- Hawai'i Institute of Marine Biology, SOEST, University of Hawai'i at Mānoa, Kāne'ohe, HI, USA
| | | | - Beth N Orcutt
- Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA.
| |
Collapse
|
8
|
Patra P, B R D, Kundu P, Das M, Ghosh A. Recent advances in machine learning applications in metabolic engineering. Biotechnol Adv 2023; 62:108069. [PMID: 36442697 DOI: 10.1016/j.biotechadv.2022.108069] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 10/18/2022] [Accepted: 11/22/2022] [Indexed: 11/27/2022]
Abstract
Metabolic engineering encompasses several widely-used strategies, which currently hold a high seat in the field of biotechnology when its potential is manifesting through a plethora of research and commercial products with a strong societal impact. The genomic revolution that occurred almost three decades ago has initiated the generation of large omics-datasets which has helped in gaining a better understanding of cellular behavior. The itinerary of metabolic engineering that has occurred based on these large datasets has allowed researchers to gain detailed insights and a reasonable understanding of the intricacies of biosystems. However, the existing trail-and-error approaches for metabolic engineering are laborious and time-intensive when it comes to the production of target compounds with high yields through genetic manipulations in host organisms. Machine learning (ML) coupled with the available metabolic engineering test instances and omics data brings a comprehensive and multidisciplinary approach that enables scientists to evaluate various parameters for effective strain design. This vast amount of biological data should be standardized through knowledge engineering to train different ML models for providing accurate predictions in gene circuits designing, modification of proteins, optimization of bioprocess parameters for scaling up, and screening of hyper-producing robust cell factories. This review briefs on the premise of ML, followed by mentioning various ML methods and algorithms alongside the numerous omics datasets available to train ML models for predicting metabolic outcomes with high-accuracy. The combinative interplay between the ML algorithms and biological datasets through knowledge engineering have guided the recent advancements in applications such as CRISPR/Cas systems, gene circuits, protein engineering, metabolic pathway reconstruction, and bioprocess engineering. Finally, this review addresses the probable challenges of applying ML in metabolic engineering which will guide the researchers toward novel techniques to overcome the limitations.
Collapse
Affiliation(s)
- Pradipta Patra
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Disha B R
- B.M.S College of Engineering, Basavanagudi, Bengaluru, Karnataka 560019, India
| | - Pritam Kundu
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Manali Das
- School of Bioscience, Indian Institute of Technology Kharagpur, West Bengal 721302, India
| | - Amit Ghosh
- School School of Energy Science and Engineering, Indian Institute of Technology Kharagpur, West Bengal 721302, India; P.K. Sinha Centre for Bioenergy and Renewables, Indian Institute of Technology Kharagpur, West Bengal 721302, India.
| |
Collapse
|
9
|
Mitrofanov A, Ziemann M, Alkhnbashi OS, Hess WR, Backofen R. CRISPRtracrRNA: robust approach for CRISPR tracrRNA detection. Bioinformatics 2022; 38:ii42-ii48. [PMID: 36124799 PMCID: PMC9486595 DOI: 10.1093/bioinformatics/btac466] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
MOTIVATION The CRISPR-Cas9 system is a Type II CRISPR system that has rapidly become the most versatile and widespread tool for genome engineering. It consists of two components, the Cas9 effector protein, and a single guide RNA that combines the spacer (for identifying the target) with the tracrRNA, a trans-activating small RNA required for both crRNA maturation and interference. While there are well-established methods for screening Cas effector proteins and CRISPR arrays, the detection of tracrRNA remains the bottleneck in detecting Class 2 CRISPR systems. RESULTS We introduce a new pipeline CRISPRtracrRNA for screening and evaluation of tracrRNA candidates in genomes. This pipeline combines evidence from different components of the Cas9-sgRNA complex. The core is a newly developed structural model via covariance models from a sequence-structure alignment of experimentally validated tracrRNAs. As additional evidence, we determine the terminator signal (required for the tracrRNA transcription) and the RNA-RNA interaction between the CRISPR array repeat and the 5'-part of the tracrRNA. Repeats are detected via an ML-based approach (CRISPRidenify). Providing further evidence, we detect the cassette containing the Cas9 (Type II CRISPR systems) and Cas12 (Type V CRISPR systems) effector protein. Our tool is the first for detecting tracrRNA for Type V systems. AVAILABILITY AND IMPLEMENTATION The implementation of the CRISPRtracrRNA is available on GitHub upon requesting the access permission, (https://github.com/BackofenLab/CRISPRtracrRNA). Data generated in this study can be obtained upon request to the corresponding person: Rolf Backofen (backofen@informatik.uni-freiburg.de). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Wolfgang R Hess
- Faculty of Biology, Genetics and Experimental Bioinformatics, University of Freiburg, Freiburg, Germany
| | | |
Collapse
|
10
|
Unraveling the Genomic Potential of the Thermophilic Bacterium Anoxybacillus flavithermus from an Antarctic Geothermal Environment. Microorganisms 2022; 10:microorganisms10081673. [PMID: 36014090 PMCID: PMC9413872 DOI: 10.3390/microorganisms10081673] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 08/12/2022] [Accepted: 08/16/2022] [Indexed: 11/25/2022] Open
Abstract
Antarctica is a mosaic of extremes. It harbors active polar volcanoes, such as Deception Island, a marine stratovolcano having notable temperature gradients over very short distances, with the temperature reaching up to 100 °C near the fumaroles and subzero temperatures being noted in the glaciers. From the sediments of Deception Island, we isolated representatives of the genus Anoxybacillus, a widely spread genus that is mainly encountered in thermophilic environments. However, the phylogeny of this genus and its adaptive mechanisms in the geothermal sites of cold environments remain unknown. To the best of our knowledge, this is the first study to unravel the genomic features and provide insights into the phylogenomics and metabolic potential of members of the genus Anoxybacillus inhabiting the Antarctic thermophilic ecosystem. Here, we report the genome sequencing data of seven A. flavithermus strains isolated from two geothermal sites on Deception Island, Antarctic Peninsula. Their genomes were approximately 3.0 Mb in size, had a G + C ratio of 42%, and were predicted to encode 3500 proteins on average. We observed that the strains were phylogenomically closest to each other (Average Nucleotide Identity (ANI) > 98%) and to A. flavithermus (ANI 95%). In silico genomic analysis revealed 15 resistance and metabolic islands, as well as genes related to genome stabilization, DNA repair systems against UV radiation threats, temperature adaptation, heat- and cold-shock proteins (Csps), and resistance to alkaline conditions. Remarkably, glycosyl hydrolase enzyme-encoding genes, secondary metabolites, and prophage sequences were predicted, revealing metabolic and cellular capabilities for potential biotechnological applications.
Collapse
|
11
|
Genomes of six viruses that infect Asgard archaea from deep-sea sediments. Nat Microbiol 2022; 7:953-961. [PMID: 35760837 DOI: 10.1038/s41564-022-01150-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2021] [Accepted: 05/16/2022] [Indexed: 12/25/2022]
Abstract
Asgard archaea are globally distributed prokaryotic microorganisms related to eukaryotes; however, viruses that infect these organisms have not been described. Here, using metagenome sequences recovered from deep-sea hydrothermal sediments, we characterize six relatively large (up to 117 kb) double-stranded DNA (dsDNA) viral genomes that infected two Asgard archaeal phyla, Lokiarchaeota and Helarchaeota. These viruses encode Caudovirales-like structural proteins, as well as proteins distinct from those described in known archaeal viruses. Their genomes contain around 1-5% of genes associated with eukaryotic nucleocytoplasmic large DNA viruses (NCLDVs) and appear to be capable of semi-autonomous genome replication, repair, epigenetic modifications and transcriptional regulation. Moreover, Helarchaeota viruses may hijack host ubiquitin systems similar to eukaryotic viruses. Genomic analysis of these Asgard viruses reveals that they contain features of both prokaryotic and eukaryotic viruses, and provides insights into their potential infection and host interaction mechanisms.
Collapse
|
12
|
A closed Candidatus Odinarchaeum chromosome exposes Asgard archaeal viruses. Nat Microbiol 2022; 7:948-952. [PMID: 35760836 PMCID: PMC9246712 DOI: 10.1038/s41564-022-01122-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 04/06/2022] [Indexed: 12/11/2022]
Abstract
Asgard archaea have recently been identified as the closest archaeal relatives of eukaryotes. Their ecology, and particularly their virome, remain enigmatic. We reassembled and closed the chromosome of Candidatus Odinarchaeum yellowstonii LCB_4, through long-range PCR, revealing CRISPR spacers targeting viral contigs. We found related viruses in the genomes of diverse prokaryotes from geothermal environments, including other Asgard archaea. These viruses open research avenues into the ecology and evolution of Asgard archaea.
Collapse
|
13
|
Mattiello L, Rütgers M, Sua-Rojas MF, Tavares R, Soares JS, Begcy K, Menossi M. Molecular and Computational Strategies to Increase the Efficiency of CRISPR-Based Techniques. FRONTIERS IN PLANT SCIENCE 2022; 13:868027. [PMID: 35712599 PMCID: PMC9194676 DOI: 10.3389/fpls.2022.868027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 04/27/2022] [Indexed: 06/15/2023]
Abstract
The prokaryote-derived Clustered Regularly Interspaced Palindromic Repeats (CRISPR)/Cas mediated gene editing tools have revolutionized our ability to precisely manipulate specific genome sequences in plants and animals. The simplicity, precision, affordability, and robustness of this technology have allowed a myriad of genomes from a diverse group of plant species to be successfully edited. Even though CRISPR/Cas, base editing, and prime editing technologies have been rapidly adopted and implemented in plants, their editing efficiency rate and specificity varies greatly. In this review, we provide a critical overview of the recent advances in CRISPR/Cas9-derived technologies and their implications on enhancing editing efficiency. We highlight the major efforts of engineering Cas9, Cas12a, Cas12b, and Cas12f proteins aiming to improve their efficiencies. We also provide a perspective on the global future of agriculturally based products using DNA-free CRISPR/Cas techniques. The improvement of CRISPR-based technologies efficiency will enable the implementation of genome editing tools in a variety of crop plants, as well as accelerate progress in basic research and molecular breeding.
Collapse
Affiliation(s)
- Lucia Mattiello
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Mark Rütgers
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Maria Fernanda Sua-Rojas
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Rafael Tavares
- Cell and Developmental Biology, John Innes Centre, Norwich, United Kingdom
| | - José Sérgio Soares
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| | - Kevin Begcy
- Environmental Horticulture Department, University of Florida, Gainesville, FL, United States
| | - Marcelo Menossi
- Department of Genetics, Evolution, Microbiology and Immunology, Institute of Biology, State University of Campinas (UNICAMP), Campinas, Brazil
| |
Collapse
|
14
|
Wandera KG, Alkhnbashi OS, Bassett HVI, Mitrofanov A, Hauns S, Migur A, Backofen R, Beisel CL. Anti-CRISPR prediction using deep learning reveals an inhibitor of Cas13b nucleases. Mol Cell 2022; 82:2714-2726.e4. [PMID: 35649413 DOI: 10.1016/j.molcel.2022.05.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 03/25/2022] [Accepted: 05/03/2022] [Indexed: 11/28/2022]
Abstract
As part of the ongoing bacterial-phage arms race, CRISPR-Cas systems in bacteria clear invading phages whereas anti-CRISPR proteins (Acrs) in phages inhibit CRISPR defenses. Known Acrs have proven extremely diverse, complicating their identification. Here, we report a deep learning algorithm for Acr identification that revealed an Acr against type VI-B CRISPR-Cas systems. The algorithm predicted numerous putative Acrs spanning almost all CRISPR-Cas types and subtypes, including over 7,000 putative type IV and VI Acrs not predicted by other algorithms. By performing a cell-free screen for Acr hits against type VI-B systems, we identified a potent inhibitor of Cas13b nucleases we named AcrVIB1. AcrVIB1 blocks Cas13b-mediated defense against a targeted plasmid and lytic phage, and its inhibitory function principally occurs upstream of ribonucleoprotein complex formation. Overall, our work helps expand the known Acr universe, aiding our understanding of the bacteria-phage arms race and the use of Acrs to control CRISPR technologies.
Collapse
Affiliation(s)
- Katharina G Wandera
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
| | - Omer S Alkhnbashi
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia
| | - Harris V I Bassett
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
| | | | - Sven Hauns
- Universität Freiburg, 79098 Freiburg, Germany
| | - Anzhela Migur
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany
| | - Rolf Backofen
- Universität Freiburg, 79098 Freiburg, Germany; Signalling Research Centres BIOSS and CIBSS, University of Freiburg, 79098 Freiburg, Germany.
| | - Chase L Beisel
- Helmholtz Institute for RNA-Based Infection Research (HIRI), Helmholtz Centre for Infection Research (HZI), 97080 Würzburg, Germany; Medical Faculty, University of Würzburg, 97080 Würzburg, Germany.
| |
Collapse
|
15
|
Tesson F, Hervé A, Mordret E, Touchon M, d'Humières C, Cury J, Bernheim A. Systematic and quantitative view of the antiviral arsenal of prokaryotes. Nat Commun 2022; 13:2561. [PMID: 35538097 PMCID: PMC9090908 DOI: 10.1038/s41467-022-30269-9] [Citation(s) in RCA: 178] [Impact Index Per Article: 89.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Accepted: 04/22/2022] [Indexed: 12/16/2022] Open
Abstract
Bacteria and archaea have developed multiple antiviral mechanisms, and genomic evidence indicates that several of these antiviral systems co-occur in the same strain. Here, we introduce DefenseFinder, a tool that automatically detects known antiviral systems in prokaryotic genomes. We use DefenseFinder to analyse 21000 fully sequenced prokaryotic genomes, and find that antiviral strategies vary drastically between phyla, species and strains. Variations in composition of antiviral systems correlate with genome size, viral threat, and lifestyle traits. DefenseFinder will facilitate large-scale genomic analysis of antiviral defense systems and the study of host-virus interactions in prokaryotes.
Collapse
Affiliation(s)
- Florian Tesson
- Université de Paris, IAME, UMR 1137, INSERM, Paris, France
- SEED, U1284, INSERM, Université de Paris, Paris, France
| | | | | | - Marie Touchon
- Institut Pasteur, Université de Paris, CNRS, UMR3525, Microbial Evolutionary Genomics, Paris, 75015, France
| | | | - Jean Cury
- SEED, U1284, INSERM, Université de Paris, Paris, France.
- Université Paris-Saclay, CNRS, INRIA, Laboratoire Interdisciplinaire des Sciences du Numérique, UMR, 9015, Orsay, France.
| | - Aude Bernheim
- Université de Paris, IAME, UMR 1137, INSERM, Paris, France.
- SEED, U1284, INSERM, Université de Paris, Paris, France.
| |
Collapse
|
16
|
Spacer prioritization in CRISPR-Cas9 immunity is enabled by the leader RNA. Nat Microbiol 2022; 7:530-541. [PMID: 35314780 PMCID: PMC7612570 DOI: 10.1038/s41564-022-01074-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Accepted: 02/01/2022] [Indexed: 11/08/2022]
Abstract
CRISPR-Cas systems store fragments of foreign DNA called spacers as immunological recordings used to combat future infections. Of the many spacers stored in a CRISPR array, the newest spacers are known to be prioritized for immune defense. However, the underlying mechanism remains unclear. Here we show that the leader region upstream of CRISPR arrays in CRISPR-Cas9 systems enhances CRISPR RNA (crRNA) processing from the newest spacer, prioritizing defense against the matching invader. Using the CRISPR-Cas9 system from Streptococcus pyogenes as a model, we found that the transcribed leader interacts with the conserved repeats bordering the newest spacer. The resulting interaction promotes tracrRNA hybridization with the second repeat, accelerating crRNA processing. Accordingly, disrupting this structure reduces the abundance of the associated crRNA and immune defense against targeted plasmids and bacteriophages. Beyond the S. pyogenes system, bioinformatics analyses revealed that leader-repeat structures appear across CRISPR-Cas9 systems. CRISPR-Cas systems thus possess an RNA-based mechanism to prioritize defense against the most recently encountered invaders.
Collapse
|
17
|
Santana de Carvalho D, Trovatti Uetanabaro AP, Kato RB, Aburjaile FF, Jaiswal AK, Profeta R, De Oliveira Carvalho RD, Tiwar S, Cybelle Pinto Gomide A, Almeida Costa E, Kukharenko O, Orlovska I, Podolich O, Reva O, Ramos PIP, De Carvalho Azevedo VA, Brenig B, Andrade BS, de Vera JPP, Kozyrovska NO, Barh D, Góes-Neto A. The Space-Exposed Kombucha Microbial Community Member Komagataeibacter oboediens Showed Only Minor Changes in Its Genome After Reactivation on Earth. Front Microbiol 2022; 13:782175. [PMID: 35369445 PMCID: PMC8970348 DOI: 10.3389/fmicb.2022.782175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 02/01/2022] [Indexed: 11/23/2022] Open
Abstract
Komagataeibacter is the dominant taxon and cellulose-producing bacteria in the Kombucha Microbial Community (KMC). This is the first study to isolate the K. oboediens genome from a reactivated space-exposed KMC sample and comprehensively characterize it. The space-exposed genome was compared with the Earth-based reference genome to understand the genome stability of K. oboediens under extraterrestrial conditions during a long time. Our results suggest that the genomes of K. oboediens IMBG180 (ground sample) and K. oboediens IMBG185 (space-exposed) are remarkably similar in topology, genomic islands, transposases, prion-like proteins, and number of plasmids and CRISPR-Cas cassettes. Nonetheless, there was a difference in the length of plasmids and the location of cas genes. A small difference was observed in the number of protein coding genes. Despite these differences, they do not affect any genetic metabolic profile of the cellulose synthesis, nitrogen-fixation, hopanoid lipids biosynthesis, and stress-related pathways. Minor changes are only observed in central carbohydrate and energy metabolism pathways gene numbers or sequence completeness. Altogether, these findings suggest that K. oboediens maintains its genome stability and functionality in KMC exposed to the space environment most probably due to the protective role of the KMC biofilm. Furthermore, due to its unaffected metabolic pathways, this bacterial species may also retain some promising potential for space applications.
Collapse
Affiliation(s)
- Daniel Santana de Carvalho
- Laboratory of Molecular and Computational Biology of Fungi, Department of Microbiology, Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil
- Laboratory of Cellular and Molecular Genetics, Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Ana Paula Trovatti Uetanabaro
- Laboratory of Molecular and Computational Biology of Fungi, Department of Microbiology, Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil
- Postgraduate Program in Biology and Biotechnology of Microorganisms, Department of Biological Sciences, State University of Santa Cruz, Ilhéus, Brazil
| | - Rodrigo Bentes Kato
- Laboratory of Molecular and Computational Biology of Fungi, Department of Microbiology, Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil
- Laboratory of Cellular and Molecular Genetics, Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Flávia Figueira Aburjaile
- Laboratory of Cellular and Molecular Genetics, Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Arun Kumar Jaiswal
- Laboratory of Cellular and Molecular Genetics, Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Rodrigo Profeta
- Laboratory of Cellular and Molecular Genetics, Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Rodrigo Dias De Oliveira Carvalho
- Laboratory of Cellular and Molecular Genetics, Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Sandeep Tiwar
- Laboratory of Cellular and Molecular Genetics, Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Anne Cybelle Pinto Gomide
- Laboratory of Cellular and Molecular Genetics, Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Eduardo Almeida Costa
- Computational Biology and Biotechnological Information Management Center (NBCGIB), State University of Santa Cruz, Ilhéus, Brazil
| | - Olga Kukharenko
- Institute of Molecular Biology and Genetics of NASU, Kyiv, Ukraine
| | - Iryna Orlovska
- Institute of Molecular Biology and Genetics of NASU, Kyiv, Ukraine
| | - Olga Podolich
- Institute of Molecular Biology and Genetics of NASU, Kyiv, Ukraine
| | - Oleg Reva
- Department of Biochemistry, Genetics and Microbiology, Centre for Bioinformatics and Computational Biology, University of Pretoria, Pretoria, South Africa
| | - Pablo Ivan P. Ramos
- Center for Data and Knowledge Integration for Health (CIDACS), Institute Gonçalo Moniz, Oswaldo Cruz Foundation (FIOCRUZ-Bahia), Salvador, Brazil
| | - Vasco Ariston De Carvalho Azevedo
- Laboratory of Cellular and Molecular Genetics, Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Bertram Brenig
- Institute of Veterinary Medicine, Burckhardtweg, University of Göttingen, Göttingen, Germany
| | - Bruno Silva Andrade
- Laboratory of Bioinformatics and Computational Chemistry, Department of Biological Sciences, State University of Southwest Bahia (UESB), Jequié, Brazil
| | - Jean-Pierre P. de Vera
- German Aerospace Center (DLR) Berlin, Institute of Planetary Research, Planetary Laboratories, Astrobiological Laboratories, Berlin, Germany
| | | | - Debmalya Barh
- Laboratory of Cellular and Molecular Genetics, Department of Genetics, Ecology and Evolution, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
- Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology, Purba Medinipur, India
| | - Aristóteles Góes-Neto
- Laboratory of Molecular and Computational Biology of Fungi, Department of Microbiology, Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil
| |
Collapse
|
18
|
Payne LJ, Todeschini TC, Wu Y, Perry BJ, Ronson C, Fineran P, Nobrega F, Jackson S. Identification and classification of antiviral defence systems in bacteria and archaea with PADLOC reveals new system types. Nucleic Acids Res 2021; 49:10868-10878. [PMID: 34606606 PMCID: PMC8565338 DOI: 10.1093/nar/gkab883] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 09/13/2021] [Accepted: 09/17/2021] [Indexed: 11/14/2022] Open
Abstract
To provide protection against viral infection and limit the uptake of mobile genetic elements, bacteria and archaea have evolved many diverse defence systems. The discovery and application of CRISPR-Cas adaptive immune systems has spurred recent interest in the identification and classification of new types of defence systems. Many new defence systems have recently been reported but there is a lack of accessible tools available to identify homologs of these systems in different genomes. Here, we report the Prokaryotic Antiviral Defence LOCator (PADLOC), a flexible and scalable open-source tool for defence system identification. With PADLOC, defence system genes are identified using HMM-based homologue searches, followed by validation of system completeness using gene presence/absence and synteny criteria specified by customisable system classifications. We show that PADLOC identifies defence systems with high accuracy and sensitivity. Our modular approach to organising the HMMs and system classifications allows additional defence systems to be easily integrated into the PADLOC database. To demonstrate application of PADLOC to biological questions, we used PADLOC to identify six new subtypes of known defence systems and a putative novel defence system comprised of a helicase, methylase and ATPase. PADLOC is available as a standalone package (https://github.com/padlocbio/padloc) and as a webserver (https://padloc.otago.ac.nz).
Collapse
Affiliation(s)
- Leighton J Payne
- Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand
| | - Thomas C Todeschini
- School of Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton, UK
| | - Yi Wu
- School of Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton, UK
| | - Benjamin J Perry
- Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand
| | - Clive W Ronson
- Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand
- Genetics Otago, University of Otago, Dunedin, New Zealand
| | - Peter C Fineran
- Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand
- Genetics Otago, University of Otago, Dunedin, New Zealand
- Bioprotection Aotearoa, University of Otago, Dunedin, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Otago, Dunedin, New Zealand
| | - Franklin L Nobrega
- School of Biological Sciences, Faculty of Environmental and Life Sciences, University of Southampton, Southampton, UK
| | - Simon A Jackson
- Department of Microbiology and Immunology, University of Otago, Dunedin, New Zealand
- Genetics Otago, University of Otago, Dunedin, New Zealand
- Bioprotection Aotearoa, University of Otago, Dunedin, New Zealand
- Maurice Wilkins Centre for Molecular Biodiscovery, University of Otago, Dunedin, New Zealand
| |
Collapse
|
19
|
Yang S, Huang J, He B. CASPredict: a web service for identifying Cas proteins. PeerJ 2021; 9:e11887. [PMID: 34395100 PMCID: PMC8327967 DOI: 10.7717/peerj.11887] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2020] [Accepted: 07/09/2021] [Indexed: 12/16/2022] Open
Abstract
Clustered regularly interspaced short palindromic repeats (CRISPR) and their associated (Cas) proteins constitute the CRISPR-Cas systems, which play a key role in prokaryote adaptive immune system against invasive foreign elements. In recent years, the CRISPR-Cas systems have also been designed to facilitate target gene editing in eukaryotic genomes. As one of the important components of the CRISPR-Cas system, Cas protein plays an irreplaceable role. The effector module composed of Cas proteins is used to distinguish the type of CRISPR-Cas systems. Effective prediction and identification of Cas proteins can help biologists further infer the type of CRISPR-Cas systems. Moreover, the class 2 CRISPR-Cas systems are gradually applied in the field of genome editing. The discovery of Cas protein will help provide more candidates for genome editing. In this paper, we described a web service named CASPredict (http://i.uestc.edu.cn/caspredict/cgi-bin/CASPredict.pl) for identifying Cas proteins. CASPredict first predicts Cas proteins based on support vector machine (SVM) by using the optimal dipeptide composition and then annotates the function of Cas proteins based on the hmmscan search algorithm. The ten-fold cross-validation results showed that the 84.84% of Cas proteins were correctly classified. CASPredict will be a useful tool for the identification of Cas proteins, or at least can play a complementary role to the existing methods in this area.
Collapse
Affiliation(s)
- Shanshan Yang
- Medical College, Guizhou University, Guiyang, Guizhou Province, China
| | - Jian Huang
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan Province, China
| | - Bifang He
- Medical College, Guizhou University, Guiyang, Guizhou Province, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan Province, China
| |
Collapse
|
20
|
Alkhnbashi OS, Mitrofanov A, Bonidia R, Raden M, Tran V, Eggenhofer F, Shah S, Öztürk E, Padilha V, Sanches D, de Carvalho A, Backofen R. CRISPRloci: comprehensive and accurate annotation of CRISPR-Cas systems. Nucleic Acids Res 2021; 49:W125-W130. [PMID: 34133710 PMCID: PMC8265192 DOI: 10.1093/nar/gkab456] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/28/2021] [Accepted: 05/17/2021] [Indexed: 11/17/2022] Open
Abstract
CRISPR–Cas systems are adaptive immune systems in prokaryotes, providing resistance against invading viruses and plasmids. The identification of CRISPR loci is currently a non-standardized, ambiguous process, requiring the manual combination of multiple tools, where existing tools detect only parts of the CRISPR-systems, and lack quality control, annotation and assessment capabilities of the detected CRISPR loci. Our CRISPRloci server provides the first resource for the prediction and assessment of all possible CRISPR loci. The server integrates a series of advanced Machine Learning tools within a seamless web interface featuring: (i) prediction of all CRISPR arrays in the correct orientation; (ii) definition of CRISPR leaders for each locus; and (iii) annotation of cas genes and their unambiguous classification. As a result, CRISPRloci is able to accurately determine the CRISPR array and associated information, such as: the Cas subtypes; cassette boundaries; accuracy of the repeat structure, orientation and leader sequence; virus-host interactions; self-targeting; as well as the annotation of cas genes, all of which have been missing from existing tools. This annotation is presented in an interactive interface, making it easy for scientists to gain an overview of the CRISPR system in their organism of interest. Predictions are also rendered in GFF format, enabling in-depth genome browser inspection. In summary, CRISPRloci constitutes a full suite for CRISPR–Cas system characterization that offers annotation quality previously available only after manual inspection.
Collapse
Affiliation(s)
- Omer S Alkhnbashi
- To whom correspondence should be addressed. Tel: +49 761 2037460; Fax: +49 761 2037462;
| | | | | | - Martin Raden
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Van Dinh Tran
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Florian Eggenhofer
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Shiraz A Shah
- Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Denmark
| | - Ekrem Öztürk
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Georges-Koehler-Allee 106, 79110 Freiburg, Germany
| | - Victor A Padilha
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, SP, Brazil
| | - Danilo S Sanches
- Universidade Tecnológica Federal do Paraná, Campus Cornélio Procópio, 86300000 Cornélio Procópio, PR, Brazil
| | | | - Rolf Backofen
- Correspondence may also be addressed to Rolf Backofen.
| |
Collapse
|
21
|
Padilha VA, Alkhnbashi OS, Tran VD, Shah SA, Carvalho ACPLF, Backofen R. Casboundary: automated definition of integral Cas cassettes. Bioinformatics 2021; 37:1352-1359. [PMID: 33226067 PMCID: PMC8208735 DOI: 10.1093/bioinformatics/btaa984] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Revised: 10/28/2020] [Accepted: 11/11/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION CRISPR-Cas are important systems found in most archaeal and many bacterial genomes, providing adaptive immunity against mobile genetic elements in prokaryotes. The CRISPR-Cas systems are encoded by a set of consecutive cas genes, here termed cassette. The identification of cassette boundaries is key for finding cassettes in CRISPR research field. This is often carried out by using Hidden Markov Models and manual annotation. In this article, we propose the first method able to automatically define the cassette boundaries. In addition, we present a Cas-type predictive model used by the method to assign each gene located in the region defined by a cassette's boundaries a Cas label from a set of pre-defined Cas types. Furthermore, the proposed method can detect potentially new cas genes and decompose a cassette into its modules. RESULTS We evaluate the predictive performance of our proposed method on data collected from the two most recent CRISPR classification studies. In our experiments, we obtain an average similarity of 0.86 between the predicted and expected cassettes. Besides, we achieve F-scores above 0.9 for the classification of cas genes of known types and 0.73 for the unknown ones. Finally, we conduct two additional study cases, where we investigate the occurrence of potentially new cas genes and the occurrence of module exchange between different genomes. AVAILABILITY AND IMPLEMENTATION https://github.com/BackofenLab/Casboundary. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Victor A Padilha
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, SP 13566-590, Brazil
| | - Omer S Alkhnbashi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany
| | - Van Dinh Tran
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany
| | - Shiraz A Shah
- COPSAC, Copenhagen University Hospitals Herlev and Gentofte, DK-2820 Gentofte, Denmark
| | - André C P L F Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos, SP 13566-590, Brazil
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, 79110 Freiburg, Germany.,Signalling Research Centres BIOSS and CIBSS, University of Freiburg, 79104 Freiburg, Germany
| |
Collapse
|
22
|
Mitrofanov A, Alkhnbashi OS, Shmakov SA, Makarova K, Koonin E, Backofen R. CRISPRidentify: identification of CRISPR arrays using machine learning approach. Nucleic Acids Res 2021; 49:e20. [PMID: 33290505 PMCID: PMC7913763 DOI: 10.1093/nar/gkaa1158] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 11/09/2020] [Accepted: 11/11/2020] [Indexed: 02/02/2023] Open
Abstract
CRISPR–Cas are adaptive immune systems that degrade foreign genetic elements in archaea and bacteria. In carrying out their immune functions, CRISPR–Cas systems heavily rely on RNA components. These CRISPR (cr) RNAs are repeat-spacer units that are produced by processing of pre-crRNA, the transcript of CRISPR arrays, and guide Cas protein(s) to the cognate invading nucleic acids, enabling their destruction. Several bioinformatics tools have been developed to detect CRISPR arrays based solely on DNA sequences, but all these tools employ the same strategy of looking for repetitive patterns, which might correspond to CRISPR array repeats. The identified patterns are evaluated using a fixed, built-in scoring function, and arrays exceeding a cut-off value are reported. Here, we instead introduce a data-driven approach that uses machine learning to detect and differentiate true CRISPR arrays from false ones based on several features. Our CRISPR detection tool, CRISPRidentify, performs three steps: detection, feature extraction and classification based on manually curated sets of positive and negative examples of CRISPR arrays. The identified CRISPR arrays are then reported to the user accompanied by detailed annotation. We demonstrate that our approach identifies not only previously detected CRISPR arrays, but also CRISPR array candidates not detected by other tools. Compared to other methods, our tool has a drastically reduced false positive rate. In contrast to the existing tools, our approach not only provides the user with the basic statistics on the identified CRISPR arrays but also produces a certainty score as a practical measure of the likelihood that a given genomic region is a CRISPR array.
Collapse
Affiliation(s)
| | | | - Sergey A Shmakov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Kira S Makarova
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Rolf Backofen
- To whom correspondence should be addressed. Tel: +49 761/203 7461; Fax: +49 761/203 7462;
| |
Collapse
|
23
|
Tan X, Letendre JH, Collins JJ, Wong WW. Synthetic biology in the clinic: engineering vaccines, diagnostics, and therapeutics. Cell 2021; 184:881-898. [PMID: 33571426 PMCID: PMC7897318 DOI: 10.1016/j.cell.2021.01.017] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/12/2021] [Accepted: 01/13/2021] [Indexed: 12/17/2022]
Abstract
Synthetic biology is a design-driven discipline centered on engineering novel biological functions through the discovery, characterization, and repurposing of molecular parts. Several synthetic biological solutions to critical biomedical problems are on the verge of widespread adoption and demonstrate the burgeoning maturation of the field. Here, we highlight applications of synthetic biology in vaccine development, molecular diagnostics, and cell-based therapeutics, emphasizing technologies approved for clinical use or in active clinical trials. We conclude by drawing attention to recent innovations in synthetic biology that are likely to have a significant impact on future applications in biomedicine.
Collapse
Affiliation(s)
- Xiao Tan
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Division of Gastroenterology, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114, USA; Harvard Medical School, 25 Shattuck St., Boston, MA 02115, USA; Institute for Medical Engineering and Science, MIT, Cambridge, MA 02139, USA
| | - Justin H Letendre
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA; Biological Design Center, Boston University, Boston, MA 02215, USA
| | - James J Collins
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA; Institute for Medical Engineering and Science, MIT, Cambridge, MA 02139, USA; Department of Biological Engineering, MIT, Cambridge, MA 02139, USA; Synthetic Biology Center, MIT, 77 Massachusetts Ave., Cambridge, MA 02139, USA; Harvard-MIT Program in Health Sciences and Technology, Cambridge, MA 02139, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA.
| | - Wilson W Wong
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, USA; Biological Design Center, Boston University, Boston, MA 02215, USA.
| |
Collapse
|
24
|
Padilha VA, Alkhnbashi OS, Shah SA, de Carvalho ACPLF, Backofen R. CRISPRcasIdentifier: Machine learning for accurate identification and classification of CRISPR-Cas systems. Gigascience 2020; 9:giaa062. [PMID: 32556168 PMCID: PMC7298778 DOI: 10.1093/gigascience/giaa062] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 04/27/2020] [Accepted: 05/15/2020] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND CRISPR-Cas genes are extraordinarily diverse and evolve rapidly when compared to other prokaryotic genes. With the rapid increase in newly sequenced archaeal and bacterial genomes, manual identification of CRISPR-Cas systems is no longer viable. Thus, an automated approach is required for advancing our understanding of the evolution and diversity of these systems and for finding new candidates for genome engineering in eukaryotic models. RESULTS We introduce CRISPRcasIdentifier, a new machine learning-based tool that combines regression and classification models for the prediction of potentially missing proteins in instances of CRISPR-Cas systems and the prediction of their respective subtypes. In contrast to other available tools, CRISPRcasIdentifier can both detect cas genes and extract potential association rules that reveal functional modules for CRISPR-Cas systems. In our experimental benchmark on the most recently published and comprehensive CRISPR-Cas system dataset, CRISPRcasIdentifier was compared with recent and state-of-the-art tools. According to the experimental results, CRISPRcasIdentifier presented the best Cas protein identification and subtype classification performance. CONCLUSIONS Overall, our tool greatly extends the classification of CRISPR cassettes and, for the first time, predicts missing Cas proteins and association rules between Cas proteins. Additionally, we investigated the properties of CRISPR subtypes. The proposed tool relies not only on the knowledge of manual CRISPR annotation but also on models trained using machine learning.
Collapse
Affiliation(s)
- Victor A Padilha
- Institute of Mathematics and Computer Sciences, University of São Paulo, Av. Trabalhador São Carlense 400, São Carlos, SP, 13566-590, Brazil
| | - Omer S Alkhnbashi
- Bioinformatics Group, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
| | - Shiraz A Shah
- COPSAC, Copenhagen University Hospitals Herlev and Gentofte, Ledreborg Alle 34, DK-2820 Gentofte, Denmark
| | - André C P L F de Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo, Av. Trabalhador São Carlense 400, São Carlos, SP, 13566-590, Brazil
| | - Rolf Backofen
- Bioinformatics Group, University of Freiburg, Georges-Köhler-Allee 106, 79110 Freiburg, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Schaenzlestr. 18, 79104 Freiburg, Germany
| |
Collapse
|