1
|
Hu J, Crickard JB. All who wander are not lost: the search for homology during homologous recombination. Biochem Soc Trans 2024; 52:367-377. [PMID: 38323621 PMCID: PMC10903458 DOI: 10.1042/bst20230705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 01/10/2024] [Accepted: 01/12/2024] [Indexed: 02/08/2024]
Abstract
Homologous recombination (HR) is a template-based DNA double-strand break repair pathway that functions to maintain genomic integrity. A vital component of the HR reaction is the identification of template DNA to be used during repair. This occurs through a mechanism known as the homology search. The homology search occurs in two steps: a collision step in which two pieces of DNA are forced to collide and a selection step that results in homologous pairing between matching DNA sequences. Selection of a homologous template is facilitated by recombinases of the RecA/Rad51 family of proteins in cooperation with helicases, translocases, and topoisomerases that determine the overall fidelity of the match. This menagerie of molecular machines acts to regulate critical intermediates during the homology search. These intermediates include recombinase filaments that probe for short stretches of homology and early strand invasion intermediates in the form of displacement loops (D-loops) that stabilize paired DNA. Here, we will discuss recent advances in understanding how these specific intermediates are regulated on the molecular level during the HR reaction. We will also discuss how the stability of these intermediates influences the ultimate outcomes of the HR reaction. Finally, we will discuss recent physiological models developed to explain how the homology search protects the genome.
Collapse
Affiliation(s)
- Jingyi Hu
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, U.S.A
| | - J Brooks Crickard
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14853, U.S.A
| |
Collapse
|
2
|
Schütze K, Heinzinger M, Steinegger M, Rost B. Nearest neighbor search on embeddings rapidly identifies distant protein relations. Front Bioinform 2022; 2:1033775. [PMID: 36466147 PMCID: PMC9714024 DOI: 10.3389/fbinf.2022.1033775] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 10/31/2022] [Indexed: 11/29/2023] Open
Abstract
Since 1992, all state-of-the-art methods for fast and sensitive identification of evolutionary, structural, and functional relations between proteins (also referred to as "homology detection") use sequences and sequence-profiles (PSSMs). Protein Language Models (pLMs) generalize sequences, possibly capturing the same constraints as PSSMs, e.g., through embeddings. Here, we explored how to use such embeddings for nearest neighbor searches to identify relations between protein pairs with diverged sequences (remote homology detection for levels of <20% pairwise sequence identity, PIDE). While this approach excelled for proteins with single domains, we demonstrated the current challenges applying this to multi-domain proteins and presented some ideas how to overcome existing limitations, in principle. We observed that sufficiently challenging data set separations were crucial to provide deeply relevant insights into the behavior of nearest neighbor search when applied to the protein embedding space, and made all our methods readily available for others.
Collapse
Affiliation(s)
- Konstantin Schütze
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology—i12, Munich, Germany
| | - Michael Heinzinger
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology—i12, Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Garching, Germany
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | - Burkhard Rost
- TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology—i12, Munich, Germany
- Institute for Advanced Study (TUM-IAS), Germany & TUM School of Life Sciences Weihenstephan (WZW), Freising, Germany
| |
Collapse
|
3
|
Abrahim M, Machado E, Alvarez-Valín F, de Miranda AB, Catanho M. Uncovering Pseudogenes and Intergenic Protein-coding Sequences in TriTryps' Genomes. Genome Biol Evol 2022; 14:6754225. [PMID: 36208292 PMCID: PMC9576210 DOI: 10.1093/gbe/evac142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 09/14/2022] [Accepted: 09/20/2022] [Indexed: 01/24/2023] Open
Abstract
Trypanosomatids belong to a remarkable group of unicellular, parasitic organisms of the order Kinetoplastida, an early diverging branch of the phylogenetic tree of eukaryotes, exhibiting intriguing biological characteristics affecting gene expression (intronless polycistronic transcription, trans-splicing, and RNA editing), metabolism, surface molecules, and organelles (compartmentalization of glycolysis, variation of the surface molecules, and unique mitochondrial DNA), cell biology and life cycle (phagocytic vacuoles evasion and intricate patterns of cell morphogenesis). With numerous genomic-scale data of several trypanosomatids becoming available since 2005 (genomes, transcriptomes, and proteomes), the scientific community can further investigate the mechanisms underlying these unusual features and address other unexplored phenomena possibly revealing biological aspects of the early evolution of eukaryotes. One fundamental aspect comprises the processes and mechanisms involved in the acquisition and loss of genes throughout the evolutionary history of these primitive microorganisms. Here, we present a comprehensive in silico analysis of pseudogenes in three major representatives of this group: Leishmania major, Trypanosoma brucei, and Trypanosoma cruzi. Pseudogenes, DNA segments originating from altered genes that lost their original function, are genomic relics that can offer an essential record of the evolutionary history of functional genes, as well as clues about the dynamics and evolution of hosting genomes. Scanning these genomes with functional proteins as proxies to reveal intergenic regions with protein-coding features, relying on a customized threshold to distinguish statistically and biologically significant sequence similarities, and reassembling remnant sequences from their debris, we found thousands of pseudogenes and hundreds of open reading frames, with particular characteristics in each trypanosomatid: mutation profile, number, content, density, codon bias, average size, single- or multi-copy gene origin, number and type of mutations, putative primitive function, and transcriptional activity. These features suggest a common process of pseudogene formation, different patterns of pseudogene evolution and extant biological functions, and/or distinct genome organization undertaken by those parasites during evolution, as well as different evolutionary and/or selective pressures acting on distinct lineages.
Collapse
Affiliation(s)
- Mayla Abrahim
- Laboratório de Tecnologia Imunológica, Instituto de Tecnologia em Imunobiológicos, Vice-Diretoria de Desenvolvimento Tecnológico, Bio-Manguinhos, Fundação Oswaldo Cruz (FIOCRUZ), Rio de Janeiro, RJ, Brazil
| | - Edson Machado
- Laboratório de Biologia Molecular Aplicada a Micobactérias, Instituto Oswaldo Cruz, Fiocruz, Brazil
| | - Fernando Alvarez-Valín
- Unidad de Genómica Evolutiva, Sección Biomatemática, Universidad de la República del Uruguay, Montevideo, Uruguay
| | | | | |
Collapse
|
4
|
Takabatake K, Izawa K, Akikawa M, Yanagisawa K, Ohue M, Akiyama Y. Improved Large-Scale Homology Search by Two-Step Seed Search Using Multiple Reduced Amino Acid Alphabets. Genes (Basel) 2021; 12:1455. [PMID: 34573438 DOI: 10.3390/genes12091455] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 09/17/2021] [Accepted: 09/18/2021] [Indexed: 12/02/2022] Open
Abstract
Metagenomic analysis, a technique used to comprehensively analyze microorganisms present in the environment, requires performing high-precision homology searches on large amounts of sequencing data, the size of which has increased dramatically with the development of next-generation sequencing. NCBI BLAST is the most widely used software for performing homology searches, but its speed is insufficient for the throughput of current DNA sequencers. In this paper, we propose a new, high-performance homology search algorithm that employs a two-step seed search strategy using multiple reduced amino acid alphabets to identify highly similar subsequences. Additionally, we evaluated the validity of the proposed method against several existing tools. Our method was faster than any other existing program for ≤120,000 queries, while DIAMOND, an existing tool, was the fastest method for >120,000 queries.
Collapse
|
5
|
Abstract
Telomerase RNA (TR) is a noncoding RNA essential for the function of telomerase ribonucleoprotein. TRs from vertebrates, fungi, ciliates, and plants exhibit extreme diversity in size, sequence, secondary structure, and biogenesis pathway. However, the evolutionary pathways leading to such unusual diversity among eukaryotic kingdoms remain elusive. Within the metazoan kingdom, the study of TR has been limited to vertebrates and echinoderms. To understand the origin and evolution of TR across the animal kingdom, we employed a phylogeny-guided, structure-based bioinformatics approach to identify 82 novel TRs from eight previously unexplored metazoan phyla, including the basal-branching sponges. Synthetic TRs from two representative species, a hemichordate and a mollusk, reconstitute active telomerase in vitro with their corresponding telomerase reverse transcriptase components, confirming that they are authentic TRs. Comparative analysis shows that three functional domains, template-pseudoknot (T-PK), CR4/5, and box H/ACA, are conserved between vertebrate and the basal metazoan lineages, indicating a monophyletic origin of the animal TRs with a snoRNA-related biogenesis mechanism. Nonetheless, TRs along separate animal lineages evolved with divergent structural elements in the T-PK and CR4/5 domains. For example, TRs from echinoderms and protostomes lack the canonical CR4/5 and have independently evolved functionally equivalent domains with different secondary structures. In the T-PK domain, a P1.1 stem common in most metazoan clades defines the template boundary, which is replaced by a P1-defined boundary in vertebrates. This study provides unprecedented insight into the divergent evolution of detailed TR secondary structures across broad metazoan lineages, revealing ancestral and later-diversified elements.
Collapse
Affiliation(s)
| | - Yang Li
- School of Molecular Sciences, Arizona State University, Tempe, AZ
| | | | - Julian J-L Chen
- School of Molecular Sciences, Arizona State University, Tempe, AZ
| |
Collapse
|
6
|
Liu B, Thippabhotla S, Zhang J, Zhong C. DRAGoM: Classification and Quantification of Noncoding RNA in Metagenomic Data. Front Genet 2021; 12:669495. [PMID: 34025724 PMCID: PMC8131839 DOI: 10.3389/fgene.2021.669495] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 03/23/2021] [Indexed: 12/21/2022] Open
Abstract
Noncoding RNAs (ncRNAs) play important regulatory and functional roles in microorganisms, such as regulation of gene expression, signaling, protein synthesis, and RNA processing. Hence, their classification and quantification are central tasks toward the understanding of the function of the microbial community. However, the majority of the current metagenomic sequencing technologies generate short reads, which may contain only a partial secondary structure that complicates ncRNA homology detection. Meanwhile, de novo assembly of the metagenomic sequencing data remains challenging for complex communities. To tackle these challenges, we developed a novel algorithm called DRAGoM (Detection of RNA using Assembly Graph from Metagenomic data). DRAGoM first constructs a hybrid graph by merging an assembly string graph and an assembly de Bruijn graph. Then, it classifies paths in the hybrid graph and their constituent readsinto differentncRNA families based on both sequence and structural homology. Our benchmark experiments show that DRAGoMcan improve the performance and robustness over traditional approaches on the classification and quantification of a wide class of ncRNA families.
Collapse
Affiliation(s)
- Ben Liu
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States
| | - Sirisha Thippabhotla
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States
| | - Jun Zhang
- Division of Medical Oncology, Department of Internal Medicine, University of Kansas Medical Center, Kansas City, KS, United States.,Department of Cancer Biology, University of Kansas Medical Center, Kansas City, KS, United States
| | - Cuncong Zhong
- Department of Electrical Engineering and Computer Science, The University of Kansas, Lawrence, KS, United States.,Bioengineering Program, The University of Kansas, Lawrence, KS, United States.,Center for Computational Biology, The University of Kansas, Lawrence, KS, United States
| |
Collapse
|
7
|
Velandia-Huerto CA, Fallmann J, Stadler PF. miRNAture-Computational Detection of microRNA Candidates. Genes (Basel) 2021; 12:348. [PMID: 33673400 PMCID: PMC7996739 DOI: 10.3390/genes12030348] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2021] [Revised: 02/19/2021] [Accepted: 02/20/2021] [Indexed: 12/16/2022] Open
Abstract
Homology-based annotation of short RNAs, including microRNAs, is a difficult problem because their inherently small size limits the available information. Highly sensitive methods, including parameter optimized blast, nhmmer, or cmsearch runs designed to increase sensitivity inevitable lead to large numbers of false positives, which can be detected only by detailed analysis of specific features typical for a RNA family and/or the analysis of conservation patterns in structure-annotated multiple sequence alignments. The miRNAture pipeline implements a workflow specific to animal microRNAs that automatizes homology search and validation steps. The miRNAture pipeline yields very good results for a large number of "typical" miRBase families. However, it also highlights difficulties with atypical cases, in particular microRNAs deriving from repetitive elements and microRNAs with unusual, branched precursor structures and atypical locations of the mature product, which require specific curation by domain experts.
Collapse
Affiliation(s)
- Cristian A. Velandia-Huerto
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
| | - Peter F. Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Leipzig University, D-04107 Leipzig, Germany
- Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany
- Institute for Theoretical Chemistry, University of Vienna, A-1090 Wien, Austria
- Facultad de Ciencias, Universidad National de Colombia, CO-111321 Bogotá, Colombia
- Santa Fe Insitute, Santa Fe, NM 87501, USA
| |
Collapse
|
8
|
Crickard JB, Moevus CJ, Kwon Y, Sung P, Greene EC. Rad54 Drives ATP Hydrolysis-Dependent DNA Sequence Alignment during Homologous Recombination. Cell 2020; 181:1380-1394.e18. [PMID: 32502392 DOI: 10.1016/j.cell.2020.04.056] [Citation(s) in RCA: 59] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Revised: 03/07/2020] [Accepted: 04/29/2020] [Indexed: 12/30/2022]
Abstract
Homologous recombination (HR) helps maintain genome integrity, and HR defects give rise to disease, especially cancer. During HR, damaged DNA must be aligned with an undamaged template through a process referred to as the homology search. Despite decades of study, key aspects of this search remain undefined. Here, we use single-molecule imaging to demonstrate that Rad54, a conserved Snf2-like protein found in all eukaryotes, switches the search from the diffusion-based pathways characteristic of the basal HR machinery to an active process in which DNA sequences are aligned via an ATP-dependent molecular motor-driven mechanism. We further demonstrate that Rad54 disrupts the donor template strands, enabling the search to take place within a migrating DNA bubble-like structure that is bound by replication protein A (RPA). Our results reveal that Rad54, working together with RPA, fundamentally alters how DNA sequences are aligned during HR.
Collapse
Affiliation(s)
- J Brooks Crickard
- Department of Biochemistry & Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Corentin J Moevus
- Department of Biochemistry & Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | - Youngho Kwon
- Department of Biochemistry and Structural Biology, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Patrick Sung
- Department of Biochemistry and Structural Biology, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Eric C Greene
- Department of Biochemistry & Molecular Biophysics, Columbia University, New York, NY 10032, USA.
| |
Collapse
|
9
|
Abstract
BACKGROUND Gene homology type classification is required for many types of genome analyses, including comparative genomics, phylogenetics, and protein function annotation. Consequently, a large variety of tools have been developed to perform homology classification across genomes of different species. However, when applied to large genomic data sets, these tools require high memory and CPU usage, typically available only in computational clusters. FINDINGS Here we present a new graph-based orthology analysis tool, SwiftOrtho, which is optimized for speed and memory usage when applied to large-scale data. SwiftOrtho uses long k-mers to speed up homology search, while using a reduced amino acid alphabet and spaced seeds to compensate for the loss of sensitivity due to long k-mers. In addition, it uses an affinity propagation algorithm to reduce the memory usage when clustering large-scale orthology relationships into orthologous groups. In our tests, SwiftOrtho was the only tool that completed orthology analysis of proteins from 1,760 bacterial genomes on a computer with only 4 GB RAM. Using various standard orthology data sets, we also show that SwiftOrtho has a high accuracy. CONCLUSIONS SwiftOrtho enables the accurate comparative genomic analyses of thousands of genomes using low-memory computers. SwiftOrtho is available at https://github.com/Rinoahu/SwiftOrtho.
Collapse
Affiliation(s)
- Xiao Hu
- Department of Veterinary Microbiology and Preventive Medicine, 2118 Veterinary Medicine, College of Veterinary Medicine, Iowa State University, Ames, IA, 50011, USA
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, 2118 Veterinary Medicine, College of Veterinary Medicine, Iowa State University, Ames, IA, 50011, USA
| |
Collapse
|
10
|
Smith MJ, Bryant EE, Rothstein R. Increased chromosomal mobility after DNA damage is controlled by interactions between the recombination machinery and the checkpoint. Genes Dev 2018; 32:1242-1251. [PMID: 30181361 PMCID: PMC6120718 DOI: 10.1101/gad.317966.118] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Accepted: 07/02/2018] [Indexed: 12/20/2022]
Abstract
In this study, Smith et al. investigated how cells modulate chromosome mobility in response to DNA damage. They show that global chromosome mobility is regulated by the Rad51 recombinase and its mediator, Rad52, and their findings indicate that interplay between recombination factors and the checkpoint restricts increased mobility until recombination proteins are assembled at damaged sites. During homologous recombination, cells must coordinate repair, DNA damage checkpoint signaling, and movement of chromosomal loci to facilitate homology search. In Saccharomyces cerevisiae, increased movement of damaged loci (local mobility) and undamaged loci (global mobility) precedes homolog pairing in mitotic cells. How cells modulate chromosome mobility in response to DNA damage remains unclear. Here, we demonstrate that global chromosome mobility is regulated by the Rad51 recombinase and its mediator, Rad52. Surprisingly, rad51Δ rad52Δ cells display checkpoint-dependent constitutively increased mobility, indicating that a regulatory circuit exists between recombination and checkpoint machineries to govern chromosomal mobility. We found that the requirement for Rad51 in this circuit is distinct from its role in recombination and that interaction with Rad52 is necessary to alleviate inhibition imposed by mediator recruitment to ssDNA. Thus, interplay between recombination factors and the checkpoint restricts increased mobility until recombination proteins are assembled at damaged sites.
Collapse
Affiliation(s)
- Michael J Smith
- Department of Genetics and Development, Columbia University Medical Center, New York, New York 10032, USA
| | - Eric E Bryant
- Department of Biological Sciences, Columbia University, New York, New York 10027, USA
| | - Rodney Rothstein
- Department of Genetics and Development, Columbia University Medical Center, New York, New York 10032, USA
| |
Collapse
|
11
|
Waldl M, Thiel BC, Ochsenreiter R, Holzenleiter A, de Araujo Oliveira JV, Walter MEMT, Wolfinger MT, Stadler PF. TERribly Difficult: Searching for Telomerase RNAs in Saccharomycetes. Genes (Basel) 2018; 9:genes9080372. [PMID: 30049970 PMCID: PMC6115765 DOI: 10.3390/genes9080372] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 07/17/2018] [Accepted: 07/18/2018] [Indexed: 11/20/2022] Open
Abstract
The telomerase RNA in yeasts is large, usually >1000 nt, and contains functional elements that have been extensively studied experimentally in several disparate species. Nevertheless, they are very difficult to detect by homology-based methods and so far have escaped annotation in the majority of the genomes of Saccharomycotina. This is a consequence of sequences that evolve rapidly at nucleotide level, are subject to large variations in size, and are highly plastic with respect to their secondary structures. Here, we report on a survey that was aimed at closing this gap in RNA annotation. Despite considerable efforts and the combination of a variety of different methods, it was only partially successful. While 27 new telomerase RNAs were identified, we had to restrict our efforts to the subgroup Saccharomycetacea because even this narrow subgroup was diverse enough to require different search models for different phylogenetic subgroups. More distant branches of the Saccharomycotina remain without annotated telomerase RNA.
Collapse
Affiliation(s)
- Maria Waldl
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.
| | - Bernhard C Thiel
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.
| | - Roman Ochsenreiter
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.
| | - Alexander Holzenleiter
- BioInformatics Group, Fakultät CB Hochschule Mittweida, Technikumplatz 17, D-09648 Mittweida, Germany.
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstraße 16-18, D-04107 Leipzig, Germany.
| | - João Victor de Araujo Oliveira
- Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade de Brasília, Campus Universitário⁻Asa Norte, Brasília, DF CEP: 70910-900, Brazil.
| | - Maria Emília M T Walter
- Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade de Brasília, Campus Universitário⁻Asa Norte, Brasília, DF CEP: 70910-900, Brazil.
| | - Michael T Wolfinger
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.
- Center for Anatomy and Cell Biology, Medical University of Vienna, Währingerstraße 13, 1090 Vienna, Austria.
| | - Peter F Stadler
- Institute for Theoretical Chemistry, University of Vienna, Währingerstraße 17, A-1090 Wien, Austria.
- German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, Universität Leipzig, D-04107 Leipzig, Germany.
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany.
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM 87501, USA.
| |
Collapse
|
12
|
Lott SC, Schäfer RA, Mann M, Backofen R, Hess WR, Voß B, Georg J. GLASSgo - Automated and Reliable Detection of sRNA Homologs From a Single Input Sequence. Front Genet 2018; 9:124. [PMID: 29719549 PMCID: PMC5913331 DOI: 10.3389/fgene.2018.00124] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2018] [Accepted: 03/26/2018] [Indexed: 11/24/2022] Open
Abstract
Bacterial small RNAs (sRNAs) are important post-transcriptional regulators of gene expression. The functional and evolutionary characterization of sRNAs requires the identification of homologs, which is frequently challenging due to their heterogeneity, short length and partly, little sequence conservation. We developed the GLobal Automatic Small RNA Search go (GLASSgo) algorithm to identify sRNA homologs in complex genomic databases starting from a single sequence. GLASSgo combines an iterative BLAST strategy with pairwise identity filtering and a graph-based clustering method that utilizes RNA secondary structure information. We tested the specificity, sensitivity and runtime of GLASSgo, BLAST and the combination RNAlien/cmsearch in a typical use case scenario on 40 bacterial sRNA families. The sensitivity of the tested methods was similar, while the specificity of GLASSgo and RNAlien/cmsearch was significantly higher than that of BLAST. GLASSgo was on average ∼87 times faster than RNAlien/cmsearch, and only ∼7.5 times slower than BLAST, which shows that GLASSgo optimizes the trade-off between speed and accuracy in the task of finding sRNA homologs. GLASSgo is fully automated, whereas BLAST often recovers only parts of homologs and RNAlien/cmsearch requires extensive additional bioinformatic work to get a comprehensive set of homologs. GLASSgo is available as an easy-to-use web server to find homologous sRNAs in large databases.
Collapse
Affiliation(s)
- Steffen C Lott
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Freiburg, Germany
| | - Richard A Schäfer
- Institute of Biochemical Engineering, University of Stuttgart, Stuttgart, Germany
| | - Martin Mann
- Bioinformatics Group, Faculty of Computer Science, University of Freiburg, Freiburg, Germany.,Forest Growth and Dendroecology, Institute of Forest Sciences, University of Freiburg, Freiburg, Germany
| | - Rolf Backofen
- Bioinformatics Group, Faculty of Computer Science, University of Freiburg, Freiburg, Germany.,ZBSA Center for Biological Systems Analysis, University of Freiburg, Freiburg, Germany.,BIOSS Centre for Biological Signalling Studies, Cluster of Excellence, University of Freiburg, Freiburg, Germany.,Center for Non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark
| | - Wolfgang R Hess
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Freiburg, Germany.,Freiburg Institute for Advanced Studies, University of Freiburg, Freiburg, Germany
| | - Björn Voß
- Institute of Biochemical Engineering, University of Stuttgart, Stuttgart, Germany
| | - Jens Georg
- Genetics and Experimental Bioinformatics, Faculty of Biology, University of Freiburg, Freiburg, Germany
| |
Collapse
|
13
|
Abstract
The 7SK RNA is a small nuclear RNA that is involved in the regulation of Pol-II transcription. It is very well conserved in vertebrates, but shows extensive variations in both sequence and structure across invertebrates. A systematic homology search extended the collection of 7SK genes in both Arthropods and Lophotrochozoa making use of the large number of recently published invertebrate genomes. The extended data set made it possible to infer complete consensus structures for invertebrate 7SK RNAs. These show that not only the well-conserved 5'- and 3'- domains but all the interior Stem A domain is universally conserved. In contrast, Stem B region exhibits substantial structural variation and does not adhere to a common structural model beyond phylum level.
Collapse
Affiliation(s)
- Ali M Yazbeck
- a Bioinformatics Group, Department of Computer Science , Leipzig University , Härtelstraße 16-18, Leipzig , Germany.,b Lebanese University, Doctoral School for Science and Technology, Rafic Hariri University Campus , Hadath , Lebanon
| | - Kifah R Tout
- b Lebanese University, Doctoral School for Science and Technology, Rafic Hariri University Campus , Hadath , Lebanon
| | - Peter F Stadler
- a Bioinformatics Group, Department of Computer Science , Leipzig University , Härtelstraße 16-18, Leipzig , Germany.,c Interdisciplinary Center for Bioinformatics, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases , Leipzig University.,d Department of Diagnostics , Fraunhofer Institute for Cell Therapy and Immunology - IZI , Perlickstraße 1, D-04103 Leipzig , Germany.,e Max Planck Institute for Mathematics in the Sciences , Inselstraße 22, D-04103 Leipzig , Germany.,f Department of Theoretical Chemistry , University of Vienna , Währingerstraße 17, A-1090 Wien , Austria.,g Center for non-coding RNA in Technology and Health , University of Copenhagen , Grønnegårdsvej 3, DK-1870 Frederiksberg C , Denmark.,h Santa Fe Institute , 1399 Hyde Park Rd., Santa Fe , NM 87501 , USA
| |
Collapse
|
14
|
Samarajeewa DA, Manitchotpisit P, Henderson M, Xiao H, Rehard DG, Edwards KA, Shiu PKT, Hammond TM. An RNA Recognition Motif-Containing Protein Functions in Meiotic Silencing by Unpaired DNA. G3 (Bethesda) 2017; 7:2871-82. [PMID: 28667016 DOI: 10.1534/g3.117.041848] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Meiotic silencing by unpaired DNA (MSUD) is a biological process that searches pairs of homologous chromosomes (homologs) for segments of DNA that are unpaired. Genes found within unpaired segments are silenced for the duration of meiosis. In this report, we describe the identification and characterization of Neurospora crassa sad-7, a gene that encodes a protein with an RNA recognition motif (RRM). Orthologs of sad-7 are found in a wide range of ascomycete fungi. In N. crassa, sad-7 is required for a fully efficient MSUD response to unpaired genes. Additionally, at least one parent must have a functional sad-7 allele for a cross to produce ascospores. Although sad-7-null crosses are barren, sad-7Δ strains grow at a wild-type (wt) rate and appear normal under vegetative growth conditions. With respect to expression, sad-7 is transcribed at baseline levels in early vegetative cultures, at slightly higher levels in mating-competent cultures, and is at its highest level during mating. These findings suggest that SAD-7 is specific to mating-competent and sexual cultures. Although the role of SAD-7 in MSUD remains elusive, green fluorescent protein (GFP)-based tagging studies place SAD-7 within nuclei, perinuclear regions, and cytoplasmic foci of meiotic cells. This localization pattern is unique among known MSUD proteins and raises the possibility that SAD-7 coordinates nuclear, perinuclear, and cytoplasmic aspects of MSUD.
Collapse
|
15
|
Piazza A, Wright WD, Heyer WD. Multi-invasions Are Recombination Byproducts that Induce Chromosomal Rearrangements. Cell 2017; 170:760-773.e15. [PMID: 28781165 DOI: 10.1016/j.cell.2017.06.052] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Revised: 05/02/2017] [Accepted: 06/30/2017] [Indexed: 11/18/2022]
Abstract
Inaccurate repair of broken chromosomes generates structural variants that can fuel evolution and inflict pathology. We describe a novel rearrangement mechanism in which translocation between intact chromosomes is induced by a lesion on a third chromosome. This multi-invasion-induced rearrangement (MIR) stems from a homologous recombination byproduct, where a broken DNA end simultaneously invades two intact donors. No homology is required between the donors, and the intervening sequence from the invading molecule is inserted at the translocation site. MIR is stimulated by increasing homology length and spatial proximity of the donors and depends on the overlapping activities of the structure-selective endonucleases Mus81-Mms4, Slx1-Slx4, and Yen1. Conversely, the 3'-flap nuclease Rad1-Rad10 and enzymes known to disrupt recombination intermediates (Sgs1-Top3-Rmi1, Srs2, and Mph1) inhibit MIR. Resolution of MIR intermediates propagates secondary chromosome breaks that frequently cause additional rearrangements. MIR features have implications for the formation of simple and complex rearrangements underlying human pathologies.
Collapse
Affiliation(s)
- Aurèle Piazza
- Department of Microbiology and Molecular Genetics, One Shields Avenue, University of California, Davis, Davis, CA 95616, USA
| | - William Douglass Wright
- Department of Microbiology and Molecular Genetics, One Shields Avenue, University of California, Davis, Davis, CA 95616, USA
| | - Wolf-Dietrich Heyer
- Department of Microbiology and Molecular Genetics, One Shields Avenue, University of California, Davis, Davis, CA 95616, USA; Department of Molecular and Cellular Biology, One Shields Avenue, University of California, Davis, Davis, CA 95616, USA.
| |
Collapse
|
16
|
Gul Z, Barozai MYK, Din M. In-silico based identification and functional analyses of miRNAs and their targets in Cowpea ( Vigna unguiculata L.). AIMS Genet 2017; 4:138-165. [PMID: 31435506 PMCID: PMC6690248 DOI: 10.3934/genet.2017.2.138] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Accepted: 06/15/2017] [Indexed: 11/18/2022]
Abstract
Cowpea (Vigna unguiculata L.) is an important leguminous plant and a good diet due to presence of carbohydrate and high protein contents. Currently, only few cowpea microRNAs (miRNAs) are reported. This study is intended to identify and functionally analyze new miRNAs and their targets in cowpea. An in-silico based homology search approach was applied and a total of 46 new miRNAs belonging to 45 families were identified and functionally annotated from the cowpea expressed sequence tags (ESTs). All these potential miRNAs are reported here for the first time in cowpea. The 46 new miRNAs were also observed with stable hairpin structures with minimum free energy, ranging from -10 to -132 kcal mol-1 with an average of -40 kcal mol-1. The length of new cowpea miRNAs are ranged from 18 to 26 nt with an average of 21 nt. The cowpea miRNA-vun-mir4414, is found as pre-miRNA cluster for the first time in cowpea. Furthermore, a set of 138 protein targets were also identified for these newly identified 46 cowpea miRNAs. These targets have significant role in various biological processes, like metabolism, transcription regulation as transcription factor, cell transport, signal transduction, growth & development and structural proteins. These findings are the significant basis to utilize and manage this important leguminous plant-cowpea for better nutritional properties and tolerance for biotic and abiotic stresses.
Collapse
Affiliation(s)
- Zareen Gul
- Department of Botany, University of Balochistan, Sariab Road, Quetta, Pakistan
| | | | - Muhammad Din
- Department of Botany, University of Balochistan, Sariab Road, Quetta, Pakistan
| |
Collapse
|
17
|
Mehta A, Beach A, Haber JE. Homology Requirements and Competition between Gene Conversion and Break-Induced Replication during Double-Strand Break Repair. Mol Cell 2017; 65:515-526.e3. [PMID: 28065599 DOI: 10.1016/j.molcel.2016.12.003] [Citation(s) in RCA: 54] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Revised: 09/27/2016] [Accepted: 12/01/2016] [Indexed: 11/27/2022]
Abstract
Saccharomyces cerevisiae mating-type switching is initiated by a double-strand break (DSB) at MATa, leaving one cut end perfectly homologous to the HMLα donor, while the second end must be processed to remove a non-homologous tail before completing repair by gene conversion (GC). When homology at the matched end is ≤150 bp, efficient repair depends on the recombination enhancer, which tethers HMLα near the DSB. Thus, homology shorter than an apparent minimum efficient processing segment can be rescued by tethering the donor near the break. When homology at the second end is ≤150 bp, second-end capture becomes inefficient and repair shifts from GC to break-induced replication (BIR). But when pol32 or pif1 mutants block BIR, GC increases 3-fold, indicating that the steps blocked by these mutations are reversible. With short second-end homology, absence of the RecQ helicase Sgs1 promotes gene conversion, whereas deletion of the FANCM-related Mph1 helicase promotes BIR.
Collapse
Affiliation(s)
- Anuja Mehta
- Department of Biology and Rosenstiel Basic Medical Sciences Research Center, Brandeis University, Waltham, MA 02454, USA
| | - Annette Beach
- Department of Biology and Rosenstiel Basic Medical Sciences Research Center, Brandeis University, Waltham, MA 02454, USA
| | - James E Haber
- Department of Biology and Rosenstiel Basic Medical Sciences Research Center, Brandeis University, Waltham, MA 02454, USA.
| |
Collapse
|
18
|
Bell JC, Kowalczykowski SC. RecA: Regulation and Mechanism of a Molecular Search Engine. Trends Biochem Sci 2016; 41:491-507. [PMID: 27156117 DOI: 10.1016/j.tibs.2016.04.002] [Citation(s) in RCA: 137] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Revised: 04/04/2016] [Accepted: 04/05/2016] [Indexed: 11/19/2022]
Abstract
Homologous recombination maintains genomic integrity by repairing broken chromosomes. The broken chromosome is partially resected to produce single-stranded DNA (ssDNA) that is used to search for homologous double-stranded DNA (dsDNA). This homology driven 'search and rescue' is catalyzed by a class of DNA strand exchange proteins that are defined in relation to Escherichia coli RecA, which forms a filament on ssDNA. Here, we review the regulation of RecA filament assembly and the mechanism by which RecA quickly and efficiently searches for and identifies a unique homologous sequence among a vast excess of heterologous DNA. Given that RecA is the prototypic DNA strand exchange protein, its behavior affords insight into the actions of eukaryotic RAD51 orthologs and their regulators, BRCA2 and other tumor suppressors.
Collapse
Affiliation(s)
- Jason C Bell
- Department of Microbiology and Molecular Genetics and Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA
| | - Stephen C Kowalczykowski
- Department of Microbiology and Molecular Genetics and Department of Molecular and Cellular Biology, University of California, Davis, CA 95616, USA.
| |
Collapse
|
19
|
Abstract
Homologous recombination allows for the regulated exchange of genetic information between two different DNA molecules of identical or nearly identical sequence composition, and is a major pathway for the repair of double-stranded DNA breaks. A key facet of homologous recombination is the ability of recombination proteins to perfectly align the damaged DNA with homologous sequence located elsewhere in the genome. This reaction is referred to as the homology search and is akin to the target searches conducted by many different DNA-binding proteins. Here I briefly highlight early investigations into the homology search mechanism, and then describe more recent research. Based on these studies, I summarize a model that includes a combination of intersegmental transfer, short-distance one-dimensional sliding, and length-specific microhomology recognition to efficiently align DNA sequences during the homology search. I also suggest some future directions to help further our understanding of the homology search. Where appropriate, I direct the reader to other recent reviews describing various issues related to homologous recombination.
Collapse
Affiliation(s)
- Eric C Greene
- From the Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032
| |
Collapse
|
20
|
Lee CS, Wang RW, Chang HH, Capurso D, Segal MR, Haber JE. Chromosome position determines the success of double-strand break repair. Proc Natl Acad Sci U S A 2016; 113:E146-54. [PMID: 26715752 DOI: 10.1073/pnas.1523660113] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Repair of a chromosomal double-strand break (DSB) by gene conversion depends on the ability of the broken ends to encounter a donor sequence. To understand how chromosomal location of a target sequence affects DSB repair, we took advantage of genome-wide Hi-C analysis of yeast chromosomes to create a series of strains in which an induced site-specific DSB in budding yeast is repaired by a 2-kb donor sequence inserted at different locations. The efficiency of repair, measured by cell viability or competition between each donor and a reference site, showed a strong correlation (r = 0.85 and 0.79) with the contact frequencies of each donor with the DSB repair site. Repair efficiency depends on the distance between donor and recipient rather than any intrinsic limitation of a particular donor site. These results further demonstrate that the search for homology is the rate-limiting step in DSB repair and suggest that cells often fail to repair a DSB because they cannot locate a donor before other, apparently lethal, processes arise. The repair efficiency of a donor locus can be improved by four factors: slower 5' to 3' resection of the DSB ends, increased abundance of replication protein factor A (RPA), longer shared homology, or presence of a recombination enhancer element adjacent to a donor.
Collapse
|
21
|
Abstract
RNA family models describe classes of functionally related, non-coding RNAs based on sequence and structure conservation. The most important method for modeling RNA families is the use of covariance models, which are stochastic models that serve in the discovery of yet unknown, homologous RNAs. However, the performance of covariance models in finding remote homologs is poor for RNA families with high sequence conservation, while for families with high structure but low sequence conservation, these models are difficult to built in the first place. A complementary approach to RNA family modeling involves the use of thermodynamic matchers. Thermodynamic matchers are RNA folding programs, based on the established thermodynamic model, but tailored to a specific structural motif. As thermodynamic matchers focus on structure and folding energy, they unfold their potential in discovering homologs, when high structure conservation is paired with low sequence conservation. In contrast to covariance models, construction of thermodynamic matchers does not require an input alignment, but requires human design decisions and experimentation, and hence, model construction is more laborious. Here we report a case study on an RNA family that was constructed by means of thermodynamic matchers. It starts from a set of known but structurally different members of the same RNA family. The consensus secondary structure of this family consists of 2 to 4 adjacent hairpins. Each hairpin loop carries the same motif, CCUCCUCCC, while the stems show high variability in their nucleotide content. The present study describes (1) a novel approach for the integration of the structurally varying family into a single RNA family model by means of the thermodynamic matcher methodology, and (2) provides the results of homology searches that were conducted with this model in a wide spectrum of bacterial species.
Collapse
Key Words
- CIN, conserved intergenic neighborhood
- CM, covariance model
- HMM, hidden Markov model
- MFE, minimum free energy
- OG, orthologous group of genes
- RBS, ribosome binding site
- RFM, RNA family model
- TDM, thermodynamic matcher
- aSD, anti Shine-Dalgarno
- alphaproteobacteria
- cuckoo RNA
- dRNA-seq, differential RNA sequencing
- family model
- homology search
- sRNA, small non-coding RNA
- small RNA
- structural RNA
- thermodynamic matcher
Collapse
Affiliation(s)
- Jan Reinkensmeier
- a Universität Bielefeld ; Technische Fakultät and Center of Biotechnology ; Bielefeld , Germany
| | | |
Collapse
|
22
|
Carroll HD, Williams AC, Davis AG, Spouge JL. Improving Retrieval Efficacy of Homology Searches Using the False Discovery Rate. IEEE/ACM Trans Comput Biol Bioinform 2015; 12:531-537. [PMID: 26357264 PMCID: PMC4568567 DOI: 10.1109/tcbb.2014.2366112] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Over the past few decades, discovery based on sequence homology has become a widely accepted practice. Consequently, comparative accuracy of retrieval algorithms (e.g., BLAST) has been rigorously studied for improvement. Unlike most components of retrieval algorithms, the E-value threshold criterion has yet to be thoroughly investigated. An investigation of the threshold is important as it exclusively dictates which sequences are declared relevant and irrelevant. In this paper, we introduce the false discovery rate (FDR) statistic as a replacement for the uniform threshold criterion in order to improve efficacy in retrieval systems. Using NCBI's BLAST and PSI-BLAST software packages, we demonstrate the applicability of such a replacement in both non-iterative (BLASTFDR) and iterative (PSI-BLAST(FDR)) homology searches. For each application, we performed an evaluation of retrieval efficacy with five different multiple testing methods on a large training database. For each algorithm, we choose the best performing method, Benjamini-Hochberg, as the default statistic. As measured by the threshold average precision, BLAST(FDR) yielded 14.1 percent better retrieval performance than BLAST on a large (5,161 queries) test database and PSI-BLAST(FDR) attained 11.8 percent better retrieval performance than PSI-BLAST. The C++ source code specific to BLAST(FDR) and PSI-BLAST(FDR) and instructions are available at http://www.cs.mtsu.edu/~hcarroll/blast_fdr/.
Collapse
Affiliation(s)
- Hyrum D. Carroll
- Department of Computer Science, Middle Tennessee State University, Murfreesboro, TN, 37128
| | - Alex C. Williams
- Department of Computer Science, Middle Tennessee State University, Murfreesboro, TN, 37128
| | - Anthony G. Davis
- Department of Computer Science, Middle Tennessee State University, Murfreesboro, TN, 37128
| | - John L. Spouge
- National Center for Biotechnology Information, Bethesda, MD 20894
| |
Collapse
|
23
|
Yakimov AP, Seregina TA, Kholodnyak AA, Kreneva RA, Mironov AS, Perumov DA, Timkovskii AL. Possible Function of the ribT Gene of Bacillus subtilis: Theoretical Prediction, Cloning, and Expression. Acta Naturae 2014; 6:106-9. [PMID: 25349719 PMCID: PMC4207565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The complete decipherment of the functions and interactions of the elements of the riboflavin biosynthesis operon (rib operon) of Bacillus subtilis are necessary for the development of superproducers of this important vitamin. The function of its terminal ribT gene has not been established to date. In this work, a search for homologs of the hypothetical amino acid sequence of the gene product through databases, as well as an analysis of the homolgs, was performed; the distribution of secondary structure elements was theoretically predicted; and the tertiary structure of the RibT protein was proposed. The ribT gene nucleotide sequence was amplified and cloned into the standard high-copy expression vector pET15b and then expressed after induction with IPTG in E. coli BL21 (DE3) strain cells containing the inducible phage T7 RNA polymerase gene. The ribT gene expression was confirmed by SDS-PAGE. The protein product of the expression was purified by affinity chromatography. Therefore, the real possibility of RibT protein production in quantities sufficient for further investigation of its structure and functional activity was demonstrated.
Collapse
Affiliation(s)
- A. P. Yakimov
- B.P. Konstantinov Petersburg Nuclear Physics Institute, National Research Center “Kurchatov Institute”, Orlova Roshcha, Gatchina, Leningrad Region, Russia, 188300
- St. Petersburg State Polytechnical University, Polytechnicheskaya Str., 29, St. Petersburg, Russia, 195251
| | - T. A. Seregina
- State Research Institute of Genetics and Selection of Industrial Microorganisms, 1st Dorozhnyi Proezd, 1, Moscow, Russia, 117545
| | - A. A. Kholodnyak
- State Research Institute of Genetics and Selection of Industrial Microorganisms, 1st Dorozhnyi Proezd, 1, Moscow, Russia, 117545
| | - R. A. Kreneva
- B.P. Konstantinov Petersburg Nuclear Physics Institute, National Research Center “Kurchatov Institute”, Orlova Roshcha, Gatchina, Leningrad Region, Russia, 188300
| | - A. S. Mironov
- State Research Institute of Genetics and Selection of Industrial Microorganisms, 1st Dorozhnyi Proezd, 1, Moscow, Russia, 117545
| | - D. A. Perumov
- B.P. Konstantinov Petersburg Nuclear Physics Institute, National Research Center “Kurchatov Institute”, Orlova Roshcha, Gatchina, Leningrad Region, Russia, 188300
| | - A. L. Timkovskii
- B.P. Konstantinov Petersburg Nuclear Physics Institute, National Research Center “Kurchatov Institute”, Orlova Roshcha, Gatchina, Leningrad Region, Russia, 188300
- St. Petersburg State Polytechnical University, Polytechnicheskaya Str., 29, St. Petersburg, Russia, 195251
| |
Collapse
|
24
|
Zhou P, Silverstein KAT, Gao L, Walton JD, Nallu S, Guhlin J, Young ND. Detecting small plant peptides using SPADA (Small Peptide Alignment Discovery Application). BMC Bioinformatics 2013; 14:335. [PMID: 24256031 PMCID: PMC3924332 DOI: 10.1186/1471-2105-14-335] [Citation(s) in RCA: 74] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2013] [Accepted: 11/15/2013] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Small peptides encoded as one- or two-exon genes in plants have recently been shown to affect multiple aspects of plant development, reproduction and defense responses. However, popular similarity search tools and gene prediction techniques generally fail to identify most members belonging to this class of genes. This is largely due to the high sequence divergence among family members and the limited availability of experimentally verified small peptides to use as training sets for homology search and ab initio prediction. Consequently, there is an urgent need for both experimental and computational studies in order to further advance the accurate prediction of small peptides. RESULTS We present here a homology-based gene prediction program to accurately predict small peptides at the genome level. Given a high-quality profile alignment, SPADA identifies and annotates nearly all family members in tested genomes with better performance than all general-purpose gene prediction programs surveyed. We find numerous mis-annotations in the current Arabidopsis thaliana and Medicago truncatula genome databases using SPADA, most of which have RNA-Seq expression support. We also show that SPADA works well on other classes of small secreted peptides in plants (e.g., self-incompatibility protein homologues) as well as non-secreted peptides outside the plant kingdom (e.g., the alpha-amanitin toxin gene family in the mushroom, Amanita bisporigera). CONCLUSIONS SPADA is a free software tool that accurately identifies and predicts the gene structure for short peptides with one or two exons. SPADA is able to incorporate information from profile alignments into the model prediction process and makes use of it to score different candidate models. SPADA achieves high sensitivity and specificity in predicting small plant peptides such as the cysteine-rich peptide families. A systematic application of SPADA to other classes of small peptides by research communities will greatly improve the genome annotation of different protein families in public genome databases.
Collapse
Affiliation(s)
- Peng Zhou
- Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota 55108, USA
| | - Kevin AT Silverstein
- Supercomputing Institute for Advanced Computational Research, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - Liangliang Gao
- Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota 55108, USA
| | - Jonathan D Walton
- Department of Plant Biology and U.S. Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, Michigan 48824, USA
| | - Sumitha Nallu
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
| | - Joseph Guhlin
- Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota 55108, USA
| | - Nevin D Young
- Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota 55108, USA
- Department of Plant Biology, University of Minnesota, St. Paul, Minnesota 55108, USA
| |
Collapse
|
25
|
|
26
|
Abstract
A key step toward understanding a metagenomics data set is the identification of functional sequence elements within it, such as protein coding genes and structural RNAs. Relative to protein coding genes, structural RNAs are more difficult to identify because of their reduced alphabet size, lack of open reading frames, and short length. Infernal is a software package that implements “covariance models” (CMs) for RNA homology search, which harness both sequence and structural conservation when searching for RNA homologs. Thanks to the added statistical signal inherent in the secondary structure conservation of many RNA families, Infernal is more powerful than sequence-only based methods such as BLAST and profile HMMs. Together with the Rfam database of CMs, Infernal is a useful tool for identifying RNAs in metagenomics data sets.
Collapse
|
27
|
Uva P, Da Sacco L, Del Cornò M, Baldassarre A, Sestili P, Orsini M, Palma A, Gessani S, Masotti A. Rat mir-155 generated from the lncRNA Bic is 'hidden' in the alternate genomic assembly and reveals the existence of novel mammalian miRNAs and clusters. RNA 2013; 19:365-79. [PMID: 23329697 PMCID: PMC3677247 DOI: 10.1261/rna.035394.112] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
MicroRNAs (miRNAs) are a class of small noncoding RNAs acting as post-transcriptional gene expression regulators in many physiological and pathological conditions. During the last few years, many novel mammalian miRNAs have been predicted experimentally with bioinformatics approaches and validated by next-generation sequencing. Although these strategies have prompted the discovery of several miRNAs, the total number of these genes still seems larger. Here, by exploiting the species conservation of human, mouse, and rat hairpin miRNAs, we discovered a novel rat microRNA, mir-155. We found that mature miR-155 is overexpressed in rat spleen myeloid cells treated with LPS, similarly to humans and mice. Rat mir-155 is annotated only on the alternate genome, suggesting the presence of other "hidden" miRNAs on this assembly. Therefore, we comprehensively extended the homology search also to mice and humans, finally validating 34 novel mammalian miRNAs (two in humans, five in mice, and up to 27 in rats). Surprisingly, 15 of these novel miRNAs (one for mice and 14 for rats) were found only on the alternate and not on the reference genomic assembly. To date, our findings indicate that the choice of genomic assembly, when mapping small RNA reads, is an important option that should be carefully considered, at least for these animal models. Finally, the discovery of these novel mammalian miRNA genes may contribute to a better understanding of already acquired experimental data, thereby paving the way to still unexplored investigations and to unraveling the function of miRNAs in disease models.
Collapse
Affiliation(s)
- Paolo Uva
- CRS4 Bioinformatics Laboratory, Parco Scientifico e Tecnologico POLARIS, 09010 Pula, Cagliari, Italy
| | - Letizia Da Sacco
- Gene Expression–Microarrays Laboratory, Bambino Gesù Children’s Hospital, IRCCS, 00165 Rome, Italy
| | - Manuela Del Cornò
- Department of Hematology, Oncology, and Molecular Medicine, Istituto Superiore di Sanità, 00161 Rome, Italy
| | - Antonella Baldassarre
- Gene Expression–Microarrays Laboratory, Bambino Gesù Children’s Hospital, IRCCS, 00165 Rome, Italy
| | - Paola Sestili
- Department of Hematology, Oncology, and Molecular Medicine, Istituto Superiore di Sanità, 00161 Rome, Italy
| | - Massimiliano Orsini
- CRS4 Bioinformatics Laboratory, Parco Scientifico e Tecnologico POLARIS, 09010 Pula, Cagliari, Italy
| | - Alessia Palma
- Genomic Core Facility, Bambino Gesù Children’s Hospital, IRCCS, 00139 Rome, Italy
| | - Sandra Gessani
- Department of Hematology, Oncology, and Molecular Medicine, Istituto Superiore di Sanità, 00161 Rome, Italy
| | - Andrea Masotti
- Gene Expression–Microarrays Laboratory, Bambino Gesù Children’s Hospital, IRCCS, 00165 Rome, Italy
- Corresponding authorE-mail E-mail
| |
Collapse
|
28
|
Kayser JP, Vallet JL, Cerny RL. Defining parameters for homology-tolerant database searching. J Biomol Tech 2004; 15:285-95. [PMID: 15585825 PMCID: PMC2291706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
De novo interpretation of tandem mass spectrometry (MS/MS) spectra provides sequences for searching protein databases when limited sequence information is present in the database. Our objective was to define a strategy for this type of homology-tolerant database search. Homology searches, using MS-Homology software, were conducted with 20, 10, or 5 of the most abundant peptides from 9 proteins, based either on precursor trigger intensity or on total ion current, and allowing for 50%, 30%, or 10% mismatch in the search. Protein scores were corrected by subtracting a threshold score that was calculated from random peptides. The highest (p < .01) corrected protein scores (i.e., above the threshold) were obtained by submitting 20 peptides and allowing 30% mismatch. Using these criteria, protein identification based on ion mass searching using MS/MS data (i.e., Mascot) was compared with that obtained using homology search. The highest-ranking protein was the same using Mascot, homology search using the 20 most intense peptides, or homology search using all peptides, for 63.4% of 112 spots from two-dimensional polyacrylamide gel electrophoresis gels. For these proteins, the percent coverage was greatest using Mascot compared with the use of all or just the 20 most intense peptides in a homology search (25.1%, 18.3%, and 10.6%, respectively). Finally, 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide. This percentage increased when the search was limited to the 20 most intense peptides (44.0%). After identifying the protein using MS-Homology, a peptide mass search may increase the percent coverage of the protein identified.
Collapse
Affiliation(s)
- J P Kayser
- USDA, ARS, RLH US Meat Animal Research Center, Clay Center, NE 68933, USA
| | | | | |
Collapse
|
29
|
Hedman M, Deloof H, Von Heijne G, Elofsson A. Improved detection of homologous membrane proteins by inclusion of information from topology predictions. Protein Sci 2002; 11:652-8. [PMID: 11847287 PMCID: PMC2373465 DOI: 10.1110/ps.39402] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
A total of 20%-25% of the proteins in a typical genome are helical membrane proteins. The transmembrane regions of these proteins have markedly different properties when compared with globular proteins. This presents a problem when homology search algorithms optimized for globular proteins are applied to membrane proteins. Here we present modifications of the standard Smith-Waterman and profile search algorithms that significantly improve the detection of related membrane proteins. The improvement is based on the inclusion of information about predicted transmembrane segments in the alignment algorithm. This is done by simply increasing the alignment score if two residues predicted to belong to transmembrane segments are aligned with each other. Benchmarking over a test set of G-protein-coupled receptor sequences shows that the number of false positives is significantly reduced in this way, both when closely related and distantly related proteins are searched for.
Collapse
Affiliation(s)
- Maria Hedman
- Stockholm Bioinformatics Center, SCFAB, Stockholm University, SE-10691, Stockholm, Sweden
| | | | | | | |
Collapse
|
30
|
Abstract
An automatic procedure is proposed to identify, from the protein sequence database, conserved amino acid patterns (or sequence motifs) that are exclusive to a group of functionally related proteins. This procedure is applied to the PIR database and a dictionary of sequence motifs that relate to specific superfamilies constructed. The motifs have a practical relevance in identifying the membership of specific superfamilies without the need to perform sequence database searches in 20% of newly determined sequences. The sequence motifs identified represent functionally important sites on protein molecules. When multiple blocks exist in a single motif they are often close together in the 3-D structure. Furthermore, occasionally these motif blocks were found to be split by introns when the correlation with exon structures was examined.
Collapse
Affiliation(s)
- A Ogiwara
- Institute for Chemical Research, Kyoto University, Japan
| | | | | | | |
Collapse
|