1
|
dos Santos IC, de Souza RDS, Tolstoy I, Oliveira LS, Gruber A. Integrating Sequence- and Structure-Based Similarity Metrics for the Demarcation of Multiple Viral Taxonomic Levels. Viruses 2025; 17:642. [PMID: 40431654 PMCID: PMC12115509 DOI: 10.3390/v17050642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2025] [Revised: 04/23/2025] [Accepted: 04/25/2025] [Indexed: 05/29/2025] Open
Abstract
Viruses exhibit significantly greater diversity than cellular organisms, posing a complex challenge to their taxonomic classification. While primary sequences may diverge considerably, protein functional domains can maintain conserved 3D structures throughout evolution. Consequently, structural homology of viral proteins can reveal deep taxonomic relationships, overcoming limitations inherent in sequence-based methods. In this work, we introduce MPACT (Multimetric Pairwise Comparison Tool), an integrated tool that utilizes both sequence- and structure-based metrics. The program incorporates five metrics: sequence identity, similarity, maximum likelihood distance, TM-score, and 3Di-character similarity. MPACT generates heatmaps and distance trees to visualize viral relationships across multiple levels, enabling users to substantiate viral taxa demarcation. Taxa delineation can be achieved by specifying appropriate score cutoffs for each metric, facilitating the definition of viral groups, and storing their corresponding sequence data. By analyzing diverse viral datasets spanning various levels of divergence, we demonstrate MPACT's capability to reveal viral relationships, even among distantly related taxa. This tool provides a comprehensive approach to assist viral classification, exceeding the current methods by integrating multiple metrics and uncovering deeper evolutionary connections.
Collapse
Affiliation(s)
- Igor C. dos Santos
- Escola de Artes, Ciências e Humanidades, Universidade de São Paulo, São Paulo 038288-000, Brazil;
| | | | - Igor Tolstoy
- Argentys Informatics, LLC, 12 South Summit Avenue Suite 200, Gaithersburg, MD 20877, USA;
| | - Liliane S. Oliveira
- Department of Computer Science, Federal University of Technology of Paraná (UTFPR), Alberto Carazzai Avenue, 1640, Cornélio Procópio 86300-000, Brazil;
| | - Arthur Gruber
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, Brazil
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| |
Collapse
|
2
|
Nawn D, Hassan SS, Hromić-Jahjefendić A, Bhattacharya T, Basu P, Redwan EM, Barh D, Andrade BS, Aljabali AA, Serrano-Aroca Á, Lundstrom K, Tambuwala MM, Uversky VN. Molecular genomic insights into melanoma associated proteins PRAME and BAP1. J Biomol Struct Dyn 2025:1-31. [PMID: 40084617 DOI: 10.1080/07391102.2025.2475228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Accepted: 02/06/2025] [Indexed: 03/16/2025]
Abstract
Melanoma, a globally prevalent skin cancer with over 325,000 new cases annually, necessitates a comprehensive under- standing of its molecular components. This study looks at the PRAME (cutaneous melanoma-associated antigen) and BAP1 (gene controlling gene-environment interactions) proteins. Both PRAME and BAP1 are associated with critical genomic alterations that significantly influence melanoma progression and patient outcomes. PRAME is overexpressed in various cancers, especially uveal melanoma (UM), where high levels correlate with poor prognosis and genomic instability linked to chromosome 8q12 alterations. Meanwhile, mutations in BAP1 contribute to increased genomic instability and a higher risk of metastasis in UM, highlighting its importance as a key prognostic marker in tumorigenesis. Established approaches along with features proposed in this work are used to investigate sequence conservation, polyglutamic acid presence, intrinsic disorder of proteins, polar-nonpolar residues arrangement PRAME and BAP1 conserved residues highlight their critical roles in protein function and interaction. Sequence invariance indicates the possibility of functional relevance and evolutionary conservation. PRAME has enhanced intrinsic disorder and flexibility, whereas BAP1 has changed disorder-promoting residue sequences. Polyglutamic acid strings are found in both proteins, emphasizing their modulatory involvement in protein interactions. The ratios and spatial arrangement of amino acids have a profound influence on interactions and gene dysregulation. This work contributes to a better knowledge of the two melanoma-associated proteins viz. PRAME and BAP1 by unraveling their structural and functional complexities.
Collapse
Affiliation(s)
- Debaleena Nawn
- Department of Computer Science and Engineering, Adamas University, Jagannathpur, Kolkata, West Bengal, India
| | - Sk Sarif Hassan
- Department of Mathematics, Pingla Thana Mahavidyalaya, Maligram, Paschim Medinipur, West Bengal, India
| | - Altijana Hromić-Jahjefendić
- Department of Genetics and Bioengineering, Faculty of Engineering and Natural Sciences, International University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Tanishta Bhattacharya
- Developmental Genetics (Dept III), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Pallab Basu
- School of Physics, University of the Witwatersrand, Johannesburg, Braamfontein, South Africa
- Adjunct Faculty, Woxsen School of Sciences, Woxsen University, Hyderabad, Telangana, India
| | - Elrashdy M Redwan
- Biological Science Department, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
- Protein Research Department, Therapeutic and Protective Proteins Laboratory, Genetic Engineering and Biotechnology Research Institute, City of Scientific Research and Technological Applications, New Borg EL-Arab, Alexandria, Egypt
| | - Debmalya Barh
- Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur, India
- Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, Brazil
| | - Bruno Silva Andrade
- Department of Biological Sciences, Laboratory of Bioinformatics and Computational Chemistry, State University of Southwest of Bahia (UESB), Jequié, Brazil
| | - Alaa A Aljabali
- Department of Pharmaceutics and Pharmaceutical Technology, Faculty of Pharmacy, Yarmouk University, Irbid, Jordan
| | - Ángel Serrano-Aroca
- Biomaterials and Bioengineering Lab, Centro de Investigación Traslacional San Alberto Magno, Universidad Católica de Valencia San Vicente Mártir, Valencia, Spain
| | | | | | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
| |
Collapse
|
3
|
Zhang C, Wang Q, Li Y, Teng A, Hu G, Wuyun Q, Zheng W. The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction. Biomolecules 2024; 14:1531. [PMID: 39766238 PMCID: PMC11673352 DOI: 10.3390/biom14121531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2024] [Revised: 11/24/2024] [Accepted: 11/27/2024] [Indexed: 01/11/2025] Open
Abstract
Multiple sequence alignment (MSA) has evolved into a fundamental tool in the biological sciences, playing a pivotal role in predicting molecular structures and functions. With broad applications in protein and nucleic acid modeling, MSAs continue to underpin advancements across a range of disciplines. MSAs are not only foundational for traditional sequence comparison techniques but also increasingly important in the context of artificial intelligence (AI)-driven advancements. Recent breakthroughs in AI, particularly in protein and nucleic acid structure prediction, rely heavily on the accuracy and efficiency of MSAs to enhance remote homology detection and guide spatial restraints. This review traces the historical evolution of MSA, highlighting its significance in molecular structure and function prediction. We cover the methodologies used for protein monomers, protein complexes, and RNA, while also exploring emerging AI-based alternatives, such as protein language models, as complementary or replacement approaches to traditional MSAs in application tasks. By discussing the strengths, limitations, and applications of these methods, this review aims to provide researchers with valuable insights into MSA's evolving role, equipping them to make informed decisions in structural prediction research.
Collapse
Affiliation(s)
- Chenyue Zhang
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Qinxin Wang
- Suzhou New & High-Tech Innovation Service Center, Suzhou 215011, China;
| | - Yiyang Li
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Anqi Teng
- Bioscience and Biomedical Engineering Thrust, Systems Hub, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou 511453, China;
| | - Gang Hu
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
| | - Qiqige Wuyun
- Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| | - Wei Zheng
- NITFID, School of Statistics and Data Science, LPMC and KLMDASR, Nankai University, Tianjin 300071, China; (C.Z.); (Y.L.); (G.H.)
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
4
|
Ansel M, Ramachandran K, Dey G, Brunet T. Origin and evolution of microvilli. Biol Cell 2024; 116:e2400054. [PMID: 39233537 DOI: 10.1111/boc.202400054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 07/31/2024] [Accepted: 08/13/2024] [Indexed: 09/06/2024]
Abstract
BACKGROUND INFORMATION Microvilli are finger-like, straight, and stable cellular protrusions that are filled with F-actin and present a stereotypical length. They are present in a broad range of cell types across the animal tree of life and mediate several fundamental functions, including nutrient absorption, photosensation, and mechanosensation. Therefore, understanding the origin and evolution of microvilli is key to reconstructing the evolution of animal cellular form and function. Here, we review the current state of knowledge on microvilli evolution and perform a bioinformatic survey of the conservation of genes encoding microvillar proteins in animals and their unicellular relatives. RESULTS We first present a detailed description of mammalian microvilli based on two well-studied examples, the brush border microvilli of enterocytes and the stereocilia of hair cells. We also survey the broader diversity of microvilli and discuss similarities and differences between microvilli and filopodia. Based on our bioinformatic survey coupled with carefully reconstructed molecular phylogenies, we reconstitute the order of evolutionary appearance of microvillar proteins. We document the stepwise evolutionary assembly of the "molecular microvillar toolkit" with notable bursts of innovation at two key nodes: the last common filozoan ancestor (correlated with the evolution of microvilli distinct from filopodia) and the last common choanozoan ancestor (correlated with the emergence of inter-microvillar adhesions). CONCLUSION AND SIGNIFICANCE We conclude with a scenario for the evolution of microvilli from filopodia-like ancestral structures in unicellular precursors of animals.
Collapse
Affiliation(s)
- Mylan Ansel
- Institut Pasteur, Université Paris-Cité, CNRS UMR3691, Evolutionary Cell Biology and Evolution of Morphogenesis Unit, Paris, France
- Cell Biology and Biophysics, European Molecular Biology Laboratory, Heidelberg, Germany
- Master BioSciences, Département de Biologie, Ecole Normale Supérieure de Lyon, Lyon, France
| | - Kaustubh Ramachandran
- Cell Biology and Biophysics, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Gautam Dey
- Cell Biology and Biophysics, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Thibaut Brunet
- Institut Pasteur, Université Paris-Cité, CNRS UMR3691, Evolutionary Cell Biology and Evolution of Morphogenesis Unit, Paris, France
| |
Collapse
|
5
|
Krause GR, Shands W, Wheeler TJ. Sensitive and error-tolerant annotation of protein-coding DNA with BATH. BIOINFORMATICS ADVANCES 2024; 4:vbae088. [PMID: 38966592 PMCID: PMC11223822 DOI: 10.1093/bioadv/vbae088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 05/03/2024] [Accepted: 06/10/2024] [Indexed: 07/06/2024]
Abstract
Summary We present BATH, a tool for highly sensitive annotation of protein-coding DNA based on direct alignment of that DNA to a database of protein sequences or profile hidden Markov models (pHMMs). BATH is built on top of the HMMER3 code base, and simplifies the annotation workflow for pHMM-based translated sequence annotation by providing a straightforward input interface and easy-to-interpret output. BATH also introduces novel frameshift-aware algorithms to detect frameshift-inducing nucleotide insertions and deletions (indels). BATH matches the accuracy of HMMER3 for annotation of sequences containing no errors, and produces superior accuracy to all tested tools for annotation of sequences containing nucleotide indels. These results suggest that BATH should be used when high annotation sensitivity is required, particularly when frameshift errors are expected to interrupt protein-coding regions, as is true with long-read sequencing data and in the context of pseudogenes. Availability and implementation The software is available at https://github.com/TravisWheelerLab/BATH.
Collapse
Affiliation(s)
- Genevieve R Krause
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ 85721, United States
- Department of Computer Science, University of Montana, Missoula, MT 59812, United States
| | - Walt Shands
- Department of Computer Science, University of Montana, Missoula, MT 59812, United States
- Genomics Institute, UC Santa Cruz, Santa Cruz, CA 95060, United States
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, AZ 85721, United States
- Department of Computer Science, University of Montana, Missoula, MT 59812, United States
| |
Collapse
|
6
|
Krause GR, Shands W, Wheeler TJ. Sensitive and error-tolerant annotation of protein-coding DNA with BATH. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.31.573773. [PMID: 38260252 PMCID: PMC10802276 DOI: 10.1101/2023.12.31.573773] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
We present BATH, a tool for highly sensitive annotation of protein-coding DNA based on direct alignment of that DNA to a database of protein sequences or profile hidden Markov models (pHMMs). BATH is built on top of the HMMER3 code base, and simplifies the annotation workflow for pHMM-based annotation by providing a straightforward input interface and easy-to-interpret output. BATH also introduces novel frameshift-aware algorithms to detect frameshift-inducing nucleotide insertions and deletions (indels). BATH matches the accuracy of HMMER3 for annotation of sequences containing no errors, and produces superior accuracy to all tested tools for annotation of sequences containing nucleotide indels. These results suggest that BATH should be used when high annotation sensitivity is required, particularly when frameshift errors are expected to interrupt protein-coding regions, as is true with long read sequencing data and in the context of pseudogenes.
Collapse
Affiliation(s)
- Genevieve R Krause
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, Arizona, USA
- Department of Computer Science, University of Montana, Missoula, Montana, USA
| | - Walt Shands
- Department of Computer Science, University of Montana, Missoula, Montana, USA
- UC Santa Cruz Genomics Institute, Santa Cruz, California, USA
| | - Travis J Wheeler
- R. Ken Coit College of Pharmacy, University of Arizona, Tucson, Arizona, USA
- Department of Computer Science, University of Montana, Missoula, Montana, USA
| |
Collapse
|
7
|
Karlsen ST, Rau MH, Sánchez BJ, Jensen K, Zeidan AA. From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry. FEMS Microbiol Rev 2023; 47:fuad030. [PMID: 37286882 PMCID: PMC10337747 DOI: 10.1093/femsre/fuad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/09/2023] Open
Abstract
When selecting microbial strains for the production of fermented foods, various microbial phenotypes need to be taken into account to achieve target product characteristics, such as biosafety, flavor, texture, and health-promoting effects. Through continuous advances in sequencing technologies, microbial whole-genome sequences of increasing quality can now be obtained both cheaper and faster, which increases the relevance of genome-based characterization of microbial phenotypes. Prediction of microbial phenotypes from genome sequences makes it possible to quickly screen large strain collections in silico to identify candidates with desirable traits. Several microbial phenotypes relevant to the production of fermented foods can be predicted using knowledge-based approaches, leveraging our existing understanding of the genetic and molecular mechanisms underlying those phenotypes. In the absence of this knowledge, data-driven approaches can be applied to estimate genotype-phenotype relationships based on large experimental datasets. Here, we review computational methods that implement knowledge- and data-driven approaches for phenotype prediction, as well as methods that combine elements from both approaches. Furthermore, we provide examples of how these methods have been applied in industrial biotechnology, with special focus on the fermented food industry.
Collapse
Affiliation(s)
- Signe T Karlsen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Martin H Rau
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Benjamín J Sánchez
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Kristian Jensen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Ahmad A Zeidan
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| |
Collapse
|
8
|
Oliveira LS, Reyes A, Dutilh BE, Gruber A. Rational Design of Profile HMMs for Sensitive and Specific Sequence Detection with Case Studies Applied to Viruses, Bacteriophages, and Casposons. Viruses 2023; 15:519. [PMID: 36851733 PMCID: PMC9966878 DOI: 10.3390/v15020519] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/01/2023] [Accepted: 02/09/2023] [Indexed: 02/15/2023] Open
Abstract
Profile hidden Markov models (HMMs) are a powerful way of modeling biological sequence diversity and constitute a very sensitive approach to detecting divergent sequences. Here, we report the development of protocols for the rational design of profile HMMs. These methods were implemented on TABAJARA, a program that can be used to either detect all biological sequences of a group or discriminate specific groups of sequences. By calculating position-specific information scores along a multiple sequence alignment, TABAJARA automatically identifies the most informative sequence motifs and uses them to construct profile HMMs. As a proof-of-principle, we applied TABAJARA to generate profile HMMs for the detection and classification of two viral groups presenting different evolutionary rates: bacteriophages of the Microviridae family and viruses of the Flavivirus genus. We obtained conserved models for the generic detection of any Microviridae or Flavivirus sequence, and profile HMMs that can specifically discriminate Microviridae subfamilies or Flavivirus species. In another application, we constructed Cas1 endonuclease-derived profile HMMs that can discriminate CRISPRs and casposons, two evolutionarily related transposable elements. We believe that the protocols described here, and implemented on TABAJARA, constitute a generic toolbox for generating profile HMMs for the highly sensitive and specific detection of sequence classes.
Collapse
Affiliation(s)
- Liliane S. Oliveira
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil
| | - Alejandro Reyes
- Max Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, Colombia
- The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO 63108, USA
| | - Bas E. Dutilh
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich-Schiller-University Jena, 07743 Jena, Germany
- Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| | - Arthur Gruber
- Department of Parasitology, Instituto de Ciências Biomédicas, Universidade de São Paulo, São Paulo 05508-000, SP, Brazil
- European Virus Bioinformatics Center, Leutragraben 1, 07743 Jena, Germany
| |
Collapse
|
9
|
Rodriguez-Valera F, Pushkarev A, Rosselli R, Béjà O. Searching Metagenomes for New Rhodopsins. Methods Mol Biol 2022; 2501:101-108. [PMID: 35857224 DOI: 10.1007/978-1-0716-2329-9_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Most microbial groups have not been cultivated yet, and the only way to approach the enormous diversity of rhodopsins that they contain in a sensible timeframe is through the analysis of their genomes. High-throughput sequencing technologies have allowed the release of community genomics (metagenomics) of many habitats in the photic zones of the ocean and lakes. Already the harvest is impressive and included from the first bacterial rhodopsin (proteorhodopsin) to the recent discovery of heliorhodopsin by functional metagenomics. However, the search continues using bioinformatic or biochemical routes.
Collapse
Affiliation(s)
- Francisco Rodriguez-Valera
- Evolutionary Genomics Group, Departamento de Producción Vegetal y Microbiología, Universidad Miguel Hernández, San Juan de Alicante, Alicante, Spain
- Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology (National Research University), Dolgoprudny, Russia
| | - Alina Pushkarev
- Faculty of Biology, Technion - Israel Institute of Technology, Haifa, Israel
| | - Riccardo Rosselli
- Departamento de Fisiología, Genética y Microbiología, Facultad de Ciencias, Universidad de Alicante, Alicante, Spain
| | - Oded Béjà
- Faculty of Biology, Technion - Israel Institute of Technology, Haifa, Israel.
| |
Collapse
|
10
|
Lopez-Ibañez J, Pazos F, Chagoyen M. Predicting biological pathways of chemical compounds with a profile-inspired approach. BMC Bioinformatics 2021; 22:320. [PMID: 34118870 PMCID: PMC8199418 DOI: 10.1186/s12859-021-04252-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 06/09/2021] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND Assignment of chemical compounds to biological pathways is a crucial step to understand the relationship between the chemical repertory of an organism and its biology. Protein sequence profiles are very successful in capturing the main structural and functional features of a protein family, and can be used to assign new members to it based on matching of their sequences against these profiles. In this work, we extend this idea to chemical compounds, constructing a profile-inspired model for a set of related metabolites (those in the same biological pathway), based on a fragment-based vectorial representation of their chemical structures. RESULTS We use this representation to predict the biological pathway of a chemical compound with good overall accuracy (AUC 0.74-0.90 depending on the database tested), and analyzed some factors that affect performance. The approach, which is compared with equivalent methods, can in addition detect those molecular fragments characteristic of a pathway. CONCLUSIONS The method is available as a graphical interactive web server http://csbg.cnb.csic.es/iFragMent .
Collapse
Affiliation(s)
- Javier Lopez-Ibañez
- Computational Systems Biology Group, National Center for Biotecnology (CNB-CSIC), Darwin 3, 28049, Madrid, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, National Center for Biotecnology (CNB-CSIC), Darwin 3, 28049, Madrid, Spain
| | - Monica Chagoyen
- Computational Systems Biology Group, National Center for Biotecnology (CNB-CSIC), Darwin 3, 28049, Madrid, Spain.
| |
Collapse
|
11
|
Smyshlyaev G, Bateman A, Barabas O. Sequence analysis of tyrosine recombinases allows annotation of mobile genetic elements in prokaryotic genomes. Mol Syst Biol 2021; 17:e9880. [PMID: 34018328 PMCID: PMC8138268 DOI: 10.15252/msb.20209880] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Revised: 04/18/2021] [Accepted: 04/20/2021] [Indexed: 11/16/2022] Open
Abstract
Mobile genetic elements (MGEs) sequester and mobilize antibiotic resistance genes across bacterial genomes. Efficient and reliable identification of such elements is necessary to follow resistance spreading. However, automated tools for MGE identification are missing. Tyrosine recombinase (YR) proteins drive MGE mobilization and could provide markers for MGE detection, but they constitute a diverse family also involved in housekeeping functions. Here, we conducted a comprehensive survey of YRs from bacterial, archaeal, and phage genomes and developed a sequence-based classification system that dissects the characteristics of MGE-borne YRs. We revealed that MGE-related YRs evolved from non-mobile YRs by acquisition of a regulatory arm-binding domain that is essential for their mobility function. Based on these results, we further identified numerous unknown MGEs. This work provides a resource for comparative analysis and functional annotation of YRs and aids the development of computational tools for MGE annotation. Additionally, we reveal how YRs adapted to drive gene transfer across species and provide a tool to better characterize antibiotic resistance dissemination.
Collapse
Affiliation(s)
- Georgy Smyshlyaev
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)HinxtonUK
- European Molecular Biology Laboratory (EMBL)Structural and Computational Biology UnitHeidelbergGermany
- Department of Molecular BiologyUniversity of GenevaGenevaSwitzerland
| | - Alex Bateman
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI)HinxtonUK
| | - Orsolya Barabas
- European Molecular Biology Laboratory (EMBL)Structural and Computational Biology UnitHeidelbergGermany
- Department of Molecular BiologyUniversity of GenevaGenevaSwitzerland
| |
Collapse
|
12
|
Rational Design of Profile Hidden Markov Models for Viral Classification and Discovery. Bioinformatics 2021. [DOI: 10.36255/exonpublications.bioinformatics.2021.ch9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] Open
|
13
|
Storer J, Hubley R, Rosen J, Wheeler TJ, Smit AF. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA 2021; 12:2. [PMID: 33436076 PMCID: PMC7805219 DOI: 10.1186/s13100-020-00230-y] [Citation(s) in RCA: 347] [Impact Index Per Article: 86.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2020] [Accepted: 12/28/2020] [Indexed: 02/02/2023] Open
Abstract
Dfam is an open access database of repetitive DNA families, sequence models, and genome annotations. The 3.0-3.3 releases of Dfam ( https://dfam.org ) represent an evolution from a proof-of-principle collection of transposable element families in model organisms into a community resource for a broad range of species, and for both curated and uncurated datasets. In addition, releases since Dfam 3.0 provide auxiliary consensus sequence models, transposable element protein alignments, and a formalized classification system to support the growing diversity of organisms represented in the resource. The latest release includes 266,740 new de novo generated transposable element families from 336 species contributed by the EBI. This expansion demonstrates the utility of many of Dfam's new features and provides insight into the long term challenges ahead for improving de novo generated transposable element datasets.
Collapse
Affiliation(s)
| | - Robert Hubley
- Institute for Systems Biology, Seattle, WA, 98109, USA.
| | - Jeb Rosen
- Institute for Systems Biology, Seattle, WA, 98109, USA
| | | | - Arian F Smit
- Institute for Systems Biology, Seattle, WA, 98109, USA.
| |
Collapse
|
14
|
James JE, Willis SM, Nelson PG, Weibel C, Kosinski LJ, Masel J. Universal and taxon-specific trends in protein sequences as a function of age. eLife 2021; 10:e57347. [PMID: 33416492 PMCID: PMC7819706 DOI: 10.7554/elife.57347] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2020] [Accepted: 01/05/2021] [Indexed: 01/12/2023] Open
Abstract
Extant protein-coding sequences span a huge range of ages, from those that emerged only recently to those present in the last universal common ancestor. Because evolution has had less time to act on young sequences, there might be 'phylostratigraphy' trends in any properties that evolve slowly with age. A long-term reduction in hydrophobicity and hydrophobic clustering was found in previous, taxonomically restricted studies. Here we perform integrated phylostratigraphy across 435 fully sequenced species, using sensitive HMM methods to detect protein domain homology. We find that the reduction in hydrophobic clustering is universal across lineages. However, only young animal domains have a tendency to have higher structural disorder. Among ancient domains, trends in amino acid composition reflect the order of recruitment into the genetic code, suggesting that the composition of the contemporary descendants of ancient sequences reflects amino acid availability during the earliest stages of life, when these sequences first emerged.
Collapse
Affiliation(s)
- Jennifer E James
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Sara M Willis
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Paul G Nelson
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| | - Catherine Weibel
- Department of Physics, University of ArizonaTucsonUnited States
- Department of Mathematics, University of ArizonaTucsonUnited States
| | - Luke J Kosinski
- Department of Molecular and Cellular Biology, University of ArizonaTucsonUnited States
| | - Joanna Masel
- Department of Ecology and Evolutionary Biology, University of ArizonaTucsonUnited States
| |
Collapse
|
15
|
Characterization of a Novel Mitovirus of the Sand Fly Lutzomyia longipalpis Using Genomic and Virus-Host Interaction Signatures. Viruses 2020; 13:v13010009. [PMID: 33374584 PMCID: PMC7822452 DOI: 10.3390/v13010009] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 12/17/2020] [Accepted: 12/21/2020] [Indexed: 02/06/2023] Open
Abstract
Hematophagous insects act as the major reservoirs of infectious agents due to their intimate contact with a large variety of vertebrate hosts. Lutzomyia longipalpis is the main vector of Leishmania chagasi in the New World, but its role as a host of viruses is poorly understood. In this work, Lu. longipalpis RNA libraries were subjected to progressive assembly using viral profile HMMs as seeds. A sequence phylogenetically related to fungal viruses of the genus Mitovirus was identified and this novel virus was named Lul-MV-1. The 2697-base genome presents a single gene coding for an RNA-directed RNA polymerase with an organellar genetic code. To determine the possible host of Lul-MV-1, we analyzed the molecular characteristics of the viral genome. Dinucleotide composition and codon usage showed profiles similar to mitochondrial DNA of invertebrate hosts. Also, the virus-derived small RNA profile was consistent with the activation of the siRNA pathway, with size distribution and 5′ base enrichment analogous to those observed in viruses of sand flies, reinforcing Lu. longipalpis as a putative host. Finally, RT-PCR of different insect pools and sequences of public Lu. longipalpis RNA libraries confirmed the high prevalence of Lul-MV-1. This is the first report of a mitovirus infecting an insect host.
Collapse
|
16
|
Kirsip H, Abroi A. Protein Structure-Guided Hidden Markov Models (HMMs) as A Powerful Method in the Detection of Ancestral Endogenous Viral Elements. Viruses 2019; 11:v11040320. [PMID: 30986983 PMCID: PMC6520822 DOI: 10.3390/v11040320] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Revised: 03/23/2019] [Accepted: 03/27/2019] [Indexed: 12/19/2022] Open
Abstract
It has been believed for a long time that the transfer and fixation of genetic material from RNA viruses to eukaryote genomes is very unlikely. However, during the last decade, there have been several cases in which “virus-to-host” gene transfer from various viral families into various eukaryotic phyla have been described. These transfers have been identified by sequence similarity, which may disappear very quickly, especially in the case of RNA viruses. However, compared to sequences, protein structure is known to be more conserved. Applying protein structure-guided protein domain-specific Hidden Markov Models, we detected homologues of the Virgaviridae capsid protein in Schizophora flies. Further data analysis supported “virus-to-host” transfer into Schizophora ancestors as a single transfer event. This transfer was not identifiable by BLAST or by other methods we applied. Our data show that structure-guided Hidden Markov Models should be used to detect ancestral virus-to-host transfers.
Collapse
Affiliation(s)
- Heleri Kirsip
- Department of Bioinformatics, University of Tartu, Tartu, 51010, Riia 23, Estonia.
| | - Aare Abroi
- Institute of Technology, University of Tartu, Tartu, 50411, Nooruse 1, Estonia.
| |
Collapse
|
17
|
Bordenave CD, Granados Mendoza C, Jiménez Bremont JF, Gárriz A, Rodríguez AA. Defining novel plant polyamine oxidase subfamilies through molecular modeling and sequence analysis. BMC Evol Biol 2019; 19:28. [PMID: 30665356 PMCID: PMC6341606 DOI: 10.1186/s12862-019-1361-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Accepted: 01/14/2019] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND The polyamine oxidases (PAOs) catabolize the oxidative deamination of the polyamines (PAs) spermine (Spm) and spermidine (Spd). Most of the phylogenetic studies performed to analyze the plant PAO family took into account only a limited number and/or taxonomic representation of plant PAOs sequences. RESULTS Here, we constructed a plant PAO protein sequence database and identified four subfamilies. Subfamily PAO back conversion 1 (PAObc1) was present on every lineage included in these analyses, suggesting that BC-type PAOs might play an important role in plants, despite its precise function is unknown. Subfamily PAObc2 was exclusively present in vascular plants, suggesting that t-Spm oxidase activity might play an important role in the development of the vascular system. The only terminal catabolism (TC) PAO subfamily (subfamily PAOtc) was lost in Superasterids but it was present in all other land plants. This indicated that the TC-type reactions are fundamental for land plants and that their function could being taken over by other enzymes in Superasterids. Subfamily PAObc3 was the result of a gene duplication event preceding Angiosperm diversification, followed by a gene extinction in Monocots. Differential conserved protein motifs were found for each subfamily of plant PAOs. The automatic assignment using these motifs was found to be comparable to the assignment by rough clustering performed on this work. CONCLUSIONS The results presented in this work revealed that plant PAO family is bigger than previously conceived. Also, they delineate important background information for future specific structure-function and evolutionary investigations and lay a foundation for the deeper characterization of each plant PAO subfamily.
Collapse
Affiliation(s)
- Cesar Daniel Bordenave
- Laboratorio de Fisiología de Estrés Abiótico en Plantas, Unidad de Biotecnología, INTECH - CONICET - UNSAM, Intendente Marino KM 8.2 - B7130IWA Chascomús, Buenos Aires, Argentina
| | - Carolina Granados Mendoza
- Departamento de Botánica, Instituto de Biología, Universidad Nacional Autónoma de México, Apartado Postal 70-367, Coyoacán, 04510, México City, Mexico
| | - Juan Francisco Jiménez Bremont
- División de Biología Molecular, Instituto Potosino de Investigación Científica y Tecnológica (IPICYT), San Luis Potosí, Mexico
| | - Andrés Gárriz
- Laboratorio de Fisiología de Estrés Abiótico en Plantas, Unidad de Biotecnología, INTECH - CONICET - UNSAM, Intendente Marino KM 8.2 - B7130IWA Chascomús, Buenos Aires, Argentina
| | - Andrés Alberto Rodríguez
- Laboratorio de Fisiología de Estrés Abiótico en Plantas, Unidad de Biotecnología, INTECH - CONICET - UNSAM, Intendente Marino KM 8.2 - B7130IWA Chascomús, Buenos Aires, Argentina.
| |
Collapse
|
18
|
Harish A. What is an archaeon and are the Archaea really unique? PeerJ 2018; 6:e5770. [PMID: 30357005 PMCID: PMC6196074 DOI: 10.7717/peerj.5770] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 09/05/2018] [Indexed: 12/05/2022] Open
Abstract
The recognition of the group Archaea as a major branch of the tree of life (ToL) prompted a new view of the evolution of biodiversity. The genomic representation of archaeal biodiversity has since significantly increased. In addition, advances in phylogenetic modeling of multi-locus datasets have resolved many recalcitrant branches of the ToL. Despite the technical advances and an expanded taxonomic representation, two important aspects of the origins and evolution of the Archaea remain controversial, even as we celebrate the 40th anniversary of the monumental discovery. These issues concern (i) the uniqueness (monophyly) of the Archaea, and (ii) the evolutionary relationships of the Archaea to the Bacteria and the Eukarya; both of these are relevant to the deep structure of the ToL. To explore the causes for this persistent ambiguity, I examine multiple datasets and different phylogenetic approaches that support contradicting conclusions. I find that the uncertainty is primarily due to a scarcity of information in standard datasets-universal core-genes datasets-to reliably resolve the conflicts. These conflicts can be resolved efficiently by comparing patterns of variation in the distribution of functional genomic signatures, which are less diffused unlike patterns of primary sequence variation. Relatively lower heterogeneity in distribution patterns minimizes uncertainties and supports statistically robust phylogenetic inferences, especially of the earliest divergences of life. This case study further highlights the limitations of primary sequence data in resolving difficult phylogenetic problems, and raises questions about evolutionary inferences drawn from the analyses of sequence alignments of a small set of core genes. In particular, the findings of this study corroborate the growing consensus that reversible substitution mutations may not be optimal phylogenetic markers for resolving early divergences in the ToL, nor for determining the polarity of evolutionary transitions across the ToL.
Collapse
Affiliation(s)
- Ajith Harish
- Department of Cell and Molecular Biology, Program in Molecular Biology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
19
|
Iyer MS, Joshi AG, Sowdhamini R. Genome-wide survey of remote homologues for protein domain superfamilies of known structure reveals unequal distribution across structural classes. Mol Omics 2018; 14:266-280. [PMID: 29971307 DOI: 10.1039/c8mo00008e] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Domains are the basic building blocks of proteins which can combine to give rise to different domain architectures. Annotation of domains in a sequence is the first step towards understanding the biological function. Since there are a limited number of folds and evolutionarily related proteins have a similar structure, function can be inferred through remote homology. Computational sequence searches were performed for remote homologues on genomes of around ∼160 000 different organisms, starting from nearly 11 000 superfamily queries of known structure. Case studies revealed that most of the associated domains are involved in the same biological process. Using all the proteins predicted to have at least one structural domain, a coverage of 61% of Pfam families was achieved which is higher than the existing methods (43.36% by SIFTS). Taxonomic analysis of the proteins revealed 493 superfamilies in all the major kingdoms of life and a few lateral gene transfers between viruses and cellular organisms. The distribution of remote homologues across different classes, folds and superfamilies was studied and reveals that sequences are unequally distributed across structural classes. Finally, domain architectures were computed for the homologues and these data were compiled for each superfamily and organism.
Collapse
Affiliation(s)
- Meenakshi S Iyer
- National Centre for Biological Sciences (TIFR), GKVK Campus, Bellary Road, Bangalore, Karnataka 560 065, India.
| | | | | |
Collapse
|
20
|
Castillo-Lara S, Abril JF. PlanNET: homology-based predicted interactome for multiple planarian transcriptomes. Bioinformatics 2018; 34:1016-1023. [PMID: 29186384 PMCID: PMC5860622 DOI: 10.1093/bioinformatics/btx738] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Revised: 10/24/2017] [Accepted: 11/23/2017] [Indexed: 01/30/2023] Open
Abstract
Motivation Planarians are emerging as a model organism to study regeneration in animals. However, the little available data of protein-protein interactions hinders the advances in understanding the mechanisms underlying its regenerating capabilities. Results We have developed a protocol to predict protein-protein interactions using sequence homology data and a reference Human interactome. This methodology was applied on 11 Schmidtea mediterranea transcriptomic sequence datasets. Then, using Neo4j as our database manager, we developed PlanNET, a web application to explore the multiplicity of networks and the associated sequence annotations. By mapping RNA-seq expression experiments onto the predicted networks, and allowing a transcript-centric exploration of the planarian interactome, we provide researchers with a useful tool to analyse possible pathways and to design new experiments, as well as a reproducible methodology to predict, store, and explore protein interaction networks for non-model organisms. Availability and implementation The web application PlanNET is available at https://compgen.bio.ub.edu/PlanNET. The source code used is available at https://compgen.bio.ub.edu/PlanNET/downloads. Contact jabril@ub.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- S Castillo-Lara
- Computational Genomics Laboratory, Genetics, Microbiology and Statistics Department, Institut de Biomedicina (IBUB), Universitat de Barcelona, Barcelona, Catalonia, Spain
| | - J F Abril
- Computational Genomics Laboratory, Genetics, Microbiology and Statistics Department, Institut de Biomedicina (IBUB), Universitat de Barcelona, Barcelona, Catalonia, Spain
| |
Collapse
|
21
|
Abstract
The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function and the extent to which the functional repertoire can vary across the three kingdoms of life. This has lead to the creation of a wide range of protein family classifications that aim to group proteins based upon their evolutionary relationships.In this chapter we discuss the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and we show how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.
Collapse
|
22
|
Abstract
The study of evolutionary relationships among protein sequences was one of the first applications of bioinformatics. Since then, and accompanying the wealth of biological data produced by genome sequencing and other high-throughput techniques, the use of bioinformatics in general and phylogenetics in particular has been gaining ground in the study of protein and proteome evolution. Nowadays, the use of phylogenetics is instrumental not only to infer the evolutionary relationships among species and their genome sequences, but also to reconstruct ancestral states of proteins and proteomes and hence trace the paths followed by evolution. Here I survey recent progress in the elucidation of mechanisms of protein and proteome evolution in which phylogenetics has played a determinant role.
Collapse
Affiliation(s)
- Toni Gabaldón
- Bioinformatics Department, Centro de Investigación Principe Felipe
| |
Collapse
|
23
|
Harish A, Kurland CG. Mitochondria are not captive bacteria. J Theor Biol 2017; 434:88-98. [PMID: 28754286 DOI: 10.1016/j.jtbi.2017.07.011] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2017] [Revised: 07/10/2017] [Accepted: 07/14/2017] [Indexed: 10/19/2022]
Abstract
Lynn Sagan's conjecture (1967) that three of the fundamental organelles observed in eukaryote cells, specifically mitochondria, plastids and flagella were once free-living primitive (prokaryotic) cells was accepted after considerable opposition. Even though the idea was swiftly refuted for the specific case of origins of flagella in eukaryotes, the symbiosis model in general was accepted for decades as a realistic hypothesis to describe the endosymbiotic origins of eukaryotes. However, a systematic analysis of the origins of the mitochondrial proteome based on empirical genome evolution models now indicates that 97% of modern mitochondrial protein domains as well their homologues in bacteria and archaea were present in the universal common ancestor (UCA) of the modern tree of life (ToL). These protein domains are universal modular building blocks of modern genes and genomes, each of which is identified by a unique tertiary structure and a specific biochemical function as well as a characteristic sequence profile. Further, phylogeny reconstructed from genome-scale evolution models reveals that Eukaryotes and Akaryotes (archaea and bacteria) descend independently from UCA. That is to say, Eukaryotes and Akaryotes are both primordial lineages that evolved in parallel. Finally, there is no indication of massive inter-lineage exchange of coding sequences during the descent of the two lineages. Accordingly, we suggest that the evolution of the mitochondrial proteome was autogenic (endogenic) and not endosymbiotic (exogenic).
Collapse
Affiliation(s)
- Ajith Harish
- Department of Cell and Molecular Biology, Section of Structural and Molecular Biology, Uppsala University, Uppsala, Sweden.
| | - Charles G Kurland
- Department of Biology, Section of Microbial Ecology, Lund University, Lund, Sweden.
| |
Collapse
|
24
|
Walsh CJ, Guinane CM, O' Toole PW, Cotter PD. A Profile Hidden Markov Model to investigate the distribution and frequency of LanB-encoding lantibiotic modification genes in the human oral and gut microbiome. PeerJ 2017; 5:e3254. [PMID: 28462050 PMCID: PMC5410138 DOI: 10.7717/peerj.3254] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2016] [Accepted: 03/31/2017] [Indexed: 01/04/2023] Open
Abstract
Background The human microbiota plays a key role in health and disease, and bacteriocins, which are small, bacterially produced, antimicrobial peptides, are likely to have an important function in the stability and dynamics of this community. Here we examined the density and distribution of the subclass I lantibiotic modification protein, LanB, in human oral and stool microbiome datasets using a specially constructed profile Hidden Markov Model (HMM). Methods The model was validated by correctly identifying known lanB genes in the genomes of known bacteriocin producers more effectively than other methods, while being sensitive enough to differentiate between different subclasses of lantibiotic modification proteins. This approach was compared with two existing methods to screen both genomic and metagenomic datasets obtained from the Human Microbiome Project (HMP). Results Of the methods evaluated, the new profile HMM identified the greatest number of putative LanB proteins in the stool and oral metagenome data while BlastP identified the fewest. In addition, the model identified more LanB proteins than a pre-existing Pfam lanthionine dehydratase model. Searching the gastrointestinal tract subset of the HMP reference genome database with the new HMM identified seven putative subclass I lantibiotic producers, including two members of the Coprobacillus genus. Conclusions These findings establish custom profile HMMs as a potentially powerful tool in the search for novel bioactive producers with the power to benefit human health, and reinforce the repertoire of apparent bacteriocin-encoding gene clusters that may have been overlooked by culture-dependent mining efforts to date.
Collapse
Affiliation(s)
- Calum J Walsh
- Teagasc Food Research Centre, Moorepark, Co. Cork, Ireland.,School of Microbiology, University College Cork, Co. Cork, Ireland
| | | | - Paul W O' Toole
- School of Microbiology, University College Cork, Co. Cork, Ireland.,APC Microbiome Institute, University College Cork, Co. Cork, Ireland
| | - Paul D Cotter
- Teagasc Food Research Centre, Moorepark, Co. Cork, Ireland.,APC Microbiome Institute, University College Cork, Co. Cork, Ireland
| |
Collapse
|
25
|
Lacerda Júnior GV, Noronha MF, de Sousa STP, Cabral L, Domingos DF, Sáber ML, de Melo IS, Oliveira VM. Potential of semiarid soil from Caatinga biome as a novel source for mining lignocellulose-degrading enzymes. FEMS Microbiol Ecol 2016; 93:fiw248. [PMID: 27986827 DOI: 10.1093/femsec/fiw248] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Revised: 08/11/2016] [Accepted: 12/13/2016] [Indexed: 11/14/2022] Open
Abstract
The litterfall is the major organic material deposited in soil of Brazilian Caatinga biome, thus providing the ideal conditions for plant biomass-degrading microorganisms to thrive. Herein, the phylogenetic composition and lignocellulose-degrading capacity have been explored for the first time from a fosmid library dataset of Caatinga soil by sequence-based screening. A complex bacterial community dominated by Proteobacteria and Actinobacteria was unraveled. SEED subsystems-based annotations revealed a broad range of genes assigned to carbohydrate and aromatic compounds metabolism, indicating microbial ability to utilize plant-derived material. CAZy-based annotation identified 7275 genes encoding 37 glycoside hydrolases (GHs) families related to hydrolysis of cellulose, hemicellulose, oligosaccharides and other lignin-modifying enzymes. Taxonomic affiliation of genes showed high genetic potential of the phylum Acidobacteria for hemicellulose degradation, whereas Actinobacteria members appear to play an important role in celullose hydrolysis. Additionally, comparative analyses revealed greater GHs profile similarity among soils as compared to the digestive tract of animals capable of digesting plant biomass, particularly in the hemicellulases content. Combined results suggest a complex synergistic interaction of community members required for biomass degradation into fermentable sugars. This large repertoire of lignocellulolytic enzymes opens perspectives for mining potential candidates of biochemical catalysts for biofuels production from renewable resources and other environmental applications.
Collapse
Affiliation(s)
- Gileno V Lacerda Júnior
- Research Center for Chemistry, Biology and Agriculture (CPQBA), UNICAMP, Division of Microbial Resources, Zip code 13148-218, Paulínia, São Paulo, Brazil
| | - Melline F Noronha
- Research Center for Chemistry, Biology and Agriculture (CPQBA), UNICAMP, Division of Microbial Resources, Zip code 13148-218, Paulínia, São Paulo, Brazil
| | - Sanderson Tarciso P de Sousa
- Research Center for Chemistry, Biology and Agriculture (CPQBA), UNICAMP, Division of Microbial Resources, Zip code 13148-218, Paulínia, São Paulo, Brazil
| | - Lucélia Cabral
- Research Center for Chemistry, Biology and Agriculture (CPQBA), UNICAMP, Division of Microbial Resources, Zip code 13148-218, Paulínia, São Paulo, Brazil
| | - Daniela F Domingos
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093-0412, USA
| | - Mírian L Sáber
- Laboratory of Environmental Microbiology, Brazilian Agricultural Research Corporation, EMBRAPA Environment, Jaguariúna, Zip code 13820-000, Brazil
| | - Itamar S de Melo
- Laboratory of Environmental Microbiology, Brazilian Agricultural Research Corporation, EMBRAPA Environment, Jaguariúna, Zip code 13820-000, Brazil
| | - Valéria M Oliveira
- Research Center for Chemistry, Biology and Agriculture (CPQBA), UNICAMP, Division of Microbial Resources, Zip code 13148-218, Paulínia, São Paulo, Brazil
| |
Collapse
|
26
|
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
27
|
Figueroa-Yañez L, Pereira-Santana A, Arroyo-Herrera A, Rodriguez-Corona U, Sanchez-Teyer F, Espadas-Alcocer J, Espadas-Gil F, Barredo-Pool F, Castaño E, Rodriguez-Zapata LC. RAP2.4a Is Transported through the Phloem to Regulate Cold and Heat Tolerance in Papaya Tree (Carica papaya cv. Maradol): Implications for Protection Against Abiotic Stress. PLoS One 2016; 11:e0165030. [PMID: 27764197 PMCID: PMC5072549 DOI: 10.1371/journal.pone.0165030] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Accepted: 10/05/2016] [Indexed: 11/18/2022] Open
Abstract
Plants respond to stress through metabolic and morphological changes that increase their ability to survive and grow. To this end, several transcription factor families are responsible for transmitting the signals that are required for these changes. Here, we studied the transcription factor superfamily AP2/ERF, particularly, RAP2.4 from Carica papaya cv. Maradol. We isolated four genes (CpRap2.4a, CpRAap2.4b, CpRap2.1 and CpRap2.10), and an in silico analysis showed that the four genes encode proteins that contain a conserved APETALA2 (AP2) domain located within group I and II transcription factors of the AP2/ERF superfamily. Semiquantitative PCR experiments indicated that each CpRap2 gene is differentially expressed under stress conditions, such as extreme temperatures. Moreover, genetic transformants of tobacco plants overexpressing CpRap2.4a and CpRap2.4b genes show a high level of tolerance to cold and heat stress compared to non-transformed plants. Confocal microscopy analysis of tobacco transgenic plants showed that CpRAP2.4a and CpRAP2.4b proteins were mainly localized to the nuclei of cells from the leaves and roots and also in the sieve elements. Moreover, the movement of CpRap2.4a RNA in tobacco grafting was analyzed. Our results indicate that CpRap2.4a and CpRap2.4b RNA in the papaya tree have a functional role in the response to stress conditions such as exposure to extreme temperatures via direct translation outside the parental RNA cell.
Collapse
Affiliation(s)
- Luis Figueroa-Yañez
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, Mérida, Yucatán, México
| | | | - Ana Arroyo-Herrera
- Laboratorio de Farmacología, Facultad de Química, Universidad Autónoma de Yucatán, Mérida, Yucatán, México
| | - Ulises Rodriguez-Corona
- Unidad de Bioquímica y Biología Molecular de Plantas, Centro de Investigación Científica de Yucatán, Mérida, Yucatán, México
| | - Felipe Sanchez-Teyer
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, Mérida, Yucatán, México
| | - Jorge Espadas-Alcocer
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, Mérida, Yucatán, México
| | - Francisco Espadas-Gil
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, Mérida, Yucatán, México
| | - Felipe Barredo-Pool
- Unidad de Biotecnología, Centro de Investigación Científica de Yucatán, Mérida, Yucatán, México
| | - Enrique Castaño
- Unidad de Bioquímica y Biología Molecular de Plantas, Centro de Investigación Científica de Yucatán, Mérida, Yucatán, México
| | | |
Collapse
|
28
|
Figueroa-Yañez L, Pereira-Santana A, Arroyo-Herrera A, Rodriguez-Corona U, Sanchez-Teyer F, Espadas-Alcocer J, Espadas-Gil F, Barredo-Pool F, Castaño E, Rodriguez-Zapata LC. RAP2.4a Is Transported through the Phloem to Regulate Cold and Heat Tolerance in Papaya Tree (Carica papaya cv. Maradol): Implications for Protection Against Abiotic Stress. PLoS One 2016; 11:e0165030. [DOI: https:/doi.org/10.1371/journal.pone.0165030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2024] Open
|
29
|
Webb B, Sali A. Comparative Protein Structure Modeling Using MODELLER. CURRENT PROTOCOLS IN BIOINFORMATICS 2016; 54:5.6.1-5.6.37. [PMID: 27322406 PMCID: PMC5031415 DOI: 10.1002/cpbi.3] [Citation(s) in RCA: 2121] [Impact Index Per Article: 235.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
30
|
Cui X, Lu Z, Wang S, Jing-Yan Wang J, Gao X. CMsearch: simultaneous exploration of protein sequence space and structure space improves not only protein homology detection but also protein structure prediction. Bioinformatics 2016; 32:i332-i340. [PMID: 27307635 PMCID: PMC4908355 DOI: 10.1093/bioinformatics/btw271] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
MOTIVATION Protein homology detection, a fundamental problem in computational biology, is an indispensable step toward predicting protein structures and understanding protein functions. Despite the advances in recent decades on sequence alignment, threading and alignment-free methods, protein homology detection remains a challenging open problem. Recently, network methods that try to find transitive paths in the protein structure space demonstrate the importance of incorporating network information of the structure space. Yet, current methods merge the sequence space and the structure space into a single space, and thus introduce inconsistency in combining different sources of information. METHOD We present a novel network-based protein homology detection method, CMsearch, based on cross-modal learning. Instead of exploring a single network built from the mixture of sequence and structure space information, CMsearch builds two separate networks to represent the sequence space and the structure space. It then learns sequence-structure correlation by simultaneously taking sequence information, structure information, sequence space information and structure space information into consideration. RESULTS We tested CMsearch on two challenging tasks, protein homology detection and protein structure prediction, by querying all 8332 PDB40 proteins. Our results demonstrate that CMsearch is insensitive to the similarity metrics used to define the sequence and the structure spaces. By using HMM-HMM alignment as the sequence similarity metric, CMsearch clearly outperforms state-of-the-art homology detection methods and the CASP-winning template-based protein structure prediction methods. AVAILABILITY AND IMPLEMENTATION Our program is freely available for download from http://sfb.kaust.edu.sa/Pages/Software.aspx CONTACT : xin.gao@kaust.edu.sa SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xuefeng Cui
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia
| | - Zhiwu Lu
- Beijing Key Laboratory of Big Data Management and Analysis Methods, School of Information, Renmin University of China, Beijing 100872, China
| | - Sheng Wang
- Toyota Technological Institute at Chicago, 6045 Kenwood Avenue, Chicago, IL 60637, USA Department of Human Genetics, University of Chicago, E. 58th St, Chicago, IL 60637, USA
| | - Jim Jing-Yan Wang
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia
| | - Xin Gao
- King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Thuwal 23955-6900, Saudi Arabia
| |
Collapse
|
31
|
Khan FI, Wei DQ, Gu KR, Hassan MI, Tabrez S. Current updates on computer aided protein modeling and designing. Int J Biol Macromol 2016; 85:48-62. [DOI: 10.1016/j.ijbiomac.2015.12.072] [Citation(s) in RCA: 108] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Revised: 12/17/2015] [Accepted: 12/21/2015] [Indexed: 12/15/2022]
|
32
|
Alves JMP, de Oliveira AL, Sandberg TOM, Moreno-Gallego JL, de Toledo MAF, de Moura EMM, Oliveira LS, Durham AM, Mehnert DU, Zanotto PMDA, Reyes A, Gruber A. GenSeed-HMM: A Tool for Progressive Assembly Using Profile HMMs as Seeds and its Application in Alpavirinae Viral Discovery from Metagenomic Data. Front Microbiol 2016; 7:269. [PMID: 26973638 PMCID: PMC4777721 DOI: 10.3389/fmicb.2016.00269] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 02/19/2016] [Indexed: 01/01/2023] Open
Abstract
This work reports the development of GenSeed-HMM, a program that implements seed-driven progressive assembly, an approach to reconstruct specific sequences from unassembled data, starting from short nucleotide or protein seed sequences or profile Hidden Markov Models (HMM). The program can use any one of a number of sequence assemblers. Assembly is performed in multiple steps and relatively few reads are used in each cycle, consequently the program demands low computational resources. As a proof-of-concept and to demonstrate the power of HMM-driven progressive assemblies, GenSeed-HMM was applied to metagenomic datasets in the search for diverse ssDNA bacteriophages from the recently described Alpavirinae subfamily. Profile HMMs were built using Alpavirinae-specific regions from multiple sequence alignments (MSA) using either the viral protein 1 (VP1; major capsid protein) or VP4 (genome replication initiation protein). These profile HMMs were used by GenSeed-HMM (running Newbler assembler) as seeds to reconstruct viral genomes from sequencing datasets of human fecal samples. All contigs obtained were annotated and taxonomically classified using similarity searches and phylogenetic analyses. The most specific profile HMM seed enabled the reconstruction of 45 partial or complete Alpavirinae genomic sequences. A comparison with conventional (global) assembly of the same original dataset, using Newbler in a standalone execution, revealed that GenSeed-HMM outperformed global genomic assembly in several metrics employed. This approach is capable of detecting organisms that have not been used in the construction of the profile HMM, which opens up the possibility of diagnosing novel viruses, without previous specific information, constituting a de novo diagnosis. Additional applications include, but are not limited to, the specific assembly of extrachromosomal elements such as plastid and mitochondrial genomes from metagenomic data. Profile HMM seeds can also be used to reconstruct specific protein coding genes for gene diversity studies, and to determine all possible gene variants present in a metagenomic sample. Such surveys could be useful to detect the emergence of drug-resistance variants in sensitive environments such as hospitals and animal production facilities, where antibiotics are regularly used. Finally, GenSeed-HMM can be used as an adjunct for gap closure on assembly finishing projects, by using multiple contig ends as anchored seeds.
Collapse
Affiliation(s)
- João M P Alves
- Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo São Paulo, Brazil
| | - André L de Oliveira
- Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo São Paulo, Brazil
| | - Tatiana O M Sandberg
- Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo São Paulo, Brazil
| | | | - Marcelo A F de Toledo
- Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo São Paulo, Brazil
| | - Elisabeth M M de Moura
- Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo São Paulo, Brazil
| | - Liliane S Oliveira
- Department of Parasitology, Institute of Biomedical Sciences, University of São PauloSão Paulo, Brazil; Department of Computer Science, Institute of Mathematics and Statistics, University of São PauloSão Paulo, Brazil
| | - Alan M Durham
- Department of Computer Science, Institute of Mathematics and Statistics, University of São Paulo São Paulo, Brazil
| | - Dolores U Mehnert
- Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo São Paulo, Brazil
| | - Paolo M de A Zanotto
- Department of Microbiology, Institute of Biomedical Sciences, University of São Paulo São Paulo, Brazil
| | - Alejandro Reyes
- Department of Biological Sciences, Universidad de los AndesBogotá, Colombia; Center for Genome Sciences and Systems Biology, Department of Pathology and Immunology, Washington University in Saint LouisMO, USA
| | - Arthur Gruber
- Department of Parasitology, Institute of Biomedical Sciences, University of São Paulo São Paulo, Brazil
| |
Collapse
|
33
|
A protein domain-based view of the virosphere–host relationship. Biochimie 2015; 119:231-43. [DOI: 10.1016/j.biochi.2015.08.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Accepted: 08/15/2015] [Indexed: 11/20/2022]
|
34
|
|
35
|
Ghouzam Y, Postic G, de Brevern AG, Gelly JC. Improving protein fold recognition with hybrid profiles combining sequence and structure evolution. Bioinformatics 2015; 31:3782-9. [PMID: 26254434 DOI: 10.1093/bioinformatics/btv462] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 08/02/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Template-based modeling, the most successful approach for predicting protein 3D structure, often requires detecting distant evolutionary relationships between the target sequence and proteins of known structure. Developed for this purpose, fold recognition methods use elaborate strategies to exploit evolutionary information, mainly by encoding amino acid sequence into profiles. Since protein structure is more conserved than sequence, the inclusion of structural information can improve the detection of remote homology. RESULTS Here, we present ORION, a new fold recognition method based on the pairwise comparison of hybrid profiles that contain evolutionary information from both protein sequence and structure. Our method uses the 16-state structural alphabet Protein Blocks, which provides an accurate 1D description of protein structure local conformations. ORION systematically outperforms PSI-BLAST and HHsearch on several benchmarks, including target sequences from the modeling competitions CASP8, 9 and 10, and detects ∼10% more templates at fold and superfamily SCOP levels. AVAILABILITY Software freely available for download at http://www.dsimb.inserm.fr/orion/. CONTACT jean-christophe.gelly@univ-paris-diderot.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yassine Ghouzam
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| | - Guillaume Postic
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| | - Alexandre G de Brevern
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| | - Jean-Christophe Gelly
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| |
Collapse
|
36
|
Alternative approach to protein structure prediction based on sequential similarity of physical properties. Proc Natl Acad Sci U S A 2015; 112:5029-32. [PMID: 25848034 DOI: 10.1073/pnas.1504806112] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The relationship between protein sequence and structure arises entirely from amino acid physical properties. An alternative method is therefore proposed to identify homologs in which residue equivalence is based exclusively on the pairwise physical property similarities of sequences. This approach, the property factor method (PFM), is entirely different from those in current use. A comparison is made between our method and PSI BLAST. We demonstrate that traditionally defined sequence similarity can be very low for pairs of sequences (which therefore cannot be identified using PSI BLAST), but similarity of physical property distributions results in almost identical 3D structures. The performance of PFM is shown to be better than that of PSI BLAST when sequence matching is comparable, based on a comparison using targets from CASP10 (89 targets) and CASP11 (51 targets). It is also shown that PFM outperforms PSI BLAST in informatically challenging targets.
Collapse
|
37
|
Yang M, Wu Y, Jin S, Hou J, Mao Y, Liu W, Shen Y, Wu L. Flower bud transcriptome analysis of Sapium sebiferum (Linn.) Roxb. and primary investigation of drought induced flowering: pathway construction and G-quadruplex prediction based on transcriptome. PLoS One 2015; 10:e0118479. [PMID: 25738565 PMCID: PMC4349590 DOI: 10.1371/journal.pone.0118479] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Accepted: 01/17/2015] [Indexed: 11/27/2022] Open
Abstract
Sapium sebiferum (Linn.) Roxb. (Chinese Tallow Tree) is a perennial woody tree and its seeds are rich in oil which hold great potential for biodiesel production. Despite a traditional woody oil plant, our understanding on S. sebiferum genetics and molecular biology remains scant. In this study, the first comprehensive transcriptome of S. sebiferum flower has been generated by sequencing and de novo assembly. A total of 149,342 unigenes were generated from raw reads, of which 24,289 unigenes were successfully matched to public database. A total of 61 MADS box genes and putative pathways involved in S. sebiferum flower development have been identified. Abiotic stress response network was also constructed in this work, where 2,686 unigenes are involved in the pathway. As for lipid biosynthesis, 161 unigenes have been identified in fatty acid (FA) and triacylglycerol (TAG) biosynthesis. Besides, the G-Quadruplexes in RNA of S. sebiferum also have been predicted. An interesting finding is that the stress-induced flowering was observed in S. sebiferum for the first time. According to the results of semi-quantitative PCR, expression tendencies of flowering-related genes, GA1, AP2 and CRY2, accorded with stress-related genes, such as GRX50435 and PRXⅡ39562. This transcriptome provides functional genomic information for further research of S. sebiferum, especially for the genetic engineering to shorten the juvenile period and improve yield by regulating flower development. It also offers a useful database for the research of other Euphorbiaceae family plants.
Collapse
Affiliation(s)
- Minglei Yang
- Key Laboratory of Ion Beam Bioengineering and Bioenergy Forest Research Center of State Forestry Administration, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, People’s Republic of China
| | - Ying Wu
- Key Laboratory of Ion Beam Bioengineering and Bioenergy Forest Research Center of State Forestry Administration, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, People’s Republic of China
- School of Life Science, University of Science and Technology of China, Hefei, Anhui, People’s Republic of China
- College of Food and Bioengineering, Henan University of Science and Technology, Luoyang, Henan, People’s Republic of China
| | - Shan Jin
- Key Laboratory of Ion Beam Bioengineering and Bioenergy Forest Research Center of State Forestry Administration, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, People’s Republic of China
| | - Jinyan Hou
- Key Laboratory of Ion Beam Bioengineering and Bioenergy Forest Research Center of State Forestry Administration, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, People’s Republic of China
| | - Yingji Mao
- Key Laboratory of Ion Beam Bioengineering and Bioenergy Forest Research Center of State Forestry Administration, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, People’s Republic of China
- School of Life Science, University of Science and Technology of China, Hefei, Anhui, People’s Republic of China
| | - Wenbo Liu
- Key Laboratory of Ion Beam Bioengineering and Bioenergy Forest Research Center of State Forestry Administration, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, People’s Republic of China
- School of Life Science, University of Science and Technology of China, Hefei, Anhui, People’s Republic of China
| | - Yangcheng Shen
- Key Laboratory of Ion Beam Bioengineering and Bioenergy Forest Research Center of State Forestry Administration, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, People’s Republic of China
- School of Life Science, Anhui University, Hefei, Anhui, People’s Republic of China
| | - Lifang Wu
- Key Laboratory of Ion Beam Bioengineering and Bioenergy Forest Research Center of State Forestry Administration, Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, Anhui, People’s Republic of China
- School of Life Science, University of Science and Technology of China, Hefei, Anhui, People’s Republic of China
- * E-mail:
| |
Collapse
|
38
|
Ramakrishnan G, Ochoa-Montaño B, Raghavender US, Mudgal R, Joshi AG, Chandra NR, Sowdhamini R, Blundell TL, Srinivasan N. Enriching the annotation of Mycobacterium tuberculosis H37Rv proteome using remote homology detection approaches: insights into structure and function. Tuberculosis (Edinb) 2014; 95:14-25. [PMID: 25467293 DOI: 10.1016/j.tube.2014.10.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2014] [Revised: 10/14/2014] [Accepted: 10/27/2014] [Indexed: 12/01/2022]
Abstract
The availability of the genome sequence of Mycobacterium tuberculosis H37Rv has encouraged determination of large numbers of protein structures and detailed definition of the biological information encoded therein; yet, the functions of many proteins in M. tuberculosis remain unknown. The emergence of multidrug resistant strains makes it a priority to exploit recent advances in homology recognition and structure prediction to re-analyse its gene products. Here we report the structural and functional characterization of gene products encoded in the M. tuberculosis genome, with the help of sensitive profile-based remote homology search and fold recognition algorithms resulting in an enhanced annotation of the proteome where 95% of the M. tuberculosis proteins were identified wholly or partly with information on structure or function. New information includes association of 244 proteins with 205 domain families and a separate set of new association of folds to 64 proteins. Extending structural information across uncharacterized protein families represented in the M. tuberculosis proteome, by determining superfamily relationships between families of known and unknown structures, has contributed to an enhancement in the knowledge of structural content. In retrospect, such superfamily relationships have facilitated recognition of probable structure and/or function for several uncharacterized protein families, eventually aiding recognition of probable functions for homologous proteins corresponding to such families. Gene products unique to mycobacteria for which no functions could be identified are 183. Of these 18 were determined to be M. tuberculosis specific. Such pathogen-specific proteins are speculated to harbour virulence factors required for pathogenesis. A re-annotated proteome of M. tuberculosis, with greater completeness of annotated proteins and domain assigned regions, provides a valuable basis for experimental endeavours designed to obtain a better understanding of pathogenesis and to accelerate the process of drug target discovery.
Collapse
Affiliation(s)
- Gayatri Ramakrishnan
- Indian Institute of Science Mathematics Initiative, Indian Institute of Science, Bangalore 560012, India; Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India.
| | | | - Upadhyayula S Raghavender
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Gandhi Krishi Vignyan Kendra Campus, Bangalore 560065, India.
| | - Richa Mudgal
- Indian Institute of Science Mathematics Initiative, Indian Institute of Science, Bangalore 560012, India; Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India.
| | - Adwait G Joshi
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Gandhi Krishi Vignyan Kendra Campus, Bangalore 560065, India; Manipal University, Manipal, Karnataka 576104, India.
| | - Nagasuma R Chandra
- Department of Biochemistry, Indian Institute of Science, Bangalore 560012, India.
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Gandhi Krishi Vignyan Kendra Campus, Bangalore 560065, India.
| | - Tom L Blundell
- Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK.
| | | |
Collapse
|
39
|
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | | |
Collapse
|
40
|
Skewes-Cox P, Sharpton TJ, Pollard KS, DeRisi JL. Profile hidden Markov models for the detection of viruses within metagenomic sequence data. PLoS One 2014; 9:e105067. [PMID: 25140992 PMCID: PMC4139300 DOI: 10.1371/journal.pone.0105067] [Citation(s) in RCA: 127] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2014] [Accepted: 07/20/2014] [Indexed: 01/01/2023] Open
Abstract
Rapid, sensitive, and specific virus detection is an important component of clinical diagnostics. Massively parallel sequencing enables new diagnostic opportunities that complement traditional serological and PCR based techniques. While massively parallel sequencing promises the benefits of being more comprehensive and less biased than traditional approaches, it presents new analytical challenges, especially with respect to detection of pathogen sequences in metagenomic contexts. To a first approximation, the initial detection of viruses can be achieved simply through alignment of sequence reads or assembled contigs to a reference database of pathogen genomes with tools such as BLAST. However, recognition of highly divergent viral sequences is problematic, and may be further complicated by the inherently high mutation rates of some viral types, especially RNA viruses. In these cases, increased sensitivity may be achieved by leveraging position-specific information during the alignment process. Here, we constructed HMMER3-compatible profile hidden Markov models (profile HMMs) from all the virally annotated proteins in RefSeq in an automated fashion using a custom-built bioinformatic pipeline. We then tested the ability of these viral profile HMMs ("vFams") to accurately classify sequences as viral or non-viral. Cross-validation experiments with full-length gene sequences showed that the vFams were able to recall 91% of left-out viral test sequences without erroneously classifying any non-viral sequences into viral protein clusters. Thorough reanalysis of previously published metagenomic datasets with a set of the best-performing vFams showed that they were more sensitive than BLAST for detecting sequences originating from more distant relatives of known viruses. To facilitate the use of the vFams for rapid detection of remote viral homologs in metagenomic data, we provide two sets of vFams, comprising more than 4,000 vFams each, in the HMMER3 format. We also provide the software necessary to build custom profile HMMs or update the vFams as more viruses are discovered (http://derisilab.ucsf.edu/software/vFam).
Collapse
Affiliation(s)
- Peter Skewes-Cox
- Biological and Medical Informatics Graduate Program, University of California San Francisco, San Francisco, California, United States of America
- Departments of Medicine, Biochemistry and Biophysics, and Microbiology, University of California San Francisco, San Francisco, California, United States of America
- Howard Hughes Medical Institute, Bethesda, Maryland, United States of America
| | - Thomas J. Sharpton
- The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America
| | - Katherine S. Pollard
- The J. David Gladstone Institutes, University of California San Francisco, San Francisco, California, United States of America
- Institute for Human Genetics & Division of Biostatistics, University of California San Francisco, San Francisco, California, United States of America
| | - Joseph L. DeRisi
- Departments of Medicine, Biochemistry and Biophysics, and Microbiology, University of California San Francisco, San Francisco, California, United States of America
- Howard Hughes Medical Institute, Bethesda, Maryland, United States of America
| |
Collapse
|
41
|
Estellon J, Ollagnier de Choudens S, Smadja M, Fontecave M, Vandenbrouck Y. An integrative computational model for large-scale identification of metalloproteins in microbial genomes: a focus on iron-sulfur cluster proteins. Metallomics 2014; 6:1913-30. [PMID: 25117543 DOI: 10.1039/c4mt00156g] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Metalloproteins represent a ubiquitous group of molecules which are crucial to the survival of all living organisms. While several metal-binding motifs have been defined, it remains challenging to confidently identify metalloproteins from primary protein sequences using computational approaches alone. Here, we describe a comprehensive strategy based on a machine learning approach to design and assess a penalized generalized linear model. We used this strategy to detect members of the iron-sulfur cluster protein family. A new category of descriptors, whose profile is based on profile hidden Markov models, encoding structural information was combined with public descriptors into a linear model. The model was trained and tested on distinct datasets composed of well-characterized iron-sulfur protein sequences, and the resulting model provided higher sensitivity compared to a motif-based approach, while maintaining a good level of specificity. Analysis of this linear model allows us to detect and quantify the contribution of each descriptor, providing us with a better understanding of this complex protein family along with valuable indications for further experimental characterization. Two newly-identified proteins, YhcC and YdiJ, were functionally validated as genuine iron-sulfur proteins, confirming the prediction. The computational model was then applied to over 550 prokaryotic genomes to screen for iron-sulfur proteomes; the results are publicly available at: . This study represents a proof-of-concept for the application of a penalized linear model to identify metalloprotein superfamilies on a large-scale. The application employed here, screening for iron-sulfur proteomes, provides new candidates for further biochemical and structural analysis as well as new resources for an extensive exploration of iron-sulfuromes in the microbial world.
Collapse
Affiliation(s)
- Johan Estellon
- Univ. Grenoble Alpes, iRTSV-BGE, F-38000 Grenoble, France.
| | | | | | | | | |
Collapse
|
42
|
Wagner I, Volkmer M, Sharan M, Villaveces JM, Oswald F, Surendranath V, Habermann BH. morFeus: a web-based program to detect remotely conserved orthologs using symmetrical best hits and orthology network scoring. BMC Bioinformatics 2014; 15:263. [PMID: 25096057 PMCID: PMC4137093 DOI: 10.1186/1471-2105-15-263] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2014] [Accepted: 07/21/2014] [Indexed: 02/04/2023] Open
Abstract
Background Searching the orthologs of a given protein or DNA sequence is one of the most important and most commonly used Bioinformatics methods in Biology. Programs like BLAST or the orthology search engine Inparanoid can be used to find orthologs when the similarity between two sequences is sufficiently high. They however fail when the level of conservation is low. The detection of remotely conserved proteins oftentimes involves sophisticated manual intervention that is difficult to automate. Results Here, we introduce morFeus, a search program to find remotely conserved orthologs. Based on relaxed sequence similarity searches, morFeus selects sequences based on the similarity of their alignments to the query, tests for orthology by iterative reciprocal BLAST searches and calculates a network score for the resulting network of orthologs that is a measure of orthology independent of the E-value. Detecting remotely conserved orthologs of a protein using morFeus thus requires no manual intervention. We demonstrate the performance of morFeus by comparing it to state-of-the-art orthology resources and methods. We provide an example of remotely conserved orthologs, which were experimentally shown to be functionally equivalent in the respective organisms and therefore meet the criteria of the orthology-function conjecture. Conclusions Based on our results, we conclude that morFeus is a powerful and specific search method for detecting remotely conserved orthologs. morFeus is freely available at http://bio.biochem.mpg.de/morfeus/. Its source code is available from Sourceforge.net (https://sourceforge.net/p/morfeus/). Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-263) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Bianca H Habermann
- Max Planck Institute of Biochemistry, Am Klopferspitz 18, Martinsried 82152, Germany.
| |
Collapse
|
43
|
A comparative assessment and analysis of 20 representative sequence alignment methods for protein structure prediction. Sci Rep 2014; 3:2619. [PMID: 24018415 PMCID: PMC3965362 DOI: 10.1038/srep02619] [Citation(s) in RCA: 128] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2013] [Accepted: 08/22/2013] [Indexed: 11/08/2022] Open
Abstract
Protein sequence alignment is essential for template-based protein structure prediction and function annotation. We collect 20 sequence alignment algorithms, 10 published and 10 newly developed, which cover all representative sequence- and profile-based alignment approaches. These algorithms are benchmarked on 538 non-redundant proteins for protein fold-recognition on a uniform template library. Results demonstrate dominant advantage of profile-profile based methods, which generate models with average TM-score 26.5% higher than sequence-profile methods and 49.8% higher than sequence-sequence alignment methods. There is no obvious difference in results between methods with profiles generated from PSI-BLAST PSSM matrix and hidden Markov models. Accuracy of profile-profile alignments can be further improved by 9.6% or 21.4% when predicted or native structure features are incorporated. Nevertheless, TM-scores from profile-profile methods including experimental structural features are still 37.1% lower than that from TM-align, demonstrating that the fold-recognition problem cannot be solved solely by improving accuracy of structure feature predictions.
Collapse
|
44
|
Piao H, Froula J, Du C, Kim TW, Hawley ER, Bauer S, Wang Z, Ivanova N, Clark DS, Klenk HP, Hess M. Identification of novel biomass-degrading enzymes from genomic dark matter: Populating genomic sequence space with functional annotation. Biotechnol Bioeng 2014; 111:1550-65. [PMID: 24728961 DOI: 10.1002/bit.25250] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2013] [Revised: 02/21/2014] [Accepted: 03/24/2014] [Indexed: 11/06/2022]
Abstract
Although recent nucleotide sequencing technologies have significantly enhanced our understanding of microbial genomes, the function of ∼35% of genes identified in a genome currently remains unknown. To improve the understanding of microbial genomes and consequently of microbial processes it will be crucial to assign a function to this "genomic dark matter." Due to the urgent need for additional carbohydrate-active enzymes for improved production of transportation fuels from lignocellulosic biomass, we screened the genomes of more than 5,500 microorganisms for hypothetical proteins that are located in the proximity of already known cellulases. We identified, synthesized and expressed a total of 17 putative cellulase genes with insufficient sequence similarity to currently known cellulases to be identified as such using traditional sequence annotation techniques that rely on significant sequence similarity. The recombinant proteins of the newly identified putative cellulases were subjected to enzymatic activity assays to verify their hydrolytic activity towards cellulose and lignocellulosic biomass. Eleven (65%) of the tested enzymes had significant activity towards at least one of the substrates. This high success rate highlights that a gene context-based approach can be used to assign function to genes that are otherwise categorized as "genomic dark matter" and to identify biomass-degrading enzymes that have little sequence similarity to already known cellulases. The ability to assign function to genes that have no related sequence representatives with functional annotation will be important to enhance our understanding of microbial processes and to identify microbial proteins for a wide range of applications.
Collapse
Affiliation(s)
- Hailan Piao
- School of Molecular Biosciences, Washington State University, Richland, Washington, 99352; Pacific Northwest National Laboratory, Richland, Washington
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Lopes FM, Ray SS, Hashimoto RF, Cesar RM. Entropic Biological Score: a cell cycle investigation for GRNs inference. Gene 2014; 541:129-37. [DOI: 10.1016/j.gene.2014.03.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2013] [Revised: 02/17/2014] [Accepted: 03/05/2014] [Indexed: 12/21/2022]
|
46
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
47
|
Ma J, Wang S, Wang Z, Xu J. MRFalign: protein homology detection through alignment of Markov random fields. PLoS Comput Biol 2014; 10:e1003500. [PMID: 24675572 PMCID: PMC3967925 DOI: 10.1371/journal.pcbi.1003500] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2013] [Accepted: 01/08/2014] [Indexed: 11/24/2022] Open
Abstract
Sequence-based protein homology detection has been extensively studied and so far the most sensitive method is based upon comparison of protein sequence profiles, which are derived from multiple sequence alignment (MSA) of sequence homologs in a protein family. A sequence profile is usually represented as a position-specific scoring matrix (PSSM) or an HMM (Hidden Markov Model) and accordingly PSSM-PSSM or HMM-HMM comparison is used for homolog detection. This paper presents a new homology detection method MRFalign, consisting of three key components: 1) a Markov Random Fields (MRF) representation of a protein family; 2) a scoring function measuring similarity of two MRFs; and 3) an efficient ADMM (Alternating Direction Method of Multipliers) algorithm aligning two MRFs. Compared to HMM that can only model very short-range residue correlation, MRFs can model long-range residue interaction pattern and thus, encode information for the global 3D structure of a protein family. Consequently, MRF-MRF comparison for remote homology detection shall be much more sensitive than HMM-HMM or PSSM-PSSM comparison. Experiments confirm that MRFalign outperforms several popular HMM or PSSM-based methods in terms of both alignment accuracy and remote homology detection and that MRFalign works particularly well for mainly beta proteins. For example, tested on the benchmark SCOP40 (8353 proteins) for homology detection, PSSM-PSSM and HMM-HMM succeed on 48% and 52% of proteins, respectively, at superfamily level, and on 15% and 27% of proteins, respectively, at fold level. In contrast, MRFalign succeeds on 57.3% and 42.5% of proteins at superfamily and fold level, respectively. This study implies that long-range residue interaction patterns are very helpful for sequence-based homology detection. The software is available for download at http://raptorx.uchicago.edu/download/. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5. Sequence-based protein homology detection has been extensively studied, but it remains very challenging for remote homologs with divergent sequences. So far the most sensitive methods employ HMM-HMM comparison, which models a protein family using HMM (Hidden Markov Model) and then detects homologs using HMM-HMM alignment. HMM cannot model long-range residue interaction patterns and thus, carries very little information regarding the global 3D structure of a protein family. As such, HMM comparison is not sensitive enough for distantly-related homologs. In this paper, we present an MRF-MRF comparison method for homology detection. In particular, we model a protein family using Markov Random Fields (MRF) and then detect homologs by MRF-MRF alignment. Compared to HMM, MRFs are able to model long-range residue interaction pattern and thus, contains information for the overall 3D structure of a protein family. Consequently, MRF-MRF comparison is much more sensitive than HMM-HMM comparison. To implement MRF-MRF comparison, we have developed a new scoring function to measure the similarity of two MRFs and also an efficient ADMM algorithm to optimize the scoring function. Experiments confirm that MRF-MRF comparison indeed outperforms HMM-HMM comparison in terms of both alignment accuracy and remote homology detection, especially for mainly beta proteins.
Collapse
Affiliation(s)
- Jianzhu Ma
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Sheng Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Zhiyong Wang
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
48
|
Trunk cleavage is essential for Drosophila terminal patterning and can occur independently of Torso-like. Nat Commun 2014; 5:3419. [DOI: 10.1038/ncomms4419] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Accepted: 02/10/2014] [Indexed: 02/07/2023] Open
|
49
|
Webb B, Eswar N, Fan H, Khuri N, Pieper U, Dong G, Sali A. Comparative Modeling of Drug Target Proteins☆. REFERENCE MODULE IN CHEMISTRY, MOLECULAR SCIENCES AND CHEMICAL ENGINEERING 2014. [PMCID: PMC7157477 DOI: 10.1016/b978-0-12-409547-2.11133-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state-of-the-art by a number of specific examples.
Collapse
|
50
|
Novotna J, Olsovska J, Novak P, Mojzes P, Chaloupkova R, Kamenik Z, Spizek J, Kutejova E, Mareckova M, Tichy P, Damborsky J, Janata J. Lincomycin biosynthesis involves a tyrosine hydroxylating heme protein of an unusual enzyme family. PLoS One 2013; 8:e79974. [PMID: 24324587 PMCID: PMC3851162 DOI: 10.1371/journal.pone.0079974] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Accepted: 10/07/2013] [Indexed: 11/18/2022] Open
Abstract
The gene lmbB2 of the lincomycin biosynthetic gene cluster of Streptomyces lincolnensis ATCC 25466 was shown to code for an unusual tyrosine hydroxylating enzyme involved in the biosynthetic pathway of this clinically important antibiotic. LmbB2 was expressed in Escherichia coli, purified near to homogeneity and shown to convert tyrosine to 3,4-dihydroxyphenylalanine (DOPA). In contrast to the well-known tyrosine hydroxylases (EC 1.14.16.2) and tyrosinases (EC 1.14.18.1), LmbB2 was identified as a heme protein. Mass spectrometry and Soret band-excited Raman spectroscopy of LmbB2 showed that LmbB2 contains heme b as prosthetic group. The CO-reduced differential absorption spectra of LmbB2 showed that the coordination of Fe was different from that of cytochrome P450 enzymes. LmbB2 exhibits sequence similarity to Orf13 of the anthramycin biosynthetic gene cluster, which has recently been classified as a heme peroxidase. Tyrosine hydroxylating activity of LmbB2 yielding DOPA in the presence of (6R)-5,6,7,8-tetrahydro-L-biopterin (BH4) was also observed. Reaction mechanism of this unique heme peroxidases family is discussed. Also, tyrosine hydroxylation was confirmed as the first step of the amino acid branch of the lincomycin biosynthesis.
Collapse
Affiliation(s)
- Jitka Novotna
- Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
- Central-European Technology Institute, Brno, Czech Republic
- Crop Research Institute, Drnovska Prague, Czech Republic
| | - Jana Olsovska
- Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
| | - Petr Novak
- Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
| | - Peter Mojzes
- Institute of Physics, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
| | - Radka Chaloupkova
- Loschmidt Laboratories, Institute of Experimental Biology and National Centre for Biomolecular Research, Brno, Czech Republic
| | - Zdenek Kamenik
- Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
| | - Jaroslav Spizek
- Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
| | - Eva Kutejova
- Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
- Institute of Molecular Biology, Slovak Academy of Sciences, Bratislava, Slovak Republic
| | | | - Pavel Tichy
- Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
| | - Jiri Damborsky
- Loschmidt Laboratories, Institute of Experimental Biology and National Centre for Biomolecular Research, Brno, Czech Republic
| | - Jiri Janata
- Institute of Microbiology, Academy of Sciences of the Czech Republic, Prague, Czech Republic
| |
Collapse
|