1
|
Scarano N, Espinoza S, Brullo C, Cichero E. Computational Methods for the Discovery and Optimization of TAAR1 and TAAR5 Ligands. Int J Mol Sci 2024; 25:8226. [PMID: 39125796 PMCID: PMC11312273 DOI: 10.3390/ijms25158226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 07/25/2024] [Accepted: 07/25/2024] [Indexed: 08/12/2024] Open
Abstract
G-protein-coupled receptors (GPCRs) represent a family of druggable targets when treating several diseases and continue to be a leading part of the drug discovery process. Trace amine-associated receptors (TAARs) are GPCRs involved in many physiological functions with TAAR1 having important roles within the central nervous system (CNS). By using homology modeling methods, the responsiveness of TAAR1 to endogenous and synthetic ligands has been explored. In addition, the discovery of different chemo-types as selective murine and/or human TAAR1 ligands has helped in the understanding of the species-specificity preferences. The availability of TAAR1-ligand complexes sheds light on how different ligands bind TAAR1. TAAR5 is considered an olfactory receptor but has specific involvement in some brain functions. In this case, the drug discovery effort has been limited. Here, we review the successful computational efforts developed in the search for novel TAAR1 and TAAR5 ligands. A specific focus on applying structure-based and/or ligand-based methods has been done. We also give a perspective of the experimental data available to guide the future drug design of new ligands, probing species-specificity preferences towards more selective ligands. Hints for applying repositioning approaches are also discussed.
Collapse
Affiliation(s)
- Naomi Scarano
- Department of Pharmacy, Section of Medicinal Chemistry, School of Medical and Pharmaceutical Sciences, University of Genoa, Viale Benedetto XV, 3, 16132 Genoa, Italy; (N.S.); (C.B.)
| | - Stefano Espinoza
- Department of Health Sciences and Research Center on Autoimmune and Allergic Diseases (CAAD), University of Piemonte Orientale (UPO), 28100 Novara, Italy;
- Central RNA Laboratory, Istituto Italiano di Tecnologia (IIT), 16152 Genova, Italy
| | - Chiara Brullo
- Department of Pharmacy, Section of Medicinal Chemistry, School of Medical and Pharmaceutical Sciences, University of Genoa, Viale Benedetto XV, 3, 16132 Genoa, Italy; (N.S.); (C.B.)
| | - Elena Cichero
- Department of Pharmacy, Section of Medicinal Chemistry, School of Medical and Pharmaceutical Sciences, University of Genoa, Viale Benedetto XV, 3, 16132 Genoa, Italy; (N.S.); (C.B.)
| |
Collapse
|
2
|
Singh A, Amod A, Mulpuru V, Mishra N, Sahoo AK, Samanta SK. Finding Novel AMPs Secreted from the Human Microbiome as Potent Antibacterial and Antibiofilm Agents and Studying Their Synergistic Activity with Ag NCs. ACS APPLIED BIO MATERIALS 2023; 6:3674-3682. [PMID: 37603700 DOI: 10.1021/acsabm.3c00302] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/23/2023]
Abstract
Due to the enhanced resistance of bacteria to antibiotics, researchers always try to find effective alternatives to treat drug-resistant bacterial infections. In this context, we have explored antimicrobial peptides (AMPs), which are a broad class of small peptide molecules, and investigated their efficacy as potent antibacterial and antibiofilm agents. AMPs can cause cell death either through disruption of the cell membrane or by inhibiting vital intracellular functions, by binding to RNA, DNA, or intracellular components upon transversion through the cell membrane. We attempted to find potent intracellular cationic AMPs that can demonstrate antibacterial activity through interaction with DNA. As a source of AMPs, we have utilized those that are secreted from the human microbiome with the anticipation that these will be non-toxic in nature. Out of the total 1087 AMPs, 27 were screened on the basis of amino acid length and efficacy to cross the cell membrane barrier. From the list of 27 peptides, 4 candidates were selected through the docking score of these peptides with the DNA binding domain of H2A proteins. Further, the molecular dynamics simulation analysis demonstrated that 2 AMPs, i.e., peptides 7 and 25, are having considerable membrane permeation and DNA binding ability. Further, the in vitro analysis indicated that both peptides 7 and 25 could exhibit potent antibacterial and antibiofilm activities. In order to further enhance the antibiofilm potency, the above AMPs were used as supplements to silver nanoclusters (Ag NCs) to get synergistic activity. The synergistic activity of Ag NCs was found to be significantly increased with both the above AMPs.
Collapse
Affiliation(s)
- Anirudh Singh
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Allahabad 211012, Uttar Pradesh, India
| | - Ayush Amod
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Allahabad 211012, Uttar Pradesh, India
| | - Viswajit Mulpuru
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Allahabad 211012, Uttar Pradesh, India
| | - Nidhi Mishra
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Allahabad 211012, Uttar Pradesh, India
| | - Amaresh Kumar Sahoo
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Allahabad 211012, Uttar Pradesh, India
| | - Sintu Kumar Samanta
- Department of Applied Sciences, Indian Institute of Information Technology Allahabad, Allahabad 211012, Uttar Pradesh, India
| |
Collapse
|
3
|
Mehta S, Bernt M, Chambers M, Fahrner M, Föll MC, Gruening B, Horro C, Johnson JE, Loux V, Rajczewski AT, Schilling O, Vandenbrouck Y, Gustafsson OJR, Thang WCM, Hyde C, Price G, Jagtap PD, Griffin TJ. A Galaxy of informatics resources for MS-based proteomics. Expert Rev Proteomics 2023; 20:251-266. [PMID: 37787106 DOI: 10.1080/14789450.2023.2265062] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/06/2023] [Indexed: 10/04/2023]
Abstract
INTRODUCTION Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.
Collapse
Affiliation(s)
- Subina Mehta
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Matthias Bernt
- Helmholtz Centre for Environmental Research - UFZ, Department Computational Biology, Leipzig, Germany
| | | | - Matthias Fahrner
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Melanie Christine Föll
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Bjoern Gruening
- Bioinformatics Group, Department of Computer Science, Albert-Ludwigs-University Freiburg, Freiburg, Germany
| | - Carlos Horro
- Proteomics Unit, Department of Biomedicine, University of Bergen, Bergen, Norway
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - James E Johnson
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN, USA
| | - Valentin Loux
- Université Paris-Saclay, INRAE, MaIAGE, Jouy-en-Josas, France
- Université Paris-Saclay, INRAE, BioinfOmics, MIGALE bioinformatics facility, Jouy-en-Josas, France
| | - Andrew T Rajczewski
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Oliver Schilling
- Institute for Surgical Pathology, Medical Center - University of Freiburg, Freiburg, Germany
- German Cancer Consortium (DKTK) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | | | | | - W C Mike Thang
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Cameron Hyde
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Sippy Downs, University of the Sunshine Coast, Australia
| | - Gareth Price
- Queensland Cyber Infrastructure Foundation (QCIF), Australia
- Institute of Molecular Bioscience, University of Queensland, St Lucia, Australia
| | - Pratik D Jagtap
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| | - Timothy J Griffin
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
4
|
Agrò SN, Rozza R, Movilla S, Aupič J, Magistrato A. Molecular Dynamics Simulations Elucidate the Molecular Basis of Pre-mRNA Translocation by the Prp2 Spliceosomal Helicase. J Chem Inf Model 2023. [PMID: 37379492 DOI: 10.1021/acs.jcim.3c00585] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/30/2023]
Abstract
The spliceosome machinery catalyzes precursor-messenger RNA (pre-mRNA) splicing by undergoing at each splicing cycle assembly, activation, catalysis, and disassembly processes, thanks to the concerted action of specific RNA-dependent ATPases/helicases. Prp2, a member of the DExH-box ATPase/helicase family, harnesses the energy of ATP hydrolysis to translocate a single pre-mRNA strand in the 5' to 3' direction, thus promoting spliceosome remodeling to its catalytic-competent state. Here, we established the functional coupling between ATPase and helicase activities of Prp2. Namely, extensive multi-μs molecular dynamics simulations allowed us to unlock how, after pre-mRNA selection, ATP binding, hydrolysis, and dissociation induce a functional typewriter-like rotation of the Prp2 C-terminal domain. This movement, endorsed by an iterative swing of interactions established between specific Prp2 residues with the nucleobases at 5'- and 3'-ends of pre-mRNA, promotes pre-mRNA translocation. Notably, some of these Prp2 residues are conserved in the DExH-box family, suggesting that the translocation mechanism elucidated here may be applicable to all DExH-box helicases.
Collapse
Affiliation(s)
- Sefora Naomi Agrò
- National Research Council of Italy (CNR)─Institute of Material (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| | - Riccardo Rozza
- National Research Council of Italy (CNR)─Institute of Material (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| | - Santiago Movilla
- BioComp Group, Institute of Advanced Materials (INAM), Universitat Jaume I, 12071 Castellón, Spain
| | - Jana Aupič
- National Research Council of Italy (CNR)─Institute of Material (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| | - Alessandra Magistrato
- National Research Council of Italy (CNR)─Institute of Material (IOM) c/o International School for Advanced Studies (SISSA), Via Bonomea, 265, 34136 Trieste, Italy
| |
Collapse
|
5
|
Yao Y, Frith MC. Improved DNA-Versus-Protein Homology Search for Protein Fossils. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1691-1699. [PMID: 35617174 DOI: 10.1109/tcbb.2022.3177855] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Protein fossils, i.e., noncoding DNA descended from coding DNA, arise frequently from transposable elements (TEs), decayed genes, and viral integrations. They can reveal, and mislead about, evolutionary history and relationships. They have been detected by comparing DNA to protein sequences, but current methods are not optimized for this task. We describe a powerful DNA-protein homology search method. We use a 64×21 substitution matrix, which is fitted to sequence data, automatically learning the genetic code. We detect subtly homologous regions by considering alternative possible alignments between them, and calculate significance (probability of occurring by chance between random sequences). Our method detects TE protein fossils much more sensitively than blastx, and faster. Of the ∼ 7 major categories of eukaryotic TE, three were long thought absent in mammals: we find two of them in the human genome, polinton and DIRS/Ngaro. This method increases our power to find ancient fossils, and perhaps to detect non-standard genetic codes. The alternative-alignments and significance paradigm is not specific to DNA-protein comparison, and could benefit homology search generally. This is an extended version of a conference paper (Yao & Frith, 2021).
Collapse
|
6
|
Movilla S, Roca M, Moliner V, Magistrato A. Molecular Basis of RNA-Driven ATP Hydrolysis in DExH-Box Helicases. J Am Chem Soc 2023; 145:6691-6701. [PMID: 36926902 DOI: 10.1021/jacs.2c11980] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
The spliceosome machinery catalyzes precursor messenger (pre-m)RNA splicing. In each cycle, the spliceosome experiences massive compositional and conformational remodeling fueled by the concerted action of specific RNA-dependent ATPases/helicases. Intriguingly, these enzymes are allosterically activated to perform ATP hydrolysis and trigger helicase activity only upon pre-mRNA binding. Yet, the molecular mechanism underlying the RNA-driven regulation of their ATPase function remains elusive. Here, we focus on the Prp2 ATPase/helicase which contributes to reshaping the spliceosome into its catalytic competent state. By performing classical and quantum-classical molecular dynamics simulations, we unprecedentedly unlock the molecular terms governing the Prp2 ATPase/helicase function. Namely, we dissect the molecular mechanism of ATP hydrolysis, and we disclose that RNA binding allosterically triggers the formation of a set of interactions linking the RNA binding tunnel to the catalytic site. This activates the Prp2's ATPase function by optimally placing the nucleophilic water and the general base of the enzymatic process to perform ATP hydrolysis. The key structural motifs, mechanically coupling RNA gripping and the ATPase/helicase functions, are conserved across all DExH-box helicases. This mechanism could thus be broadly applicable to all DExH-box helicase family.
Collapse
Affiliation(s)
- Santiago Movilla
- BioComp Group, Institute of Advanced Materials (INAM), Universitat Jaume I, 12071 Castellón, Spain
| | - Maite Roca
- BioComp Group, Institute of Advanced Materials (INAM), Universitat Jaume I, 12071 Castellón, Spain
| | - Vicent Moliner
- BioComp Group, Institute of Advanced Materials (INAM), Universitat Jaume I, 12071 Castellón, Spain
| | - Alessandra Magistrato
- Department National Research Council of Italy (CNR), Institute of Material (IOM) c/o International School for Advanced Studies (SISSA), 34136 Trieste, Italy
| |
Collapse
|
7
|
Lykholat YV, Rabokon AM, Blume RY, Khromykh NO, Didur OO, Sakharova VH, Kabar AM, Pirko YV, Blume YB. Characterization of β-Tubulin Genes in Prunus persica and Prunus dulcis for Fingerprinting of their Interspecific Hybrids. CYTOL GENET+ 2022. [DOI: 10.3103/s009545272206007x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
8
|
Bennison SA, Liu X, Toyo-Oka K. Nuak kinase signaling in development and disease of the central nervous system. Cell Signal 2022; 100:110472. [PMID: 36122883 DOI: 10.1016/j.cellsig.2022.110472] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 09/11/2022] [Accepted: 09/13/2022] [Indexed: 01/14/2023]
Abstract
Protein kinases represent important signaling hubs for a variety of biological functions. Many kinases are traditionally studied for their roles in cancer cell biology, but recent advances in neuroscience research show repurposed kinase function to be important for nervous system development and function. Two members of the AMP-activated protein kinase (AMPK) related family, NUAK1 and NUAK2, have drawn attention in neuroscience due to their mutations in autism spectrum disorder (ASD), attention deficit hyperactivity disorder (ADHD), schizophrenia, and intellectual disability (ID). Furthermore, Nuak kinases have also been implicated in tauopathy and other disorders of aging. This review highlights what is known about the Nuak kinases in nervous system development and disease and explores the possibility of Nuak kinases as targets for therapeutic innovation.
Collapse
Affiliation(s)
- Sarah A Bennison
- Department of Neurobiology and Anatomy, Drexel University College of Medicine, Philadelphia, PA 19129, USA
| | - Xiaonan Liu
- Department of Pharmacology and Physiology, Drexel University College of Medicine, Philadelphia, PA 19102, USA
| | - Kazuhito Toyo-Oka
- Department of Neurobiology and Anatomy, Drexel University College of Medicine, Philadelphia, PA 19129, USA.
| |
Collapse
|
9
|
Enzingmüller-Bleyl TC, Boden JS, Herrmann AJ, Ebel KW, Sánchez-Baracaldo P, Frankenberg-Dinkel N, Gehringer MM. On the trail of iron uptake in ancestral Cyanobacteria on early Earth. GEOBIOLOGY 2022; 20:776-789. [PMID: 35906866 DOI: 10.1111/gbi.12515] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 06/28/2022] [Accepted: 07/06/2022] [Indexed: 06/15/2023]
Abstract
Cyanobacteria oxygenated Earth's atmosphere ~2.4 billion years ago, during the Great Oxygenation Event (GOE), through oxygenic photosynthesis. Their high iron requirement was presumably met by high levels of Fe(II) in the anoxic Archean environment. We found that many deeply branching Cyanobacteria, including two Gloeobacter and four Pseudanabaena spp., cannot synthesize the Fe(II) specific transporter, FeoB. Phylogenetic and relaxed molecular clock analyses find evidence that FeoB and the Fe(III) transporters, cFTR1 and FutB, were present in Proterozoic, but not earlier Archaean lineages of Cyanobacteria. Furthermore Pseudanabaena sp. PCC7367, an early diverging marine, benthic strain grown under simulated Archean conditions, constitutively expressed cftr1, even after the addition of Fe(II). Our genetic profiling suggests that, prior to the GOE, ancestral Cyanobacteria may have utilized alternative metal iron transporters such as ZIP, NRAMP, or FicI, and possibly also scavenged exogenous siderophore bound Fe(III), as they only acquired the necessary Fe(II) and Fe(III) transporters during the Proterozoic. Given that Cyanobacteria arose 3.3-3.6 billion years ago, it is possible that limitations in iron uptake may have contributed to the delay in their expansion during the Archean, and hence the oxygenation of the early Earth.
Collapse
Affiliation(s)
| | - Joanne S Boden
- School of Geographical Sciences, Faculty of Science, University of Bristol, Bristol, UK
- School of Earth and Environmental Sciences, University of St. Andrews, St. Andrews, UK
| | - Achim J Herrmann
- Department of Microbiology, University of Kaiserslautern, Kaiserslautern, Germany
| | - Katharina W Ebel
- Department of Microbiology, University of Kaiserslautern, Kaiserslautern, Germany
| | | | | | - Michelle M Gehringer
- Department of Microbiology, University of Kaiserslautern, Kaiserslautern, Germany
| |
Collapse
|
10
|
Blume R, Yemets A, Korkhovyi V, Radchuk V, Rakhmetov D, Blume Y. Genome-wide identification and analysis of the cytokinin oxidase/dehydrogenase ( ckx) gene family in finger millet ( Eleusine coracana). Front Genet 2022; 13:963789. [PMID: 36299586 PMCID: PMC9589517 DOI: 10.3389/fgene.2022.963789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 08/05/2022] [Indexed: 11/13/2022] Open
Abstract
Cytokinin dehydrogenase/oxidase (CKX) enzymes play a key role in regulating cytokinin (CK) levels in plants by degrading the excess of this phytohormone. CKX genes have proven an attractive target for genetic engineering, as their silencing boosts cytokinin accumulation in various tissues, thereby contributing to a rapid increase in biomass and overall plant productivity. We previously reported a similar effect in finger millet (Eleusine coracana) somaclonal lines, caused by downregulation of EcCKX1 and EcCKX2. However, the CKX gene family has numerous representatives, especially in allopolyploid crop species, such as E. coracana. To date, the entire CKX gene family of E. coracana and its related species has not been characterized. We offer here, for the first time, a comprehensive genome-wide identification and analysis of a panel of CKX genes in finger millet. The functional genes identified in the E. coracana genome are compared with the previously-identified genes, EcCKX1 and EcCKX2. Exon-intron structural analysis and motif analysis of FAD- and CK-binding domains are performed. The phylogeny of the EcCKX genes suggests that CKX genes are divided into several distinct groups, corresponding to certain isotypes. Finally, the phenotypic effect of EcCKX1 and EcCKX2 in partially silencing the SE7 somaclonal line is investigated, showing that lines deficient in CKX-expression demonstrate increased grain yield and greater bushiness, enhanced biomass accumulation, and a shorter vegetation cycle.
Collapse
Affiliation(s)
- Rostyslav Blume
- Department of Population Genetics, Institute of Food Biotechnology and Genomics, National Academy of Sciences of Ukraine, Kyiv, Ukraine
| | - Alla Yemets
- Department of Cell Biology and Biotechnology, Institute of Food Biotechnology and Genomics, National Academy of Sciences of Ukraine, Kyiv, Ukraine
| | - Vitaliy Korkhovyi
- Department of Cell Biology and Biotechnology, Institute of Food Biotechnology and Genomics, National Academy of Sciences of Ukraine, Kyiv, Ukraine
| | - Volodymyr Radchuk
- Leibniz Institute of Plant Genetics and Crop Plant Research, Gatersleben, Germany
| | - Dzhamal Rakhmetov
- M. M. Gryshko National Botanic Garden of National Academy of Sciences of Ukraine, Kyiv, Ukraine
| | - Yaroslav Blume
- Department of Genomics and Molecular Biotechnology, Institute of Food Biotechnology and Genomics, National Academy of Sciences of Ukraine, Kyiv, Ukraine
| |
Collapse
|
11
|
Cophylogeny and convergence shape holobiont evolution in sponge-microbe symbioses. Nat Ecol Evol 2022; 6:750-762. [PMID: 35393600 DOI: 10.1038/s41559-022-01712-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 02/24/2022] [Indexed: 02/07/2023]
Abstract
Symbiotic microbial communities of sponges serve critical functions that have shaped the evolution of reef ecosystems since their origins. Symbiont abundance varies tremendously among sponges, with many species classified as either low microbial abundance (LMA) or high microbial abundance (HMA), but the evolutionary dynamics of these symbiotic states remain unknown. This study examines the LMA/HMA dichotomy across an exhaustive sampling of Caribbean sponge biodiversity and predicts that the LMA symbiotic state is the ancestral state among sponges. Conversely, HMA symbioses, consisting of more specialized microorganisms, have evolved multiple times by recruiting similar assemblages, mostly since the rise of scleractinian-dominated reefs. Additionally, HMA symbioses show stronger signals of phylosymbiosis and cophylogeny, consistent with stronger co-evolutionary interaction in these complex holobionts. These results indicate that HMA holobionts are characterized by increased endemism, metabolic dependence and chemical defences. The selective forces driving these patterns may include the concurrent increase in dissolved organic matter in reef ecosystems or the diversification of spongivorous fishes.
Collapse
|
12
|
Tarone L, Giacobino D, Camerino M, Ferrone S, Buracco P, Cavallo F, Riccardo F. Canine Melanoma Immunology and Immunotherapy: Relevance of Translational Research. Front Vet Sci 2022; 9:803093. [PMID: 35224082 PMCID: PMC8873926 DOI: 10.3389/fvets.2022.803093] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 01/10/2022] [Indexed: 11/17/2022] Open
Abstract
In veterinary oncology, canine melanoma is still a fatal disease for which innovative and long-lasting curative treatments are urgently required. Considering the similarities between canine and human melanoma and the clinical revolution that immunotherapy has instigated in the treatment of human melanoma patients, special attention must be paid to advancements in tumor immunology research in the veterinary field. Herein, we aim to discuss the most relevant knowledge on the immune landscape of canine melanoma and the most promising immunotherapeutic approaches under investigation. Particular attention will be dedicated to anti-cancer vaccination, and, especially, to the encouraging clinical results that we have obtained with DNA vaccines directed against chondroitin sulfate proteoglycan 4 (CSPG4), which is an appealing tumor-associated antigen with a key oncogenic role in both canine and human melanoma. In parallel with advances in therapeutic options, progress in the identification of easily accessible biomarkers to improve the diagnosis and the prognosis of melanoma should be sought, with circulating small extracellular vesicles emerging as strategically relevant players. Translational advances in melanoma management, whether achieved in the human or veterinary fields, may drive improvements with mutual clinical benefits for both human and canine patients; this is where the strength of comparative oncology lies.
Collapse
Affiliation(s)
- Lidia Tarone
- Department of Molecular Biotechnology and Health Sciences, Molecular Biotechnology Center, University of Turin, Turin, Italy
| | - Davide Giacobino
- Department of Veterinary Sciences, University of Turin, Turin, Italy
| | | | - Soldano Ferrone
- Department of Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| | - Paolo Buracco
- Department of Veterinary Sciences, University of Turin, Turin, Italy
| | - Federica Cavallo
- Department of Molecular Biotechnology and Health Sciences, Molecular Biotechnology Center, University of Turin, Turin, Italy
| | - Federica Riccardo
- Department of Molecular Biotechnology and Health Sciences, Molecular Biotechnology Center, University of Turin, Turin, Italy
| |
Collapse
|
13
|
Manor J, Chung H, Bhagwat PK, Wangler MF. ABCD1 and X-linked adrenoleukodystrophy: A disease with a markedly variable phenotype showing conserved neurobiology in animal models. J Neurosci Res 2021; 99:3170-3181. [PMID: 34716609 PMCID: PMC9665428 DOI: 10.1002/jnr.24953] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 07/30/2021] [Accepted: 08/15/2021] [Indexed: 12/12/2022]
Abstract
X-linked adrenoleukodystrophy (X-ALD) is a phenotypically heterogeneous disorder involving defective peroxisomal β-oxidation of very long-chain fatty acids (VLCFAs), due to mutation in the ABCD1 gene. X-ALD is the most common peroxisomal inborn error of metabolism and confers a high degree of morbidity and mortality. Remarkably, a subset of patients exhibit a cerebral form with inflammatory invasion of the central nervous system and extensive demyelination, while in others only dying-back axonopathy or even isolated adrenal insufficiency is seen, without genotype-phenotype correlation. X-ALD's biochemical signature is marked elevation of VLCFAs in blood, a finding that has been utilized for massive newborn screening for early diagnosis. Investigational gene therapy approaches hold promises for improved outcomes. However, the pathophysiological mechanisms of the disease remain poorly understood, limiting investigation of targeted therapeutic options. Animal models for the disease recapitulate the biochemical signature of VLCFA accumulation and demonstrate mitochondrially generated reactive oxygen species, oxidative damage, increased glial death, and axonal damage. Most strikingly, however, cerebral invasion of leukocytes and demyelination were not observed in any animal model for X-ALD, reflecting upon pathological processes that are yet to be discovered. This review summarizes the current disease models in animals, the lessons learned from these models, and the gaps that remained to be filled in order to assist in therapeutic investigations for ALD.
Collapse
Affiliation(s)
- Joshua Manor
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, Texas, USA
- Texas Children’s Hospital, Houston, Texas, USA
| | - Hyunglok Chung
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, Texas, USA
| | - Pranjali K. Bhagwat
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, Texas, USA
| | - Michael F. Wangler
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
- Jan and Dan Duncan Neurological Research Institute at Texas Children’s Hospital, Houston, Texas, USA
| |
Collapse
|
14
|
Henderson PJF, Maher C, Elbourne LDH, Eijkelkamp BA, Paulsen IT, Hassan KA. Physiological Functions of Bacterial "Multidrug" Efflux Pumps. Chem Rev 2021; 121:5417-5478. [PMID: 33761243 DOI: 10.1021/acs.chemrev.0c01226] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Bacterial multidrug efflux pumps have come to prominence in human and veterinary pathogenesis because they help bacteria protect themselves against the antimicrobials used to overcome their infections. However, it is increasingly realized that many, probably most, such pumps have physiological roles that are distinct from protection of bacteria against antimicrobials administered by humans. Here we undertake a broad survey of the proteins involved, allied to detailed examples of their evolution, energetics, structures, chemical recognition, and molecular mechanisms, together with the experimental strategies that enable rapid and economical progress in understanding their true physiological roles. Once these roles are established, the knowledge can be harnessed to design more effective drugs, improve existing microbial production of drugs for clinical practice and of feedstocks for commercial exploitation, and even develop more sustainable biological processes that avoid, for example, utilization of petroleum.
Collapse
Affiliation(s)
- Peter J F Henderson
- School of Biomedical Sciences and Astbury Centre for Structural Molecular Biology, University of Leeds, Leeds LS2 9JT, United Kingdom
| | - Claire Maher
- School of Environmental and Life Sciences, University of Newcastle, Callaghan 2308, New South Wales, Australia
| | - Liam D H Elbourne
- Department of Biomolecular Sciences, Macquarie University, Sydney 2109, New South Wales, Australia.,ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney 2019, New South Wales, Australia
| | - Bart A Eijkelkamp
- College of Science and Engineering, Flinders University, Bedford Park 5042, South Australia, Australia
| | - Ian T Paulsen
- Department of Biomolecular Sciences, Macquarie University, Sydney 2109, New South Wales, Australia.,ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney 2019, New South Wales, Australia
| | - Karl A Hassan
- School of Environmental and Life Sciences, University of Newcastle, Callaghan 2308, New South Wales, Australia.,ARC Centre of Excellence in Synthetic Biology, Macquarie University, Sydney 2019, New South Wales, Australia
| |
Collapse
|
15
|
Zhan Q, Wang N, Jin S, Tan R, Jiang Q, Wang Y. ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function. BMC Bioinformatics 2019; 20:573. [PMID: 31760933 PMCID: PMC6876095 DOI: 10.1186/s12859-019-3132-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND During procedures for conducting multiple sequence alignment, that is so essential to use the substitution score of pairwise alignment. To compute adaptive scores for alignment, researchers usually use Hidden Markov Model or probabilistic consistency methods such as partition function. Recent studies show that optimizing the parameters for hidden Markov model, as well as integrating hidden Markov model with partition function can raise the accuracy of alignment. The combination of partition function and optimized HMM, which could further improve the alignment's accuracy, however, was ignored by these researches. RESULTS A novel algorithm for MSA called ProbPFP is presented in this paper. It intergrate optimized HMM by particle swarm with partition function. The algorithm of PSO was applied to optimize HMM's parameters. After that, the posterior probability obtained by the HMM was combined with the one obtained by partition function, and thus to calculate an integrated substitution score for alignment. In order to evaluate the effectiveness of ProbPFP, we compared it with 13 outstanding or classic MSA methods. The results demonstrate that the alignments obtained by ProbPFP got the maximum mean TC scores and mean SP scores on these two benchmark datasets: SABmark and OXBench, and it got the second highest mean TC scores and mean SP scores on the benchmark dataset BAliBASE. ProbPFP is also compared with 4 other outstanding methods, by reconstructing the phylogenetic trees for six protein families extracted from the database TreeFam, based on the alignments obtained by these 5 methods. The result indicates that the reference trees are closer to the phylogenetic trees reconstructed from the alignments obtained by ProbPFP than the other methods. CONCLUSIONS We propose a new multiple sequence alignment method combining optimized HMM and partition function in this paper. The performance validates this method could make a great improvement of the alignment's accuracy.
Collapse
Affiliation(s)
- Qing Zhan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Nan Wang
- Department of Mathematics, Harbin Institute of Technology, Harbin, 150001, China
| | - Shuilin Jin
- Department of Mathematics, Harbin Institute of Technology, Harbin, 150001, China
| | - Renjie Tan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
16
|
Scaini JLR, Camargo AD, Seus VR, von Groll A, Werhli AV, da Silva PEA, Machado KDS. Molecular modelling and competitive inhibition of a Mycobacterium tuberculosis multidrug-resistance efflux pump. J Mol Graph Model 2018; 87:98-108. [PMID: 30529931 DOI: 10.1016/j.jmgm.2018.11.016] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Revised: 11/29/2018] [Accepted: 11/29/2018] [Indexed: 02/08/2023]
Abstract
Tuberculosis is a major cause of mortality and morbidity in developing countries, and the emergency of multidrug and extensive drug resistance cases is an utmost issue on the control of the disease. Despite the efforts on the development of new antibiotics, eventually there will be strains resistant to them as well. Efflux plays an important role in the evolution of resistance in Mycobacterium tuberculosis. Tap is an important efflux pump associated with tuberculosis resistant to isoniazid, rifampicine and ofloxacin and with multidrug resistance. The development of efflux inhibitors for Tap could raise the effectiveness of second line drugs and reduce the duration of the current treatment. Therefore the objective of this study is to build a reliable molecular model of Tap efflux pump and test the possible competitive inhibition between efflux inhibitors and antibiotics in the optimized structure. We built twenty five Tap models with molecular modelling to elect the best according to the results of the validation analysis. The elected model went through to a 100 ns molecular dynamics simulation in a lipid bilayer, and the resulting optimized structure was used in docking studies to test if the used efflux inhibitors may act via competitive inhibition on antibiotics. The validation results pointed the model built by Phyre2 as the closest to a possible native Tap structure, and therefore it was the elected model. RSMD analysis revealed the model is stable, where the predicted binding site stabilized between 15 and 20 ns, maintaining the RMSD at around 0.35 Å throughout the molecular dynamics simulation in a lipid bilayer. Therefore this model is reliable and can also be used for further studies. The docking studies showed a possibility of competitive inhibition by NUNL02 on ofloxacin and bedaquiline, and by verapamil on ofloxacin and rifampicin. This presents the possibility that NUNL02 and verapamil are possible inhibitors of Tap efflux and highlights the importance of including efflux inhibitors as adjuvants to the tuberculosis therapy, as it indicates a possible extrusion of ofloxacin, rifampicin and bedaquilin by Tap.
Collapse
Affiliation(s)
- Joāo Luís Rheingantz Scaini
- Laboratory of Computational Biology, Computational Sciences Center of the Universidade Federal do Rio Grande, Avenida Itlia, Km8, Rio Grande, RS, Brazil; Research Center in Medical Microbiology of the Universidade Federal do Rio Grande, Avenida Itlia, Km8, Rio Grande, RS, Brazil.
| | - Alex Dias Camargo
- Laboratory of Computational Biology, Computational Sciences Center of the Universidade Federal do Rio Grande, Avenida Itlia, Km8, Rio Grande, RS, Brazil
| | - Vinicius Rosa Seus
- Laboratory of Computational Biology, Computational Sciences Center of the Universidade Federal do Rio Grande, Avenida Itlia, Km8, Rio Grande, RS, Brazil
| | - Andrea von Groll
- Research Center in Medical Microbiology of the Universidade Federal do Rio Grande, Avenida Itlia, Km8, Rio Grande, RS, Brazil
| | - Adriano Velasque Werhli
- Laboratory of Computational Biology, Computational Sciences Center of the Universidade Federal do Rio Grande, Avenida Itlia, Km8, Rio Grande, RS, Brazil
| | - Pedro Eduardo Almeida da Silva
- Research Center in Medical Microbiology of the Universidade Federal do Rio Grande, Avenida Itlia, Km8, Rio Grande, RS, Brazil
| | - Karina Dos Santos Machado
- Laboratory of Computational Biology, Computational Sciences Center of the Universidade Federal do Rio Grande, Avenida Itlia, Km8, Rio Grande, RS, Brazil
| |
Collapse
|
17
|
Abstract
Sequence similarity searching has become an important part of the daily routine of molecular biologists, bioinformaticians and biophysicists. With the rapidly growing sequence databanks, this computational approach is commonly applied to determine functions and structures of unannotated sequences, to investigate relationships between sequences, and to construct phylogenetic trees. We introduce arguably the most popular BLAST-based family of the sequence similarity search tools. We explain basic concepts related to the sequence alignment and demonstrate how to search the current databanks using Web site versions of BLASTP, PSI-BLAST and BLASTN. We also describe the standalone BLAST+ tool. Moreover, this unit discusses the inputs, parameter settings and outputs of these tools. Lastly, we cover recent advances in the sequence similarity searching, focusing on the fast MMseqs2 method. © 2018 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, Virginia
| |
Collapse
|
18
|
Mondragón-Palomino M, Stam R, John-Arputharaj A, Dresselhaus T. Diversification of defensins and NLRs in Arabidopsis species by different evolutionary mechanisms. BMC Evol Biol 2017; 17:255. [PMID: 29246101 PMCID: PMC5731061 DOI: 10.1186/s12862-017-1099-4] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 11/24/2017] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Genes encoding proteins underlying host-pathogen co-evolution and which are selected for new resistance specificities frequently are under positive selection, a process that maintains diversity. Here, we tested the contribution of natural selection, recombination and transcriptional divergence to the evolutionary diversification of the plant defensins superfamily in three Arabidopsis species. The intracellular NOD-like receptor (NLR) family was used for comparison because positive selection has been well documented in its members. Similar to defensins, NLRs are encoded by a large and polymorphic gene family and many of their members are involved in the immune response. RESULTS Gene trees of Arabidopsis defensins (DEFLs) show a high prevalence of clades containing orthologs. This indicates that their diversity dates back to a common ancestor and species-specific duplications did not significantly contribute to gene family expansion. DEFLs are characterized by a pervasive pattern of neutral evolution with infrequent positive and negative selection as well as recombination. In comparison, most NLR alignment groups are characterized by frequent occurrence of positive selection and recombination in their leucine-rich repeat (LRR) domain as well negative selection in their nucleotide-binding (NB-ARC) domain. While major NLR subgroups are expressed in pistils and leaves both in presence or absence of pathogen infection, the members of DEFL alignment groups are predominantly transcribed in pistils. Furthermore, conserved groups of NLRs and DEFLs are differentially expressed in response to Fusarium graminearum regardless of whether these genes are under positive selection or not. CONCLUSIONS The present analyses of NLRs expands previous studies in Arabidopsis thaliana and highlights contrasting patterns of purifying and diversifying selection affecting different gene regions. DEFL genes show a different evolutionary trend, with fewer recombination events and significantly fewer instances of natural selection. Their heterogeneous expression pattern suggests that transcriptional divergence probably made the major contribution to functional diversification. In comparison to smaller families encoding pathogenesis-related (PR) proteins under positive selection, DEFLs are involved in a wide variety of processes that altogether might pose structural and functional trade-offs to their family-wide pattern of evolution.
Collapse
Affiliation(s)
- Mariana Mondragón-Palomino
- Cell Biology and Plant Biochemistry, Biochemie-Zentrum Regensburg, University of Regensburg, Universitätstraße 31, 93053, Regensburg, Germany.
| | - Remco Stam
- Chair of Phytopathology, Technical University of Munich, School of Life Sciences Weihenstephan, Emil-Ramann-Str. 2, 85354, Freising, Germany
| | - Ajay John-Arputharaj
- Cell Biology and Plant Biochemistry, Biochemie-Zentrum Regensburg, University of Regensburg, Universitätstraße 31, 93053, Regensburg, Germany
| | - Thomas Dresselhaus
- Cell Biology and Plant Biochemistry, Biochemie-Zentrum Regensburg, University of Regensburg, Universitätstraße 31, 93053, Regensburg, Germany
| |
Collapse
|
19
|
Rahman J, Noronha A, Thiele I, Rahman S. Leigh map: A novel computational diagnostic resource for mitochondrial disease. Ann Neurol 2017; 81:9-16. [PMID: 27977873 PMCID: PMC5347854 DOI: 10.1002/ana.24835] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Revised: 11/27/2016] [Accepted: 11/28/2016] [Indexed: 12/24/2022]
Affiliation(s)
- Joyeeta Rahman
- Mitochondrial Research Group, Genetics and Genomic Medicine Programme, UCL Great Ormond Street Institute of Child Health, London, United Kingdom
| | - Alberto Noronha
- Luxembourg Center for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Ines Thiele
- Luxembourg Center for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Shamima Rahman
- Mitochondrial Research Group, Genetics and Genomic Medicine Programme, UCL Great Ormond Street Institute of Child Health, London, United Kingdom
- Mitochondrial Research Group, Genetics and Genomic Medicine Programme, UCL Great Ormond Street Institute of Child Health and Metabolic Department, Great Ormond Street Hospital NHS Foundation Trust, London, United Kingdom
| |
Collapse
|
20
|
Pirmoradian M, Aarsland D, Zubarev RA. Isoelectric point region pI≈7.4 as a treasure island of abnormal proteoforms in blood. Discoveries (Craiova) 2016; 4:e67. [PMID: 32309586 PMCID: PMC7159840 DOI: 10.15190/d.2016.14] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Theoretical distribution of isoelectric points (pI values) of human blood proteins exhibits multi-modality with a deep minimum in the range between pI 7.30 and 7.50. Considering that the pH of human blood is 7.4±0.1, normal forms of human proteins tend to eschew this specific pI region, thus avoiding charge neutrality that can result in enhanced precipitation. However, abnormal protein isoforms (proteoforms), which are the hallmarks and potential biomarkers of certain diseases, are likely to be found everywhere in the pI distribution, including this “forbidden” region. Therefore, we hypothesized that damaging proteoforms characteristic for neurodegenerative diseases are best detected around pI≈7.4. Blood serum samples from 14 Alzheimer's disease patients were isolated by capillary isoelectric focusing and analyzed by liquid chromatography hyphenated with tandem mass spectrometry. Consistent with the pI≈7.4 hypothesis, the 8 patients with fast memory decline had a significantly (p<0.003) higher concentration of proteoforms in the pI=7.4±0.1 region than the 6 patients with a slow memory decline. Moreover, protein compositions differed more from each other than for any other investigated pI region, providing absolute separation of the fast and slow decliner samples. The discovery of the “treasure island” of abnormal proteoforms in form of the pI≈7.4 region promises to boost biomarker development for a range of diseases.
Collapse
Affiliation(s)
- Mohammad Pirmoradian
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden.,Biomotif AB, Stockholm, Sweden
| | - Dag Aarsland
- Alzheimer's Disease Research Centre, Department of Neurobiology, Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden
| | - Roman A Zubarev
- Department of Medical Biochemistry and Biophysics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
21
|
Rivas E, Eddy SR. Parameterizing sequence alignment with an explicit evolutionary model. BMC Bioinformatics 2015; 16:406. [PMID: 26652060 PMCID: PMC4676179 DOI: 10.1186/s12859-015-0832-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 11/20/2015] [Indexed: 11/10/2022] Open
Abstract
Background Inference of sequence homology is inherently an evolutionary question, dependent upon evolutionary divergence. However, the insertion and deletion penalties in the most widely used methods for inferring homology by sequence alignment, including BLAST and profile hidden Markov models (profile HMMs), are not based on any explicitly time-dependent evolutionary model. Using one fixed score system (BLOSUM62 with some gap open/extend costs, for example) corresponds to making an unrealistic assumption that all sequence relationships have diverged by the same time. Adoption of explicit time-dependent evolutionary models for scoring insertions and deletions in sequence alignments has been hindered by algorithmic complexity and technical difficulty. Results We identify and implement several probabilistic evolutionary models compatible with the affine-cost insertion/deletion model used in standard pairwise sequence alignment. Assuming an affine gap cost imposes important restrictions on the realism of the evolutionary models compatible with it, as single insertion events with geometrically distributed lengths do not result in geometrically distributed insert lengths at finite times. Nevertheless, we identify one evolutionary model compatible with symmetric pair HMMs that are the basis for Smith-Waterman pairwise alignment, and two evolutionary models compatible with standard profile-based alignment. We test different aspects of the performance of these “optimized branch length” models, including alignment accuracy and homology coverage (discrimination of residues in a homologous region from nonhomologous flanking residues). We test on benchmarks of both global homologies (full length sequence homologs) and local homologies (homologous subsequences embedded in nonhomologous sequence). Conclusions Contrary to our expectations, we find that for global homologies a single long branch parameterization suffices both for distant and close homologous relationships. In contrast, we do see an advantage in using explicit evolutionary models for local homologies. Optimal branch parameterization reduces a known artifact called “homologous overextension”, in which local alignments erroneously extend through flanking nonhomologous residues. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0832-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Elena Rivas
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA.
| | - Sean R Eddy
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, 02138, MA, USA. .,Howard Hughes Medical Institute, 4000 Jones Bridge Rd, Chevy Chase, 20815, MD, USA. .,John A. Paulson School of Engineering and Applied Sciences, 16 Divinity Avenue, Cambridge, 02138, MA, USA. .,FAS Center for Systems Biology, Harvard University, 16 Divinity Avenue, Cambridge, 02138, MA, USA.
| |
Collapse
|
22
|
Petronikolou N, Nair SK. Biochemical Studies of Mycobacterial Fatty Acid Methyltransferase: A Catalyst for the Enzymatic Production of Biodiesel. ACTA ACUST UNITED AC 2015; 22:1480-1490. [PMID: 26526103 DOI: 10.1016/j.chembiol.2015.09.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Revised: 09/04/2015] [Accepted: 09/24/2015] [Indexed: 10/22/2022]
Abstract
Transesterification of fatty acids yields the essential component of biodiesel, but current processes are cost-prohibitive and generate waste. Recent efforts make use of biocatalysts that are effective in diverting products from primary metabolism to yield fatty acid methyl esters in bacteria. These biotransformations require the fatty acid O-methyltransferase (FAMT) from Mycobacterium marinum (MmFAMT). Although this activity was first reported in the literature in 1970, the FAMTs have yet to be biochemically characterized. Here, we describe several crystal structures of MmFAMT, which highlight an unexpected structural conservation with methyltransferases that are involved in plant natural product metabolism. The determinants for ligand recognition are analyzed by kinetic analysis of structure-based active-site variants. These studies reveal how an architectural fold employed in plant natural product biosynthesis is used in bacterial fatty acid O-methylation.
Collapse
Affiliation(s)
- Nektaria Petronikolou
- Department of Biochemistry, University of Illinois at Urbana Champaign, 600 South Mathews Avenue, Urbana, IL 61801, USA; Institute for Genomic Biology, University of Illinois at Urbana Champaign, 600 South Mathews Avenue, Urbana, IL 61801, USA
| | - Satish K Nair
- Department of Biochemistry, University of Illinois at Urbana Champaign, 600 South Mathews Avenue, Urbana, IL 61801, USA; Center for Biophysics and Computational Biology and University of Illinois at Urbana Champaign, 600 South Mathews Avenue, Roger Adams Lab Room 430, Urbana, IL 61801, USA; Institute for Genomic Biology, University of Illinois at Urbana Champaign, 600 South Mathews Avenue, Urbana, IL 61801, USA.
| |
Collapse
|
23
|
Kumar RR, Goswami S, Sharma SK, Kala YK, Rai GK, Mishra DC, Grover M, Singh GP, Pathak H, Rai A, Chinnusamy V, Rai RD. Harnessing Next Generation Sequencing in Climate Change: RNA-Seq Analysis of Heat Stress-Responsive Genes in Wheat (Triticum aestivum L.). OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2015; 19:632-47. [PMID: 26406536 DOI: 10.1089/omi.2015.0097] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Wheat is a staple food worldwide and provides 40% of the calories in the diet. Climate change and global warming pose a threat to wheat production, however, and demand a deeper understanding of how heat stress might impact wheat production and wheat biology. However, it is difficult to identify novel heat stress associated genes when the genomic information is not available. Wheat has a very large and complex genome that is about 37 times the size of the rice genome. The present study sequenced the whole transcriptome of the wheat cv. HD2329 at the flowering stage, under control (22°±3°C) and heat stress (42°C, 2 h) conditions using Illumina HiSeq and Roche GS-FLX 454 platforms. We assembled more than 26.3 and 25.6 million high-quality reads from the control and HS-treated tissues transcriptome sequences respectively. About 76,556 (control) and 54,033 (HS-treated) contigs were assembled and annotated de novo using different assemblers and a total of 21,529 unigenes were obtained. Gene expression profile showed significant differential expression of 1525 transcripts under heat stress, of which 27 transcripts showed very high (>10) fold upregulation. Cellular processes such as metabolic processes, protein phosphorylation, oxidations-reductions, among others were highly influenced by heat stress. In summary, these observations significantly enrich the transcript dataset of wheat available on public domain and show a de novo approach to discover the heat-responsive transcripts of wheat, which can accelerate the progress of wheat stress-genomics as well as the course of wheat breeding programs in the era of climate change.
Collapse
Affiliation(s)
- Ranjeet R Kumar
- 1 Division of Biochemistry, Indian Agricultural Research Institute , New Delhi, India
| | - Suneha Goswami
- 1 Division of Biochemistry, Indian Agricultural Research Institute , New Delhi, India
| | - Sushil K Sharma
- 1 Division of Biochemistry, Indian Agricultural Research Institute , New Delhi, India
| | - Yugal K Kala
- 2 Division of Genetics, Indian Agricultural Research Institute , New Delhi, India
| | - Gyanendra K Rai
- 3 Sher-e-Kashmir University of Agricultural Sciences and Technology , Jammu, India
| | - Dwijesh C Mishra
- 4 Centre for Agricultural Bio-Informatics (CAB-in), Indian Agricultural Statistics Research Institute (IASRI) , New Delhi, India
| | - Monendra Grover
- 4 Centre for Agricultural Bio-Informatics (CAB-in), Indian Agricultural Statistics Research Institute (IASRI) , New Delhi, India
| | | | - Himanshu Pathak
- 6 Division of CESCRA, Indian Agricultural Research Institute , New Delhi, India
| | - Anil Rai
- 4 Centre for Agricultural Bio-Informatics (CAB-in), Indian Agricultural Statistics Research Institute (IASRI) , New Delhi, India
| | - Viswanathan Chinnusamy
- 7 Division of Plant Physiology, Indian Agricultural Research Institute , New Delhi, India
| | - Raj D Rai
- 1 Division of Biochemistry, Indian Agricultural Research Institute , New Delhi, India
| |
Collapse
|
24
|
Abstract
Sequence alignment remains a fundamental task in bioinformatics. The literature contains programs that employ a host of exact and heuristic strategies available in computer science. Probcons was the first program to construct maximum expected accuracy sequence alignments with hidden Markov models and at the time of its publication achieved the highest accuracies on standard protein multiple alignment benchmarks. Probalign followed this strategy except that it used a partition function approach instead of hidden Markov models. Several programs employing both strategies have been published since then. In this chapter we describe Probcons and Probalign.
Collapse
Affiliation(s)
- Usman Roshan
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| |
Collapse
|
25
|
Kaznadzey A, Alexandrova N, Novichkov V, Kaznadzey D. PSimScan: algorithm and utility for fast protein similarity search. PLoS One 2013; 8:e58505. [PMID: 23505522 PMCID: PMC3591303 DOI: 10.1371/journal.pone.0058505] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2012] [Accepted: 02/07/2013] [Indexed: 01/19/2023] Open
Abstract
In the era of metagenomics and diagnostics sequencing, the importance of protein comparison methods of boosted performance cannot be overstated. Here we present PSimScan (Protein Similarity Scanner), a flexible open source protein similarity search tool which provides a significant gain in speed compared to BLASTP at the price of controlled sensitivity loss. The PSimScan algorithm introduces a number of novel performance optimization methods that can be further used by the community to improve the speed and lower hardware requirements of bioinformatics software. The optimization starts at the lookup table construction, then the initial lookup table–based hits are passed through a pipeline of filtering and aggregation routines of increasing computational complexity. The first step in this pipeline is a novel algorithm that builds and selects ‘similarity zones’ aggregated from neighboring matches on small arrays of adjacent diagonals. PSimScan performs 5 to 100 times faster than the standard NCBI BLASTP, depending on chosen parameters, and runs on commodity hardware. Its sensitivity and selectivity at the slowest settings are comparable to the NCBI BLASTP’s and decrease with the increase of speed, yet stay at the levels reasonable for many tasks. PSimScan is most advantageous when used on large collections of query sequences. Comparing the entire proteome of Streptocuccus pneumoniae (2,042 proteins) to the NCBI’s non-redundant protein database of 16,971,855 records takes 6.5 hours on a moderately powerful PC, while the same task with the NCBI BLASTP takes over 66 hours. We describe innovations in the PSimScan algorithm in considerable detail to encourage bioinformaticians to improve on the tool and to use the innovations in their own software development.
Collapse
Affiliation(s)
- Anna Kaznadzey
- Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Natalia Alexandrova
- Genome Designs, Inc., Walnut Creek, California, United States of America
- * E-mail:
| | | | - Denis Kaznadzey
- DOE Joint Genome Institute, Walnut Creek, California, United States of America
| |
Collapse
|
26
|
Boratyn GM, Schäffer AA, Agarwala R, Altschul SF, Lipman DJ, Madden TL. Domain enhanced lookup time accelerated BLAST. Biol Direct 2012; 7:12. [PMID: 22510480 PMCID: PMC3438057 DOI: 10.1186/1745-6150-7-12] [Citation(s) in RCA: 555] [Impact Index Per Article: 46.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2011] [Accepted: 04/17/2012] [Indexed: 11/10/2022] Open
Abstract
Background BLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific score matrix (PSSM) for searching the database in round i + 1. Biegert and Söding developed Context-sensitive BLAST (CS-BLAST), which combines information from searching the sequence database with information derived from a library of short protein profiles to achieve better homology detection than PSI-BLAST, which builds its PSSMs from scratch. Results We describe a new method, called domain enhanced lookup time accelerated BLAST (DELTA-BLAST), which searches a database of pre-constructed PSSMs before searching a protein-sequence database, to yield better homology detection. For its PSSMs, DELTA-BLAST employs a subset of NCBI’s Conserved Domain Database (CDD). On a test set derived from ASTRAL, with one round of searching, DELTA-BLAST achieves a ROC5000 of 0.270 vs. 0.116 for CS-BLAST. The performance advantage diminishes in iterated searches, but DELTA-BLAST continues to achieve better ROC scores than CS-BLAST. Conclusions DELTA-BLAST is a useful program for the detection of remote protein homologs. It is available under the “Protein BLAST” link at http://blast.ncbi.nlm.nih.gov. Reviewers This article was reviewed by Arcady Mushegian, Nick V. Grishin, and Frank Eisenhaber.
Collapse
Affiliation(s)
- Grzegorz M Boratyn
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | | | | | | | | | | |
Collapse
|
27
|
Hiremath PJ, Farmer A, Cannon SB, Woodward J, Kudapa H, Tuteja R, Kumar A, BhanuPrakash A, Mulaosmanovic B, Gujaria N, Krishnamurthy L, Gaur PM, KaviKishor PB, Shah T, Srinivasan R, Lohse M, Xiao Y, Town CD, Cook DR, May GD, Varshney RK. Large-scale transcriptome analysis in chickpea (Cicer arietinum L.), an orphan legume crop of the semi-arid tropics of Asia and Africa. PLANT BIOTECHNOLOGY JOURNAL 2011; 9:922-31. [PMID: 21615673 PMCID: PMC3437486 DOI: 10.1111/j.1467-7652.2011.00625.x] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Chickpea (Cicer arietinum L.) is an important legume crop in the semi-arid regions of Asia and Africa. Gains in crop productivity have been low however, particularly because of biotic and abiotic stresses. To help enhance crop productivity using molecular breeding techniques, next generation sequencing technologies such as Roche/454 and Illumina/Solexa were used to determine the sequence of most gene transcripts and to identify drought-responsive genes and gene-based molecular markers. A total of 103,215 tentative unique sequences (TUSs) have been produced from 435,018 Roche/454 reads and 21,491 Sanger expressed sequence tags (ESTs). Putative functions were determined for 49,437 (47.8%) of the TUSs, and gene ontology assignments were determined for 20,634 (41.7%) of the TUSs. Comparison of the chickpea TUSs with the Medicago truncatula genome assembly (Mt 3.5.1 build) resulted in 42,141 aligned TUSs with putative gene structures (including 39,281 predicted intron/splice junctions). Alignment of ∼37 million Illumina/Solexa tags generated from drought-challenged root tissues of two chickpea genotypes against the TUSs identified 44,639 differentially expressed TUSs. The TUSs were also used to identify a diverse set of markers, including 728 simple sequence repeats (SSRs), 495 single nucleotide polymorphisms (SNPs), 387 conserved orthologous sequence (COS) markers, and 2088 intron-spanning region (ISR) markers. This resource will be useful for basic and applied research for genome analysis and crop improvement in chickpea.
Collapse
Affiliation(s)
- Pavana J Hiremath
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Patancheru, India
- Osmania University (OU)Hyderabad, India
| | - Andrew Farmer
- National Centre for Genome Resources (NCGR)Santa Fe, NM, USA
| | - Steven B Cannon
- United States Department of Agriculture-Agricultural Research Service, Corn Insects and Crop Genetics Research Unit (USDA-ARS-CICGRU)Ames, IA, USA
- Department of Agronomy, Iowa State UniversityAmes, IA, USA
| | - Jimmy Woodward
- National Centre for Genome Resources (NCGR)Santa Fe, NM, USA
| | - Himabindu Kudapa
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Patancheru, India
| | - Reetu Tuteja
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Patancheru, India
| | - Ashish Kumar
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Patancheru, India
| | - Amindala BhanuPrakash
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Patancheru, India
| | | | - Neha Gujaria
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Patancheru, India
| | - Laxmanan Krishnamurthy
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Patancheru, India
| | - Pooran M Gaur
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Patancheru, India
| | | | - Trushar Shah
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Patancheru, India
| | - Ramamurthy Srinivasan
- National Research Centre on Plant Biotechnology (NRCPB), IARI CampusNew Delhi, India
| | - Marc Lohse
- Max Planck Institute for Molecular Plant Physiology (MPIMPP)Am Muehlenberg, Potsdam-Golm, Germany
| | - Yongli Xiao
- J. Craig Venter Institute (JCVI)Rockville, MD, USA
| | | | | | - Gregory D May
- National Centre for Genome Resources (NCGR)Santa Fe, NM, USA
| | - Rajeev K Varshney
- International Crops Research Institute for the Semi-Arid Tropics (ICRISAT)Patancheru, India
- Generation Challenge Program (GCP)c/o CIMMYT, Mexico DF, Mexico
- *Correspondence (Tel +91 40 30713305; fax +91 40 30713074/3075; email )
| |
Collapse
|
28
|
Zouine M, Sculo Q, Labedan B. Correct assignment of homology is crucial when genomics meets molecular evolution. Comp Funct Genomics 2010; 3:488-93. [PMID: 18629254 PMCID: PMC2448416 DOI: 10.1002/cfg.214] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2002] [Accepted: 10/07/2002] [Indexed: 11/30/2022] Open
Abstract
Pertinent evolutionary studies are based on a correct use of homology terms such as paralogues, metalogues and orthologues. Such crucial concepts have been applied to
intragenomic and intergenomic analyses. A further requisite is a proper definition of
what is a structural segment of homology. Such segments are called modules to reflect
that they play a role in the mechanism of combinational construction of a gene from
ready-made basic components. Since identifying a module is operationally equivalent
to determining the ancestor to this gene segment, it becomes possible to track back
protein history and genome evolution. Such studies underline the importance of two
fundamental processes, gene duplication and gene fusion. Moreover, grouping the
closest orthologues in families is a pertinent way to reconstruct a genomic tree for all
available prokaryotes.
Collapse
Affiliation(s)
- Mohamed Zouine
- Evolution Moleculaire et Genomique, Institut de Genetique et Microbiologie, CNRS UMR 8621, Universite Paris-Sud, Batiment 409, 91405 Orsay Cedex, France
| | | | | |
Collapse
|
29
|
Pierri CL, Parisi G, Porcelli V. Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2010; 1804:1695-712. [PMID: 20433957 DOI: 10.1016/j.bbapap.2010.04.008] [Citation(s) in RCA: 59] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2010] [Revised: 03/04/2010] [Accepted: 04/14/2010] [Indexed: 12/12/2022]
Abstract
The functional characterization of proteins represents a daily challenge for biochemical, medical and computational sciences. Although finally proved on the bench, the function of a protein can be successfully predicted by computational approaches that drive the further experimental assays. Current methods for comparative modeling allow the construction of accurate 3D models for proteins of unknown structure, provided that a crystal structure of a homologous protein is available. Binding regions can be proposed by using binding site predictors, data inferred from homologous crystal structures, and data provided from a careful interpretation of the multiple sequence alignment of the investigated protein and its homologs. Once the location of a binding site has been proposed, chemical ligands that have a high likelihood of binding can be identified by using ligand docking and structure-based virtual screening of chemical libraries. Most docking algorithms allow building a list sorted by energy of the lowest energy docking configuration for each ligand of the library. In this review the state-of-the-art of computational approaches in 3D protein comparative modeling and in the study of protein-ligand interactions is provided. Furthermore a possible combined/concerted multistep strategy for protein function prediction, based on multiple sequence alignment, comparative modeling, binding region prediction, and structure-based virtual screening of chemical libraries, is described by using suitable examples. As practical examples, Abl-kinase molecular modeling studies, HPV-E6 protein multiple sequence alignment analysis, and some other model docking-based characterization reports are briefly described to highlight the importance of computational approaches in protein function prediction.
Collapse
Affiliation(s)
- Ciro Leonardo Pierri
- Department of Pharmaco-Biology, Laboratory of Biochemistry and Molecular Biology, University of Bari, Va E. Orabona, 4 - 70125 Bari, Italy.
| | | | | |
Collapse
|
30
|
Libants S, Carr K, Wu H, Teeter JH, Chung-Davidson YW, Zhang Z, Wilkerson C, Li W. The sea lamprey Petromyzon marinus genome reveals the early origin of several chemosensory receptor families in the vertebrate lineage. BMC Evol Biol 2009; 9:180. [PMID: 19646260 PMCID: PMC2728731 DOI: 10.1186/1471-2148-9-180] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2008] [Accepted: 07/31/2009] [Indexed: 01/26/2023] Open
Abstract
Background In gnathostomes, chemosensory receptors (CR) expressed in olfactory epithelia are encoded by evolutionarily dynamic gene families encoding odorant receptors (OR), trace amine-associated receptors (TAAR), V1Rs and V2Rs. A limited number of OR-like sequences have been found in invertebrate chordate genomes. Whether these gene families arose in basal or advanced vertebrates has not been resolved because these families have not been examined systematically in agnathan genomes. Results Petromyzon is the only extant jawless vertebrate whose genome has been sequenced. Known to be exquisitely sensitive to several classes of odorants, lampreys detect fewer amino acids and steroids than teleosts. This reduced number of detectable odorants is indicative of reduced numbers of CR gene families or a reduced number of genes within CR families, or both, in the sea lamprey. In the lamprey genome we identified a repertoire of 59 intact single-exon CR genes, including 27 OR, 28 TAAR, and four V1R-like genes. These three CR families were expressed in the olfactory organ of both parasitic and adult life stages. Conclusion An extensive search in the lamprey genome failed to identify potential orthologs or pseudogenes of the multi-exon V2R family that is greatly expanded in teleost genomes, but did find intact calcium-sensing receptors (CASR) and intact metabotropic glutamate receptors (MGR). We conclude that OR and V1R arose in chordates after the cephalochordate-urochordate split, but before the diversification of jawed and jawless vertebrates. The advent and diversification of V2R genes from glutamate receptor-family G protein-coupled receptors, most likely the CASR, occurred after the agnathan-gnathostome divergence.
Collapse
Affiliation(s)
- Scot Libants
- Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824, USA.
| | | | | | | | | | | | | | | |
Collapse
|
31
|
Schmitt A, Schuchhardt J, Brockmann GA. The action of key factors in protein evolution at high temporal resolution. PLoS One 2009; 4:e4821. [PMID: 19279682 PMCID: PMC2652826 DOI: 10.1371/journal.pone.0004821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2008] [Accepted: 02/05/2009] [Indexed: 11/18/2022] Open
Abstract
Background Protein evolution is particularly shaped by the conservation of the amino acids' physico-chemical properties and the structure of the genetic code. While conservation is the result of negative selection against proteins with reduced functionality, the codon sequences determine the stochastic aspect of amino acid exchanges. Thus far, it is known that the genetic code is the dominant factor if little time has elapsed since the divergence of one gene into two, but physico-chemical forces gain importance at greater evolutionary distances. Further details, however, on how the influence of these factors varies with time are unknown to date. Methodology/Principal Findings Here, we derive each 10,000 divergence specific substitution matrices for orthologues and paralogues from the Pfam collection of multiple protein alignments and quantify the action of three physico-chemical forces and of the structure of the genetic code at high resolution using correlation analysis. For closely related proteins, the codon sequence similarity is the most influential factor controlling protein evolution, but its influence decreases rapidly as divergence grows. From a protein sequence divergence of about 20 percent on the maintenance of the hydrophobic character of an amino acid is the most influential factor. All factors lose importance from about 40 percent divergence on. This suggests that the original protein structure often does no longer represent a constraint to the protein sequence. The proteins then become free to adopt new functions. We furthermore show that the constraints exerted by both physico-chemical forces and by the genetic code are quite comparable for orthologues and paralogues, however somewhat weaker for paralogues than for orthologues in weakly or moderately diverged proteins. Conclusion/Significance Our analysis substantiates earlier findings that protein evolution is mainly governed by the structure of the genetic code in the early phase after divergence and by the conservation of physico-chemical properties at the later phase. We determine the level of sequence divergence from which on the conservation of the hydrophobic character is gaining importance over the genetic code to be 20 percent. The evolution of orthologues and paralogues is shaped by evolutionary forces in quite comparable ways.
Collapse
Affiliation(s)
- Armin Schmitt
- Institute for Animal Sciences, Humboldt-Universität zu Berlin, Berlin, Germany.
| | | | | |
Collapse
|
32
|
Abstract
INTRODUCTIONCertain amino acid substitutions commonly occur in related proteins from different species. Because a protein still functions with these substitutions, the substituted amino acids are compatible with protein structure and function. Knowing the types of changes that are most and least common in a large number of proteins can assist with predicting alignments for any set of protein sequences. If related protein sequences are quite similar, they are easy to align, and one can readily determine the single-step amino acid changes. If ancestor relationships among a group of proteins are assessed, the most likely amino acid changes that occurred during evolution can be predicted. This type of analysis was pioneered by Margaret Dayhoff and used by her to produce a type of scoring matrix called a percent accepted mutation (PAM) matrix. This article introduces Dayhoff PAM matrices, explains how they are constructed and how they can be used for sequence alignments, and highlights their strengths and limitations.
Collapse
|
33
|
Abstract
A widely used algorithm for computing an optimal local alignment between two sequences requires a parameter set with a substitution matrix and gap penalties. It is recognized that a proper parameter set should be selected to suit the level of conservation between sequences. We describe an algorithm for selecting an appropriate substitution matrix at given gap penalties for computing an optimal local alignment between two sequences. In the algorithm, a substitution matrix that leads to the maximum alignment similarity score is selected among substitution matrices at various evolutionary distances. The evolutionary distance of the selected substitution matrix is defined as the distance of the computed alignment. To show the effects of gap penalties on alignments and their distances and help select appropriate gap penalties, alignments and their distances are computed at various gap penalties. The algorithm has been implemented as a computer program named SimDist. The SimDist program was compared with an existing local alignment program named SIM for finding reciprocally best-matching pairs (RBPs) of sequences in each of 100 protein families, where RBPs are commonly used as an operational definition of orthologous sequences. SimDist produced more accurate results than SIM on 50 of the 100 families, whereas both programs produced the same results on the other 50 families. SimDist was also used to compare three types of substitution matrices in scoring 444,461 pairs of homologous sequences from the 100 families.
Collapse
Affiliation(s)
- Xiaoqiu Huang
- Department of Computer Science, Iowa State University, Ames, Iowa 50011-1040, USA.
| |
Collapse
|
34
|
Abstract
DNA and amino acid sequences contain information about both the phylogenetic relationships among species and the evolutionary processes that caused the sequences to divergence. Mathematical and statistical methods try to detect this information to determine how and why DNA and protein molecules work the way they do. This chapter describes some of the models of evolution of biological sequences most widely used. It first focuses on single nucleotide/amino acid replacement rate models. Then it discusses the modelling of evolution at gene and protein module levels. The chapter concludes with speculations about the future use of molecular evolution studies using genomic and proteomic data.
Collapse
Affiliation(s)
- Pietro Liò
- Computer Laboratory, University of Cambridge, Cambridge, UK
| | | |
Collapse
|
35
|
Hu Y, Phelan V, Ntai I, Farnet CM, Zazopoulos E, Bachmann BO. Benzodiazepine biosynthesis in Streptomyces refuineus. ACTA ACUST UNITED AC 2007; 14:691-701. [PMID: 17584616 DOI: 10.1016/j.chembiol.2007.05.009] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2007] [Revised: 05/15/2007] [Accepted: 05/18/2007] [Indexed: 11/30/2022]
Abstract
Anthramycin is a benzodiazepine alkaloid with potent antitumor and antibiotic activity produced by the thermophilic actinomycete Streptomyces refuineus sbsp. thermotolerans. In this study, the complete 32.5 kb gene cluster for the biosynthesis of anthramycin was identified by using a genome-scanning approach, and cluster boundaries were estimated via comparative genomics. A lambda-RED-mediated gene-replacement system was developed to provide supporting evidence for critical biosynthetic genes and to validate the boundaries of the proposed anthramycin gene cluster. Sequence analysis reveals that the 25 open reading frame anthramycin cluster contains genes consistent with the biosynthesis of the two halves of anthramycin: 4 methyl-3-hydroxyanthranilic acid and a "dehydroproline acrylamide" moiety. These nonproteinogenic amino acid precursors are condensed by a two-module nonribosomal peptide synthetase (NRPS) terminated by a reductase domain, consistent with the final hemiaminal oxidation state of anthramycin.
Collapse
Affiliation(s)
- Yunfeng Hu
- Department of Chemistry, Vanderbilt University, Nashville, TN 37204, USA
| | | | | | | | | | | |
Collapse
|
36
|
Fabris F, Sgarro A, Tossi A. Splitting the BLOSUM score into numbers of biological significance. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2007; 2007:31450. [PMID: 18369412 PMCID: PMC3171334 DOI: 10.1155/2007/31450] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Accepted: 03/30/2007] [Indexed: 03/31/2024]
Abstract
Mathematical tools developed in the context of Shannon information theory were used to analyze the meaning of the BLOSUM score, which was split into three components termed as the BLOSUM spectrum (or BLOSpectrum). These relate respectively to the sequence convergence (the stochastic similarity of the two protein sequences), to the background frequency divergence (typicality of the amino acid probability distribution in each sequence), and to the target frequency divergence (compliance of the amino acid variations between the two sequences to the protein model implicit in the BLOCKS database). This treatment sharpens the protein sequence comparison, providing a rationale for the biological significance of the obtained score, and helps to identify weakly related sequences. Moreover, the BLOSpectrum can guide the choice of the most appropriate scoring matrix, tailoring it to the evolutionary divergence associated with the two sequences, or indicate if a compositionally adjusted matrix could perform better.
Collapse
Affiliation(s)
- Francesco Fabris
- Dipartimento di Matematica e Informatica, Università degli Studi di Trieste, via Valerio 12b, Trieste 34127, Italy
- Centro di Biomedicina Molecolare, AREA Science Park, Strada Statale 14, Basovizza, Trieste 34012, Italy
| | - Andrea Sgarro
- Dipartimento di Matematica e Informatica, Università degli Studi di Trieste, via Valerio 12b, Trieste 34127, Italy
- Centro di Biomedicina Molecolare, AREA Science Park, Strada Statale 14, Basovizza, Trieste 34012, Italy
| | - Alessandro Tossi
- Dipartimento di Biochimica, Biofisica, e Chimica delle Macromolecole, Università degli Studi di Trieste, via Licio Giorgieri 1, Trieste 34127, Italy
| |
Collapse
|
37
|
Abstract
The level of conservation between two homologous sequences often varies among sequence regions; functionally important domains are more conserved than the remaining regions. Thus, multiple parameter sets should be used in alignment of homologous sequences with a stringent parameter set for highly conserved regions and a moderate parameter set for weakly conserved regions. We describe an alignment algorithm to allow dynamic use of multiple parameter sets with different levels of stringency in computation of an optimal alignment of two sequences. The algorithm dynamically considers various candidate alignments, partitions each candidate alignment into sections, and determines the most appropriate set of parameter values for each section of the alignment. The algorithm and its local alignment version are implemented in a computer program named GAP4. The local alignment algorithm in GAP4, that in its predecessor GAP3, and an ordinary local alignment program SIM were evaluated on 257 716 pairs of homologous sequences from 100 protein families. On 168 475 of the 257 716 pairs (a rate of 65.4%), alignments from GAP4 were more statistically significant than alignments from GAP3 and SIM.
Collapse
Affiliation(s)
- Xiaoqiu Huang
- Department of Computer Science, Iowa State University, Ames, IA 50011-1040, USA.
| | | |
Collapse
|
38
|
Roshan U, Livesay DR. Probalign: multiple sequence alignment using partition function posterior probabilities. Bioinformatics 2006; 22:2715-21. [PMID: 16954142 DOI: 10.1093/bioinformatics/btl472] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The maximum expected accuracy optimization criterion for multiple sequence alignment uses pairwise posterior probabilities of residues to align sequences. The partition function methodology is one way of estimating these probabilities. Here, we combine these two ideas for the first time to construct maximal expected accuracy sequence alignments. RESULTS We bridge the two techniques within the program Probalign. Our results indicate that Probalign alignments are generally more accurate than other leading multiple sequence alignment methods (i.e. Probcons, MAFFT and MUSCLE) on the BAliBASE 3.0 protein alignment benchmark. Similarly, Probalign also outperforms these methods on the HOMSTRAD and OXBENCH benchmarks. Probalign ranks statistically highest (P-value < 0.005) on all three benchmarks. Deeper scrutiny of the technique indicates that the improvements are largest on datasets containing N/C-terminal extensions and on datasets containing long and heterogeneous length proteins. These points are demonstrated on both real and simulated data. Finally, our method also produces accurate alignments on long and heterogeneous length datasets containing protein repeats. Here, alignment accuracy scores are at least 10% and 15% higher than the other three methods when standard deviation of length is >300 and 400, respectively. AVAILABILITY Open source code implementing Probalign as well as for producing the simulated data, and all real and simulated data are freely available from http://www.cs.njit.edu/usman/probalign
Collapse
Affiliation(s)
- Usman Roshan
- Department of Computer Science, New Jersey Institute of Technology GITC 4400, University Heights, NJ 07102, USA.
| | | |
Collapse
|
39
|
Li J, Wang W. Detailed assessment of homology detection using different substitution matrices. CHINESE SCIENCE BULLETIN-CHINESE 2006. [DOI: 10.1007/s11434-006-1538-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
40
|
Matsui T, Saeki H, Shinzato N, Matsuda H. Characterization of Rhodococcus-E. coli shuttle vector pNC9501 constructed from the cryptic plasmid of a propene-degrading bacterium. Curr Microbiol 2006; 52:445-8. [PMID: 16732453 DOI: 10.1007/s00284-005-0237-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2005] [Accepted: 11/14/2005] [Indexed: 11/24/2022]
Abstract
Rhodococcus-E. coli shuttle vector pNC9501 was constructed using circular cryptic plasmid pNC903 from propene-degrading Rhodococcus ruber P-II-123-1. Sequence analysis of pNC903 revealed two open-reading frames encoding the replication proteins Reps A and B. In the amino acid sequence of the putative Rep B, a helix-turn-helix motif, which is responsible for the binding of DNA, was found. Sequencing of the upstream region of the putative Rep A and incompatibility tests revealed that pNC903 is a Mycobacterium-derived pAL5000-related plasmid. pNC9501 could also be transformed into Mycobacterium sp. showing good segregation stability (<0.1% plasmid loss/generation) in the absence of selective pressure.
Collapse
Affiliation(s)
- Toru Matsui
- Center of Molecular Biosciences, University of the Ryukyus, 1 Sembaru, Okinawa 903-0213, Japan.
| | | | | | | |
Collapse
|
41
|
Reddy DA, Prasad BVLS, Mitra CK. Comparative analysis of core promoter region: information content from mono and dinucleotide substitution matrices. Comput Biol Chem 2006; 30:58-62. [PMID: 16321573 DOI: 10.1016/j.compbiolchem.2005.10.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2005] [Revised: 10/04/2005] [Accepted: 10/04/2005] [Indexed: 10/25/2022]
Abstract
We have studied the core promoter region in five sets of promoter sequences by calculating the average mutual information content H (relative entropy). We have used specially constructed substitution matrices to calculate mono and dinucleotide replacements in a given block of aligned sequences. These substitution matrices use log-odds form of scores, which are in bits of information. Here, we constructed and applied nucleotide substitution matrices for the core promoter region to calculate the information content to study the Transcription Start Site (TSS), TATA-box and downstream regions. As expected, the information content decreases with increasing block size. This clearly implies that the TSS region is likely to be 5-10 bases in size (length). We also notice that both in the case of mouse and humans, both TATA-boxes and TSS regions are likely to play important roles in proper transcriptional initiation.
Collapse
Affiliation(s)
- D Ashok Reddy
- Department of Biochemistry, University of Hyderabad, Hyderabad 500046, India
| | | | | |
Collapse
|
42
|
Altschul SF, Wootton JC, Gertz EM, Agarwala R, Morgulis A, Schäffer AA, Yu YK. Protein database searches using compositionally adjusted substitution matrices. FEBS J 2005; 272:5101-9. [PMID: 16218944 PMCID: PMC1343503 DOI: 10.1111/j.1742-4658.2005.04945.x] [Citation(s) in RCA: 740] [Impact Index Per Article: 38.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long-standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions. Here we review the application of compositionally adjusted matrices and consider whether they may also be applied fruitfully to general purpose protein sequence database searches, in which related sequence pairs do not necessarily have strong compositional biases. Although it is not advisable to apply compositional adjustment indiscriminately, we describe several simple criteria under which invoking such adjustment is on average beneficial. In a typical database search, at least one of these criteria is satisfied by over half the related sequence pairs. Compositional substitution matrix adjustment is now available in NCBI's protein-protein version of blast.
Collapse
Affiliation(s)
- Stephen F Altschul
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | | | | | | | | | |
Collapse
|
43
|
Frimurer TM, Ulven T, Elling CE, Gerlach LO, Kostenis E, Högberg T. A physicogenetic method to assign ligand-binding relationships between 7TM receptors. Bioorg Med Chem Lett 2005; 15:3707-12. [PMID: 15993056 DOI: 10.1016/j.bmcl.2005.05.102] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2005] [Revised: 05/09/2005] [Accepted: 05/26/2005] [Indexed: 11/29/2022]
Abstract
A computational protocol has been devised to relate 7TM receptor proteins (GPCRs) with respect to physicochemical features of the core ligand-binding site as defined from the crystal structure of bovine rhodopsin. The identification of such receptors that already are associated with ligand information (e.g., small molecule ligands with mutagenesis or SAR data) is used to support structure-guided drug design of novel ligands. A case targeting the newly identified prostaglandin D2 receptor CRTH2 serves as a primary example to illustrate the procedure.
Collapse
MESH Headings
- Animals
- Benzimidazoles/chemistry
- Benzimidazoles/pharmacology
- Binding Sites/physiology
- Binding, Competitive/drug effects
- Biphenyl Compounds
- Cattle
- Computer Simulation
- Drug Design
- Hydrocarbons, Aromatic/pharmacology
- Indomethacin/analogs & derivatives
- Indomethacin/chemistry
- Indomethacin/pharmacology
- Ligands
- Models, Biological
- Molecular Structure
- Receptors, G-Protein-Coupled/antagonists & inhibitors
- Receptors, G-Protein-Coupled/classification
- Receptors, G-Protein-Coupled/metabolism
- Receptors, Immunologic/antagonists & inhibitors
- Receptors, Immunologic/metabolism
- Receptors, Prostaglandin/antagonists & inhibitors
- Receptors, Prostaglandin/metabolism
- Rhodopsin/chemistry
- Structure-Activity Relationship
- Tetrazoles/chemistry
- Tetrazoles/pharmacology
Collapse
|
44
|
Mazumder R, Natale DA, Murthy S, Thiagarajan R, Wu CH. Computational identification of strain-, species- and genus-specific proteins. BMC Bioinformatics 2005; 6:279. [PMID: 16305751 PMCID: PMC1310627 DOI: 10.1186/1471-2105-6-279] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2005] [Accepted: 11/23/2005] [Indexed: 11/14/2022] Open
Abstract
Background The identification of unique proteins at different taxonomic levels has both scientific and practical value. Strain-, species- and genus-specific proteins can provide insight into the criteria that define an organism and its relationship with close relatives. Such proteins can also serve as taxon-specific diagnostic targets. Description A pipeline using a combination of computational and manual analyses of BLAST results was developed to identify strain-, species-, and genus-specific proteins and to catalog the closest sequenced relative for each protein in a proteome. Proteins encoded by a given strain are preliminarily considered to be unique if BLAST, using a comprehensive protein database, fails to retrieve (with an e-value better than 0.001) any protein not encoded by the query strain, species or genus (for strain-, species- and genus-specific proteins respectively), or if BLAST, using the best hit as the query (reverse BLAST), does not retrieve the initial query protein. Results are manually inspected for homology if the initial query is retrieved in the reverse BLAST but is not the best hit. Sequences unlikely to retrieve homologs using the default BLOSUM62 matrix (usually short sequences) are re-tested using the PAM30 matrix, thereby increasing the number of retrieved homologs and increasing the stringency of the search for unique proteins. The above protocol was used to examine several food- and water-borne pathogens. We find that the reverse BLAST step filters out about 22% of proteins with homologs that would otherwise be considered unique at the genus and species levels. Analysis of the annotations of unique proteins reveals that many are remnants of prophage proteins, or may be involved in virulence. The data generated from this study can be accessed and further evaluated from the CUPID (Core and Unique Protein Identification) system web site (updated semi-annually) at . Conclusion CUPID provides a set of proteins specific to a genus, species or a strain, and identifies the most closely related organism.
Collapse
Affiliation(s)
- Raja Mazumder
- Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, 3900 Reservoir Rd., NW, Washington, DC 20057-1414, USA
| | - Darren A Natale
- Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, 3900 Reservoir Rd., NW, Washington, DC 20057-1414, USA
| | - Sudhir Murthy
- DCWASA-DWT, 5000 Overlook Ave., SW, Washington, DC 20032, USA
| | - Rathi Thiagarajan
- Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, 3900 Reservoir Rd., NW, Washington, DC 20057-1414, USA
| | - Cathy H Wu
- Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, 3900 Reservoir Rd., NW, Washington, DC 20057-1414, USA
| |
Collapse
|
45
|
Abstract
MOTIVATION Standard algorithms for pairwise protein sequence alignment make the simplifying assumption that amino acid substitutions at neighboring sites are uncorrelated. This assumption allows implementation of fast algorithms for pairwise sequence alignment, but it ignores information that could conceivably increase the power of remote homolog detection. We examine the validity of this assumption by constructing extended substitution matrices that encapsulate the observed correlations between neighboring sites, by developing an efficient and rigorous algorithm for pairwise protein sequence alignment that incorporates these local substitution correlations and by assessing the ability of this algorithm to detect remote homologies. RESULTS Our analysis indicates that local correlations between substitutions are not strong on the average. Furthermore, incorporating local substitution correlations into pairwise alignment did not lead to a statistically significant improvement in remote homology detection. Therefore, the standard assumption that individual residues within protein sequences evolve independently of neighboring positions appears to be an efficient and appropriate approximation.
Collapse
Affiliation(s)
- Gavin E Crooks
- Department of Plant and Microbial Biology 111 Koshland Hall #3102 University of California, Berkeley, CA 94720-3102, USA.
| | | | | |
Collapse
|
46
|
Price GA, Crooks GE, Green RE, Brenner SE. Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap. Bioinformatics 2005; 21:3824-31. [PMID: 16105900 DOI: 10.1093/bioinformatics/bti627] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Protein sequence comparison methods are routinely used to infer the intricate network of evolutionary relationships found within the rapidly growing library of protein sequences, and thereby to predict the structure and function of uncharacterized proteins. In the present study, we detail an improved statistical benchmark of pairwise protein sequence comparison algorithms. We use bootstrap resampling techniques to determine standard statistical errors and to estimate the confidence of our conclusions. We show that the underlying structure within benchmark databases causes Efron's standard, non-parametric bootstrap to be biased. Consequently, the standard bootstrap underpredicts average performance when used in the context of evaluating sequence comparison methods. We have developed, as an alternative, an unbiased statistical evaluation based on the Bayesian bootstrap, a resampling method operationally similar to the standard bootstrap. RESULTS We apply our analysis to the comparative study of amino acid substitution matrix families and find that using modern matrices results in a small, but statistically significant improvement in remote homology detection compared with the classic PAM and BLOSUM matrices. AVAILABILITY The sequence sets and code for performing these analyses are available from http://compbio.berkeley.edu/. CONTACT brenner@compbio.berkeley.edu.
Collapse
Affiliation(s)
- Gavin A Price
- Department of Bioengineering, University of California, Berkeley, 94720, USA
| | | | | | | |
Collapse
|
47
|
Fauquet CM, Sawyer S, Idris AM, Brown JK. Sequence analysis and classification of apparent recombinant begomoviruses infecting tomato in the nile and mediterranean basins. PHYTOPATHOLOGY 2005; 95:549-55. [PMID: 18943321 DOI: 10.1094/phyto-95-0549] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]
Abstract
ABSTRACT Numerous whitefly-transmitted viral diseases of tomato have emerged in countries around the Nile and Mediterranean Basins the last 20 years. These diseases are caused by monopartite geminiviruses (family Gemini viridae) belonging to the genus Begomovirus that probably resulted from numerous recombination events. The molecular biodiversity of these viruses was investigated to better appreciate the role and importance of recombination and to better clarify the phylogenetic relationships and classification of these viruses. The analysis partitioned the tomato-infecting begomoviruses from this region into two major clades, Tomato yellow leaf curl virus and Tomato yellow leaf curl Sardinia virus. Phylogenetic and pairwise analyses together with an evaluation for gene conversion were performed from which taxonomic classification and virus biodiversity conclusions were drawn. Six recombination hotspots and three homogeneous zones within the genome were identified among the tomatoinfecting isolates and species examined here, suggesting that the recombination events identified were not random occurrences.
Collapse
|
48
|
Abstract
MOTIVATION The observed correlations between pairs of homologous protein sequences are typically explained in terms of a Markovian dynamic of amino acid substitution. This model assumes that every location on the protein sequence has the same background distribution of amino acids, an assumption that is incompatible with the observed heterogeneity of protein amino acid profiles and with the success of profile multiple sequence alignment. RESULTS We propose an alternative model of amino acid replacement during protein evolution based upon the assumption that the variation of the amino acid background distribution from one residue to the next is sufficient to explain the observed sequence correlations of homologs. The resulting dynamical model of independent replacements drawn from heterogeneous backgrounds is simple and consistent, and provides a unified homology match score for sequence-sequence, sequence-profile and profile-profile alignment.
Collapse
Affiliation(s)
- Gavin E Crooks
- Department of Plant and Microbial Biology 111 Koshland Hall #3102 University of California Berkeley, CA 94720-3102, USA.
| | | |
Collapse
|
49
|
Yu YK, Altschul SF. The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions. Bioinformatics 2004; 21:902-11. [PMID: 15509610 DOI: 10.1093/bioinformatics/bti070] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Amino acid substitution matrices play a central role in protein alignment methods. Standard log-odds matrices, such as those of the PAM and BLOSUM series, are constructed from large sets of protein alignments having implicit background amino acid frequencies. However, these matrices frequently are used to compare proteins with markedly different amino acid compositions, such as transmembrane proteins or proteins from organisms with strongly biased nucleotide compositions. It has been argued elsewhere that standard matrices are not ideal for such comparisons and, furthermore, a rationale has been presented for transforming a standard matrix for use in a non-standard compositional context. RESULTS This paper presents the mathematical details underlying the compositional adjustment of amino acid or DNA substitution matrices.
Collapse
Affiliation(s)
- Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health Bethesda, MD 20894, USA
| | | |
Collapse
|
50
|
Yu YK, Wootton JC, Altschul SF. The compositional adjustment of amino acid substitution matrices. Proc Natl Acad Sci U S A 2003; 100:15688-93. [PMID: 14663142 PMCID: PMC307629 DOI: 10.1073/pnas.2533904100] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Amino acid substitution matrices are central to protein-comparison methods. In most commonly used matrices, the substitution scores take a log-odds form, involving the ratio of "target" to "background" frequencies derived from large, carefully curated sets of protein alignments. However, such matrices often are used to compare protein sequences with amino acid compositions that differ markedly from the background frequencies used for the construction of the matrices. Of course, the target frequencies should be adjusted in such cases, but the lack of an appropriate way to do this has been a long-standing problem. This article shows that if one demands consistency between target and background frequencies, then a log-odds substitution matrix implies a unique set of target and background frequencies as well as a unique scale. Standard substitution matrices therefore are truly appropriate only for the comparison of proteins with standard amino acid composition. Accordingly, we present and evaluate a rationale for transforming the target frequencies implicit in a standard matrix to frequencies appropriate for a nonstandard context. This rationale yields asymmetric matrices for the comparison of proteins with divergent compositions. Earlier approaches are unable to deal with this case in a fully consistent manner. Composition-specific substitution matrix adjustment is shown to be of utility for comparing compositionally biased proteins, including those of organisms with nucleotide-biased, and therefore codon-biased, genomes or isochores.
Collapse
Affiliation(s)
- Yi-Kuo Yu
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | |
Collapse
|