1
|
Abstract
Amino acid mutations in proteins are random and those mutations which are beneficial or neutral survive during the course of evolution. Conservation or co-evolution analyses are performed on the multiple sequence alignment of homologous proteins to understand how important different amino acids or groups of them are. However, these traditional analyses do not explore the directed influence of amino acid mutations, such as compensatory effects. In this work we develop a method to capture the directed evolutionary impact of one amino acid on all other amino acids, and provide a visual network representation for it. The method developed for these directed networks of inter- and intra-protein evolutionary interactions can also be used for noting the differences in amino acid evolution between the control and experimental groups. The analysis is illustrated with a few examples, where the method identifies several directed interactions of functionally critical amino acids. The impact of an amino acid is quantified as the number of amino acids that are influenced as a consequence of its mutation, and it is intended to summarize the compensatory mutations in large evolutionary sequence data sets as well as to rationally identify targets for mutagenesis when their functional significance can not be assessed using structure or conservation.
Collapse
|
2
|
Lichtenstein F, Antoneli F, Briones MRS. MIA: Mutual Information Analyzer, a graphic user interface program that calculates entropy, vertical and horizontal mutual information of molecular sequence sets. BMC Bioinformatics 2015; 16:409. [PMID: 26652707 PMCID: PMC4676106 DOI: 10.1186/s12859-015-0837-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 12/02/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Short and long range correlations in biological sequences are central in genomic studies of covariation. These correlations can be studied using mutual information because it measures the amount of information one random variable contains about the other. Here we present MIA (Mutual Information Analyzer) a user friendly graphic interface pipeline that calculates spectra of vertical entropy (VH), vertical mutual information (VMI) and horizontal mutual information (HMI), since currently there is no user friendly integrated platform that in a single package perform all these calculations. MIA also calculates Jensen-Shannon Divergence (JSD) between pair of different species spectra, herein called informational distances. Thus, the resulting distance matrices can be presented by distance histograms and informational dendrograms, giving support to discrimination of closely related species. RESULTS In order to test MIA we analyzed sequences from Drosophila Adh locus, because the taxonomy and evolutionary patterns of different Drosophila species are well established and the gene Adh is extensively studied. The search retrieved 959 sequences of 291 species. From the total, 450 sequences of 17 species were selected. With this dataset MIA performed all tasks in less than three hours: gathering, storing and aligning fasta files; calculating VH, VMI and HMI spectra; and calculating JSD between pair of different species spectra. For each task MIA saved tables and graphics in the local disk, easily accessible for future analysis. CONCLUSIONS Our tests revealed that the "informational model free" spectra may represent species signatures. Since JSD applied to Horizontal Mutual Information spectra resulted in statistically significant distances between species, we could calculate respective hierarchical clusters, herein called Informational Dendrograms (ID). When compared to phylogenetic trees all Informational Dendrograms presented similar taxonomy and species clusterization.
Collapse
Affiliation(s)
- Flavio Lichtenstein
- Departamento de Informática em Saúde, Escola Paulista de Medicina, Universidade Federal de Sao Paulo, Rua Botucatu, 862, Ed. José Leal Prado, andar térreo, Vila Clementino, CEP 04023-062, Sao Paulo, SP, Brazil. .,Laboratory of Evolutionary Genomics and Biocomplexity, Escola Paulista de Medicina, Universidade Federal de São Paulo, Rua Pedro de Toledo, 669, 4 andar L4E, CEP 04039-032, São Paulo, SP, Brazil.
| | - Fernando Antoneli
- Departamento de Informática em Saúde, Escola Paulista de Medicina, Universidade Federal de Sao Paulo, Rua Botucatu, 862, Ed. José Leal Prado, andar térreo, Vila Clementino, CEP 04023-062, Sao Paulo, SP, Brazil. .,Laboratory of Evolutionary Genomics and Biocomplexity, Escola Paulista de Medicina, Universidade Federal de São Paulo, Rua Pedro de Toledo, 669, 4 andar L4E, CEP 04039-032, São Paulo, SP, Brazil.
| | - Marcelo R S Briones
- Departamento de Microbiologia, Immunologia and Parasitologia, Escola Paulista de Medicina, Universidade Federal de Sao Paulo, Rua Botucatu, 862, Ed. Ciências Biomédicas, 3 andar, Vila Clementino, CEP 04023-062, Sao Paulo, SP, Brazil. .,Laboratory of Evolutionary Genomics and Biocomplexity, Escola Paulista de Medicina, Universidade Federal de São Paulo, Rua Pedro de Toledo, 669, 4 andar L4E, CEP 04039-032, São Paulo, SP, Brazil.
| |
Collapse
|
3
|
Li G, Theys K, Verheyen J, Pineda-Peña AC, Khouri R, Piampongsant S, Eusébio M, Ramon J, Vandamme AM. A new ensemble coevolution system for detecting HIV-1 protein coevolution. Biol Direct 2015; 10:1. [PMID: 25564011 PMCID: PMC4332441 DOI: 10.1186/s13062-014-0031-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Accepted: 12/02/2014] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND A key challenge in the field of HIV-1 protein evolution is the identification of coevolving amino acids at the molecular level. In the past decades, many sequence-based methods have been designed to detect position-specific coevolution within and between different proteins. However, an ensemble coevolution system that integrates different methods to improve the detection of HIV-1 protein coevolution has not been developed. RESULTS We integrated 27 sequence-based prediction methods published between 2004 and 2013 into an ensemble coevolution system. This system allowed combinations of different sequence-based methods for coevolution predictions. Using HIV-1 protein structures and experimental data, we evaluated the performance of individual and combined sequence-based methods in the prediction of HIV-1 intra- and inter-protein coevolution. We showed that sequence-based methods clustered according to their methodology, and a combination of four methods outperformed any of the 27 individual methods. This four-method combination estimated that HIV-1 intra-protein coevolving positions were mainly located in functional domains and physically contacted with each other in the protein tertiary structures. In the analysis of HIV-1 inter-protein coevolving positions between Gag and protease, protease drug resistance positions near the active site mostly coevolved with Gag cleavage positions (V128, S373-T375, A431, F448-P453) and Gag C-terminal positions (S489-Q500) under selective pressure of protease inhibitors. CONCLUSIONS This study presents a new ensemble coevolution system which detects position-specific coevolution using combinations of 27 different sequence-based methods. Our findings highlight key coevolving residues within HIV-1 structural proteins and between Gag and protease, shedding light on HIV-1 intra- and inter-protein coevolution.
Collapse
Affiliation(s)
- Guangdi Li
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Kristof Theys
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Jens Verheyen
- Institute of Virology, University hospital, University Duisburg-Essen, Essen, Germany.
| | - Andrea-Clemencia Pineda-Peña
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium. .,Clinical and Molecular Infectious Disease Group, Faculty of Sciences and Mathematics, Universidad del Rosario, Bogotá, Colombia.
| | - Ricardo Khouri
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Supinya Piampongsant
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Mónica Eusébio
- Centro de Malária e Outras Doenças Tropicais and Unidade de Microbiologia, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisboa, Portugal.
| | - Jan Ramon
- Department of Computer Science, KU Leuven - University of Leuven, Leuven, Belgium.
| | - Anne-Mieke Vandamme
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium. .,Centro de Malária e Outras Doenças Tropicais and Unidade de Microbiologia, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisboa, Portugal.
| |
Collapse
|
4
|
Gültas M, Haubrock M, Tüysüz N, Waack S. Coupled mutation finder: a new entropy-based method quantifying phylogenetic noise for the detection of compensatory mutations. BMC Bioinformatics 2012; 13:225. [PMID: 22963049 PMCID: PMC3577461 DOI: 10.1186/1471-2105-13-225] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 08/23/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The detection of significant compensatory mutation signals in multiple sequence alignments (MSAs) is often complicated by noise. A challenging problem in bioinformatics is remains the separation of significant signals between two or more non-conserved residue sites from the phylogenetic noise and unrelated pair signals. Determination of these non-conserved residue sites is as important as the recognition of strictly conserved positions for understanding of the structural basis of protein functions and identification of functionally important residue regions. In this study, we developed a new method, the Coupled Mutation Finder (CMF) quantifying the phylogenetic noise for the detection of compensatory mutations. RESULTS To demonstrate the effectiveness of this method, we analyzed essential sites of two human proteins: epidermal growth factor receptor (EGFR) and glucokinase (GCK). Our results suggest that the CMF is able to separate significant compensatory mutation signals from the phylogenetic noise and unrelated pair signals. The vast majority of compensatory mutation sites found by the CMF are related to essential sites of both proteins and they are likely to affect protein stability or functionality. CONCLUSIONS The CMF is a new method, which includes an MSA-specific statistical model based on multiple testing procedures that quantify the error made in terms of the false discovery rate and a novel entropy-based metric to upscale BLOSUM62 dissimilar compensatory mutations. Therefore, it is a helpful tool to predict and investigate compensatory mutation sites of structural or functional importance in proteins. We suggest that the CMF could be used as a novel automated function prediction tool that is required for a better understanding of the structural basis of proteins. The CMF server is freely accessible at http://cmf.bioinf.med.uni-goettingen.de.
Collapse
Affiliation(s)
- Mehmet Gültas
- Institute of Computer Science, University of Göttingen, Goldschmidtstr. 7, Göttingen, 37077, Germany.
| | | | | | | |
Collapse
|
5
|
Wulfmeyer T, Polzer C, Hiepler G, Hamacher K, Shoeman R, Dunigan DD, Van Etten JL, Lolicato M, Moroni A, Thiel G, Meckel T. Structural organization of DNA in chlorella viruses. PLoS One 2012; 7:e30133. [PMID: 22359540 PMCID: PMC3281028 DOI: 10.1371/journal.pone.0030133] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2011] [Accepted: 12/09/2011] [Indexed: 11/19/2022] Open
Abstract
Chlorella viruses have icosahedral capsids with an internal membrane enclosing their large dsDNA genomes and associated proteins. Their genomes are packaged in the particles with a predicted DNA density of ca. 0.2 bp nm−3. Occasionally infection of an algal cell by an individual particle fails and the viral DNA is dynamically ejected from the capsid. This shows that the release of the DNA generates a force, which can aid in the transfer of the genome into the host in a successful infection. Imaging of ejected viral DNA indicates that it is intimately associated with proteins in a periodic fashion. The bulk of the protein particles detected by atomic force microscopy have a size of ∼60 kDa and two proteins (A278L and A282L) of about this size are among 6 basic putative DNA binding proteins found in a proteomic analysis of DNA binding proteins packaged in the virion. A combination of fluorescence images of ejected DNA and a bioinformatics analysis of the DNA reveal periodic patterns in the viral DNA. The periodic distribution of GC rich regions in the genome provides potential binding sites for basic proteins. This DNA/protein aggregation could be responsible for the periodic concentration of fluorescently labeled DNA observed in ejected viral DNA. Collectively the data indicate that the large chlorella viruses have a DNA packaging strategy that differs from bacteriophages; it involves proteins and share similarities to that of chromatin structure in eukaryotes.
Collapse
Affiliation(s)
- Timo Wulfmeyer
- Plant Membrane Biophysics, Technische Universität Darmstadt, Darmstadt, Germany
| | - Christian Polzer
- Plant Membrane Biophysics, Technische Universität Darmstadt, Darmstadt, Germany
| | - Gregor Hiepler
- Plant Membrane Biophysics, Technische Universität Darmstadt, Darmstadt, Germany
| | - Kay Hamacher
- Computational Biology Group, Technische Universität Darmstadt, Darmstadt, Germany
| | - Robert Shoeman
- Department of Biomolecular Mechanisms, Max Planck Institute for Medical Research, Heidelberg, Germany
| | - David D. Dunigan
- Department of Plant Pathology and Nebraska Center for Virology, University of Nebraska, Lincoln, Nebraska, United States of America
| | - James L. Van Etten
- Department of Plant Pathology and Nebraska Center for Virology, University of Nebraska, Lincoln, Nebraska, United States of America
| | - Marco Lolicato
- Department of Biology and CNR IBF-Mi, Università degli Studi di Milano, Milano, Italy
| | - Anna Moroni
- Department of Biology and CNR IBF-Mi, Università degli Studi di Milano, Milano, Italy
| | - Gerhard Thiel
- Plant Membrane Biophysics, Technische Universität Darmstadt, Darmstadt, Germany
- * E-mail:
| | - Tobias Meckel
- Plant Membrane Biophysics, Technische Universität Darmstadt, Darmstadt, Germany
| |
Collapse
|
6
|
Computation of mutual information from Hidden Markov Models. Comput Biol Chem 2010; 34:328-33. [DOI: 10.1016/j.compbiolchem.2010.08.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2010] [Revised: 08/30/2010] [Accepted: 08/30/2010] [Indexed: 11/22/2022]
|