1
|
Manapkyzy D, Joldybayeva B, Ishchenko AA, Matkarimov BT, Zharkov DO, Taipakova S, Saparbaev MK. Enhanced thermal stability enables human mismatch-specific thymine-DNA glycosylase to catalyse futile DNA repair. PLoS One 2024; 19:e0304818. [PMID: 39423202 PMCID: PMC11488719 DOI: 10.1371/journal.pone.0304818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 08/19/2024] [Indexed: 10/21/2024] Open
Abstract
Human thymine-DNA glycosylase (TDG) excises T mispaired with G in a CpG context to initiate the base excision repair (BER) pathway. TDG is also involved in epigenetic regulation of gene expression by participating in active DNA demethylation. Here we demonstrate that under extended incubation time the full-length TDG (TDGFL), but neither its isolated catalytic domain (TDGcat) nor methyl-CpG binding domain-containing protein 4 (MBD4) DNA glycosylase, exhibits significant excision activity towards T and C in regular non-damaged DNA duplex in TpG/CpA and CpG/CpG contexts. Time course of the cleavage product accumulation under single-turnover conditions shows that the apparent rate constant for TDGFL-catalysed excision of T from T•A base pairs (0.0014-0.0069 min-1) is 85-330-fold lower than for the excision of T from T•G mispairs (0.47-0.61 min-1). Unexpectedly, TDGFL, but not TDGcat, exhibits prolonged enzyme survival at 37°C when incubated in the presence of equimolar concentrations of a non-specific DNA duplex, suggesting that the disordered N- and C-terminal domains of TDG can interact with DNA and stabilize the overall conformation of the protein. Notably, TDGFL was able to excise 5-hydroxymethylcytosine (5hmC), but not 5-methylcytosine residues from duplex DNA with the efficiency that could be physiologically relevant in post-mitotic cells. Our findings demonstrate that, under the experimental conditions used, TDG catalyses sequence context-dependent removal of T, C and 5hmC residues from regular DNA duplexes. We propose that in vivo the TDG-initiated futile DNA BER may lead to formation of persistent single-strand breaks in non-methylated or hydroxymethylated chromatin regions.
Collapse
Affiliation(s)
- Diana Manapkyzy
- Department of Molecular Biology and Genetics, Faculty of Biology and Biotechnology, al-Farabi Kazakh National University, Almaty, Kazakhstan
- Scientific Research Institute of Biology and Biotechnology Problems, al-Farabi Kazakh National University, Almaty, Kazakhstan
| | - Botagoz Joldybayeva
- Department of Molecular Biology and Genetics, Faculty of Biology and Biotechnology, al-Farabi Kazakh National University, Almaty, Kazakhstan
- Scientific Research Institute of Biology and Biotechnology Problems, al-Farabi Kazakh National University, Almaty, Kazakhstan
| | - Alexander A. Ishchenko
- Group «Mechanisms of DNA Repair and Carcinogenesis», CNRS UMR9019, Université Paris-Saclay, Gustave Roussy Cancer Campus, Villejuif Cedex, France
| | | | - Dmitry O. Zharkov
- SB RAS Institute of Chemical Biology and Fundamental Medicine, Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| | - Sabira Taipakova
- Department of Molecular Biology and Genetics, Faculty of Biology and Biotechnology, al-Farabi Kazakh National University, Almaty, Kazakhstan
- Scientific Research Institute of Biology and Biotechnology Problems, al-Farabi Kazakh National University, Almaty, Kazakhstan
- National Laboratory Astana, Nazarbayev University, Astana, Kazakhstan
| | - Murat K. Saparbaev
- Group «Mechanisms of DNA Repair and Carcinogenesis», CNRS UMR9019, Université Paris-Saclay, Gustave Roussy Cancer Campus, Villejuif Cedex, France
| |
Collapse
|
2
|
Raja R, Khanum S, Aboulmouna L, Maurya MR, Gupta S, Subramaniam S, Ramkrishna D. Modeling transcriptional regulation of the cell cycle using a novel cybernetic-inspired approach. Biophys J 2024; 123:221-234. [PMID: 38102827 PMCID: PMC10808046 DOI: 10.1016/j.bpj.2023.12.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 09/18/2023] [Accepted: 12/12/2023] [Indexed: 12/17/2023] Open
Abstract
Quantitative understanding of cellular processes, such as cell cycle and differentiation, is impeded by various forms of complexity ranging from myriad molecular players and their multilevel regulatory interactions, cellular evolution with multiple intermediate stages, lack of elucidation of cause-effect relationships among the many system players, and the computational complexity associated with the profusion of variables and parameters. In this paper, we present a modeling framework based on the cybernetic concept that biological regulation is inspired by objectives embedding rational strategies for dimension reduction, process stage specification through the system dynamics, and innovative causal association of regulatory events with the ability to predict the evolution of the dynamical system. The elementary step of the modeling strategy involves stage-specific objective functions that are computationally determined from experiments, augmented with dynamical network computations involving endpoint objective functions, mutual information, change-point detection, and maximal clique centrality. We demonstrate the power of the method through application to the mammalian cell cycle, which involves thousands of biomolecules engaged in signaling, transcription, and regulation. Starting with a fine-grained transcriptional description obtained from RNA sequencing measurements, we develop an initial model, which is then dynamically modeled using the cybernetic-inspired method, based on the strategies described above. The cybernetic-inspired method is able to distill the most significant interactions from a multitude of possibilities. In addition to capturing the complexity of regulatory processes in a mechanistically causal and stage-specific manner, we identify the functional network modules, including novel cell cycle stages. Our model is able to predict future cell cycles consistent with experimental measurements. We posit that this innovative framework has the promise to extend to the dynamics of other biological processes, with a potential to provide novel mechanistic insights.
Collapse
Affiliation(s)
- Rubesh Raja
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana
| | - Sana Khanum
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana
| | - Lina Aboulmouna
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Mano R Maurya
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Shakti Gupta
- Department of Bioengineering, University of California San Diego, La Jolla, California
| | - Shankar Subramaniam
- Department of Bioengineering, University of California San Diego, La Jolla, California; Departments of Computer Science and Engineering, Cellular and Molecular Medicine, San Diego Supercomputer Center, and the Graduate Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, California.
| | - Doraiswami Ramkrishna
- The Davidson School of Chemical Engineering, Purdue University, West Lafayette, Indiana.
| |
Collapse
|
3
|
Raja R, Khanum S, Aboulmouna L, Maurya MR, Gupta S, Subramaniam S, Ramkrishna D. Modeling transcriptional regulation of the cell cycle using a novel cybernetic-inspired approach. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.21.533676. [PMID: 36993235 PMCID: PMC10055344 DOI: 10.1101/2023.03.21.533676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Quantitative understanding of cellular processes, such as cell cycle and differentiation, is impeded by various forms of complexity ranging from myriad molecular players and their multilevel regulatory interactions, cellular evolution with multiple intermediate stages, lack of elucidation of cause-effect relationships among the many system players, and the computational complexity associated with the profusion of variables and parameters. In this paper, we present an elegant modeling framework based on the cybernetic concept that biological regulation is inspired by objectives embedding entirely novel strategies for dimension reduction, process stage specification through the system dynamics, and innovative causal association of regulatory events with the ability to predict the evolution of the dynamical system. The elementary step of the modeling strategy involves stage-specific objective functions that are computationally-determined from experiments, augmented with dynamical network computations involving end point objective functions, mutual information, change point detection, and maximal clique centrality. We demonstrate the power of the method through application to the mammalian cell cycle, which involves thousands of biomolecules engaged in signaling, transcription, and regulation. Starting with a fine-grained transcriptional description obtained from RNA sequencing measurements, we develop an initial model, which is then dynamically modeled using the cybernetic-inspired method (CIM), utilizing the strategies described above. The CIM is able to distill the most significant interactions from a multitude of possibilities. In addition to capturing the complexity of regulatory processes in a mechanistically causal and stage-specific manner, we identify the functional network modules, including novel cell cycle stages. Our model is able to predict future cell cycles consistent with experimental measurements. We posit that this state-of-the-art framework has the promise to extend to the dynamics of other biological processes, with a potential to provide novel mechanistic insights. STATEMENT OF SIGNIFICANCE Cellular processes like cell cycle are overly complex, involving multiple players interacting at multiple levels, and explicit modeling of such systems is challenging. The availability of longitudinal RNA measurements provides an opportunity to "reverse-engineer" for novel regulatory models. We develop a novel framework, inspired using goal-oriented cybernetic model, to implicitly model transcriptional regulation by constraining the system using inferred temporal goals. A preliminary causal network based on information-theory is used as a starting point, and our framework is used to distill the network to temporally-based networks containing essential molecular players. The strength of this approach is its ability to dynamically model the RNA temporal measurements. The approach developed paves the way for inferring regulatory processes in many complex cellular processes.
Collapse
|
4
|
Basu S, Bahadur RP. Conservation and coevolution determine evolvability of different classes of disordered residues in human intrinsically disordered proteins. Proteins 2021; 90:632-644. [PMID: 34626492 DOI: 10.1002/prot.26261] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2021] [Revised: 10/07/2021] [Accepted: 10/07/2021] [Indexed: 12/19/2022]
Abstract
Structure, function, and evolution are interdependent properties of proteins. Diversity of protein functions arising from structural variations is a potential driving force behind protein evolvability. Intrinsically disordered proteins or regions (IDPs or IDRs) lack well-defined structure under normal physiological conditions, yet, they are highly functional. Increased occurrence of IDPs in eukaryotes compared to prokaryotes indicates strong correlation of protein evolution and disorderedness. IDPs generally have higher evolution rate compared to globular proteins. Structural pliability allows IDPs to accommodate multiple mutations without affecting their functional potential. Nevertheless, how evolutionary signals vary between different classes of disordered residues (DRs) in IDPs is poorly understood. This study addresses variation of evolutionary behavior in terms of residue conservation and intra-protein coevolution among structural and functional classes of DRs in IDPs. Analyses are performed on 579 human IDPs, which are classified based on length of IDRs, interacting partners and functional classes. We find short IDRs are less conserved than long IDRs or full IDPs. Functional classes which require flexibility and specificity to perform their activity comparatively evolve slower than others. Disorder promoting amino acids evolve faster than order promoting amino acids. Pro, Gly, Ile, and Phe have unique coevolving nature which further emphasizes on their roles in IDPs. This study sheds light on evolutionary footprints in different classes of DRs from human IDPs and enhances our understanding of the structural and functional potential of IDPs.
Collapse
Affiliation(s)
- Sushmita Basu
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Ranjit Prasad Bahadur
- Computational Structural Biology Lab, Department of Biotechnology, Indian Institute of Technology Kharagpur, Kharagpur, India
| |
Collapse
|
5
|
Durairaj J, Melillo E, Bouwmeester HJ, Beekwilder J, de Ridder D, van Dijk ADJ. Integrating structure-based machine learning and co-evolution to investigate specificity in plant sesquiterpene synthases. PLoS Comput Biol 2021; 17:e1008197. [PMID: 33750949 PMCID: PMC8016262 DOI: 10.1371/journal.pcbi.1008197] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Revised: 04/01/2021] [Accepted: 02/15/2021] [Indexed: 12/19/2022] Open
Abstract
Sesquiterpene synthases (STSs) catalyze the formation of a large class of plant volatiles called sesquiterpenes. While thousands of putative STS sequences from diverse plant species are available, only a small number of them have been functionally characterized. Sequence identity-based screening for desired enzymes, often used in biotechnological applications, is difficult to apply here as STS sequence similarity is strongly affected by species. This calls for more sophisticated computational methods for functionality prediction. We investigate the specificity of precursor cation formation in these elusive enzymes. By inspecting multi-product STSs, we demonstrate that STSs have a strong selectivity towards one precursor cation. We use a machine learning approach combining sequence and structure information to accurately predict precursor cation specificity for STSs across all plant species. We combine this with a co-evolutionary analysis on the wealth of uncharacterized putative STS sequences, to pinpoint residues and distant functional contacts influencing cation formation and reaction pathway selection. These structural factors can be used to predict and engineer enzymes with specific functions, as we demonstrate by predicting and characterizing two novel STSs from Citrus bergamia.
Collapse
Affiliation(s)
- Janani Durairaj
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| | | | - Harro J. Bouwmeester
- Swammerdam Institute for Life Sciences, University of Amsterdam, Amsterdam, The Netherlands
| | - Jules Beekwilder
- Bioscience, Wageningen Plant Research, Wageningen University and Research, Wageningen, The Netherlands
- Laboratory of Plant Physiology, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| | - Aalt D. J. van Dijk
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
- Biometris, Department of Plant Sciences, Wageningen University and Research, Wageningen, The Netherlands
| |
Collapse
|
6
|
Structural Insights into Carboxylic Polyester-Degrading Enzymes and Their Functional Depolymerizing Neighbors. Int J Mol Sci 2021; 22:ijms22052332. [PMID: 33652738 PMCID: PMC7956259 DOI: 10.3390/ijms22052332] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 02/22/2021] [Accepted: 02/23/2021] [Indexed: 11/28/2022] Open
Abstract
Esters are organic compounds widely represented in cellular structures and metabolism, originated by the condensation of organic acids and alcohols. Esterification reactions are also used by chemical industries for the production of synthetic plastic polymers. Polyester plastics are an increasing source of environmental pollution due to their intrinsic stability and limited recycling efforts. Bioremediation of polyesters based on the use of specific microbial enzymes is an interesting alternative to the current methods for the valorization of used plastics. Microbial esterases are promising catalysts for the biodegradation of polyesters that can be engineered to improve their biochemical properties. In this work, we analyzed the structure-activity relationships in microbial esterases, with special focus on the recently described plastic-degrading enzymes isolated from marine microorganisms and their structural homologs. Our analysis, based on structure-alignment, molecular docking, coevolution of amino acids and surface electrostatics determined the specific characteristics of some polyester hydrolases that could be related with their efficiency in the degradation of aromatic polyesters, such as phthalates.
Collapse
|
7
|
Chanda P, Costa E, Hu J, Sukumar S, Van Hemert J, Walia R. Information Theory in Computational Biology: Where We Stand Today. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E627. [PMID: 33286399 PMCID: PMC7517167 DOI: 10.3390/e22060627] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 05/31/2020] [Accepted: 06/03/2020] [Indexed: 12/30/2022]
Abstract
"A Mathematical Theory of Communication" was published in 1948 by Claude Shannon to address the problems in the field of data compression and communication over (noisy) communication channels. Since then, the concepts and ideas developed in Shannon's work have formed the basis of information theory, a cornerstone of statistical learning and inference, and has been playing a key role in disciplines such as physics and thermodynamics, probability and statistics, computational sciences and biological sciences. In this article we review the basic information theory based concepts and describe their key applications in multiple major areas of research in computational biology-gene expression and transcriptomics, alignment-free sequence comparison, sequencing and error correction, genome-wide disease-gene association mapping, metabolic networks and metabolomics, and protein sequence, structure and interaction analysis.
Collapse
Affiliation(s)
- Pritam Chanda
- Corteva Agriscience™, Indianapolis, IN 46268, USA
- Computer and Information Science, Indiana University-Purdue University, Indianapolis, IN 46202, USA
| | - Eduardo Costa
- Corteva Agriscience™, Mogi Mirim, Sao Paulo 13801-540, Brazil
| | - Jie Hu
- Corteva Agriscience™, Indianapolis, IN 46268, USA
| | | | | | - Rasna Walia
- Corteva Agriscience™, Johnston, IA 50131, USA
| |
Collapse
|
8
|
Fang C, Jia Y, Hu L, Lu Y, Wang H. IMPContact: An Interhelical Residue Contact Prediction Method. BIOMED RESEARCH INTERNATIONAL 2020; 2020:4569037. [PMID: 32309431 PMCID: PMC7140131 DOI: 10.1155/2020/4569037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Accepted: 03/09/2020] [Indexed: 11/17/2022]
Abstract
As an important category of proteins, alpha-helix transmembrane proteins (αTMPs) play an important role in various biological activities. Because the solved αTMP structures are inadequate, predicting the residue contacts among the transmembrane segments of an αTMP exhibits the basis of protein fold, which can be used to further discover more protein functions. A few efforts have been devoted to predict the interhelical residue contact using machine learning methods based on the prior knowledge of transmembrane protein structure. However, it is still a challenge to improve the prediction accuracy, while the deep learning method provides an opportunity to utilize the structural knowledge in a different insight. For this purpose, we proposed a novel αTMP residue-residue contact prediction method IMPContact, in which a convolutional neural network (CNN) was applied to recognize those interhelical contacts in a TMP using its specific structural features. There were four sequence-based TMP-specific features selected to descript a pair of residues, namely, evolutionary covariation, predicted topology structure, residue relative position, and evolutionary conservation. An up-to-date dataset was used to train and test the IMPContact; our method achieved better performance compared to peer methods. In the case studies, IHRCs in the regular transmembrane helixes were better predicted than in the irregular ones.
Collapse
Affiliation(s)
- Chao Fang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yajie Jia
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Institute of Computational Biology, Northeast Normal University, Changchun 130117, China
| | - Lihong Hu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yinghua Lu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China
| | - Han Wang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Institute of Computational Biology, Northeast Normal University, Changchun 130117, China
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China
| |
Collapse
|
9
|
Marques MC, Albuquerque IS, Vaz SH, Bernardes GJL. Overexpression of Osmosensitive Ca 2+-Permeable Channel TMEM63B Promotes Migration in HEK293T Cells. Biochemistry 2019; 58:2861-2866. [PMID: 31243992 DOI: 10.1021/acs.biochem.9b00224] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The recent discovery of the osmosensitive calcium (Ca2+) channel OSCA has revealed the potential mechanism by which plant cells sense diverse stimuli. Osmosensory transporters and mechanosensitive channels can detect and respond to osmotic shifts that play an important role in active cell homeostasis. Members of the TMEM63 family of proteins are described as the closest homologues of OSCAs. Here, we characterize TMEM63B, a mammalian homologue of OSCAs, recently classified as mechanosensitive. In HEK293T cells, TMEM63B localizes to the plasma membrane and is associated with F-actin. This Ca2+-permeable channel specifically induces Ca2+ influx across the membrane in response to extracellular Ca2+ concentration and hyperosmolarity. In addition, overexpression of TMEM63B in HEK293T cells significantly enhanced cell migration and wound healing. The link between Ca2+ osmosensitivity and cell migration might help to establish TMEM63B's pathogenesis, for example, in cancer in which it is frequently overexpressed.
Collapse
Affiliation(s)
- Marta C Marques
- Instituto de Medicina Molecular, Faculdade de Medicina , Universidade de Lisboa , Avenida Professor Egas Moniz , 1649-028 Lisboa , Portugal
| | - Inês S Albuquerque
- Instituto de Medicina Molecular, Faculdade de Medicina , Universidade de Lisboa , Avenida Professor Egas Moniz , 1649-028 Lisboa , Portugal
| | - Sandra H Vaz
- Instituto de Medicina Molecular, Faculdade de Medicina , Universidade de Lisboa , Avenida Professor Egas Moniz , 1649-028 Lisboa , Portugal
| | - Gonçalo J L Bernardes
- Instituto de Medicina Molecular, Faculdade de Medicina , Universidade de Lisboa , Avenida Professor Egas Moniz , 1649-028 Lisboa , Portugal.,Department of Chemistry , University of Cambridge , Lensfield Road , Cambridge CB2 1EW , U.K
| |
Collapse
|
10
|
Phylogenetic, molecular evolution and structural analyses of the WFDC1/prostate stromal protein 20 (ps20). Gene 2019; 686:125-140. [DOI: 10.1016/j.gene.2018.10.046] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2018] [Revised: 09/07/2018] [Accepted: 10/19/2018] [Indexed: 12/20/2022]
|
11
|
Endutkin AV, Koptelov SS, Popov AV, Torgasheva NA, Lomzov AA, Tsygankova AR, Skiba TV, Afonnikov DA, Zharkov DO. Residue coevolution reveals functionally important intramolecular interactions in formamidopyrimidine-DNA glycosylase. DNA Repair (Amst) 2018; 69:24-33. [DOI: 10.1016/j.dnarep.2018.07.004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 07/04/2018] [Accepted: 07/04/2018] [Indexed: 10/28/2022]
|
12
|
Chambers B, Levy M, Dechery JB, MacLean JN. Ensemble stacking mitigates biases in inference of synaptic connectivity. Netw Neurosci 2018; 2:60-85. [PMID: 29911678 PMCID: PMC5989998 DOI: 10.1162/netn_a_00032] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2017] [Accepted: 10/11/2017] [Indexed: 01/26/2023] Open
Abstract
A promising alternative to directly measuring the anatomical connections in a neuronal population is inferring the connections from the activity. We employ simulated spiking neuronal networks to compare and contrast commonly used inference methods that identify likely excitatory synaptic connections using statistical regularities in spike timing. We find that simple adjustments to standard algorithms improve inference accuracy: A signing procedure improves the power of unsigned mutual-information-based approaches and a correction that accounts for differences in mean and variance of background timing relationships, such as those expected to be induced by heterogeneous firing rates, increases the sensitivity of frequency-based methods. We also find that different inference methods reveal distinct subsets of the synaptic network and each method exhibits different biases in the accurate detection of reciprocity and local clustering. To correct for errors and biases specific to single inference algorithms, we combine methods into an ensemble. Ensemble predictions, generated as a linear combination of multiple inference algorithms, are more sensitive than the best individual measures alone, and are more faithful to ground-truth statistics of connectivity, mitigating biases specific to single inference methods. These weightings generalize across simulated datasets, emphasizing the potential for the broad utility of ensemble-based approaches.
Collapse
Affiliation(s)
- Brendan Chambers
- Committee on Computational Neuroscience, University of Chicago, Chicago, IL, USA
| | - Maayan Levy
- Committee on Computational Neuroscience, University of Chicago, Chicago, IL, USA
| | - Joseph B Dechery
- Committee on Computational Neuroscience, University of Chicago, Chicago, IL, USA
| | - Jason N MacLean
- Committee on Computational Neuroscience, University of Chicago, Chicago, IL, USA.,Department of Neurobiology, University of Chicago, Chicago, IL, USA
| |
Collapse
|
13
|
Xiong D, Zeng J, Gong H. A deep learning framework for improving long-range residue–residue contact prediction using a hierarchical strategy. Bioinformatics 2017; 33:2675-2683. [DOI: 10.1093/bioinformatics/btx296] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 05/02/2017] [Indexed: 12/31/2022] Open
Affiliation(s)
- Dapeng Xiong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| |
Collapse
|
14
|
Mandloi S, Chakrabarti S. Protein sites with more coevolutionary connections tend to evolve slower, while more variable protein families acquire higher coevolutionary connections. F1000Res 2017; 6:453. [PMID: 28751967 PMCID: PMC5506539 DOI: 10.12688/f1000research.11251.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/05/2017] [Indexed: 11/20/2022] Open
Abstract
Background: Amino acid exchanges within proteins sometimes compensate for one another and could therefore be co-evolved. It is essential to investigate the intricate relationship between the extent of coevolution and the evolutionary variability exerted at individual protein sites, as well as the whole protein. Methods: In this study, we have used a reliable set of coevolutionary connections (sites within 10Å spatial distance) and investigated their correlation with the evolutionary diversity within the respective protein sites. Results: Based on our observations, we propose an interesting hypothesis that higher numbers of coevolutionary connections are associated with lesser evolutionary variable protein sites, while higher numbers of the coevolutionary connections can be observed for a protein family that has higher evolutionary variability. Our findings also indicate that highly coevolved sites located in a solvent accessible state tend to be less evolutionary variable. This relationship reverts at the whole protein level where cytoplasmic and extracellular proteins show moderately higher anti-correlation between the number of coevolutionary connections and the average evolutionary conservation of the whole protein. Conclusions: Observations and hypothesis presented in this study provide intriguing insights towards understanding the critical relationship between coevolutionary and evolutionary changes observed within proteins. Our observations encourage further investigation to find out the reasons behind subtle variations in the relationship between coevolutionary connectivity and evolutionary diversity for proteins located at various cellular localizations and/or involved in different molecular-biological functions.
Collapse
Affiliation(s)
- Sapan Mandloi
- Department of Structural Biology and Bioinformatics Division, Council of Scientific and Industrial Research, Indian Institute of Chemical Biology, Kolkata, West Bengal, 700032, India
| | - Saikat Chakrabarti
- Department of Structural Biology and Bioinformatics Division, Council of Scientific and Industrial Research, Indian Institute of Chemical Biology, Kolkata, West Bengal, 700032, India
| |
Collapse
|
15
|
Fares MA. Coevolution Analysis Illuminates the Evolutionary Plasticity of the Chaperonin System GroES/L. STRESS AND ENVIRONMENTAL REGULATION OF GENE EXPRESSION AND ADAPTATION IN BACTERIA 2016:796-811. [DOI: 10.1002/9781119004813.ch77] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
|
16
|
Woldring DR, Holec PV, Hackel BJ. ScaffoldSeq: Software for characterization of directed evolution populations. Proteins 2016; 84:869-74. [PMID: 27018773 DOI: 10.1002/prot.25040] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2015] [Revised: 03/08/2016] [Accepted: 03/18/2016] [Indexed: 12/21/2022]
Abstract
ScaffoldSeq is software designed for the numerous applications-including directed evolution analysis-in which a user generates a population of DNA sequences encoding for partially diverse proteins with related functions and would like to characterize the single site and pairwise amino acid frequencies across the population. A common scenario for enzyme maturation, antibody screening, and alternative scaffold engineering involves naïve and evolved populations that contain diversified regions, varying in both sequence and length, within a conserved framework. Analyzing the diversified regions of such populations is facilitated by high-throughput sequencing platforms; however, length variability within these regions (e.g., antibody CDRs) encumbers the alignment process. To overcome this challenge, the ScaffoldSeq algorithm takes advantage of conserved framework sequences to quickly identify diverse regions. Beyond this, unintended biases in sequence frequency are generated throughout the experimental workflow required to evolve and isolate clones of interest prior to DNA sequencing. ScaffoldSeq software uniquely handles this issue by providing tools to quantify and remove background sequences, cluster similar protein families, and dampen the impact of dominant clones. The software produces graphical and tabular summaries for each region of interest, allowing users to evaluate diversity in a site-specific manner as well as identify epistatic pairwise interactions. The code and detailed information are freely available at http://research.cems.umn.edu/hackel. Proteins 2016; 84:869-874. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Daniel R Woldring
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, Minnesota, 55455
| | - Patrick V Holec
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, Minnesota, 55455
| | - Benjamin J Hackel
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, Minnesota, 55455
| |
Collapse
|
17
|
Li G, Theys K, Verheyen J, Pineda-Peña AC, Khouri R, Piampongsant S, Eusébio M, Ramon J, Vandamme AM. A new ensemble coevolution system for detecting HIV-1 protein coevolution. Biol Direct 2015; 10:1. [PMID: 25564011 PMCID: PMC4332441 DOI: 10.1186/s13062-014-0031-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Accepted: 12/02/2014] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND A key challenge in the field of HIV-1 protein evolution is the identification of coevolving amino acids at the molecular level. In the past decades, many sequence-based methods have been designed to detect position-specific coevolution within and between different proteins. However, an ensemble coevolution system that integrates different methods to improve the detection of HIV-1 protein coevolution has not been developed. RESULTS We integrated 27 sequence-based prediction methods published between 2004 and 2013 into an ensemble coevolution system. This system allowed combinations of different sequence-based methods for coevolution predictions. Using HIV-1 protein structures and experimental data, we evaluated the performance of individual and combined sequence-based methods in the prediction of HIV-1 intra- and inter-protein coevolution. We showed that sequence-based methods clustered according to their methodology, and a combination of four methods outperformed any of the 27 individual methods. This four-method combination estimated that HIV-1 intra-protein coevolving positions were mainly located in functional domains and physically contacted with each other in the protein tertiary structures. In the analysis of HIV-1 inter-protein coevolving positions between Gag and protease, protease drug resistance positions near the active site mostly coevolved with Gag cleavage positions (V128, S373-T375, A431, F448-P453) and Gag C-terminal positions (S489-Q500) under selective pressure of protease inhibitors. CONCLUSIONS This study presents a new ensemble coevolution system which detects position-specific coevolution using combinations of 27 different sequence-based methods. Our findings highlight key coevolving residues within HIV-1 structural proteins and between Gag and protease, shedding light on HIV-1 intra- and inter-protein coevolution.
Collapse
Affiliation(s)
- Guangdi Li
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Kristof Theys
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Jens Verheyen
- Institute of Virology, University hospital, University Duisburg-Essen, Essen, Germany.
| | - Andrea-Clemencia Pineda-Peña
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium. .,Clinical and Molecular Infectious Disease Group, Faculty of Sciences and Mathematics, Universidad del Rosario, Bogotá, Colombia.
| | - Ricardo Khouri
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Supinya Piampongsant
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Mónica Eusébio
- Centro de Malária e Outras Doenças Tropicais and Unidade de Microbiologia, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisboa, Portugal.
| | - Jan Ramon
- Department of Computer Science, KU Leuven - University of Leuven, Leuven, Belgium.
| | - Anne-Mieke Vandamme
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium. .,Centro de Malária e Outras Doenças Tropicais and Unidade de Microbiologia, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisboa, Portugal.
| |
Collapse
|
18
|
Control of catalytic efficiency by a coevolving network of catalytic and noncatalytic residues. Proc Natl Acad Sci U S A 2014; 111:E2376-83. [PMID: 24912189 DOI: 10.1073/pnas.1322352111] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The active sites of enzymes consist of residues necessary for catalysis and structurally important noncatalytic residues that together maintain the architecture and function of the active site. Examples of evolutionary interactions between catalytic and noncatalytic residues have been difficult to define and experimentally validate due to a general intolerance of these residues to substitution. Here, using computational methods to predict coevolving residues, we identify a network of positions consisting of two catalytic metal-binding residues and two adjacent noncatalytic residues in LAGLIDADG homing endonucleases (LHEs). Distinct combinations of the four residues in the network map to distinct LHE subfamilies, with a striking distribution of the metal-binding Asp (D) and Glu (E) residues. Mutation of these four positions in three LHEs--I-LtrI, I-OnuI, and I-HjeMI--indicate that the combinations of residues tolerated are specific to each enzyme. Kinetic analyses under single-turnover conditions revealed that I-LtrI activity could be modulated over an ∼100-fold range by mutation of residues in the coevolving network. I-LtrI catalytic site variants with low activity could be rescued by compensatory mutations at adjacent noncatalytic sites that restore an optimal coevolving network and vice versa. Our results demonstrate that LHE activity is constrained by an evolutionary barrier of residues with strong context-dependent effects. Creation of optimal coevolving active-site networks is therefore an important consideration in engineering of LHEs and other enzymes.
Collapse
|
19
|
Clark GW, Ackerman SH, Tillier ER, Gatti DL. Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments. BMC Bioinformatics 2014; 15:157. [PMID: 24886131 PMCID: PMC4046016 DOI: 10.1186/1471-2105-15-157] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 05/06/2014] [Indexed: 11/10/2022] Open
Abstract
Background Several methods are available for the detection of covarying positions from a multiple sequence alignment (MSA). If the MSA contains a large number of sequences, information about the proximities between residues derived from covariation maps can be sufficient to predict a protein fold. However, in many cases the structure is already known, and information on the covarying positions can be valuable to understand the protein mechanism and dynamic properties. Results In this study we have sought to determine whether a multivariate (multidimensional) extension of traditional mutual information (MI) can be an additional tool to study covariation. The performance of two multidimensional MI (mdMI) methods, designed to remove the effect of ternary/quaternary interdependencies, was tested with a set of 9 MSAs each containing <400 sequences, and was shown to be comparable to that of the newest methods based on maximum entropy/pseudolikelyhood statistical models of protein sequences. However, while all the methods tested detected a similar number of covarying pairs among the residues separated by < 8 Å in the reference X-ray structures, there was on average less than 65% overlap between the top scoring pairs detected by methods that are based on different principles. Conclusions Given the large variety of structure and evolutionary history of different proteins it is possible that a single best method to detect covariation in all proteins does not exist, and that for each protein family the best information can be derived by merging/comparing results obtained with different methods. This approach may be particularly valuable in those cases in which the size of the MSA is small or the quality of the alignment is low, leading to significant differences in the pairs detected by different methods.
Collapse
Affiliation(s)
| | | | - Elisabeth R Tillier
- Department of Medical Biophysics, University of Toronto, Campbell Family Institute for Cancer Research, Ontario Cancer Institute, University Health Network, Toronto, Ontario, Canada.
| | | |
Collapse
|
20
|
Liu J, Duan X, Sun J, Yin Y, Li G, Wang L, Liu B. Bi-factor analysis based on noise-reduction (BIFANR): a new algorithm for detecting coevolving amino acid sites in proteins. PLoS One 2013; 8:e79764. [PMID: 24278175 PMCID: PMC3835919 DOI: 10.1371/journal.pone.0079764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2013] [Accepted: 09/29/2013] [Indexed: 11/23/2022] Open
Abstract
Previous statistical analyses have shown that amino acid sites in a protein evolve in a correlated way instead of independently. Even though located distantly in the linear sequence, the coevolved amino acids could be spatially adjacent in the tertiary structure, and constitute specific protein sectors. Moreover, these protein sectors are independent of one another in structure, function, and even evolution. Thus, systematic studies on protein sectors inside a protein will contribute to the clarification of protein function. In this paper, we propose a new algorithm BIFANR (Bi-factor Analysis Based on Noise-reduction) for detecting protein sectors in amino acid sequences. After applying BIFANR on S1A family and PDZ family, we carried out internal correlation test, statistical independence test, evolutionary rate analysis, evolutionary independence analysis, and function analysis to assess the prediction. The results showed that the amino acids in certain predicted protein sector are closely correlated in structure, function, and evolution, while protein sectors are nearly statistically independent. The results also indicated that the protein sectors have distinct evolutionary directions. In addition, compared with other algorithms, BIFANR has higher accuracy and robustness under the influence of noise sites.
Collapse
Affiliation(s)
- Juntao Liu
- School of Mathematics, Shandong University, Jinan, China
| | - Xiaoyun Duan
- School of Life Science, Shandong University, Jinan, China
| | - Jianyang Sun
- School of Mathematics, Shandong University, Jinan, China
| | - Yanbin Yin
- Department of Biological Sciences, Northern Illinois University, DeKalb, Illinois, United States of America
| | - Guojun Li
- School of Mathematics, Shandong University, Jinan, China
| | - Lushan Wang
- School of Life Science, Shandong University, Jinan, China
| | - Bingqiang Liu
- School of Mathematics, Shandong University, Jinan, China
- * E-mail: Bingqiang Liu:
| |
Collapse
|
21
|
Ruiz-González MX, Fares MA. Coevolution analyses illuminate the dependencies between amino acid sites in the chaperonin system GroES-L. BMC Evol Biol 2013; 13:156. [PMID: 23875653 PMCID: PMC3728108 DOI: 10.1186/1471-2148-13-156] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Accepted: 07/18/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND GroESL is a heat-shock protein ubiquitous in bacteria and eukaryotic organelles. This evolutionarily conserved protein is involved in the folding of a wide variety of other proteins in the cytosol, being essential to the cell. The folding activity proceeds through strong conformational changes mediated by the co-chaperonin GroES and ATP. Functions alternative to folding have been previously described for GroEL in different bacterial groups, supporting enormous functional and structural plasticity for this molecule and the existence of a hidden combinatorial code in the protein sequence enabling such functions. Describing this plasticity can shed light on the functional diversity of GroEL. We hypothesize that different overlapping sets of amino acids coevolve within GroEL, GroES and between both these proteins. Shifts in these coevolutionary relationships may inevitably lead to evolution of alternative functions. RESULTS We conducted the first coevolution analyses in an extensive bacterial phylogeny, revealing complex networks of evolutionary dependencies between residues in GroESL. These networks differed among bacterial groups and involved amino acid sites with functional importance and others with previously unsuspected functional potential. Coevolutionary networks formed statistically independent units among bacterial groups and map to structurally continuous regions in the protein, suggesting their functional link. Sites involved in coevolution fell within narrow structural regions, supporting dynamic combinatorial functional links involving similar protein domains. Moreover, coevolving sites within a bacterial group mapped to regions previously identified as involved in folding-unrelated functions, and thus, coevolution may mediate alternative functions. CONCLUSIONS Our results highlight the evolutionary plasticity of GroEL across the entire bacterial phylogeny. Evidence on the functional importance of coevolving sites illuminates the as yet unappreciated functional diversity of proteins.
Collapse
Affiliation(s)
- Mario X Ruiz-González
- Integrative and Systems Biology Group, Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas (CSIC-UPV), Valencia, SPAIN
| | | |
Collapse
|
22
|
Accurate simulation and detection of coevolution signals in multiple sequence alignments. PLoS One 2012; 7:e47108. [PMID: 23091608 PMCID: PMC3473043 DOI: 10.1371/journal.pone.0047108] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Accepted: 09/10/2012] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND While the conserved positions of a multiple sequence alignment (MSA) are clearly of interest, non-conserved positions can also be important because, for example, destabilizing effects at one position can be compensated by stabilizing effects at another position. Different methods have been developed to recognize the evolutionary relationship between amino acid sites, and to disentangle functional/structural dependencies from historical/phylogenetic ones. METHODOLOGY/PRINCIPAL FINDINGS We have used two complementary approaches to test the efficacy of these methods. In the first approach, we have used a new program, MSAvolve, for the in silico evolution of MSAs, which records a detailed history of all covarying positions, and builds a global coevolution matrix as the accumulated sum of individual matrices for the positions forced to co-vary, the recombinant coevolution, and the stochastic coevolution. We have simulated over 1600 MSAs for 8 protein families, which reflect sequences of different sizes and proteins with widely different functions. The calculated coevolution matrices were compared with the coevolution matrices obtained for the same evolved MSAs with different coevolution detection methods. In a second approach we have evaluated the capacity of the different methods to predict close contacts in the representative X-ray structures of an additional 150 protein families using only experimental MSAs. CONCLUSIONS/SIGNIFICANCE Methods based on the identification of global correlations between pairs were found to be generally superior to methods based only on local correlations in their capacity to identify coevolving residues using either simulated or experimental MSAs. However, the significant variability in the performance of different methods with different proteins suggests that the simulation of MSAs that replicate the statistical properties of the experimental MSA can be a valuable tool to identify the coevolution detection method that is most effective in each case.
Collapse
|
23
|
Jeong CS, Kim D. Reliable and robust detection of coevolving protein residues†. Protein Eng Des Sel 2012; 25:705-13. [DOI: 10.1093/protein/gzs081] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
24
|
Wang C, Huang R, He B, Du Q. Improving the thermostability of alpha-amylase by combinatorial coevolving-site saturation mutagenesis. BMC Bioinformatics 2012; 13:263. [PMID: 23057711 PMCID: PMC3478181 DOI: 10.1186/1471-2105-13-263] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2012] [Accepted: 09/11/2012] [Indexed: 11/12/2022] Open
Abstract
Background The generation of focused mutant libraries at hotspot residues is an important strategy in directed protein evolution. Existing methods, such as combinatorial active site testing and residual coupling analysis, depend primarily on the evolutionary conserved information to find the hotspot residues. Hardly any attention has been paid to another important functional and structural determinants, the functionally correlated variation information--coevolution. Results In this paper, we suggest a new method, named combinatorial coevolving-site saturation mutagenesis (CCSM), in which the functionally correlated variation sites of proteins are chosen as the hotspot sites to construct focused mutant libraries. The CCSM approach was used to improve the thermal stability of α-amylase from Bacillus subtilis CN7 (Amy7C). The results indicate that the CCSM can identify novel beneficial mutation sites, and enhance the thermal stability of wild-type Amy7C by 8°C (
T5030), which could not be achieved with the ordinarily rational introduction of single or a double point mutation. Conclusions Our method is able to produce more thermostable mutant α-amylases with novel beneficial mutations at new sites. It is also verified that the coevolving sites can be used as the hotspots to construct focused mutant libraries in protein engineering. This study throws new light on the active researches of the molecular coevolution.
Collapse
Affiliation(s)
- Chenghua Wang
- Nanjing University of Technology, Nanjing, Jiangsu, China
| | | | | | | |
Collapse
|
25
|
Gomes M, Hamer R, Reinert G, Deane CM. Mutual information and variants for protein domain-domain contact prediction. BMC Res Notes 2012; 5:472. [PMID: 23244412 PMCID: PMC3532072 DOI: 10.1186/1756-0500-5-472] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2012] [Accepted: 08/10/2012] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Predicting protein contacts solely based on sequence information remains a challenging problem, despite the huge amount of sequence data at our disposal. Mutual Information (MI), an information theory measure, has been extensively employed and modified to identify residues within a protein (intra-protein) that are in contact. More recently MI and its variants have also been used in the prediction of contacts between proteins (inter-protein). METHODS Here we assess the predictive power of MI and variants for domain-domain contact prediction. We test original MI and these variants, which are called MIp, MIc and ZNMI, on 40 domain-domain test cases containing 10,753 sequences. We also propose and evaluate two new versions of MI that consider triangles of residues and the physiochemical properties of the amino acids, respectively. RESULTS We found that all versions of MI are skewed towards predicting surface residues. Since domain-domain contacts are on the surface of each domain, we considered only surface residues when attempting to predict contacts. Our analysis shows that MIc is the best current MI domain-domain contact predictor. At 20% recall MIc achieved a precision of 44.9% when only surface residues were considered. Our triangle and reduced alphabet variants of MI highlight the delicate trade-off between signal and noise in the use of MI for domain-domain contact prediction. We also examine a specific "successful" case study and demonstrate that here, when considering surface residues, even the most accurate domain-domain contact predictor, MIc, performs no better than random. CONCLUSIONS All tested variants of MI are skewed towards predicting surface residues. When considering surface residues only, we find MIc to be the best current MI domain-domain contact predictor. Its performance, however, is not as good as a non-MI based contact predictor, i-Patch. Additionally, the intra-protein contact prediction capabilities of MIc outperform its domain-domain contact prediction abilities.
Collapse
Affiliation(s)
- Mireille Gomes
- Department of Statistics, University of Oxford, Oxford, UK
| | | | | | | |
Collapse
|
26
|
Dickson RJ, Gloor GB. Protein sequence alignment analysis by local covariation: coevolution statistics detect benchmark alignment errors. PLoS One 2012; 7:e37645. [PMID: 22715369 PMCID: PMC3371027 DOI: 10.1371/journal.pone.0037645] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2011] [Accepted: 04/26/2012] [Indexed: 11/19/2022] Open
Abstract
The use of sequence alignments to understand protein families is ubiquitous in molecular biology. High quality alignments are difficult to build and protein alignment remains one of the largest open problems in computational biology. Misalignments can lead to inferential errors about protein structure, folding, function, phylogeny, and residue importance. Identifying alignment errors is difficult because alignments are built and validated on the same primary criteria: sequence conservation. Local covariation identifies systematic misalignments and is independent of conservation. We demonstrate an alignment curation tool, LoCo, that integrates local covariation scores with the Jalview alignment editor. Using LoCo, we illustrate how local covariation is capable of identifying alignment errors due to the reduction of positional independence in the region of misalignment. We highlight three alignments from the benchmark database, BAliBASE 3, that contain regions of high local covariation, and investigate the causes to illustrate these types of scenarios. Two alignments contain sequential and structural shifts that cause elevated local covariation. Realignment of these misaligned segments reduces local covariation; these alternative alignments are supported with structural evidence. We also show that local covariation identifies active site residues in a validated alignment of paralogous structures. Loco is available at https://sourceforge.net/projects/locoprotein/files/
Collapse
Affiliation(s)
| | - Gregory B. Gloor
- Department of Biochemistry, The University of Western Ontario, London, Canada
- * E-mail:
| |
Collapse
|
27
|
Lee Y, Mick J, Furdui C, Beamer LJ. A coevolutionary residue network at the site of a functionally important conformational change in a phosphohexomutase enzyme family. PLoS One 2012; 7:e38114. [PMID: 22685552 PMCID: PMC3369874 DOI: 10.1371/journal.pone.0038114] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Accepted: 05/01/2012] [Indexed: 11/26/2022] Open
Abstract
Coevolution analyses identify residues that co-vary with each other during evolution, revealing sequence relationships unobservable from traditional multiple sequence alignments. Here we describe a coevolutionary analysis of phosphomannomutase/phosphoglucomutase (PMM/PGM), a widespread and diverse enzyme family involved in carbohydrate biosynthesis. Mutual information and graph theory were utilized to identify a network of highly connected residues with high significance. An examination of the most tightly connected regions of the coevolutionary network reveals that most of the involved residues are localized near an interdomain interface of this enzyme, known to be the site of a functionally important conformational change. The roles of four interface residues found in this network were examined via site-directed mutagenesis and kinetic characterization. For three of these residues, mutation to alanine reduces enzyme specificity to ∼10% or less of wild-type, while the other has ∼45% activity of wild-type enzyme. An additional mutant of an interface residue that is not densely connected in the coevolutionary network was also characterized, and shows no change in activity relative to wild-type enzyme. The results of these studies are interpreted in the context of structural and functional data on PMM/PGM. Together, they demonstrate that a network of coevolving residues links the highly conserved active site with the interdomain conformational change necessary for the multi-step catalytic reaction. This work adds to our understanding of the functional roles of coevolving residue networks, and has implications for the definition of catalytically important residues.
Collapse
Affiliation(s)
- Yingying Lee
- Department of Chemistry, University of Missouri, Columbia, Missouri, United States of America
| | - Jacob Mick
- Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America
| | - Cristina Furdui
- Department of Internal Medicine, Wake Forest University Health Sciences Winston-Salem, North Carolina, United States of America
| | - Lesa J. Beamer
- Department of Chemistry, University of Missouri, Columbia, Missouri, United States of America
- Department of Biochemistry, University of Missouri, Columbia, Missouri, United States of America
- * E-mail:
| |
Collapse
|
28
|
Gulyás-Kovács A. Integrated analysis of residue coevolution and protein structure in ABC transporters. PLoS One 2012; 7:e36546. [PMID: 22590562 PMCID: PMC3348156 DOI: 10.1371/journal.pone.0036546] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Accepted: 04/06/2012] [Indexed: 12/22/2022] Open
Abstract
Intraprotein side chain contacts can couple the evolutionary process of amino acid substitution at one position to that at another. This coupling, known as residue coevolution, may vary in strength. Conserved contacts thus not only define 3-dimensional protein structure, but also indicate which residue-residue interactions are crucial to a protein's function. Therefore, prediction of strongly coevolving residue-pairs helps clarify molecular mechanisms underlying function. Previously, various coevolution detectors have been employed separately to predict these pairs purely from multiple sequence alignments, while disregarding available structural information. This study introduces an integrative framework that improves the accuracy of such predictions, relative to previous approaches, by combining multiple coevolution detectors and incorporating structural contact information. This framework is applied to the ABC-B and ABC-C transporter families, which include the drug exporter P-glycoprotein involved in multidrug resistance of cancer cells, as well as the CFTR chloride channel linked to cystic fibrosis disease. The predicted coevolving pairs are further analyzed based on conformational changes inferred from outward- and inward-facing transporter structures. The analysis suggests that some pairs coevolved to directly regulate conformational changes of the alternating-access transport mechanism, while others to stabilize rigid-body-like components of the protein structure. Moreover, some identified pairs correspond to residues previously implicated in cystic fibrosis.
Collapse
Affiliation(s)
- Attila Gulyás-Kovács
- Laboratory of Cardiac/Membrane Physiology, Rockefeller University, New York, New York, United States of America.
| |
Collapse
|
29
|
Livesay DR, Kreth KE, Fodor AA. A critical evaluation of correlated mutation algorithms and coevolution within allosteric mechanisms. Methods Mol Biol 2012; 796:385-398. [PMID: 22052502 DOI: 10.1007/978-1-61779-334-9_21] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The notion of using the evolutionary history encoded within multiple sequence alignments to predict allosteric mechanisms is appealing. In this approach, correlated mutations are expected to reflect coordinated changes that maintain intramolecular coupling between residue pairs. Despite much early fanfare, the general suitability of correlated mutations to predict allosteric couplings has not yet been established. Lack of progress along these lines has been hindered by several algorithmic limitations including phylogenetic artifacts within alignments masking true covariance and the computational intractability of consideration of more than two correlated residues at a time. Recent progress in algorithm development, however, has been substantial with a new generation of correlated mutation algorithms that have made fundamental progress toward solving these difficult problems. Despite these encouraging results, there remains little evidence to suggest that the evolutionary constraints acting on allosteric couplings are sufficient to be recovered from multiple sequence alignments. In this review, we argue that due to the exquisite sensitivity of protein dynamics, and hence that of allosteric mechanisms, the latter vary widely within protein families. If it turns out to be generally true that even very similar homologs display a wide divergence of allosteric mechanisms, then even a perfect correlated mutation algorithm could not be reliably used as a general mechanism for discovery of allosteric pathways.
Collapse
Affiliation(s)
- Dennis R Livesay
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | | | | |
Collapse
|
30
|
Hayashida M, Kamada M, Song J, Akutsu T. Conditional random field approach to prediction of protein-protein interactions using domain information. BMC SYSTEMS BIOLOGY 2011; 5 Suppl 1:S8. [PMID: 21689483 PMCID: PMC3121124 DOI: 10.1186/1752-0509-5-s1-s8] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Background For understanding cellular systems and biological networks, it is important to analyze functions and interactions of proteins and domains. Many methods for predicting protein-protein interactions have been developed. It is known that mutual information between residues at interacting sites can be higher than that at non-interacting sites. It is based on the thought that amino acid residues at interacting sites have coevolved with those at the corresponding residues in the partner proteins. Several studies have shown that such mutual information is useful for identifying contact residues in interacting proteins. Results We propose novel methods using conditional random fields for predicting protein-protein interactions. We focus on the mutual information between residues, and combine it with conditional random fields. In the methods, protein-protein interactions are modeled using domain-domain interactions. We perform computational experiments using protein-protein interaction datasets for several organisms, and calculate AUC (Area Under ROC Curve) score. The results suggest that our proposed methods with and without mutual information outperform EM (Expectation Maximization) method proposed by Deng et al., which is one of the best predictors based on domain-domain interactions. Conclusions We propose novel methods using conditional random fields with and without mutual information between domains. Our methods based on domain-domain interactions are useful for predicting protein-protein interactions.
Collapse
Affiliation(s)
- Morihiro Hayashida
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan.
| | | | | | | |
Collapse
|
31
|
Ackerman SH, Gatti DL. The contribution of coevolving residues to the stability of KDO8P synthase. PLoS One 2011; 6:e17459. [PMID: 21408011 PMCID: PMC3052366 DOI: 10.1371/journal.pone.0017459] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2010] [Accepted: 02/03/2011] [Indexed: 12/03/2022] Open
Abstract
Background The evolutionary tree of 3-deoxy-D-manno-octulosonate 8-phosphate (KDO8P) synthase (KDO8PS), a bacterial enzyme that catalyzes a key step in the biosynthesis of bacterial endotoxin, is evenly divided between metal and non-metal forms, both having similar structures, but diverging in various degrees in amino acid sequence. Mutagenesis, crystallographic and computational studies have established that only a few residues determine whether or not KDO8PS requires a metal for function. The remaining divergence in the amino acid sequence of KDO8PSs is apparently unrelated to the underlying catalytic mechanism. Methodology/Principal Findings The multiple alignment of all known KDO8PS sequences reveals that several residue pairs coevolved, an indication of their possible linkage to a structural constraint. In this study we investigated by computational means the contribution of coevolving residues to the stability of KDO8PS. We found that about 1/4 of all strongly coevolving pairs probably originated from cycles of mutation (decreasing stability) and suppression (restoring it), while the remaining pairs are best explained by a succession of neutral or nearly neutral covarions. Conclusions/Significance Both sequence conservation and coevolution are involved in the preservation of the core structure of KDO8PS, but the contribution of coevolving residues is, in proportion, smaller. This is because small stability gains or losses associated with selection of certain residues in some regions of the stability landscape of KDO8PS are easily offset by a large number of possible changes in other regions. While this effect increases the tolerance of KDO8PS to deleterious mutations, it also decreases the probability that specific pairs of residues could have a strong contribution to the thermodynamic stability of the protein.
Collapse
Affiliation(s)
- Sharon H. Ackerman
- Department of Biochemistry and Molecular Biology, Wayne State University School of Medicine, Detroit, Michigan, United States of America
| | - Domenico L. Gatti
- Department of Biochemistry and Molecular Biology, Wayne State University School of Medicine, Detroit, Michigan, United States of America
- Cardiovascular Research Institute, Wayne State University School of Medicine, Detroit, Michigan, United States of America
- * E-mail:
| |
Collapse
|
32
|
Choi K, Kim S. Building interacting partner predictors using co-varying residue pairs between histidine kinase and response regulator pairs of 48 bacterial two-component systems. Proteins 2011; 79:1118-31. [DOI: 10.1002/prot.22948] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2010] [Revised: 11/03/2010] [Accepted: 11/05/2010] [Indexed: 11/11/2022]
|
33
|
Dickson RJ, Wahl LM, Fernandes AD, Gloor GB. Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. PLoS One 2010; 5:e11082. [PMID: 20596526 PMCID: PMC2893159 DOI: 10.1371/journal.pone.0011082] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2010] [Accepted: 05/17/2010] [Indexed: 11/23/2022] Open
Abstract
Background There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. Methodology/Principal Findings We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. Conclusions/Significance Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation.
Collapse
Affiliation(s)
- Russell J. Dickson
- Department of Biochemistry, The University of Western Ontario, London, Canada
| | - Lindi M. Wahl
- Department of Applied Mathematics, The University of Western Ontario, London, Canada
| | - Andrew D. Fernandes
- Department of Biochemistry, The University of Western Ontario, London, Canada
- Department of Applied Mathematics, The University of Western Ontario, London, Canada
| | - Gregory B. Gloor
- Department of Biochemistry, The University of Western Ontario, London, Canada
- * E-mail:
| |
Collapse
|
34
|
Brown CA, Brown KS. Validation of coevolving residue algorithms via pipeline sensitivity analysis: ELSC and OMES and ZNMI, oh my! PLoS One 2010; 5:e10779. [PMID: 20531955 PMCID: PMC2879359 DOI: 10.1371/journal.pone.0010779] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2010] [Accepted: 04/25/2010] [Indexed: 11/26/2022] Open
Abstract
Correlated amino acid substitution algorithms attempt to discover groups of residues that co-fluctuate due to either structural or functional constraints. Although these algorithms could inform both ab initio protein folding calculations and evolutionary studies, their utility for these purposes has been hindered by a lack of confidence in their predictions due to hard to control sources of error. To complicate matters further, naive users are confronted with a multitude of methods to choose from, in addition to the mechanics of assembling and pruning a dataset. We first introduce a new pair scoring method, called ZNMI (Z-scored-product Normalized Mutual Information), which drastically improves the performance of mutual information for co-fluctuating residue prediction. Second and more important, we recast the process of finding coevolving residues in proteins as a data-processing pipeline inspired by the medical imaging literature. We construct an ensemble of alignment partitions that can be used in a cross-validation scheme to assess the effects of choices made during the procedure on the resulting predictions. This pipeline sensitivity study gives a measure of reproducibility (how similar are the predictions given perturbations to the pipeline?) and accuracy (are residue pairs with large couplings on average close in tertiary structure?). We choose a handful of published methods, along with ZNMI, and compare their reproducibility and accuracy on three diverse protein families. We find that (i) of the algorithms tested, while none appear to be both highly reproducible and accurate, ZNMI is one of the most accurate by far and (ii) while users should be wary of predictions drawn from a single alignment, considering an ensemble of sub-alignments can help to determine both highly accurate and reproducible couplings. Our cross-validation approach should be of interest both to developers and end users of algorithms that try to detect correlated amino acid substitutions.
Collapse
Affiliation(s)
- Christopher A. Brown
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
- FAS Center for Systems Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Kevin S. Brown
- Department of Physics, University of California Santa Barbara, Santa Barbara, California, United States of America
- Institute for Collaborative Biotechnologies, University of California Santa Barbara, Santa Barbara, California, United States of America
- * E-mail:
| |
Collapse
|
35
|
Chakrabarti S, Panchenko AR. Structural and functional roles of coevolved sites in proteins. PLoS One 2010; 5:e8591. [PMID: 20066038 PMCID: PMC2797611 DOI: 10.1371/journal.pone.0008591] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2009] [Accepted: 10/19/2009] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Understanding the residue covariations between multiple positions in protein families is very crucial and can be helpful for designing protein engineering experiments. These simultaneous changes or residue coevolution allow protein to maintain its overall structural-functional integrity while enabling it to acquire specific functional modifications. Despite the significant efforts in the field there is still controversy in terms of the preferable locations of coevolved residues on different regions of protein molecules, the strength of coevolutionary signal and role of coevolution in functional diversification. METHODOLOGY In this paper we study the scale and nature of residue coevolution in maintaining the overall functionality and structural integrity of proteins. We employed a large scale study to investigate the structural and functional aspects of coevolved residues. We found that the networks representing the coevolutionary residue connections within our dataset are in general of 'small-world' type as they have clustering coefficient values higher than random networks and also show smaller mean shortest path lengths similar and/or lower than random and regular networks. We also found that altogether 11% of functionally important sites are coevolved with any other sites. Active sites are found more frequently to coevolve with any other sites (15%) compared to protein (11%) and ligand (9%) binding sites. Metal binding and active sites are also found to be more frequently coevolved with other metal binding and active sites, respectively. Analysis of the coupling between coevolutionary processes and the spatial distribution of coevolved sites reveals that a high fraction of coevolved sites are located close to each other. Moreover, approximately 80% of charge compensatory substitutions within coevolved sites are found at very close spatial proximity (<or= 5A), pointing to the possible preservation of salt bridges in evolution. CONCLUSION Our findings show that a noticeable fraction of functionally important sites undergo coevolution and also point towards compensatory substitutions as a probable coevolutionary mechanism within spatially proximal coevolved functional sites.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (SC); (ARP)
| | - Anna R. Panchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- * E-mail: (SC); (ARP)
| |
Collapse
|