1
|
Meng Q, Guo F, Tang J. Improved structure-related prediction for insufficient homologous proteins using MSA enhancement and pre-trained language model. Brief Bioinform 2023:bbad217. [PMID: 37321965 DOI: 10.1093/bib/bbad217] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 04/18/2023] [Accepted: 05/21/2023] [Indexed: 06/17/2023] Open
Abstract
In recent years, protein structure problems have become a hotspot for understanding protein folding and function mechanisms. It has been observed that most of the protein structure works rely on and benefit from co-evolutionary information obtained by multiple sequence alignment (MSA). As an example, AlphaFold2 (AF2) is a typical MSA-based protein structure tool which is famous for its high accuracy. As a consequence, these MSA-based methods are limited by the quality of the MSAs. Especially for orphan proteins that have no homologous sequence, AlphaFold2 performs unsatisfactorily as MSA depth decreases, which may pose a barrier to its widespread application in protein mutation and design problems in which there are no rich homologous sequences and rapid prediction is needed. In this paper, we constructed two standard datasets for orphan and de novo proteins which have insufficient/none homology information, called Orphan62 and Design204, respectively, to fairly evaluate the performance of the various methods in this case. Then, depending on whether or not utilizing scarce MSA information, we summarized two approaches, MSA-enhanced and MSA-free methods, to effectively solve the issue without sufficient MSAs. MSA-enhanced model aims to improve poor MSA quality from the data source by knowledge distillation and generation models. MSA-free model directly learns the relationship between residues on enormous protein sequences from pre-trained models, bypassing the step of extracting the residue pair representation from MSA. Next, we evaluated the performance of four MSA-free methods (trRosettaX-Single, TRFold, ESMFold and ProtT5) and MSA-enhanced (Bagging MSA) method compared with a traditional MSA-based method AlphaFold2, in two protein structure-related prediction tasks, respectively. Comparison analyses show that trRosettaX-Single and ESMFold which belong to MSA-free method can achieve fast prediction ($\sim\! 40$s) and comparable performance compared with AF2 in tertiary structure prediction, especially for short peptides, $\alpha $-helical segments and targets with few homologous sequences. Bagging MSA utilizing MSA enhancement improves the accuracy of our trained base model which is an MSA-based method when poor homology information exists in secondary structure prediction. Our study provides biologists an insight of how to select rapid and appropriate prediction tools for enzyme engineering and peptide drug development. CONTACT guofei@csu.edu.cn, jj.tang@siat.ac.cn.
Collapse
Affiliation(s)
- Qiaozhen Meng
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jijun Tang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518000, China
| |
Collapse
|
2
|
Cannarella R, Gusmano C, Condorelli RA, Bernini A, Kaftalli J, Maltese PE, Paolacci S, Dautaj A, Marceddu G, Bertelli M, La Vignera S, Calogero AE. Genetic Analysis of Patients with Congenital Hypogonadotropic Hypogonadism: A Case Series. Int J Mol Sci 2023; 24:ijms24087428. [PMID: 37108593 PMCID: PMC10138801 DOI: 10.3390/ijms24087428] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 04/15/2023] [Accepted: 04/17/2023] [Indexed: 04/29/2023] Open
Abstract
Congenital hypogonadotropic hypogonadism (cHH)/Kallmann syndrome (KS) is a rare genetic disorder with variable penetrance and a complex inheritance pattern. Consequently, it does not always follow Mendelian laws. More recently, digenic and oligogenic transmission has been recognized in 1.5-15% of cases. We report the results of a clinical and genetic investigation of five unrelated patients with cHH/KS analyzed using a customized gene panel. Patients were diagnosed according to the clinical, hormonal, and radiological criteria of the European Consensus Statement. DNA was analyzed using next-generation sequencing with a customized panel that included 31 genes. When available, first-degree relatives of the probands were also analyzed to assess genotype-phenotype segregation. The consequences of the identified variants on gene function were evaluated by analyzing the conservation of amino acids across species and by using molecular modeling. We found one new pathogenic variant of the CHD7 gene (c.576T>A, p.Tyr1928) and three new variants of unknown significance (VUSs) in IL17RD (c.960G>A, p.Met320Ile), FGF17 (c.208G>A, p.Gly70Arg), and DUSP6 (c.434T>G, p.Leu145Arg). All were present in the heterozygous state. Previously reported heterozygous variants were also found in the PROK2 (c.163del, p.Ile55*), CHD7 (c.c.2750C>T, p.Thr917Met and c.7891C>T, p.Arg2631*), FLRT3 (c.1106C>T, p.Ala369Val), and CCDC103 (c.461A>C, p.His154Pro) genes. Molecular modeling, molecular dynamics, and conservation analyses were performed on three out of the nine variants identified in our patients, namely, FGF17 (p.Gly70Arg), DUSP6 (p.Leu145Arg), and CHD7 p.(Thr917Met). Except for DUSP6, where the L145R variant was shown to disrupt the interaction between β6 and β3, needed for extracellular signal-regulated kinase 2 (ERK2) binding and recognition, no significant changes were identified between the wild-types and mutants of the other proteins. We found a new pathogenic variant of the CHD7 gene. The molecular modeling results suggest that the VUS of the DUSP6 (c.434T>G, p.Leu145Arg) gene may play a role in the pathogenesis of cHH. However, our analysis indicates that it is unlikely that the VUSs for the IL17RD (c.960G>A, p.Met320Ile) and FGF17 (c.208G>A, p.Gly70Arg) genes are involved in the pathogenesis of cHH. Functional studies are needed to confirm this hypothesis.
Collapse
Affiliation(s)
- Rossella Cannarella
- Department of Clinical and Experimental Medicine, University of Catania, Via S. Sofia 78, 95123 Catania, Italy
| | - Carmelo Gusmano
- Department of Clinical and Experimental Medicine, University of Catania, Via S. Sofia 78, 95123 Catania, Italy
| | - Rosita A Condorelli
- Department of Clinical and Experimental Medicine, University of Catania, Via S. Sofia 78, 95123 Catania, Italy
| | - Andrea Bernini
- Department of Biotechnology, Chemistry and Pharmacy, University of Siena, 53100 Siena, Italy
| | | | | | | | | | | | - Matteo Bertelli
- Diagnostics Unit, MAGI EUREGIO, 39100 Bolzano, Italy
- Diagnostics Unit, MAGI'S LAB, 38068 Rovereto, Italy
| | - Sandro La Vignera
- Department of Clinical and Experimental Medicine, University of Catania, Via S. Sofia 78, 95123 Catania, Italy
| | - Aldo E Calogero
- Department of Clinical and Experimental Medicine, University of Catania, Via S. Sofia 78, 95123 Catania, Italy
| |
Collapse
|
3
|
Mandloi S, Chakrabarti S. Protein sites with more coevolutionary connections tend to evolve slower, while more variable protein families acquire higher coevolutionary connections. F1000Res 2017; 6:453. [PMID: 28751967 PMCID: PMC5506539 DOI: 10.12688/f1000research.11251.2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/05/2017] [Indexed: 11/20/2022] Open
Abstract
Background: Amino acid exchanges within proteins sometimes compensate for one another and could therefore be co-evolved. It is essential to investigate the intricate relationship between the extent of coevolution and the evolutionary variability exerted at individual protein sites, as well as the whole protein. Methods: In this study, we have used a reliable set of coevolutionary connections (sites within 10Å spatial distance) and investigated their correlation with the evolutionary diversity within the respective protein sites. Results: Based on our observations, we propose an interesting hypothesis that higher numbers of coevolutionary connections are associated with lesser evolutionary variable protein sites, while higher numbers of the coevolutionary connections can be observed for a protein family that has higher evolutionary variability. Our findings also indicate that highly coevolved sites located in a solvent accessible state tend to be less evolutionary variable. This relationship reverts at the whole protein level where cytoplasmic and extracellular proteins show moderately higher anti-correlation between the number of coevolutionary connections and the average evolutionary conservation of the whole protein. Conclusions: Observations and hypothesis presented in this study provide intriguing insights towards understanding the critical relationship between coevolutionary and evolutionary changes observed within proteins. Our observations encourage further investigation to find out the reasons behind subtle variations in the relationship between coevolutionary connectivity and evolutionary diversity for proteins located at various cellular localizations and/or involved in different molecular-biological functions.
Collapse
Affiliation(s)
- Sapan Mandloi
- Department of Structural Biology and Bioinformatics Division, Council of Scientific and Industrial Research, Indian Institute of Chemical Biology, Kolkata, West Bengal, 700032, India
| | - Saikat Chakrabarti
- Department of Structural Biology and Bioinformatics Division, Council of Scientific and Industrial Research, Indian Institute of Chemical Biology, Kolkata, West Bengal, 700032, India
| |
Collapse
|
4
|
Abstract
The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β-lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β-lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β-lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.
Collapse
Affiliation(s)
- Pradeep Bhadola
- Department of Physics and Astrophysics, University of Delhi, Delhi 110007, India
| | - Nivedita Deo
- Department of Physics and Astrophysics, University of Delhi, Delhi 110007, India
| |
Collapse
|
5
|
Wagner JR, Lee CT, Durrant JD, Malmstrom RD, Feher VA, Amaro RE. Emerging Computational Methods for the Rational Discovery of Allosteric Drugs. Chem Rev 2016; 116:6370-90. [PMID: 27074285 PMCID: PMC4901368 DOI: 10.1021/acs.chemrev.5b00631] [Citation(s) in RCA: 148] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
![]()
Allosteric drug development holds
promise for delivering medicines
that are more selective and less toxic than those that target orthosteric
sites. To date, the discovery of allosteric binding sites and lead
compounds has been mostly serendipitous, achieved through high-throughput
screening. Over the past decade, structural data has become more readily
available for larger protein systems and more membrane protein classes
(e.g., GPCRs and ion channels), which are common allosteric drug targets.
In parallel, improved simulation methods now provide better atomistic
understanding of the protein dynamics and cooperative motions that
are critical to allosteric mechanisms. As a result of these advances,
the field of predictive allosteric drug development is now on the
cusp of a new era of rational structure-based computational methods.
Here, we review algorithms that predict allosteric sites based on
sequence data and molecular dynamics simulations, describe tools that
assess the druggability of these pockets, and discuss how Markov state
models and topology analyses provide insight into the relationship
between protein dynamics and allosteric drug binding. In each section,
we first provide an overview of the various method classes before
describing relevant algorithms and software packages.
Collapse
Affiliation(s)
- Jeffrey R Wagner
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Christopher T Lee
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Jacob D Durrant
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Robert D Malmstrom
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Victoria A Feher
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Rommie E Amaro
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| |
Collapse
|
6
|
Gao H, Yu X, Dou Y, Wang J. New Measurement for Correlation of Co-evolution Relationship of Subsequences in Protein. Interdiscip Sci 2015; 7:364-72. [PMID: 26396121 DOI: 10.1007/s12539-015-0024-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2014] [Revised: 04/08/2014] [Accepted: 04/16/2014] [Indexed: 11/26/2022]
Abstract
Many computational tools have been developed to measure the protein residues co-evolution. Most of them only focus on co-evolution for pairwise residues in a protein sequence. However, number of residues participate in co-evolution might be multiple. And some co-evolved residues are clustered in several distinct regions in primary structure. Therefore, the co-evolution among the adjacent residues and the correlation between the distinct regions offer insights into function and evolution of the protein and residues. Subsequence is used to represent the adjacent multiple residues in one distinct region. In the paper, co-evolution relationship in each subsequence is represented by mutual information matrix (MIM). Then, Pearson's correlation coefficient: R value is developed to measure the similarity correlation of two MIMs. MSAs from Catalytic Data Base (Catalytic Site Atlas, CSA) are used for testing. R value characterizes a specific class of residues. In contrast to individual pairwise co-evolved residues, adjacent residues without high individual MI values are found since the co-evolved relationship among them is similar to that among another set of adjacent residues. These subsequences possess some flexibility in the composition of side chains, such as the catalyzed environment.
Collapse
Affiliation(s)
- Hongyun Gao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, 116024, China
- Information and Engineering College, Dalian University, Dalian, 116622, China
| | - Xiaoqing Yu
- College of Sciences, Shanghai Institute of Technology, Shanghai, 201418, China
| | - Yongchao Dou
- Center for Plant Science and Innovation, School of Biological Sciences, University of Nebraska, Lincoln, NE, 68588, USA
| | - Jun Wang
- Department of Mathematics, Shanghai Normal University, Shanghai, 200234, China.
| |
Collapse
|
7
|
Gao H, Yu X, Dou Y, Wang J. New measurement for correlation of co-evolution relationship of subsequences in protein. Interdiscip Sci 2015. [PMID: 25663109 DOI: 10.1007/s12539-014-0221-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2014] [Revised: 04/08/2014] [Accepted: 04/16/2014] [Indexed: 11/24/2022]
Abstract
Many computational tools have been developed to measure the protein residues co-evolution. Most of them only focus on co-evolution for pairwise residues in a protein sequence. However, number of residues participate in co-evolution might be multiple. And some co-evolved residues are clustered in several distinct regions in primary structure. Therefore, the co-evolution among the adjacent residues, and the correlation between the distinct regions offer insights into function and evolution of the protein and residues. Subsequence is used to represent the adjacent multiple residues in one distinct region. In the paper, co-evolution relationship in each subsequence is represented by mutual information matrix (MIM). Then, Pearson's Correlation Coefficient: R value is developed to measure the similarity correlation of two MIMs. MSAs from Catalytic Data Base (Catalytic Site Atlas, CSA) is used for testing. R value characterizes a specific class of residues. In contrast to individual pairwise co-evolved residues, adjacent residues without high individual MI values are found since the co-evolved relationship among them is similar to that among another set of adjacent residues. These subsequences possess some flexibility in the composition of side chains, such as the catalyzed environment.
Collapse
Affiliation(s)
- Hongyun Gao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, 116024, China
| | | | | | | |
Collapse
|
8
|
Li G, Theys K, Verheyen J, Pineda-Peña AC, Khouri R, Piampongsant S, Eusébio M, Ramon J, Vandamme AM. A new ensemble coevolution system for detecting HIV-1 protein coevolution. Biol Direct 2015; 10:1. [PMID: 25564011 PMCID: PMC4332441 DOI: 10.1186/s13062-014-0031-8] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Accepted: 12/02/2014] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND A key challenge in the field of HIV-1 protein evolution is the identification of coevolving amino acids at the molecular level. In the past decades, many sequence-based methods have been designed to detect position-specific coevolution within and between different proteins. However, an ensemble coevolution system that integrates different methods to improve the detection of HIV-1 protein coevolution has not been developed. RESULTS We integrated 27 sequence-based prediction methods published between 2004 and 2013 into an ensemble coevolution system. This system allowed combinations of different sequence-based methods for coevolution predictions. Using HIV-1 protein structures and experimental data, we evaluated the performance of individual and combined sequence-based methods in the prediction of HIV-1 intra- and inter-protein coevolution. We showed that sequence-based methods clustered according to their methodology, and a combination of four methods outperformed any of the 27 individual methods. This four-method combination estimated that HIV-1 intra-protein coevolving positions were mainly located in functional domains and physically contacted with each other in the protein tertiary structures. In the analysis of HIV-1 inter-protein coevolving positions between Gag and protease, protease drug resistance positions near the active site mostly coevolved with Gag cleavage positions (V128, S373-T375, A431, F448-P453) and Gag C-terminal positions (S489-Q500) under selective pressure of protease inhibitors. CONCLUSIONS This study presents a new ensemble coevolution system which detects position-specific coevolution using combinations of 27 different sequence-based methods. Our findings highlight key coevolving residues within HIV-1 structural proteins and between Gag and protease, shedding light on HIV-1 intra- and inter-protein coevolution.
Collapse
Affiliation(s)
- Guangdi Li
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Kristof Theys
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Jens Verheyen
- Institute of Virology, University hospital, University Duisburg-Essen, Essen, Germany.
| | - Andrea-Clemencia Pineda-Peña
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium. .,Clinical and Molecular Infectious Disease Group, Faculty of Sciences and Mathematics, Universidad del Rosario, Bogotá, Colombia.
| | - Ricardo Khouri
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Supinya Piampongsant
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium.
| | - Mónica Eusébio
- Centro de Malária e Outras Doenças Tropicais and Unidade de Microbiologia, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisboa, Portugal.
| | - Jan Ramon
- Department of Computer Science, KU Leuven - University of Leuven, Leuven, Belgium.
| | - Anne-Mieke Vandamme
- KU Leuven - University of Leuven, Department of Microbiology and Immunology, Rega Institute for Medical Research, Clinical and Epidemiological Virology, Leuven, Belgium. .,Centro de Malária e Outras Doenças Tropicais and Unidade de Microbiologia, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisboa, Portugal.
| |
Collapse
|
9
|
Hinsen K, Vaitinadapoule A, Ostuni MA, Etchebest C, Lacapere JJ. Construction and validation of an atomic model for bacterial TSPO from electron microscopy density, evolutionary constraints, and biochemical and biophysical data. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2014; 1848:568-80. [PMID: 25450341 DOI: 10.1016/j.bbamem.2014.10.028] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Revised: 10/01/2014] [Accepted: 10/20/2014] [Indexed: 11/30/2022]
Abstract
The 18 kDa protein TSPO is a highly conserved transmembrane protein found in bacteria, yeast, animals and plants. TSPO is involved in a wide range of physiological functions, among which the transport of several molecules. The atomic structure of monomeric ligand-bound mouse TSPO in detergent has been published recently. A previously published low-resolution structure of Rhodobacter sphaeroides TSPO, obtained from tubular crystals with lipids and observed in cryo-electron microscopy, revealed an oligomeric structure without any ligand. We analyze this electron microscopy density in view of available biochemical and biophysical data, building a matching atomic model for the monomer and then the entire crystal. We compare its intra- and inter-molecular contacts with those predicted by amino acid covariation in TSPO proteins from evolutionary sequence analysis. The arrangement of the five transmembrane helices in a monomer of our model is different from that observed for the mouse TSPO. We analyze possible ligand binding sites for protoporphyrin, for the high-affinity ligand PK 11195, and for cholesterol in TSPO monomers and/or oligomers, and we discuss possible functional implications.
Collapse
Affiliation(s)
- Konrad Hinsen
- Centre de Biophysique Moléculaire (CNRS), Rue Charles Sadron, 45071 Orléans Cedex, France; Synchrotron SOLEIL, Division Expériences, Saint Aubin, B.P. 48, 91192 Gif-sur-Yvette Cedex, France.
| | - Aurore Vaitinadapoule
- INSERM, UMR-S1134, 6 rue Alexandre Cabanel, Université Paris 7 Denis Diderot, F-75015 Paris, France; Université Paris Diderot, Sorbonne Paris Cité, Paris, France; Institut National de la Transfusion Sanguine (INTS), Paris, France; GR-Ex, Laboratoire d'Excellence, Paris, France; National Centre for Biological Sciences (NCBS), Tata Institute for Fundamental Research, GKVK Campus, Bangalore, Karnataka, India; Dynamique des Structures et des Interactions des des Macromolécules Biologiques, France.
| | - Mariano A Ostuni
- INSERM, UMR-S1134, 6 rue Alexandre Cabanel, Université Paris 7 Denis Diderot, F-75015 Paris, France; Université Paris Diderot, Sorbonne Paris Cité, Paris, France; Institut National de la Transfusion Sanguine (INTS), Paris, France; GR-Ex, Laboratoire d'Excellence, Paris, France.
| | - Catherine Etchebest
- INSERM, UMR-S1134, 6 rue Alexandre Cabanel, Université Paris 7 Denis Diderot, F-75015 Paris, France; Université Paris Diderot, Sorbonne Paris Cité, Paris, France; Institut National de la Transfusion Sanguine (INTS), Paris, France; GR-Ex, Laboratoire d'Excellence, Paris, France; Dynamique des Structures et des Interactions des des Macromolécules Biologiques, France.
| | - Jean-Jacques Lacapere
- Sorbonne Universités, UPMC Univ Paris 06, Laboratoire de Biomolécules (LBM), 4 Place Jussieu, F-75005 Paris, France; Ecole Normale Supérieure - PSL Research University, Département de Chimie, 24, rue Lhomond, 75005 Paris, France; CNRS, UMR 7203 LBM, F-75005 Paris, France.
| |
Collapse
|
10
|
Pelé J, Moreau M, Abdi H, Rodien P, Castel H, Chabbert M. Comparative analysis of sequence covariation methods to mine evolutionary hubs: Examples from selected GPCR families. Proteins 2014; 82:2141-56. [DOI: 10.1002/prot.24570] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Revised: 03/11/2014] [Accepted: 03/19/2014] [Indexed: 01/26/2023]
Affiliation(s)
- Julien Pelé
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Matthieu Moreau
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Hervé Abdi
- The University of Texas at Dallas; School of Behavioral and Brain Sciences; Richardson, TX 75080-3021 USA
| | - Patrice Rodien
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
- Department of Endocrinology, Reference Centre for the pathologies of hormonal receptivity; Centre Hospitalier Universitaire of Angers; 4 rue Larrey 49933 Angers France
| | - Hélène Castel
- INSERM U982, Laboratory of Neuronal and Neuroendocrine Communication and Differentiation, DC2N; University of Rouen; 76821 Mont-Saint-Aignan France
| | - Marie Chabbert
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| |
Collapse
|
11
|
Gültas M, Düzgün G, Herzog S, Jäger SJ, Meckbach C, Wingender E, Waack S. Quantum coupled mutation finder: predicting functionally or structurally important sites in proteins using quantum Jensen-Shannon divergence and CUDA programming. BMC Bioinformatics 2014; 15:96. [PMID: 24694117 PMCID: PMC4098773 DOI: 10.1186/1471-2105-15-96] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Accepted: 03/26/2014] [Indexed: 11/29/2022] Open
Abstract
Background The identification of functionally or structurally important non-conserved residue sites in protein MSAs is an important challenge for understanding the structural basis and molecular mechanism of protein functions. Despite the rich literature on compensatory mutations as well as sequence conservation analysis for the detection of those important residues, previous methods often rely on classical information-theoretic measures. However, these measures usually do not take into account dis/similarities of amino acids which are likely to be crucial for those residues. In this study, we present a new method, the Quantum Coupled Mutation Finder (QCMF) that incorporates significant dis/similar amino acid pair signals in the prediction of functionally or structurally important sites. Results The result of this study is twofold. First, using the essential sites of two human proteins, namely epidermal growth factor receptor (EGFR) and glucokinase (GCK), we tested the QCMF-method. The QCMF includes two metrics based on quantum Jensen-Shannon divergence to measure both sequence conservation and compensatory mutations. We found that the QCMF reaches an improved performance in identifying essential sites from MSAs of both proteins with a significantly higher Matthews correlation coefficient (MCC) value in comparison to previous methods. Second, using a data set of 153 proteins, we made a pairwise comparison between QCMF and three conventional methods. This comparison study strongly suggests that QCMF complements the conventional methods for the identification of correlated mutations in MSAs. Conclusions QCMF utilizes the notion of entanglement, which is a major resource of quantum information, to model significant dissimilar and similar amino acid pair signals in the detection of functionally or structurally important sites. Our results suggest that on the one hand QCMF significantly outperforms the previous method, which mainly focuses on dissimilar amino acid signals, to detect essential sites in proteins. On the other hand, it is complementary to the existing methods for the identification of correlated mutations. The method of QCMF is computationally intensive. To ensure a feasible computation time of the QCMF’s algorithm, we leveraged Compute Unified Device Architecture (CUDA). The QCMF server is freely accessible at http://qcmf.informatik.uni-goettingen.de/.
Collapse
Affiliation(s)
- Mehmet Gültas
- Institute of Computer Science, University of Göttingen, Goldschmidtstr, 7, 37077 Göttingen, Germany.
| | | | | | | | | | | | | |
Collapse
|
12
|
Wang C, Huang R, He B, Du Q. Improving the thermostability of alpha-amylase by combinatorial coevolving-site saturation mutagenesis. BMC Bioinformatics 2012; 13:263. [PMID: 23057711 PMCID: PMC3478181 DOI: 10.1186/1471-2105-13-263] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2012] [Accepted: 09/11/2012] [Indexed: 11/12/2022] Open
Abstract
Background The generation of focused mutant libraries at hotspot residues is an important strategy in directed protein evolution. Existing methods, such as combinatorial active site testing and residual coupling analysis, depend primarily on the evolutionary conserved information to find the hotspot residues. Hardly any attention has been paid to another important functional and structural determinants, the functionally correlated variation information--coevolution. Results In this paper, we suggest a new method, named combinatorial coevolving-site saturation mutagenesis (CCSM), in which the functionally correlated variation sites of proteins are chosen as the hotspot sites to construct focused mutant libraries. The CCSM approach was used to improve the thermal stability of α-amylase from Bacillus subtilis CN7 (Amy7C). The results indicate that the CCSM can identify novel beneficial mutation sites, and enhance the thermal stability of wild-type Amy7C by 8°C (
T5030), which could not be achieved with the ordinarily rational introduction of single or a double point mutation. Conclusions Our method is able to produce more thermostable mutant α-amylases with novel beneficial mutations at new sites. It is also verified that the coevolving sites can be used as the hotspots to construct focused mutant libraries in protein engineering. This study throws new light on the active researches of the molecular coevolution.
Collapse
Affiliation(s)
- Chenghua Wang
- Nanjing University of Technology, Nanjing, Jiangsu, China
| | | | | | | |
Collapse
|
13
|
Gültas M, Haubrock M, Tüysüz N, Waack S. Coupled mutation finder: a new entropy-based method quantifying phylogenetic noise for the detection of compensatory mutations. BMC Bioinformatics 2012; 13:225. [PMID: 22963049 PMCID: PMC3577461 DOI: 10.1186/1471-2105-13-225] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 08/23/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The detection of significant compensatory mutation signals in multiple sequence alignments (MSAs) is often complicated by noise. A challenging problem in bioinformatics is remains the separation of significant signals between two or more non-conserved residue sites from the phylogenetic noise and unrelated pair signals. Determination of these non-conserved residue sites is as important as the recognition of strictly conserved positions for understanding of the structural basis of protein functions and identification of functionally important residue regions. In this study, we developed a new method, the Coupled Mutation Finder (CMF) quantifying the phylogenetic noise for the detection of compensatory mutations. RESULTS To demonstrate the effectiveness of this method, we analyzed essential sites of two human proteins: epidermal growth factor receptor (EGFR) and glucokinase (GCK). Our results suggest that the CMF is able to separate significant compensatory mutation signals from the phylogenetic noise and unrelated pair signals. The vast majority of compensatory mutation sites found by the CMF are related to essential sites of both proteins and they are likely to affect protein stability or functionality. CONCLUSIONS The CMF is a new method, which includes an MSA-specific statistical model based on multiple testing procedures that quantify the error made in terms of the false discovery rate and a novel entropy-based metric to upscale BLOSUM62 dissimilar compensatory mutations. Therefore, it is a helpful tool to predict and investigate compensatory mutation sites of structural or functional importance in proteins. We suggest that the CMF could be used as a novel automated function prediction tool that is required for a better understanding of the structural basis of proteins. The CMF server is freely accessible at http://cmf.bioinf.med.uni-goettingen.de.
Collapse
Affiliation(s)
- Mehmet Gültas
- Institute of Computer Science, University of Göttingen, Goldschmidtstr. 7, Göttingen, 37077, Germany.
| | | | | | | |
Collapse
|
14
|
Dietrich S, Borst N, Schlee S, Schneider D, Janda JO, Sterner R, Merkl R. Experimental assessment of the importance of amino acid positions identified by an entropy-based correlation analysis of multiple-sequence alignments. Biochemistry 2012; 51:5633-41. [PMID: 22737967 DOI: 10.1021/bi300747r] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The analysis of a multiple-sequence alignment (MSA) with correlation methods identifies pairs of residue positions whose occupation with amino acids changes in a concerted manner. It is plausible to assume that positions that are part of many such correlation pairs are important for protein function or stability. We have used the algorithm H2r to identify positions k in the MSAs of the enzymes anthranilate phosphoribosyl transferase (AnPRT) and indole-3-glycerol phosphate synthase (IGPS) that show a high conn(k) value, i.e., a large number of significant correlations in which k is involved. The importance of the identified residues was experimentally validated by performing mutagenesis studies with sAnPRT and sIGPS from the archaeon Sulfolobus solfataricus. For sAnPRT, five H2r mutant proteins were generated by replacing nonconserved residues with alanine or the prevalent residue of the MSA. As a control, five residues with conn(k) values of zero were chosen randomly and replaced with alanine. The catalytic activities and conformational stabilities of the H2r and control mutant proteins were analyzed by steady-state enzyme kinetics and thermal unfolding studies. Compared to wild-type sAnPRT, the catalytic efficiencies (k(cat)/K(M)) were largely unaltered. In contrast, the apparent thermal unfolding temperature (T(M)(app)) was lowered in most proteins. Remarkably, the strongest observed destabilization (ΔT(M)(app) = 14 °C) was caused by the V284A exchange, which pertains to the position with the highest correlation signal [conn(k) = 11]. For sIGPS, six H2r mutant and four control proteins with alanine exchanges were generated and characterized. The k(cat)/K(M) values of four H2r mutant proteins were reduced between 13- and 120-fold, and their T(M)(app) values were decreased by up to 5 °C. For the sIGPS control proteins, the observed activity and stability decreases were much less severe. Our findings demonstrate that positions with high conn(k) values have an increased probability of being important for enzyme function or stability.
Collapse
Affiliation(s)
- Susanne Dietrich
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Universitätsstrasse 31, D-93053 Regensburg, Germany
| | | | | | | | | | | | | |
Collapse
|
15
|
Bay DC, Hafez M, Young MJ, Court DA. Phylogenetic and coevolutionary analysis of the β-barrel protein family comprised of mitochondrial porin (VDAC) and Tom40. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2011; 1818:1502-19. [PMID: 22178864 DOI: 10.1016/j.bbamem.2011.11.027] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Revised: 11/14/2011] [Accepted: 11/22/2011] [Indexed: 12/21/2022]
Abstract
Beta-barrel proteins are the main transit points across the mitochondrial outer membrane. Mitochondrial porin, the voltage-dependent, anion-selective channel (VDAC), is responsible for the passage of small molecules between the mitochondrion and the cytosol. Through interactions with other mitochondrial and cellular proteins, it is involved in regulating organellar and cellular metabolism and likely contributes to mitochondrial structure. Tom40 is part of the translocase of the outer membrane, and acts as the channel for passage of preproteins during their import into the organelle. These proteins appear to share a common evolutionary origin and structure. In the current study, the evolutionary relationships between and within both proteins were investigated through phylogenetic analysis. The two groups have a common origin and have followed independent, complex evolutionary pathways, leading to the generation of paralogues in animals and plants. Structures of diverse representatives were modeled, revealing common themes rather than sites of high identity in both groups. Within each group, intramolecular coevolution was assessed, revealing a new set of sites potentially involved in structure-function relationships in these molecules. A weak link between Tom40 and proteins related to the mitochondrial distribution and morphology protein, Mdm10, was identified. This article is part of a Special Issue entitled: VDAC structure, function, and regulation of mitochondrial metabolism.
Collapse
Affiliation(s)
- Denice C Bay
- Department of Microbiology, University of Manitoba, Winnipeg, Manitoba, Canada
| | | | | | | |
Collapse
|