1
|
Hernández Berthet AS, Aptekmann AA, Tejero J, Sánchez IE, Noguera ME, Roman EA. Associating protein sequence positions with the modulation of quantitative phenotypes. Arch Biochem Biophys 2024; 755:109979. [PMID: 38583654 DOI: 10.1016/j.abb.2024.109979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/11/2024] [Accepted: 03/27/2024] [Indexed: 04/09/2024]
Abstract
Although protein sequences encode the information for folding and function, understanding their link is not an easy task. Unluckily, the prediction of how specific amino acids contribute to these features is still considerably impaired. Here, we developed a simple algorithm that finds positions in a protein sequence with potential to modulate the studied quantitative phenotypes. From a few hundred protein sequences, we perform multiple sequence alignments, obtain the per-position pairwise differences for both the sequence and the observed phenotypes, and calculate the correlation between these last two quantities. We tested our methodology with four cases: archaeal Adenylate Kinases and the organisms optimal growth temperatures, microbial rhodopsins and their maximal absorption wavelengths, mammalian myoglobins and their muscular concentration, and inhibition of HIV protease clinical isolates by two different molecules. We found from 3 to 10 positions tightly associated with those phenotypes, depending on the studied case. We showed that these correlations appear using individual positions but an improvement is achieved when the most correlated positions are jointly analyzed. Noteworthy, we performed phenotype predictions using a simple linear model that links per-position divergences and differences in the observed phenotypes. Predictions are comparable to the state-of-art methodologies which, in most of the cases, are far more complex. All of the calculations are obtained at a very low information cost since the only input needed is a multiple sequence alignment of protein sequences with their associated quantitative phenotypes. The diversity of the explored systems makes our work a valuable tool to find sequence determinants of biological activity modulation and to predict various functional features for uncharacterized members of a protein family.
Collapse
Affiliation(s)
- Ayelén S Hernández Berthet
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Intendente Güiraldes 2160 - Ciudad Universitaria, 1428EGA, C.A.B.A., Argentina.
| | - Ariel A Aptekmann
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Buenos Aires, Argentina; Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, 08873, USA; Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ, 08901, USA.
| | - Jesús Tejero
- Heart, Lung, Blood and Vascular Medicine Institute, University of Pittsburgh, Pittsburgh, PA, 15261, USA; Division of Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA; Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, 15260, USA; Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
| | - Ignacio E Sánchez
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Buenos Aires, Argentina.
| | - Martín E Noguera
- Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química y Fisicoquímica Biológicas Dr. Alejandro Paladini, Junín 956, 1113AAD, C.A.B.A., Argentina; Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Saenz Peña 352, B1876BXD, Bernal, Argentina.
| | - Ernesto A Roman
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Intendente Güiraldes 2160 - Ciudad Universitaria, 1428EGA, C.A.B.A., Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química y Fisicoquímica Biológicas Dr. Alejandro Paladini, Junín 956, 1113AAD, C.A.B.A., Argentina.
| |
Collapse
|
2
|
Martinez-Gomez L, Cerdán-Vélez D, Abascal F, Tress ML. Origins and Evolution of Human Tandem Duplicated Exon Substitution Events. Genome Biol Evol 2022; 14:6809199. [PMID: 36346145 PMCID: PMC9741552 DOI: 10.1093/gbe/evac162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Revised: 10/25/2022] [Accepted: 10/29/2022] [Indexed: 11/10/2022] Open
Abstract
The mutually exclusive splicing of tandem duplicated exons produces protein isoforms that are identical save for a homologous region that allows for the fine tuning of protein function. Tandem duplicated exon substitution events are rare, yet highly important alternative splicing events. Most events are ancient, their isoforms are highly expressed, and they have significantly more pathogenic mutations than other splice events. Here, we analyzed the physicochemical properties and functional roles of the homologous polypeptide regions produced by the 236 tandem duplicated exon substitutions annotated in the human gene set. We find that the most important structural and functional residues in these homologous regions are maintained, and that most changes are conservative rather than drastic. Three quarters of the isoforms produced from tandem duplicated exon substitution events are tissue-specific, particularly in nervous and cardiac tissues, and tandem duplicated exon substitution events are enriched in functional terms related to structures in the brain and skeletal muscle. We find considerable evidence for the convergent evolution of tandem duplicated exon substitution events in vertebrates, arthropods, and nematodes. Twelve human gene families have orthologues with tandem duplicated exon substitution events in both Drosophila melanogaster and Caenorhabditis elegans. Six of these gene families are ion transporters, suggesting that tandem exon duplication in genes that control the flow of ions into the cell has an adaptive benefit. The ancient origins, the strong indications of tissue-specific functions, and the evidence of convergent evolution suggest that these events may have played important roles in the evolution of animal tissues and organs.
Collapse
Affiliation(s)
- Laura Martinez-Gomez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Daniel Cerdán-Vélez
- Bioinformatics Unit, Spanish National Cancer Research Centre (CNIO), C. Melchor Fernandez Almagro, 3, 28029 Madrid, Spain
| | - Federico Abascal
- Somatic Evolution Group, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire CB10 1SA, United Kingdom
| | | |
Collapse
|
3
|
Kim D, Noh MH, Park M, Kim I, Ahn H, Ye DY, Jung GY, Kim S. Enzyme activity engineering based on sequence co-evolution analysis. Metab Eng 2022; 74:49-60. [PMID: 36113751 DOI: 10.1016/j.ymben.2022.09.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 08/31/2022] [Accepted: 09/05/2022] [Indexed: 11/17/2022]
Abstract
The utility of engineering enzyme activity is expanding with the development of biotechnology. Conventional methods have limited applicability as they require high-throughput screening or three-dimensional structures to direct target residues of activity control. An alternative method uses sequence evolution of natural selection. A repertoire of mutations was selected for fine-tuning enzyme activities to adapt to varying environments during the evolution. Here, we devised a strategy called sequence co-evolutionary analysis to control the efficiency of enzyme reactions (SCANEER), which scans the evolution of protein sequences and direct mutation strategy to improve enzyme activity. We hypothesized that amino acid pairs for various enzyme activity were encoded in the evolutionary history of protein sequences, whereas loss-of-function mutations were avoided since those are depleted during the evolution. SCANEER successfully predicted the enzyme activities of beta-lactamase and aminoglycoside 3'-phosphotransferase. SCANEER was further experimentally validated to control the activities of three different enzymes of great interest in chemical production: cis-aconitate decarboxylase, α-ketoglutaric semialdehyde dehydrogenase, and inositol oxygenase. Activity-enhancing mutations that improve substrate-binding affinity or turnover rate were found at sites distal from known active sites or ligand-binding pockets. We provide SCANEER to control desired enzyme activity through a user-friendly webserver.
Collapse
Affiliation(s)
- Donghyo Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, South Korea
| | - Myung Hyun Noh
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, South Korea
| | - Minhyuk Park
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, South Korea
| | - Inhae Kim
- ImmunoBiome Inc., Pohang, South Korea
| | - Hyunsoo Ahn
- Graduate School of Artificial Intelligence, Pohang University of Science and Technology, Pohang, South Korea
| | - Dae-Yeol Ye
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, South Korea
| | - Gyoo Yeol Jung
- Department of Chemical Engineering, Pohang University of Science and Technology, Pohang, South Korea; Institute of Convergence Research and Education in Advanced Technology, Yonsei University, Seoul, South Korea; School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, Pohang, South Korea.
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang, South Korea; Graduate School of Artificial Intelligence, Pohang University of Science and Technology, Pohang, South Korea; Institute of Convergence Research and Education in Advanced Technology, Yonsei University, Seoul, South Korea; School of Interdisciplinary Bioscience and Bioengineering, Pohang University of Science and Technology, Pohang, South Korea.
| |
Collapse
|
4
|
Pazos F. Prediction of Protein Sites and Physicochemical Properties Related to Functional Specificity. Bioengineering (Basel) 2021; 8:bioengineering8120201. [PMID: 34940354 PMCID: PMC8698372 DOI: 10.3390/bioengineering8120201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/25/2021] [Accepted: 11/29/2021] [Indexed: 11/16/2022] Open
Abstract
Specificity Determining Positions (SDPs) are protein sites responsible for functional specificity within a family of homologous proteins. These positions are extracted from a family’s multiple sequence alignment and complement the fully conserved positions as predictors of functional sites. SDP analysis is now routinely used for locating these specificity-related sites in families of proteins of biomedical or biotechnological interest with the aim of mutating them to switch specificities or design new ones. There are many different approaches for detecting these positions in multiple sequence alignments. Nevertheless, existing methods report the potential SDP positions but they do not provide any clue on the physicochemical basis behind the functional specificity, which has to be inferred a-posteriori by manually inspecting these positions in the alignment. In this work, a new methodology is presented that, concomitantly with the detection of the SDPs, automatically provides information on the amino-acid physicochemical properties more related to the change in specificity. This new method is applied to two different multiple sequence alignments of homologous of the well-studied RasH protein representing different cases of functional specificity and the results discussed in detail.
Collapse
Affiliation(s)
- Florencio Pazos
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), c/Darwin, 3, 28049 Madrid, Spain
| |
Collapse
|
5
|
Pitarch B, Ranea JAG, Pazos F. Protein residues determining interaction specificity in paralogous families. Bioinformatics 2021; 37:1076-1082. [PMID: 33135068 DOI: 10.1093/bioinformatics/btaa934] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 10/06/2020] [Accepted: 10/22/2020] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Predicting the residues controlling a protein's interaction specificity is important not only to better understand its interactions but also to design mutations aimed at fine-tuning or swapping them as well. RESULTS In this work, we present a methodology that combines sequence information (in the form of multiple sequence alignments) with interactome information to detect that kind of residues in paralogous families of proteins. The interactome is used to define pairwise similarities of interaction contexts for the proteins in the alignment. The method looks for alignment positions with patterns of amino-acid changes reflecting the similarities/differences in the interaction neighborhoods of the corresponding proteins. We tested this new methodology in a large set of human paralogous families with structurally characterized interactions, and discuss in detail the results for the RasH family. We show that this approach is a better predictor of interfacial residues than both, sequence conservation and an equivalent 'unsupervised' method that does not use interactome information. AVAILABILITY AND IMPLEMENTATION http://csbg.cnb.csic.es/pazos/Xdet/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Borja Pitarch
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), 28049 Madrid, Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga 29071, Spain.,CIBER de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain.,Institute of Biomedical Research in Malaga (IBIMA), Malaga, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), 28049 Madrid, Spain
| |
Collapse
|
6
|
Kim D, Han SK, Lee K, Kim I, Kong J, Kim S. Evolutionary coupling analysis identifies the impact of disease-associated variants at less-conserved sites. Nucleic Acids Res 2019; 47:e94. [PMID: 31199866 PMCID: PMC6895274 DOI: 10.1093/nar/gkz536] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 05/03/2019] [Accepted: 06/05/2019] [Indexed: 12/20/2022] Open
Abstract
Genome-wide association studies have discovered a large number of genetic variants in human patients with the disease. Thus, predicting the impact of these variants is important for sorting disease-associated variants (DVs) from neutral variants. Current methods to predict the mutational impacts depend on evolutionary conservation at the mutation site, which is determined using homologous sequences and based on the assumption that variants at well-conserved sites have high impacts. However, many DVs at less-conserved but functionally important sites cannot be predicted by the current methods. Here, we present a method to find DVs at less-conserved sites by predicting the mutational impacts using evolutionary coupling analysis. Functionally important and evolutionarily coupled sites often have compensatory variants on cooperative sites to avoid loss of function. We found that our method identified known intolerant variants in a diverse group of proteins. Furthermore, at less-conserved sites, we identified DVs that were not identified using conservation-based methods. These newly identified DVs were frequently found at protein interaction interfaces, where species-specific mutations often alter interaction specificity. This work presents a means to identify less-conserved DVs and provides insight into the relationship between evolutionarily coupled sites and human DVs.
Collapse
Affiliation(s)
- Donghyo Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Seong Kyu Han
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Kwanghwan Lee
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Inhae Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - JungHo Kong
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| | - Sanguk Kim
- Department of Life Sciences, Pohang University of Science and Technology, Pohang 790-784, Korea
| |
Collapse
|
7
|
Li B, Fooksa M, Heinze S, Meiler J. Finding the needle in the haystack: towards solving the protein-folding problem computationally. Crit Rev Biochem Mol Biol 2018; 53:1-28. [PMID: 28976219 PMCID: PMC6790072 DOI: 10.1080/10409238.2017.1380596] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 08/22/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022]
Abstract
Prediction of protein tertiary structures from amino acid sequence and understanding the mechanisms of how proteins fold, collectively known as "the protein folding problem," has been a grand challenge in molecular biology for over half a century. Theories have been developed that provide us with an unprecedented understanding of protein folding mechanisms. However, computational simulation of protein folding is still difficult, and prediction of protein tertiary structure from amino acid sequence is an unsolved problem. Progress toward a satisfying solution has been slow due to challenges in sampling the vast conformational space and deriving sufficiently accurate energy functions. Nevertheless, several techniques and algorithms have been adopted to overcome these challenges, and the last two decades have seen exciting advances in enhanced sampling algorithms, computational power and tertiary structure prediction methodologies. This review aims at summarizing these computational techniques, specifically conformational sampling algorithms and energy approximations that have been frequently used to study protein-folding mechanisms or to de novo predict protein tertiary structures. We hope that this review can serve as an overview on how the protein-folding problem can be studied computationally and, in cases where experimental approaches are prohibitive, help the researcher choose the most relevant computational approach for the problem at hand. We conclude with a summary of current challenges faced and an outlook on potential future directions.
Collapse
Affiliation(s)
- Bian Li
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Michaela Fooksa
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
- Chemical and Physical Biology Graduate Program, Vanderbilt University, Nashville, TN, USA
| | - Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
8
|
Moratorio G, Henningsson R, Barbezange C, Carrau L, Bordería AV, Blanc H, Beaucourt S, Poirier EZ, Vallet T, Boussier J, Mounce BC, Fontes M, Vignuzzi M. Attenuation of RNA viruses by redirecting their evolution in sequence space. Nat Microbiol 2017; 2:17088. [PMID: 28581455 PMCID: PMC7098180 DOI: 10.1038/nmicrobiol.2017.88] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 04/27/2017] [Indexed: 12/18/2022]
Abstract
RNA viruses pose serious threats to human health. Their success relies on their capacity to generate genetic variability and, consequently, on their adaptive potential. We describe a strategy to attenuate RNA viruses by altering their evolutionary potential. We rationally altered the genomes of Coxsackie B3 and influenza A viruses to redirect their evolutionary trajectories towards detrimental regions in sequence space. Specifically, viral genomes were engineered to harbour more serine and leucine codons with nonsense mutation targets: codons that could generate Stop mutations after a single nucleotide substitution. Indeed, these viruses generated more Stop mutations both in vitro and in vivo, accompanied by significant losses in viral fitness. In vivo, the viruses were attenuated, generated high levels of neutralizing antibodies and protected against lethal challenge. Our study demonstrates that cornering viruses in ‘risky’ areas of sequence space may be implemented as a broad-spectrum vaccine strategy against RNA viruses. Virus attenuation is used to obtain vaccine strains. Here, the rapid evolution of RNA viruses is exploited by engineering their genomes to encode sites that are a mutation away from a stop codon, a clever method to generate attenuated viruses.
Collapse
Affiliation(s)
- Gonzalo Moratorio
- Viral Populations and Pathogenesis Unit, Institut Pasteur, CNRS UMR 3569, 28 rue du Dr. Roux, 75724 Paris cedex 15, France
| | - Rasmus Henningsson
- Viral Populations and Pathogenesis Unit, Institut Pasteur, CNRS UMR 3569, 28 rue du Dr. Roux, 75724 Paris cedex 15, France.,International Group for Data Analysis, Institut Pasteur, C3BI, USR 3756 IP CNRS, 28 rue du Dr. Roux, 75724 Paris cedex 15, France.,Centre for Mathematical Sciences, Lund University, 22100 Lund, Sweden
| | - Cyril Barbezange
- Viral Populations and Pathogenesis Unit, Institut Pasteur, CNRS UMR 3569, 28 rue du Dr. Roux, 75724 Paris cedex 15, France
| | - Lucia Carrau
- Viral Populations and Pathogenesis Unit, Institut Pasteur, CNRS UMR 3569, 28 rue du Dr. Roux, 75724 Paris cedex 15, France.,Sorbonne Paris Cité, Université Paris Diderot, Cellule Pasteur, 75013 Paris, France
| | - Antonio V Bordería
- Viral Populations and Pathogenesis Unit, Institut Pasteur, CNRS UMR 3569, 28 rue du Dr. Roux, 75724 Paris cedex 15, France.,International Group for Data Analysis, Institut Pasteur, C3BI, USR 3756 IP CNRS, 28 rue du Dr. Roux, 75724 Paris cedex 15, France
| | - Hervé Blanc
- Viral Populations and Pathogenesis Unit, Institut Pasteur, CNRS UMR 3569, 28 rue du Dr. Roux, 75724 Paris cedex 15, France
| | - Stephanie Beaucourt
- Viral Populations and Pathogenesis Unit, Institut Pasteur, CNRS UMR 3569, 28 rue du Dr. Roux, 75724 Paris cedex 15, France
| | - Enzo Z Poirier
- Viral Populations and Pathogenesis Unit, Institut Pasteur, CNRS UMR 3569, 28 rue du Dr. Roux, 75724 Paris cedex 15, France.,Sorbonne Paris Cité, Université Paris Diderot, Cellule Pasteur, 75013 Paris, France
| | - Thomas Vallet
- Viral Populations and Pathogenesis Unit, Institut Pasteur, CNRS UMR 3569, 28 rue du Dr. Roux, 75724 Paris cedex 15, France
| | - Jeremy Boussier
- International Group for Data Analysis, Institut Pasteur, C3BI, USR 3756 IP CNRS, 28 rue du Dr. Roux, 75724 Paris cedex 15, France.,Unité d'Immunobiologie des Cellules Dendritiques, Institut Pasteur, Inserm 1223, 25 rue du Dr. Roux, 75724 Paris cedex 15, Paris, France.,Ecole doctorale Frontières du vivant, Université Paris Diderot, 75013 Paris, France
| | - Bryan C Mounce
- Viral Populations and Pathogenesis Unit, Institut Pasteur, CNRS UMR 3569, 28 rue du Dr. Roux, 75724 Paris cedex 15, France
| | - Magnus Fontes
- International Group for Data Analysis, Institut Pasteur, C3BI, USR 3756 IP CNRS, 28 rue du Dr. Roux, 75724 Paris cedex 15, France.,Centre for Mathematical Sciences, Lund University, 22100 Lund, Sweden
| | - Marco Vignuzzi
- Viral Populations and Pathogenesis Unit, Institut Pasteur, CNRS UMR 3569, 28 rue du Dr. Roux, 75724 Paris cedex 15, France
| |
Collapse
|
9
|
O'Rourke KF, Gorman SD, Boehr DD. Biophysical and computational methods to analyze amino acid interaction networks in proteins. Comput Struct Biotechnol J 2016; 14:245-51. [PMID: 27441044 PMCID: PMC4939391 DOI: 10.1016/j.csbj.2016.06.002] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 06/04/2016] [Accepted: 06/13/2016] [Indexed: 12/20/2022] Open
Abstract
Globular proteins are held together by interacting networks of amino acid residues. A number of different structural and computational methods have been developed to interrogate these amino acid networks. In this review, we describe some of these methods, including analyses of X-ray crystallographic data and structures, computer simulations, NMR data, and covariation among protein sequences, and indicate the critical insights that such methods provide into protein function. This information can be leveraged towards the design of new allosteric drugs, and the engineering of new protein function and protein regulation strategies.
Collapse
Affiliation(s)
- Kathleen F O'Rourke
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| | - Scott D Gorman
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| | - David D Boehr
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
10
|
Wagner JR, Lee CT, Durrant JD, Malmstrom RD, Feher VA, Amaro RE. Emerging Computational Methods for the Rational Discovery of Allosteric Drugs. Chem Rev 2016; 116:6370-90. [PMID: 27074285 PMCID: PMC4901368 DOI: 10.1021/acs.chemrev.5b00631] [Citation(s) in RCA: 148] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
![]()
Allosteric drug development holds
promise for delivering medicines
that are more selective and less toxic than those that target orthosteric
sites. To date, the discovery of allosteric binding sites and lead
compounds has been mostly serendipitous, achieved through high-throughput
screening. Over the past decade, structural data has become more readily
available for larger protein systems and more membrane protein classes
(e.g., GPCRs and ion channels), which are common allosteric drug targets.
In parallel, improved simulation methods now provide better atomistic
understanding of the protein dynamics and cooperative motions that
are critical to allosteric mechanisms. As a result of these advances,
the field of predictive allosteric drug development is now on the
cusp of a new era of rational structure-based computational methods.
Here, we review algorithms that predict allosteric sites based on
sequence data and molecular dynamics simulations, describe tools that
assess the druggability of these pockets, and discuss how Markov state
models and topology analyses provide insight into the relationship
between protein dynamics and allosteric drug binding. In each section,
we first provide an overview of the various method classes before
describing relevant algorithms and software packages.
Collapse
Affiliation(s)
- Jeffrey R Wagner
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Christopher T Lee
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Jacob D Durrant
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Robert D Malmstrom
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Victoria A Feher
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Rommie E Amaro
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| |
Collapse
|
11
|
Russell N, Delatycki M, Grossmann M. Metastatic phaeochromocytoma in a 23-year-old woman with an unclassified variant in the von Hippel Lindau disease gene: how can the pathogenicity of this variant be determined? Clin Endocrinol (Oxf) 2015; 83:15-9. [PMID: 25557216 DOI: 10.1111/cen.12710] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/23/2014] [Revised: 11/15/2014] [Accepted: 12/21/2014] [Indexed: 12/11/2022]
Abstract
A 23-year-old woman with metastatic phaeochromocytoma was found to have a previously unclassified variant in the von Hippel Lindau disease gene (c.361G>C). We use this case to highlight the issue of unclassified single nucleotide variants and the approaches to help predict whether they are disease causing or neutral. With increasing use of genetic testing, and widespread clinical use of next-generation sequencing around the corner, this issue is likely to become more prominent.
Collapse
Affiliation(s)
- Nicholas Russell
- Department of Endocrinology, Austin Health, Heidelberg, VIC, Australia
| | - Martin Delatycki
- Department of Medicine Austin Health, University of Melbourne, Heidelberg, VIC, Australia
- Clinical Genetics Service, Austin Health, Heidelberg, VIC, Australia
| | - Mathis Grossmann
- Department of Endocrinology, Austin Health, Heidelberg, VIC, Australia
- Department of Medicine Austin Health, University of Melbourne, Heidelberg, VIC, Australia
- Clinical Genetics Service, Austin Health, Heidelberg, VIC, Australia
| |
Collapse
|
12
|
Pelé J, Moreau M, Abdi H, Rodien P, Castel H, Chabbert M. Comparative analysis of sequence covariation methods to mine evolutionary hubs: Examples from selected GPCR families. Proteins 2014; 82:2141-56. [DOI: 10.1002/prot.24570] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Revised: 03/11/2014] [Accepted: 03/19/2014] [Indexed: 01/26/2023]
Affiliation(s)
- Julien Pelé
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Matthieu Moreau
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Hervé Abdi
- The University of Texas at Dallas; School of Behavioral and Brain Sciences; Richardson, TX 75080-3021 USA
| | - Patrice Rodien
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
- Department of Endocrinology, Reference Centre for the pathologies of hormonal receptivity; Centre Hospitalier Universitaire of Angers; 4 rue Larrey 49933 Angers France
| | - Hélène Castel
- INSERM U982, Laboratory of Neuronal and Neuroendocrine Communication and Differentiation, DC2N; University of Rouen; 76821 Mont-Saint-Aignan France
| | - Marie Chabbert
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| |
Collapse
|
13
|
Mendoza JL, Schmidt A, Li Q, Nuvaga E, Barrett T, Bridges RJ, Feranchak AP, Brautigam CA, Thomas PJ. Requirements for efficient correction of ΔF508 CFTR revealed by analyses of evolved sequences. Cell 2012; 148:164-74. [PMID: 22265409 DOI: 10.1016/j.cell.2011.11.023] [Citation(s) in RCA: 214] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2011] [Revised: 10/20/2011] [Accepted: 11/03/2011] [Indexed: 12/14/2022]
Abstract
Misfolding of ΔF508 cystic fibrosis (CF) transmembrane conductance regulator (CFTR) underlies pathology in most CF patients. F508 resides in the first nucleotide-binding domain (NBD1) of CFTR near a predicted interface with the fourth intracellular loop (ICL4). Efforts to identify small molecules that restore function by correcting the folding defect have revealed an apparent efficacy ceiling. To understand the mechanistic basis of this obstacle, positions statistically coupled to 508, in evolved sequences, were identified and assessed for their impact on both NBD1 and CFTR folding. The results indicate that both NBD1 folding and interaction with ICL4 are altered by the ΔF508 mutation and that correction of either individual process is only partially effective. By contrast, combination of mutations that counteract both defects restores ΔF508 maturation and function to wild-type levels. These results provide a mechanistic rationale for the limited efficacy of extant corrector compounds and suggest approaches for identifying compounds that correct both defective steps.
Collapse
Affiliation(s)
- Juan L Mendoza
- Molecular Biophysics Program, and Department of Physiology, University of Texas Southwestern Medical Center, Dallas, TX 75390-9040, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Livesay DR, Kreth KE, Fodor AA. A critical evaluation of correlated mutation algorithms and coevolution within allosteric mechanisms. Methods Mol Biol 2012; 796:385-398. [PMID: 22052502 DOI: 10.1007/978-1-61779-334-9_21] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The notion of using the evolutionary history encoded within multiple sequence alignments to predict allosteric mechanisms is appealing. In this approach, correlated mutations are expected to reflect coordinated changes that maintain intramolecular coupling between residue pairs. Despite much early fanfare, the general suitability of correlated mutations to predict allosteric couplings has not yet been established. Lack of progress along these lines has been hindered by several algorithmic limitations including phylogenetic artifacts within alignments masking true covariance and the computational intractability of consideration of more than two correlated residues at a time. Recent progress in algorithm development, however, has been substantial with a new generation of correlated mutation algorithms that have made fundamental progress toward solving these difficult problems. Despite these encouraging results, there remains little evidence to suggest that the evolutionary constraints acting on allosteric couplings are sufficient to be recovered from multiple sequence alignments. In this review, we argue that due to the exquisite sensitivity of protein dynamics, and hence that of allosteric mechanisms, the latter vary widely within protein families. If it turns out to be generally true that even very similar homologs display a wide divergence of allosteric mechanisms, then even a perfect correlated mutation algorithm could not be reliably used as a general mechanism for discovery of allosteric pathways.
Collapse
Affiliation(s)
- Dennis R Livesay
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | | | | |
Collapse
|
15
|
Abstract
Contact map prediction is of great interest for its application in fold recognition and protein 3D structure determination. In this paper we present a contact-map prediction algorithm that employs Support Vector Machines as the machine learning tool and incorporates various features such as sequence profiles and their conservations, correlated mutation analysis based on various amino acid physicochemical properties, and secondary structure. In addition, we evaluated the effectiveness of the different features on contact map prediction for different fold classes. On average, our predictor achieved a prediction accuracy of 0.224 with an improvement over a random predictor of a factor 11.7, which is better than reported studies. Our study showed that predicted secondary structure features play an important roles for the proteins containing beta-structures. Models based on secondary structure features and correlated mutation analysis features produce different sets of predictions. Our study also suggests that models learned separately for different protein fold families may achieve better performance than a unified model.
Collapse
Affiliation(s)
- YING ZHAO
- Department of Computer Science, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | - GEORGE KARYPIS
- Department of Computer Science, University of Minnesota, Minneapolis, Minnesota 55455, USA
| |
Collapse
|
16
|
Jones DT, Buchan DWA, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. ACTA ACUST UNITED AC 2011; 28:184-90. [PMID: 22101153 DOI: 10.1093/bioinformatics/btr638] [Citation(s) in RCA: 525] [Impact Index Per Article: 40.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION The accurate prediction of residue-residue contacts, critical for maintaining the native fold of a protein, remains an open problem in the field of structural bioinformatics. Interest in this long-standing problem has increased recently with algorithmic improvements and the rapid growth in the sizes of sequence families. Progress could have major impacts in both structure and function prediction to name but two benefits. Sequence-based contact predictions are usually made by identifying correlated mutations within multiple sequence alignments (MSAs), most commonly through the information-theoretic approach of calculating mutual information between pairs of sites in proteins. These predictions are often inaccurate because the true covariation signal in the MSA is often masked by biases from many ancillary indirect-coupling or phylogenetic effects. Here we present a novel method, PSICOV, which introduces the use of sparse inverse covariance estimation to the problem of protein contact prediction. Our method builds on work which had previously demonstrated corrections for phylogenetic and entropic correlation noise and allows accurate discrimination of direct from indirectly coupled mutation correlations in the MSA. RESULTS PSICOV displays a mean precision substantially better than the best performing normalized mutual information approach and Bayesian networks. For 118 out of 150 targets, the L/5 (i.e. top-L/5 predictions for a protein of length L) precision for long-range contacts (sequence separation >23) was ≥ 0.5, which represents an improvement sufficient to be of significant benefit in protein structure prediction or model quality assessment. AVAILABILITY The PSICOV source code can be downloaded from http://bioinf.cs.ucl.ac.uk/downloads/PSICOV.
Collapse
Affiliation(s)
- David T Jones
- Department of Computer Science, Bioinformatics Group, Centre for Computational Statistics and Machine Learning, University College London, Malet Place, London WC1E 6BT, UK.
| | | | | | | |
Collapse
|
17
|
Casadio R, Vassura M, Tiwari S, Fariselli P, Luigi Martelli P. Correlating disease-related mutations to their effect on protein stability: a large-scale analysis of the human proteome. Hum Mutat 2011; 32:1161-70. [PMID: 21853506 DOI: 10.1002/humu.21555] [Citation(s) in RCA: 66] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2011] [Accepted: 06/03/2011] [Indexed: 11/08/2022]
Abstract
Single residue mutations in proteins are known to affect protein stability and function. As a consequence, they can be disease associated. Available computational methods starting from protein sequence/structure can predict whether a mutated residue is or not disease associated and whether it is promoting instability of the protein-folded structure. However, the relationship among stability changes in proteins and their involvement in human diseases still needs to be fully exploited. Here, we try to rationalize in a nutshell the complexity of the question by generalizing over information already stored in public databases. For each single aminoacid polymorphysm (SAP) type, we derive the probability of being disease-related (Pd) and compute from thermodynamic data three indexes indicating the probability of decreasing (P-), increasing (P+), and perturbing the protein structure stability (Pp). Statistically validated analysis of the different P/Pd correlations indicate that Pd best correlates with Pp. Pp/Pd correlation values are as high as 0.49, and increase up to 0.67 when data variability is taken into consideration. This is indicative of a medium/good correlation among Pd and Pp and corroborates the assumption that protein stability changes can also be disease associated at the proteome level.
Collapse
Affiliation(s)
- Rita Casadio
- Laboratory of Biocomputing, Giorgio Prodi Center/CIRB/Department of Biology, University of Bologna, Bologna, Italy.
| | | | | | | | | |
Collapse
|
18
|
Di Lena P, Fariselli P, Margara L, Vassura M, Casadio R. Is there an optimal substitution matrix for contact prediction with correlated mutations? IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1017-1028. [PMID: 20855922 DOI: 10.1109/tcbb.2010.91] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Correlated mutations in proteins are believed to occur in order to preserve the protein functional folding through evolution. Their values can be deduced from sequence and/or structural alignments and are indicative of residue contacts in the protein three-dimensional structure. A correlation among pairs of residues is routinely evaluated with the Pearson correlation coefficient and the MCLACHLAN similarity matrix. In literature, there is no justification for the adoption of the MCLACHLAN instead of other substitution matrices. In this paper, we approach the problem of computing the optimal similarity matrix for contact prediction with correlated mutations, i.e., the similarity matrix that maximizes the accuracy of contact prediction with correlated mutations. We describe an optimization procedure, based on the gradient descent method, for computing the optimal similarity matrix and perform an extensive number of experimental tests. Our tests show that there is a large number of optimal matrices that perform similarly to MCLACHLAN. We also obtain that the upper limit to the accuracy achievable in protein contact prediction is independent of the optimized similarity matrix. This suggests that the poor scoring of the correlated mutations approach may be due to the choice of the linear correlation function in evaluating correlated mutations.
Collapse
Affiliation(s)
- Pietro Di Lena
- Department of Computer Science, University of Bologna, Via Mura Anteo Zamboni 7, 40127 Bologna, Italy.
| | | | | | | | | |
Collapse
|
19
|
Jeon J, Nam HJ, Choi YS, Yang JS, Hwang J, Kim S. Molecular evolution of protein conformational changes revealed by a network of evolutionarily coupled residues. Mol Biol Evol 2011; 28:2675-85. [PMID: 21470969 DOI: 10.1093/molbev/msr094] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
An improved understanding of protein conformational changes has broad implications for elucidating the mechanisms of various biological processes and for the design of protein engineering experiments. Understanding rearrangements of residue interactions is a key component in the challenge of describing structural transitions. Evolutionary properties of protein sequences and structures are extensively studied; however, evolution of protein motions, especially with respect to interaction rearrangements, has yet to be explored. Here, we investigated the relationship between sequence evolution and protein conformational changes and discovered that structural transitions are encoded in amino acid sequences as coevolving residue pairs. Furthermore, we found that highly coevolving residues are clustered in the flexible regions of proteins and facilitate structural transitions by forming and disrupting their interactions cooperatively. Our results provide insight into the evolution of protein conformational changes and help to identify residues important for structural transitions.
Collapse
Affiliation(s)
- Jouhyun Jeon
- Division of Molecular and Life Science, Pohang University of Science and Technology, Pohang, Korea
| | | | | | | | | | | |
Collapse
|
20
|
Cline MS, Karchin R. Using bioinformatics to predict the functional impact of SNVs. Bioinformatics 2011; 27:441-8. [PMID: 21159622 PMCID: PMC3105482 DOI: 10.1093/bioinformatics/btq695] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Revised: 11/21/2010] [Accepted: 12/12/2010] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The past decade has seen the introduction of fast and relatively inexpensive methods to detect genetic variation across the genome and exponential growth in the number of known single nucleotide variants (SNVs). There is increasing interest in bioinformatics approaches to identify variants that are functionally important from millions of candidate variants. Here, we describe the essential components of bioinformatics tools that predict functional SNVs. RESULTS Bioinformatics tools have great potential to identify functional SNVs, but the black box nature of many tools can be a pitfall for researchers. Understanding the underlying methods, assumptions and biases of these tools is essential to their intelligent application.
Collapse
Affiliation(s)
- Melissa S Cline
- Department of Molecular Cell and Developmental Biology, University of California, Santa Cruz, CA, USA
| | | |
Collapse
|
21
|
Gershoni M, Fuchs A, Shani N, Fridman Y, Corral-Debrinski M, Aharoni A, Frishman D, Mishmar D. Coevolution predicts direct interactions between mtDNA-encoded and nDNA-encoded subunits of oxidative phosphorylation complex i. J Mol Biol 2010; 404:158-71. [PMID: 20868692 DOI: 10.1016/j.jmb.2010.09.029] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2009] [Revised: 09/05/2010] [Accepted: 09/13/2010] [Indexed: 10/19/2022]
Abstract
Despite years of research, the structure of the largest mammalian oxidative phosphorylation (OXPHOS) complex, NADH-ubiquinone oxidoreductase (complex I), and the interactions among its 45 subunits are not fully understood. Since complex I harbors subunits encoded by mitochondrial DNA (mtDNA) and nuclear DNA (nDNA) genomes, with the former evolving ∼10 times faster than the latter, tight cytonuclear coevolution is expected and observed. Recently, we identified three nDNA-encoded complex I subunits that underwent accelerated amino acid replacement, suggesting their adjustment to the elevated mtDNA rate of change. Hence, they constitute excellent candidates for binding mtDNA-encoded subunits. Here, we further disentangle the network of physical cytonuclear interactions within complex I by analyzing subunits coevolution. Firstly, relying on the bioinformatic analysis of 10 protein complexes possessing solved structures, we show that signals of coevolution identified physically interacting subunits with nearly 90% accuracy, thus lending support to our approach. When applying this approach to cytonuclear interaction within complex I, we predict that the 'rate-accelerated' nDNA-encoded subunits of complex I, NDUFC2 and NDUFA1, likely interact with the mtDNA-encoded subunits ND5/ND4 and ND5/ND4/ND1, respectively. Furthermore, we predicted interactions among mtDNA-encoded complex I subunits. Using the yeast two-hybrid system, we experimentally confirmed the predicted interactions of human NDUFC2 with ND4, the interactions of human NDUFA1 with ND1 and ND4, and the lack of interaction of NDUFC2 with ND3 and NDUFA1, thus providing a proof of concept for our approach. Our study shows, for the first time, evidence for direct interactions between nDNA-encoded and mtDNA-encoded subunits of human OXPHOS complex I and paves the path towards deciphering subunit interactions within complexes lacking three-dimensional structures. Our subunit-interactions-predicting method, ComplexCorr, is available at http://webclu.bio.wzw.tum.de/complexcorr.
Collapse
Affiliation(s)
- Moran Gershoni
- Department of Life Sciences and the Nation Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
| | | | | | | | | | | | | | | |
Collapse
|
22
|
Zurawski G, Bottomley W, Whitfeld PR. Structures of the genes for the beta and epsilon subunits of spinach chloroplast ATPase indicate a dicistronic mRNA and an overlapping translation stop/start signal. Proc Natl Acad Sci U S A 2010; 79:6260-4. [PMID: 16593238 PMCID: PMC347100 DOI: 10.1073/pnas.79.20.6260] [Citation(s) in RCA: 174] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A 2,4-kilobase-pair region of spinach chloroplast DNA adjacent to the gene for the large subunit of ribulosebisphosphate carboxylase has been analyzed by RNA hybridization, in vitro transcription/translation, and DNA sequence determination. The analysis indicates that this region carries the genes for the beta and epsilon subunits of chloroplast ATPase and that the two genes are cotranscribed into a dicistronic mRNA with 4-base-pair overlap between the stop codon of the beta-subunit gene and the start codon of the epsilon-subunit gene. The ATPase and carboxylase genes are transcribed divergently with respect to each other. The deduced amino acid sequences of the beta and epsilon subunits from spinach show 67% and 26% homology, respectively, with the published sequences of the beta and epsilon subunits of Escherichia coli ATPase.
Collapse
Affiliation(s)
- G Zurawski
- Division of Plant Industry, Commonwealth Scientific and Industrial Research Organisation, Post Office Box 1600, Canberra City, A.C.T. 2601, Australia
| | | | | |
Collapse
|
23
|
Mok J, Kim PM, Lam HYK, Piccirillo S, Zhou X, Jeschke GR, Sheridan DL, Parker SA, Desai V, Jwa M, Cameroni E, Niu H, Good M, Remenyi A, Ma JLN, Sheu YJ, Sassi HE, Sopko R, Chan CSM, De Virgilio C, Hollingsworth NM, Lim WA, Stern DF, Stillman B, Andrews BJ, Gerstein MB, Snyder M, Turk BE. Deciphering protein kinase specificity through large-scale analysis of yeast phosphorylation site motifs. Sci Signal 2010; 3:ra12. [PMID: 20159853 DOI: 10.1126/scisignal.2000482] [Citation(s) in RCA: 274] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Phosphorylation is a universal mechanism for regulating cell behavior in eukaryotes. Although protein kinases target short linear sequence motifs on their substrates, the rules for kinase substrate recognition are not completely understood. We used a rapid peptide screening approach to determine consensus phosphorylation site motifs targeted by 61 of the 122 kinases in Saccharomyces cerevisiae. By correlating these motifs with kinase primary sequence, we uncovered previously unappreciated rules for determining specificity within the kinase family, including a residue determining P-3 arginine specificity among members of the CMGC [CDK (cyclin-dependent kinase), MAPK (mitogen-activated protein kinase), GSK (glycogen synthase kinase), and CDK-like] group of kinases. Furthermore, computational scanning of the yeast proteome enabled the prediction of thousands of new kinase-substrate relationships. We experimentally verified several candidate substrates of the Prk1 family of kinases in vitro and in vivo and identified a protein substrate of the kinase Vhs1. Together, these results elucidate how kinase catalytic domains recognize their phosphorylation targets and suggest general avenues for the identification of previously unknown kinase substrates across eukaryotes.
Collapse
Affiliation(s)
- Janine Mok
- Department of Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT 06520, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Ashkenazy H, Kliger Y. Reducing phylogenetic bias in correlated mutation analysis. Protein Eng Des Sel 2010; 23:321-6. [PMID: 20067922 DOI: 10.1093/protein/gzp078] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Correlated mutation analysis (CMA) is a sequence-based approach for ab initio protein contact map prediction. The basis of this approach is the observed correlation between mutations in interacting amino acid residues. These correlations are often estimated by either calculating the Pearson's correlation coefficient (PCC) or the mutual information (MI) between columns in a multiple sequence alignment (MSA) of the protein of interest and its homologs. A major challenge of CMA is to filter out the background noise originating from phylogenetic relatedness between sequences included in the MSA. Recently, a procedure to reduce this background noise was demonstrated to improve an MI-based predictor. Herein, we tested whether a similar approach can also improve the performance of the classical PCC-based method. Indeed, performance improvements were achieved for all four major SCOP classes. Furthermore, the results reveal that the improved PCC-based method is superior to MI-based methods for proteins having MSAs of up to 100 sequences.
Collapse
|
25
|
Pokarowski P, Kloczkowski A, Nowakowski S, Pokarowska M, Jernigan RL, Kolinski A. Ideal amino acid exchange forms for approximating substitution matrices. Proteins 2009; 69:379-93. [PMID: 17623859 DOI: 10.1002/prot.21509] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We have analyzed 29 published substitution matrices (SMs) and five statistical protein contact potentials (CPs) for comparison. We find that popular, 'classical' SMs obtained mainly from sequence alignments of globular proteins are mostly correlated by at least a value of 0.9. The BLOSUM62 is the central element of this group. A second group includes SMs derived from alignments of remote homologs or transmembrane proteins. These matrices correlate better with classical SMs (0.8) than among themselves (0.7). A third group consists of intermediate links between SMs and CPs - matrices and potentials that exhibit mutual correlations of at least 0.8. Next, we show that SMs can be approximated with a correlation of 0.9 by expressions c(0) + x(i)x(j) + y(i)y(j) + z(i)z(j), 1<or= i, j <or= 20, where c(0) is a constant and the vectors (x(i)), (y(i)), (z(i)) correlate highly with hydrophobicity, molecular volume and coil preferences of amino acids, respectively. The present paper is the continuation of our work (Pokarowski et al., Proteins 2005;59:49-57), where similar approximation were used to derive ideal amino acid interaction forms from CPs. Both approximations allow us to understand general trends in amino acid similarity and can help improve multiple sequence alignments using the fast Fourier transform (MAFFT), fast threading or another methods based on alignments of physicochemical profiles of protein sequences. The use of this approximation in sequence alignments instead of a classical SM yields results that differ by less than 5%. Intermediate links between SMs and CPs, new formulas for approximating these matrices, and the highly significant dependence of classical SMs on coil preferences are new findings.
Collapse
Affiliation(s)
- Piotr Pokarowski
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, Warsaw University, 02-097 Warsaw, Poland.
| | | | | | | | | | | |
Collapse
|
26
|
Xu F, Du P, Shen H, Hu H, Wu Q, Xie J, Yu L. Correlated mutation analysis on the catalytic domains of serine/threonine protein kinases. PLoS One 2009; 4:e5913. [PMID: 19526051 PMCID: PMC2690836 DOI: 10.1371/journal.pone.0005913] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2009] [Accepted: 05/11/2009] [Indexed: 01/15/2023] Open
Abstract
Background Protein kinases (PKs) have emerged as the largest family of signaling proteins in eukaryotic cells and are involved in every aspect of cellular regulation. Great progresses have been made in understanding the mechanisms of PKs phosphorylating their substrates, but the detailed mechanisms, by which PKs ensure their substrate specificity with their structurally conserved catalytic domains, still have not been adequately understood. Correlated mutation analysis based on large sets of diverse sequence data may provide new insights into this question. Methodology/Principal Findings Statistical coupling, residue correlation and mutual information analyses along with clustering were applied to analyze the structure-based multiple sequence alignment of the catalytic domains of the Ser/Thr PK family. Two clusters of highly coupled sites were identified. Mapping these positions onto the 3D structure of PK catalytic domain showed that these two groups of positions form two physically close networks. We named these two networks as θ-shaped and γ-shaped networks, respectively. Conclusions/Significance The θ-shaped network links the active site cleft and the substrate binding regions, and might participate in PKs recognizing and interacting with their substrates. The γ-shaped network is mainly situated in one side of substrate binding regions, linking the activation loop and the substrate binding regions. It might play a role in supporting the activation loop and substrate binding regions before catalysis, and participate in product releasing after phosphoryl transfer. Our results exhibit significant correlations with experimental observations, and can be used as a guide to further experimental and theoretical studies on the mechanisms of PKs interacting with their substrates.
Collapse
Affiliation(s)
- Feng Xu
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
- * E-mail: (FX); (LY)
| | - Pan Du
- Biomedical Informatics Center, Northwestern University, Chicago, Illinois, United States of America
| | - Hongbo Shen
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Hairong Hu
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Qi Wu
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Jun Xie
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Long Yu
- Institute of Biomedical Sciences, Fudan University, Shanghai, China
- * E-mail: (FX); (LY)
| |
Collapse
|
27
|
Samsonov SA, Teyra J, Anders G, Pisabarro MT. Analysis of the impact of solvent on contacts prediction in proteins. BMC STRUCTURAL BIOLOGY 2009; 9:22. [PMID: 19368710 PMCID: PMC2676287 DOI: 10.1186/1472-6807-9-22] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Accepted: 04/15/2009] [Indexed: 11/10/2022]
Abstract
Background The correlated mutations concept is based on the assumption that interacting protein residues coevolve, so that a mutation in one of the interacting counterparts is compensated by a mutation in the other. Approaches based on this concept have been widely used for protein contacts prediction since the 90s. Previously, we have shown that water-mediated interactions play an important role in protein interfaces. We have observed that current "dry" correlated mutations approaches might not properly predict certain interactions in protein interfaces due to the fact that they are water-mediated. Results The goal of this study has been to analyze the impact of including solvent into the concept of correlated mutations. For this purpose we use linear combinations of the predictions obtained by the application of two different similarity matrices: a standard "dry" similarity matrix (DRY) and a "wet" similarity matrix (WET) derived from all water-mediated protein interfacial interactions in the PDB. We analyze two datasets containing 50 domains and 10 domain pairs from PFAM and compare the results obtained by using a combination of both matrices. We find that for both intra- and interdomain contacts predictions the introduction of a combination of a "wet" and a "dry" similarity matrix improves the predictions in comparison to the "dry" one alone. Conclusion Our analysis, despite the complexity of its possible general applicability, opens up that the consideration of water may have an impact on the improvement of the contact predictions obtained by correlated mutations approaches.
Collapse
|
28
|
Xu D. Computational methods for protein sequence comparison and search. CURRENT PROTOCOLS IN PROTEIN SCIENCE 2009; Chapter 2:2.1.1-2.1.27. [PMID: 19365790 DOI: 10.1002/0471140864.ps0201s56] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Protein sequence comparison and search has become commonplace not only for bioinformatics researchers but also for experimentalists in many cases. Because of the exponential growth in sequence data, sequence comparison in particular has become an increasingly important tool. Relating a new gene sequence to other known sequences often reveals its function, structure, and evolution. Many sequence comparison and search tools are available through public Web servers, and biologists can use them easily with little knowledge of computers or bioinformatics. This unit provides some theoretical background and describes popular tools for dot plot, sequence search against a database, multiple sequence alignments, protein tree construction, and protein family and motif search. Step-by-step examples are provided to illustrate how to use some of the most well-known tools. Finally, some general advice is given on combining different sequence analysis tools for biological inference.
Collapse
Affiliation(s)
- Dong Xu
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri-Columbia, Columbia, Missouri
| |
Collapse
|
29
|
Fuchs A, Kirschner A, Frishman D. Prediction of helix-helix contacts and interacting helices in polytopic membrane proteins using neural networks. Proteins 2009; 74:857-71. [PMID: 18704938 DOI: 10.1002/prot.22194] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Despite rapidly increasing numbers of available 3D structures, membrane proteins still account for less than 1% of all structures in the Protein Data Bank. Recent high-resolution structures indicate a clearly broader structural diversity of membrane proteins than initially anticipated, motivating the development of reliable structure prediction methods specifically tailored for this class of molecules. One important prediction target capturing all major aspects of a protein's 3D structure is its contact map. Our analysis shows that computational methods trained to predict residue contacts in globular proteins perform poorly when applied to membrane proteins. We have recently published a method to identify interacting alpha-helices in membrane proteins based on the analysis of coevolving residues in predicted transmembrane regions. Here, we present a substantially improved algorithm for the same problem, which uses a newly developed neural network approach to predict helix-helix contacts. In addition to the input features commonly used for contact prediction of soluble proteins, such as windowed residue profiles and residue distance in the sequence, our network also incorporates features that apply to membrane proteins only, such as residue position within the transmembrane segment and its orientation toward the lipophilic environment. The obtained neural network can predict contacts between residues in transmembrane segments with nearly 26% accuracy. It is therefore the first published contact predictor developed specifically for membrane proteins performing with equal accuracy to state-of-the-art contact predictors available for soluble proteins. The predicted helix-helix contacts were employed in a second step to identify interacting helices. For our dataset consisting of 62 membrane proteins of solved structure, we gained an accuracy of 78.1%. Because the reliable prediction of helix interaction patterns is an important step in the classification and prediction of membrane protein folds, our method will be a helpful tool in compiling a structural census of membrane proteins.
Collapse
Affiliation(s)
- Angelika Fuchs
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85354 Freising, Germany
| | | | | |
Collapse
|
30
|
Ashkenazy H, Unger R, Kliger Y. Optimal data collection for correlated mutation analysis. Proteins 2009; 74:545-55. [PMID: 18655065 DOI: 10.1002/prot.22168] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The main objective of correlated mutation analysis (CMA) is to predict intraprotein residue-residue interactions from sequence alone. Despite considerable progress in algorithms and computer capabilities, the performance of CMA methods remains quite low. Here we examine whether, and to what extent, the quality of CMA methods depends on the sequences that are included in the multiple sequence alignment (MSA). The results revealed a strong correlation between the number of homologs in an MSA and CMA prediction strength. Furthermore, many of the current methods include only orthologs in the MSA, we found that it is beneficial to include both orthologs and paralogs in the MSA. Remarkably, even remote homologs contribute to the improved accuracy. Based on our findings we put forward an automated data collection procedure, with a minimal coverage of 50% between the query protein and its orthologs and paralogs. This procedure improves accuracy even in the absence of manual curation. In this era of massive sequencing and exploding sequence data, our results suggest that correlated mutation-based methods have not reached their inherent performance limitations and that the role of CMA in structural biology is far from being fulfilled.
Collapse
|
31
|
Greller LD, Erhan S. Short length amino acid sequence homology among ancestrally unrelated proteins. INTERNATIONAL JOURNAL OF PEPTIDE AND PROTEIN RESEARCH 2009; 6:165-73. [PMID: 4370369 DOI: 10.1111/j.1399-3011.1974.tb02375.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
32
|
Erhan S, Greller LD. Presence of repeating sub-sequences and symmetry patterns in proteins. INTERNATIONAL JOURNAL OF PEPTIDE AND PROTEIN RESEARCH 2009; 6:175-81. [PMID: 4370278 DOI: 10.1111/j.1399-3011.1974.tb02376.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
33
|
Alexandrov K, Sobolev B, Filimonov D, Poroikov V. Recognition of protein function using the local similarity. J Bioinform Comput Biol 2008; 6:709-25. [PMID: 18763738 DOI: 10.1142/s021972000800359x] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2007] [Revised: 12/14/2007] [Accepted: 01/19/2008] [Indexed: 11/18/2022]
Abstract
The functional annotation of amino acid sequences is one of the most important problems in bioinformatics. Different programs have been successfully applied for recognition of some functional classes; nevertheless, many functional groups still cannot be predicted with the required accuracy. We developed a new method for protein function recognition using the original approach of sequence description. Each sequence of the training set is compared with the query sequence, and the local similarity scores are calculated for the query sequence positions and used as input data for the original classifier. The method was tested using leave-one-out cross-validation for three data sets covering 58 enzyme classes. Two tested sets including noncrossing functional classes were recognized with high accuracy at various levels of classification hierarchy. The majority of these classes were predicted with 100% accuracy, showing a prediction ability comparable with the HMMer method and an accuracy superior to the SVM-Prot program. When the tested set was composed of intersected classes of ligand specificity, the prediction accuracy was less; however, the accuracy increased as the size of the predicted class expanded. The proposed method can be used for both predicting protein functional class and selecting the functionally significant sites in a sequence.
Collapse
Affiliation(s)
- Kirill Alexandrov
- Laboratory for Structure-Function Based Drug Design, Institute of Biomedical Chemistry, Russian Academy of Medical Sciences, Pogodinskaya Str. 10, Moscow 119121, Russia.
| | | | | | | |
Collapse
|
34
|
Mukhopadhyay P, Basak S, Ghosh TC. Differential selective constraints shaping codon usage pattern of housekeeping and tissue-specific homologous genes of rice and arabidopsis. DNA Res 2008; 15:347-56. [PMID: 18827062 PMCID: PMC2608846 DOI: 10.1093/dnares/dsn023] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Intra-genomic variation between housekeeping and tissue-specific genes has always been a study of interest in higher eukaryotes. To-date, however, no such investigation has been done in plants. Availability of whole genome expression data for both rice and Arabidopsis has made it possible to examine the evolutionary forces in shaping codon usage pattern in both housekeeping and tissue-specific genes in plants. In the present work, we have taken 4065 rice-Arabidopsis homologous gene pairs to study evolutionary forces responsible for codon usage divergence between housekeeping and tissue-specific genes. In both rice and Arabidopsis, it is mutational bias that regulates error minimization in highly expressed genes of both housekeeping and tissue-specific genes. Our results show that, in comparison to tissue-specific genes, housekeeping genes are under strong selective constraint in plants. However, in tissue-specific genes, lowly expressed genes are under stronger selective constraint compared with highly expressed genes. We demonstrated that constraint acting on mRNA secondary structure is responsible for modulating codon usage variations in rice tissue-specific genes. Thus, different evolutionary forces must underline the evolution of synonymous codon usage of highly expressed genes of housekeeping and tissue-specific genes in rice and Arabidopsis.
Collapse
Affiliation(s)
- Pamela Mukhopadhyay
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | | | | |
Collapse
|
35
|
Banerjee N, Sarani R, Ranjani CV, Sowmiya G, Michael D, Balakrishnan N, Sekar K. Algorithm to find distant repeats in a single protein sequence. Bioinformation 2008; 3:28-32. [PMID: 19052663 PMCID: PMC2586129 DOI: 10.6026/97320630003028] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2008] [Accepted: 07/24/2008] [Indexed: 11/23/2022] Open
Abstract
Distant repeats in protein sequence play an important role in various aspects of protein analysis. A keen analysis of the distant repeats would enable to establish a firm relation of the repeats with respect to their function and three-dimensional structure during the evolutionary process. Further, it enlightens the diversity of duplication during the evolution. To this end, an algorithm has been developed to find all distant repeats in a protein sequence. The scores from Point Accepted Mutation (PAM) matrix has been deployed for the identification of amino acid substitutions while detecting the distant repeats. Due to the biological importance of distant repeats, the proposed algorithm will be of importance to structural biologists, molecular biologists, biochemists and researchers involved in phylogenetic and evolutionary studies.
Collapse
Affiliation(s)
- Nirjhar Banerjee
- Bioinformatics Centre, Centre of Excellence in Structural Biology and Bio-computing
| | - Rangarajan Sarani
- Bioinformatics Centre, Centre of Excellence in Structural Biology and Bio-computing
| | | | - Govindaraj Sowmiya
- Bioinformatics Centre, Centre of Excellence in Structural Biology and Bio-computing
| | - Daliah Michael
- Bioinformatics Centre, Centre of Excellence in Structural Biology and Bio-computing
| | | | - Kanagaraj Sekar
- Bioinformatics Centre, Centre of Excellence in Structural Biology and Bio-computing
- Supercomputer Education and Research Centre, Indian Institute of Science, Bangalore 560 012, India
| |
Collapse
|
36
|
The long coming of computational structural biology. J Struct Biol 2008; 163:254-7. [DOI: 10.1016/j.jsb.2008.02.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2008] [Revised: 02/25/2008] [Accepted: 02/26/2008] [Indexed: 11/20/2022]
|
37
|
Michaels G, Garian R. Computational methods for protein sequence analysis. ACTA ACUST UNITED AC 2008; Chapter 2:Unit2.1. [PMID: 18429149 DOI: 10.1002/0471140864.ps0201s00] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
This unit is presented as a guide to addressing the issue of what to do with a protein sequence once it is obtained. A theoretical background for protein sequence analysis is provided first, followed by a discussion of matrix methods for sequence comparison (Matrix Methods for Sequence Comparison: Dot Plots). Sequence similarity searching is then presented, including the BLAST and FASTA databases. Other aspects of protein sequence analysis covered here are alignment methods, scoring matrices, multiple alignments, cluster methods and trees, and identification of functional sites.
Collapse
Affiliation(s)
- G Michaels
- George Mason University, Fairfax, Virginia, USA
| | | |
Collapse
|
38
|
|
39
|
Fuchs A, Martin-Galiano AJ, Kalman M, Fleishman S, Ben-Tal N, Frishman D. Co-evolving residues in membrane proteins. Bioinformatics 2007; 23:3312-9. [DOI: 10.1093/bioinformatics/btm515] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
40
|
Yip KY, Patel P, Kim PM, Engelman DM, McDermott D, Gerstein M. An integrated system for studying residue coevolution in proteins. Bioinformatics 2007; 24:290-2. [PMID: 18056067 DOI: 10.1093/bioinformatics/btm584] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Residue coevolution has recently emerged as an important concept, especially in the context of protein structures. While a multitude of different functions for quantifying it have been proposed, not much is known about their relative strengths and weaknesses. Also, subtle algorithmic details have discouraged implementing and comparing them. We addressed this issue by developing an integrated online system that enables comparative analyses with a comprehensive set of commonly used scoring functions, including Statistical Coupling Analysis (SCA), Explicit Likelihood of Subset Variation (ELSC), mutual information and correlation-based methods. A set of data preprocessing options are provided for improving the sensitivity and specificity of coevolution signal detection, including sequence weighting, residue grouping and the filtering of sequences, sites and site pairs. A total of more than 100 scoring variations are available. The system also provides facilities for studying the relationship between coevolution scores and inter-residue distances from a crystal structure if provided, which may help in understanding protein structures. AVAILABILITY The system is available at http://coevolution.gersteinlab.org. The source code and JavaDoc API can also be downloaded from the web site.
Collapse
Affiliation(s)
- Kevin Y Yip
- Department of Computer Science, Yale University, 51 Prospect Street, New Haven, CT 06511, USA
| | | | | | | | | | | |
Collapse
|
41
|
On the origin of synonymous codon usage divergence between thermophilic and mesophilic prokaryotes. FEBS Lett 2007; 581:5825-30. [DOI: 10.1016/j.febslet.2007.11.054] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Revised: 11/14/2007] [Accepted: 11/16/2007] [Indexed: 01/24/2023]
|
42
|
Gouveia-Oliveira R, Pedersen AG. Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation. Algorithms Mol Biol 2007; 2:12. [PMID: 17915013 PMCID: PMC2234412 DOI: 10.1186/1748-7188-2-12] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2007] [Accepted: 10/03/2007] [Indexed: 11/10/2022] Open
Abstract
Background Some amino acid residues functionally interact with each other. This interaction will result in an evolutionary co-variation between these residues – coevolution. Our goal is to find these coevolving residues. Results We present six new methods for detecting coevolving residues. Among other things, we suggest measures that are variants of Mutual Information, and measures that use a multidimensional representation of each residue in order to capture the physico-chemical similarities between amino acids. We created a benchmarking system, in silico, able to evaluate these methods through a wide range of realistic conditions. Finally, we use the combination of different methods as a way of improving performance. Conclusion Our best method (Row and Column Weighed Mutual Information) has an estimated accuracy increase of 63% over Mutual Information. Furthermore, we show that the combination of different methods is efficient, and that the methods are quite sensitive to the different conditions tested.
Collapse
Affiliation(s)
- Rodrigo Gouveia-Oliveira
- Center for Biological sequence analysis, The Technical University of Denmark, Building 208, 2800 Lyngby, Denmark
| | - Anders G Pedersen
- Center for Biological sequence analysis, The Technical University of Denmark, Building 208, 2800 Lyngby, Denmark
| |
Collapse
|
43
|
Eyal E, Frenkel-Morgenstern M, Sobolev V, Pietrokovski S. A pair-to-pair amino acids substitution matrix and its applications for protein structure prediction. Proteins 2007; 67:142-53. [PMID: 17243158 DOI: 10.1002/prot.21223] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is constructed from a very large amount of integrated high quality multiple sequence alignments (Blocks) and protein structures. It evaluates the likelihoods of all 160,000 pair-to-pair substitutions. P2PMAT matrix implicitly accounts for evolutionary conservation, correlated mutations, and residue-residue contact potentials. The usefulness of the matrix for structural predictions is shown in this article. Predicting protein residue-residue contacts from sequence information alone, by our method (P2PConPred) is particularly accurate in the protein cores, where it performs better than other basic contact prediction methods (increasing accuracy by 25-60%). The method mean accuracy for protein cores is 24% for 59 diverse families and 34% for a subset of proteins shorter than 100 residues. This is above the level that was recently shown to be sufficient to significantly improve ab initio protein structure prediction. We also demonstrate the ability of our approach to identify native structures within large sets of (300-2000) protein decoys. On the basis of evolutionary information alone our method ranks the native structure in the top 0.3% of the decoys in 4/10 of the sets, and in 8/10 of sets the native structure is ranked in the top 10% of the decoys. The method can, thus, be used to assist filtering wrong models, complementing traditional scoring functions.
Collapse
Affiliation(s)
- Eran Eyal
- Department of Plant Sciences, Weizmann Institute of Science, Rehovot 76100, Israel.
| | | | | | | |
Collapse
|
44
|
Fabris F, Sgarro A, Tossi A. Splitting the BLOSUM score into numbers of biological significance. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2007; 2007:31450. [PMID: 18369412 PMCID: PMC3171334 DOI: 10.1155/2007/31450] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2006] [Accepted: 03/30/2007] [Indexed: 03/31/2024]
Abstract
Mathematical tools developed in the context of Shannon information theory were used to analyze the meaning of the BLOSUM score, which was split into three components termed as the BLOSUM spectrum (or BLOSpectrum). These relate respectively to the sequence convergence (the stochastic similarity of the two protein sequences), to the background frequency divergence (typicality of the amino acid probability distribution in each sequence), and to the target frequency divergence (compliance of the amino acid variations between the two sequences to the protein model implicit in the BLOCKS database). This treatment sharpens the protein sequence comparison, providing a rationale for the biological significance of the obtained score, and helps to identify weakly related sequences. Moreover, the BLOSpectrum can guide the choice of the most appropriate scoring matrix, tailoring it to the evolutionary divergence associated with the two sequences, or indicate if a compositionally adjusted matrix could perform better.
Collapse
Affiliation(s)
- Francesco Fabris
- Dipartimento di Matematica e Informatica, Università degli Studi di Trieste, via Valerio 12b, Trieste 34127, Italy
- Centro di Biomedicina Molecolare, AREA Science Park, Strada Statale 14, Basovizza, Trieste 34012, Italy
| | - Andrea Sgarro
- Dipartimento di Matematica e Informatica, Università degli Studi di Trieste, via Valerio 12b, Trieste 34127, Italy
- Centro di Biomedicina Molecolare, AREA Science Park, Strada Statale 14, Basovizza, Trieste 34012, Italy
| | - Alessandro Tossi
- Dipartimento di Biochimica, Biofisica, e Chimica delle Macromolecole, Università degli Studi di Trieste, via Licio Giorgieri 1, Trieste 34127, Italy
| |
Collapse
|
45
|
Nishimoto Y, Takasaka T, Hasegawa M, Zheng HY, Chen Q, Sugimoto C, Kitamura T, Yogo Y. Evolution of BK virus based on complete genome data. J Mol Evol 2006; 63:341-52. [PMID: 16897259 DOI: 10.1007/s00239-005-0092-5] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2005] [Accepted: 03/29/2006] [Indexed: 02/02/2023]
Abstract
The human polyomavirus BK virus (BKV) is ubiquitous in humans, infecting children asymptomatically. BKV is the only primate polyomavirus that has subtypes (I-IV) distinguishable by immunological reactivity. Nucleotide (nt) variations in a major capsid protein (VP1) gene region (designated the epitope region), probably responsible for antigenic diversity, have been used to classify BKV isolates into subtypes. Here, with all the protein-encoding gene sequences, we attempted to elucidate the evolutionary relationships among 28 BKV isolates belonging to subtypes I, III, and IV (no isolate belonging to subtype II, a minor one, was included). First, using the GTR + Gamma + I model, maximum likelihood trees were reconstructed for individual viral genes as well as for concatenated viral genes. On the resultant trees, the 28 BKV isolates were consistently divided into three clades corresponding to subtypes I, III, and IV, although bootstrap probabilities are not always high. Then we used more sophisticated likelihood models, one of which takes account of codon structure, to elucidate the phylogenetic relationships among BKV subtypes, but the phylogeny of the deep branchings remained ambiguous. Furthermore, the possibility of positive selection in the evolution of BKV was examined using the nonsynonymous/synonymous rate ratio as a measure of selection. An analysis based on entire genes could not detect any strong evidence for positive selection, but that based on the epitope region identified a few sites potentially under positive selection (these sites were among those showing subtype linked polymorphisms).
Collapse
Affiliation(s)
- Yuriko Nishimoto
- The Institute of Statistical Mathematics, Research Organization of Information and Systems, Minato-ku, Tokyo, 106-8569, Japan
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Conant GC, Wagner GP, Stadler PF. Modeling amino acid substitution patterns in orthologous and paralogous genes. Mol Phylogenet Evol 2006; 42:298-307. [PMID: 16942891 DOI: 10.1016/j.ympev.2006.07.006] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Revised: 06/12/2006] [Accepted: 07/06/2006] [Indexed: 11/29/2022]
Abstract
We study to what degree patterns of amino acid substitution vary between genes using two models of protein-coding gene evolution. The first divides the amino acids into groups, with one substitution rate for pairs of residues in the same group and a second for those in differing groups. Unlike previous applications of this model, the groups themselves are estimated from data by simulated annealing. The second model makes substitution rates a function of the physical and chemical similarity between two residues. Because we model the evolution of coding DNA sequences as opposed to protein sequences, artifacts arising from the differing numbers of nucleotide substitutions required to bring about various amino acid substitutions are avoided. Using 10 alignments of related sequences (five of orthologous genes and five gene families), we do find differences in substitution patterns. We also find that, although patterns of amino acid substitution vary temporally within the history of a gene, variation is not greater in paralogous than in orthologous genes. Improved understanding of such gene-specific variation in substitution patterns may have implications for applications such as sequence alignment and phylogenetic inference.
Collapse
Affiliation(s)
- Gavin C Conant
- Smurfit Institute of Genetics, Trinity College, University of Dublin, Dublin 2, Ireland.
| | | | | |
Collapse
|
47
|
Abstract
Synonymous codons are neutral at the protein level, therefore natural selection at the protein level should have no effect on their frequencies. Synonymous codons, however, differ in their capacity to reduce the effects of errors: after mutation, certain codons keep on coding for the same amino acid or for amino acids with similar properties, while other synonymous codons produce very different amino acids. Therefore, the impact of errors on a coding sequence (genetic robustness) can be measured by analysing its codon usage. I analyse the codon usage of sequenced nuclear and cytoplasmic genomes and I show that there is an extensive variation in genetic robustness at the DNA sequence level, both among genomes and among genes of the same genome. I also show theoretically that robustness can be adaptive, that is natural selection may lead to a preference for codons that reduce the impact of errors. If selection occurs only among the mutants of a codon (e.g. among the progeny before the adult phase), however, the codons that are more sensitive to the effects of mutations may increase in frequency because they manage to get rid more easily of deleterious mutations. I also suggest other possible explanations for the evolution of genetic robustness at the codon level.
Collapse
Affiliation(s)
- M Archetti
- Department of Zoology, Oxford University, Oxford, UK.
| |
Collapse
|
48
|
Lise S, Walker-Taylor A, Jones DT. Docking protein domains in contact space. BMC Bioinformatics 2006; 7:310. [PMID: 16790041 PMCID: PMC1559650 DOI: 10.1186/1471-2105-7-310] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2006] [Accepted: 06/21/2006] [Indexed: 11/10/2022] Open
Abstract
Background Many biological processes involve the physical interaction between protein domains. Understanding these functional associations requires knowledge of the molecular structure. Experimental investigations though present considerable difficulties and there is therefore a need for accurate and reliable computational methods. In this paper we present a novel method that seeks to dock protein domains using a contact map representation. Rather than providing a full three dimensional model of the complex, the method predicts contacting residues across the interface. We use a scoring function that combines structural, physicochemical and evolutionary information, where each potential residue contact is assigned a value according to the scoring function and the hypothesis is that the real configuration of contacts is the one that maximizes the score. The search is performed with a simulated annealing algorithm directly in contact space. Results We have tested the method on interacting domain pairs that are part of the same protein (intra-molecular domains). We show that it correctly predicts some contacts and that predicted residues tend to be significantly closer to each other than other pairs of residues in the same domains. Moreover we find that predicted contacts can often discriminate the best model (or the native structure, if present) among a set of optimal solutions generated by a standard docking procedure. Conclusion Contact docking appears feasible and able to complement other computational methods for the prediction of protein-protein interactions. With respect to more standard docking algorithms it might be more suitable to handle protein conformational changes and to predict complexes starting from protein models.
Collapse
Affiliation(s)
- Stefano Lise
- Department of Biochemistry and Molecular Biology, University College London, UK
| | | | - David T Jones
- Department of Biochemistry and Molecular Biology, University College London, UK
- Department of Computer Science, University College London, UK
| |
Collapse
|
49
|
Halperin I, Wolfson H, Nussinov R. Correlated mutations: advances and limitations. A study on fusion proteins and on the Cohesin-Dockerin families. Proteins 2006; 63:832-45. [PMID: 16508975 DOI: 10.1002/prot.20933] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Correlated mutations have been repeatedly exploited for intramolecular contact map prediction. Over the last decade these efforts yielded several methods for measuring correlated mutations. Nevertheless, the application of correlated mutations for the prediction of intermolecular interactions has not yet been explored. This gap is due to several obstacles, such as 3D complexes availability, paralog discrimination, and the availability of sequence pairs that are required for inter- but not intramolecular analyses. Here we selected for analysis fusion protein families that bypass some of these obstacles. We find that several correlated mutation measurements yield reasonable accuracy for intramolecular contact map prediction on the fusion dataset. However, the accuracy level drops sharply in intermolecular contacts prediction. This drop in accuracy does not occur always. In the Cohesin-Dockerin family, reasonable accuracy is achieved in the prediction of both intra- and intermolecular contacts. The Cohesin-Dockerin family is well suited for correlated mutation analysis. Because, however, this family constitutes a special case (it has radical mutations, has domain repeats, within each species each Dockerin domain interacts with each Cohesin domain, see below), the successful prediction in this family does not point to a general potential in using correlated mutations for predicting intermolecular contacts. Overall, the results of our study indicate that current methodologies of correlated mutations analysis are not suitable for large-scale intermolecular contact prediction, and thus cannot assist in docking. With current measurements, sequence availability, sequence annotations, and underdeveloped sequence pairing methods, correlated mutations can yield reasonable accuracy only for a handful of families.
Collapse
Affiliation(s)
- Inbal Halperin
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | | | | |
Collapse
|
50
|
Chen Y, Reilly K, Chang Y. Evolutionarily conserved allosteric network in the Cys loop family of ligand-gated ion channels revealed by statistical covariance analyses. J Biol Chem 2006; 281:18184-92. [PMID: 16595655 DOI: 10.1074/jbc.m600349200] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The Cys loop family of ligand-gated ion channels mediate fast synaptic transmission for communication between neurons. They are allosteric proteins, in which binding of a neurotransmitter to its binding site in the extracellular amino-terminal domain triggers structural changes in distant transmembrane domains to open a channel for ion flow. Although the locations of binding site and channel gating machinery are well defined, the structural basis of the activation pathway coupling binding and channel opening remains to be determined. In this paper, by analyzing amino acid covariance in a multiple sequence alignment, we have identified an energetically interconnected network in the Cys loop family of ligand-gated ion channels. Statistical coupling and correlated mutational analyses along with clustering revealed a highly coupled cluster. Mapping the positions in the cluster onto a three-dimensional structural model demonstrated that these highly coupled positions form an interconnected network linking experimentally identified binding domains through the coupling region to the gating machinery. In addition, these highly coupled positions are also condensed in the transmembrane domains, which are a recent focus for the sites of action of many allosteric modulators. Thus, our results revealed a genetically interconnected network that potentially plays an important role in the allosteric activation and modulation of the Cys loop family of ligand-gated ion channels.
Collapse
Affiliation(s)
- Yonghui Chen
- Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | | | | |
Collapse
|