1
|
Hernández Berthet AS, Aptekmann AA, Tejero J, Sánchez IE, Noguera ME, Roman EA. Associating protein sequence positions with the modulation of quantitative phenotypes. Arch Biochem Biophys 2024; 755:109979. [PMID: 38583654 DOI: 10.1016/j.abb.2024.109979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/11/2024] [Accepted: 03/27/2024] [Indexed: 04/09/2024]
Abstract
Although protein sequences encode the information for folding and function, understanding their link is not an easy task. Unluckily, the prediction of how specific amino acids contribute to these features is still considerably impaired. Here, we developed a simple algorithm that finds positions in a protein sequence with potential to modulate the studied quantitative phenotypes. From a few hundred protein sequences, we perform multiple sequence alignments, obtain the per-position pairwise differences for both the sequence and the observed phenotypes, and calculate the correlation between these last two quantities. We tested our methodology with four cases: archaeal Adenylate Kinases and the organisms optimal growth temperatures, microbial rhodopsins and their maximal absorption wavelengths, mammalian myoglobins and their muscular concentration, and inhibition of HIV protease clinical isolates by two different molecules. We found from 3 to 10 positions tightly associated with those phenotypes, depending on the studied case. We showed that these correlations appear using individual positions but an improvement is achieved when the most correlated positions are jointly analyzed. Noteworthy, we performed phenotype predictions using a simple linear model that links per-position divergences and differences in the observed phenotypes. Predictions are comparable to the state-of-art methodologies which, in most of the cases, are far more complex. All of the calculations are obtained at a very low information cost since the only input needed is a multiple sequence alignment of protein sequences with their associated quantitative phenotypes. The diversity of the explored systems makes our work a valuable tool to find sequence determinants of biological activity modulation and to predict various functional features for uncharacterized members of a protein family.
Collapse
Affiliation(s)
- Ayelén S Hernández Berthet
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Intendente Güiraldes 2160 - Ciudad Universitaria, 1428EGA, C.A.B.A., Argentina.
| | - Ariel A Aptekmann
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Buenos Aires, Argentina; Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, 08873, USA; Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ, 08901, USA.
| | - Jesús Tejero
- Heart, Lung, Blood and Vascular Medicine Institute, University of Pittsburgh, Pittsburgh, PA, 15261, USA; Division of Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA; Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, 15260, USA; Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
| | - Ignacio E Sánchez
- Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Buenos Aires, Argentina.
| | - Martín E Noguera
- Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química y Fisicoquímica Biológicas Dr. Alejandro Paladini, Junín 956, 1113AAD, C.A.B.A., Argentina; Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Saenz Peña 352, B1876BXD, Bernal, Argentina.
| | - Ernesto A Roman
- Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Intendente Güiraldes 2160 - Ciudad Universitaria, 1428EGA, C.A.B.A., Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química y Fisicoquímica Biológicas Dr. Alejandro Paladini, Junín 956, 1113AAD, C.A.B.A., Argentina.
| |
Collapse
|
2
|
Jin R, He B, Qin Y, Du Z, Cao C, Li J. Unveiling the role of bZIP transcription factors CREB and CEBP in detoxification metabolism of Nilaparvata lugens (Stål). Int J Biol Macromol 2023; 253:126576. [PMID: 37648128 DOI: 10.1016/j.ijbiomac.2023.126576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 08/24/2023] [Accepted: 08/26/2023] [Indexed: 09/01/2023]
Abstract
The basic leucine zipper (bZIP) superfamily is a crucial group of xenobiotics in insects. However, little is known about the function of CAAT enhancer binding proteins (CEBP) and cAMP response element binding protein (CREB) in Nilaparvata lugens. In the present study, NlCEBP and NlCREB were cloned and identified. Quantitative polymerase real-time chain reaction (qRT-PCR) analysis showed the expression of NlCEBP and NlCREB was significantly induced after chemical insecticides exposure. Silencing of NlCEBP and NlCREB increased the susceptibility of N. lugens to insecticides, and the detoxification enzyme activities were also significantly decreased. In addition, comparative transcriptome analysis revealed that 174 genes were significantly co-down-regulated after interfering with the two transcription factors. GO analysis showed that co-down-regulated genes are mostly related to energy transport and metabolic functions indicating the potential regulatory role of NlCEBP and NlCREB in detoxification metabolism. Our research shed lights on the functional roles of transcription factors NlCEBP and NlCREB in the detoxification metabolism of N. lugens, providing a theoretical basis for pest management and comprehensive control of this pest and increasing our understanding of insect toxicology.
Collapse
Affiliation(s)
- Ruoheng Jin
- National Biopesticide Engineering Research Centre, Hubei Biopesticide Engineering Research Centre, Hubei Academy of Agricultural Science, Wuhan 430064, PR China; Hubei Insect Resources Utilization and Sustainable Pest Management Key Laboratory, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Biyan He
- Hubei Insect Resources Utilization and Sustainable Pest Management Key Laboratory, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China; Tongling Municipal Bureau of Agricultural and Rural Affairs, Tongling 244002, PR China
| | - Yao Qin
- Hubei Insect Resources Utilization and Sustainable Pest Management Key Laboratory, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Zuyi Du
- Hubei Insect Resources Utilization and Sustainable Pest Management Key Laboratory, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China
| | - Chunxia Cao
- National Biopesticide Engineering Research Centre, Hubei Biopesticide Engineering Research Centre, Hubei Academy of Agricultural Science, Wuhan 430064, PR China.
| | - Jianhong Li
- Hubei Insect Resources Utilization and Sustainable Pest Management Key Laboratory, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan 430070, PR China.
| |
Collapse
|
3
|
Bhadola P, Deo N. Exploring complexity of class-A Beta-lactamase family using physiochemical-based multiplex networks. Sci Rep 2023; 13:20626. [PMID: 37996629 PMCID: PMC10667273 DOI: 10.1038/s41598-023-48128-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open
Abstract
The Beta-lactamase protein family is vital in countering Beta-lactam antibiotics, a widely used antimicrobial. To enhance our understanding of this family, we adopted a novel approach employing a multiplex network representation of its multiple sequence alignment. Each network layer, derived from the physiochemical properties of amino acids, unveils distinct insights into the intricate interactions among nodes, thereby enabling the identification of key motifs. Nodes with identical property signs tend to aggregate, providing evidence of the presence of consequential functional and evolutionary constraints shaping the Beta-lactamase family. We further investigate the distribution of evolutionary links across various layers. We observe that polarity manifests the highest number of unique links at lower thresholds, followed by hydrophobicity and polarizability, wherein hydrophobicity exerts dominance at higher thresholds. Further, the combinations of polarizability and volume, exhibit multiple simultaneous connections at all thresholds. The combination of hydrophobicity, polarizability, and volume uncovers shared links exclusive to these layers, implying substantial evolutionary impacts that may have functional or structural implications. By assessing the multi-degree of nodes, we unveil the hierarchical influence of properties at each position, identifying crucial properties responsible for the protein's functionality and providing valuable insights into potential targets for modulating enzymatic activity.
Collapse
Affiliation(s)
- Pradeep Bhadola
- Centre for Theoretical Physics & Natural Philosophy, Mahidol University, Nakhonsawan Campus, Phayuha Khiri, NakhonSawan, 60130, Thailand.
| | - Nivedita Deo
- Department of Physics and Astrophysics, University of Delhi, Delhi, 110007, India.
| |
Collapse
|
4
|
Szatkownik A, Zea DJ, Richard H, Laine E. Building alternative splicing and evolution-aware sequence-structure maps for protein repeats. J Struct Biol 2023; 215:107997. [PMID: 37453591 DOI: 10.1016/j.jsb.2023.107997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 06/15/2023] [Accepted: 07/05/2023] [Indexed: 07/18/2023]
Abstract
Alternative splicing of repeats in proteins provides a mechanism for rewiring and fine-tuning protein interaction networks. In this work, we developed a robust and versatile method, ASPRING, to identify alternatively spliced protein repeats from gene annotations. ASPRING leverages evolutionary meaningful alternative splicing-aware hierarchical graphs to provide maps between protein repeats sequences and 3D structures. We re-think the definition of repeats by explicitly accounting for transcript diversity across several genes/species. Using a stringent sequence-based similarity criterion, we detected over 5,000 evolutionary conserved repeats by screening virtually all human protein-coding genes and their orthologs across a dozen species. Through a joint analysis of their sequences and structures, we extracted specificity-determining sequence signatures and assessed their implication in experimentally resolved and modelled protein interactions. Our findings demonstrate the widespread alternative usage of protein repeats in modulating protein interactions and open avenues for targeting repeat-mediated interactions.
Collapse
Affiliation(s)
- Antoine Szatkownik
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France; Bioinformatics Unit, Genome Competence Center (MF1), Robert Koch Institute, 13353 Berlin, Germany
| | - Diego Javier Zea
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Hugues Richard
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France; Bioinformatics Unit, Genome Competence Center (MF1), Robert Koch Institute, 13353 Berlin, Germany.
| | - Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France.
| |
Collapse
|
5
|
Pascarelli S, Laurino P. Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins. PLoS Comput Biol 2022; 18:e1010016. [PMID: 35377869 PMCID: PMC9009777 DOI: 10.1371/journal.pcbi.1010016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 04/14/2022] [Accepted: 03/12/2022] [Indexed: 11/25/2022] Open
Abstract
Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify “inter-paralog inversions”, i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline. Proteins are critical components of living systems because they facilitate most biological processes like protein synthesis, DNA replication, chemical catalysis, etc. Proteins are encoded in their genes. During evolution, genes accumulate mutations that get translated at the protein level. These mutations can be “neutral” if they do not affect the protein function immediately and directly; otherwise, mutations can be functional if they directly modify protein function. An event that provides an opportunity to study protein function is gene duplication namely, when two copies of a gene encoding the same protein appear. One copy of the protein often retains the same function while the other is free to diverge and specialize to a different function. This work sheds light on an alternative outcome of gene duplication that might be critical to discern between neutral and functional mutations. By looking at 88 fish genomes, we found proteins in which the evolution of their sequences does not follow the expected pattern of divergence after gene duplication. In this case, the protein sequence of a subgroup of species diverges in the copy expected to retain its function, while the sequence is retained in the expectedly divergent one. We called this event “inter-paralog amino acid inversion”. Our data shows that this “inversion” event is correlated to function, and its detection has to be considered for assigning protein functions correctly.
Collapse
Affiliation(s)
- Stefano Pascarelli
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa, Japan
| | - Paola Laurino
- Protein Engineering and Evolution Unit, Okinawa Institute of Science and Technology Graduate University, Onna, Okinawa, Japan
- * E-mail:
| |
Collapse
|
6
|
Pazos F. Computational prediction of protein functional sites-Applications in biotechnology and biomedicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:39-57. [PMID: 35534114 DOI: 10.1016/bs.apcsb.2021.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
There are many computational approaches for predicting protein functional sites based on different sequence and structural features. These methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. They complement the more expensive and time-consuming experimental approaches by pointing them to possible candidate positions. In many cases they are jointly used to characterize the functional sites in proteins of biotechnological and biomedical interest and eventually modify them for different purposes. There is a clear trend towards approaches based on machine learning and those using structural information, due to the recent developments in these areas. Nevertheless, "classic" methods based on sequence and evolutionary features are still playing an important role as these features are strongly related to functionality. In this review, the main approaches for predicting general functional sites in a protein are discussed, with a focus on sequence-based approaches.
Collapse
Affiliation(s)
- Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Madrid, Spain.
| |
Collapse
|
7
|
Pazos F. Prediction of Protein Sites and Physicochemical Properties Related to Functional Specificity. Bioengineering (Basel) 2021; 8:bioengineering8120201. [PMID: 34940354 PMCID: PMC8698372 DOI: 10.3390/bioengineering8120201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/25/2021] [Accepted: 11/29/2021] [Indexed: 11/16/2022] Open
Abstract
Specificity Determining Positions (SDPs) are protein sites responsible for functional specificity within a family of homologous proteins. These positions are extracted from a family’s multiple sequence alignment and complement the fully conserved positions as predictors of functional sites. SDP analysis is now routinely used for locating these specificity-related sites in families of proteins of biomedical or biotechnological interest with the aim of mutating them to switch specificities or design new ones. There are many different approaches for detecting these positions in multiple sequence alignments. Nevertheless, existing methods report the potential SDP positions but they do not provide any clue on the physicochemical basis behind the functional specificity, which has to be inferred a-posteriori by manually inspecting these positions in the alignment. In this work, a new methodology is presented that, concomitantly with the detection of the SDPs, automatically provides information on the amino-acid physicochemical properties more related to the change in specificity. This new method is applied to two different multiple sequence alignments of homologous of the well-studied RasH protein representing different cases of functional specificity and the results discussed in detail.
Collapse
Affiliation(s)
- Florencio Pazos
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), c/Darwin, 3, 28049 Madrid, Spain
| |
Collapse
|
8
|
Zea DJ, Laskina S, Baudin A, Richard H, Laine E. Assessing conservation of alternative splicing with evolutionary splicing graphs. Genome Res 2021; 31:1462-1473. [PMID: 34266979 PMCID: PMC8327911 DOI: 10.1101/gr.274696.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 06/11/2021] [Indexed: 12/29/2022]
Abstract
Understanding how protein function has evolved and diversified is of great importance for human genetics and medicine. Here, we tackle the problem of describing the whole transcript variability observed in several species by generalizing the definition of splicing graph. We provide a practical solution to construct parsimonious evolutionary splicing graphs where each node is a minimal transcript building block defined across species. We show a clear link between the functional relevance, tissue regulation, and conservation of alternative transcripts on a set of 50 genes. By scaling up to the whole human protein-coding genome, we identify a few thousand genes where alternative splicing modulates the number and composition of pseudorepeats. We have implemented our approach in ThorAxe, an efficient, versatile, robust, and freely available computational tool.
Collapse
Affiliation(s)
- Diego Javier Zea
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Sofya Laskina
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| | - Alexis Baudin
- Sorbonne Université, CNRS, LIP6, F-75005 Paris, France
| | - Hugues Richard
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, 13353 Berlin, Germany
| | - Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| |
Collapse
|
9
|
Rauer C, Sen N, Waman VP, Abbasian M, Orengo CA. Computational approaches to predict protein functional families and functional sites. Curr Opin Struct Biol 2021; 70:108-122. [PMID: 34225010 DOI: 10.1016/j.sbi.2021.05.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/13/2021] [Accepted: 05/25/2021] [Indexed: 01/06/2023]
Abstract
Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features.
Collapse
Affiliation(s)
- Clemens Rauer
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Neeladri Sen
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Vaishali P Waman
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Mahnaz Abbasian
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, London, WC1E 6BT, UK.
| |
Collapse
|
10
|
Karakulak T, Rifaioglu AS, Rodrigues JPGLM, Karaca E. Predicting the Specificity- Determining Positions of Receptor Tyrosine Kinase Axl. Front Mol Biosci 2021; 8:658906. [PMID: 34195226 PMCID: PMC8236827 DOI: 10.3389/fmolb.2021.658906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/20/2021] [Indexed: 11/22/2022] Open
Abstract
Owing to its clinical significance, modulation of functionally relevant amino acids in protein-protein complexes has attracted a great deal of attention. To this end, many approaches have been proposed to predict the partner-selecting amino acid positions in evolutionarily close complexes. These approaches can be grouped into sequence-based machine learning and structure-based energy-driven methods. In this work, we assessed these methods’ ability to map the specificity-determining positions of Axl, a receptor tyrosine kinase involved in cancer progression and immune system diseases. For sequence-based predictions, we used SDPpred, Multi-RELIEF, and Sequence Harmony. For structure-based predictions, we utilized HADDOCK refinement and molecular dynamics simulations. As a result, we observed that (i) sequence-based methods overpredict partner-selecting residues of Axl and that (ii) combining Multi-RELIEF with HADDOCK-based predictions provides the key Axl residues, covered by the extensive molecular dynamics simulations. Expanding on these results, we propose that a sequence-structure-based approach is necessary to determine specificity-determining positions of Axl, which can guide the development of therapeutic molecules to combat Axl misregulation.
Collapse
Affiliation(s)
- Tülay Karakulak
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey.,Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,Department of Pathology and Molecular Pathology, University Hospital Zurich, Zurich, Switzerland.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Ahmet Sureyya Rifaioglu
- Department of Electrical - Electronics Engineering, İskenderun Technical University, Hatay, Turkey
| | - João P G L M Rodrigues
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, United States
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Turkey.,Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| |
Collapse
|
11
|
Fonseca NJ, Afonso MQL, Carrijo L, Bleicher L. CONAN: a web application to detect specificity determinants and functional sites by amino acids co-variation network analysis. Bioinformatics 2021; 37:1026-1028. [PMID: 32780795 DOI: 10.1093/bioinformatics/btaa713] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 08/01/2020] [Accepted: 08/05/2020] [Indexed: 11/12/2022] Open
Abstract
SUMMARY CONAN is a web application developed to detect specificity determinants and function-related sites by amino acids co-variation networks analysis, emphasizing local coevolutionary constraints. The software allows the characterization of structurally and functionally relevant groups of residues and their relationship with subsets of sequences by automatic cross-referencing with GO terms, UniprotKb annotations and INTERPRO. AVAILABILITY AND IMPLEMENTATION CONAN is free and open-source, being distributed in the terms of the GPLV3 license. The software is available as a web application and python script versions and can be accessed at http://bioinfo.icb.ufmg.br/conan. We also provide running instructions, the source code and a user guide.
Collapse
Affiliation(s)
- N J Fonseca
- Cellular Structure and 3D Bioimaging, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK.,Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - M Q L Afonso
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - L Carrijo
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - L Bleicher
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| |
Collapse
|
12
|
Pitarch B, Ranea JAG, Pazos F. Protein residues determining interaction specificity in paralogous families. Bioinformatics 2021; 37:1076-1082. [PMID: 33135068 DOI: 10.1093/bioinformatics/btaa934] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 10/06/2020] [Accepted: 10/22/2020] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION Predicting the residues controlling a protein's interaction specificity is important not only to better understand its interactions but also to design mutations aimed at fine-tuning or swapping them as well. RESULTS In this work, we present a methodology that combines sequence information (in the form of multiple sequence alignments) with interactome information to detect that kind of residues in paralogous families of proteins. The interactome is used to define pairwise similarities of interaction contexts for the proteins in the alignment. The method looks for alignment positions with patterns of amino-acid changes reflecting the similarities/differences in the interaction neighborhoods of the corresponding proteins. We tested this new methodology in a large set of human paralogous families with structurally characterized interactions, and discuss in detail the results for the RasH family. We show that this approach is a better predictor of interfacial residues than both, sequence conservation and an equivalent 'unsupervised' method that does not use interactome information. AVAILABILITY AND IMPLEMENTATION http://csbg.cnb.csic.es/pazos/Xdet/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Borja Pitarch
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), 28049 Madrid, Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga 29071, Spain.,CIBER de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain.,Institute of Biomedical Research in Malaga (IBIMA), Malaga, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Department, National Centre for Biotechnology (CNB-CSIC), 28049 Madrid, Spain
| |
Collapse
|
13
|
Pontes C, Ruiz-Serra V, Lepore R, Valencia A. Unraveling the molecular basis of host cell receptor usage in SARS-CoV-2 and other human pathogenic β-CoVs. Comput Struct Biotechnol J 2021; 19:759-766. [PMID: 33456724 PMCID: PMC7802526 DOI: 10.1016/j.csbj.2021.01.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 01/07/2021] [Accepted: 01/07/2021] [Indexed: 01/13/2023] Open
Abstract
The recent emergence of the novel SARS-CoV-2 in China and its rapid spread in the human population has led to a public health crisis worldwide. Like in SARS-CoV, horseshoe bats currently represent the most likely candidate animal source for SARS-CoV-2. Yet, the specific mechanisms of cross-species transmission and adaptation to the human host remain unknown. Here we show that the unsupervised analysis of conservation patterns across the β-CoV spike protein family, using sequence information alone, can provide valuable insights on the molecular basis of the specificity of β-CoVs to different host cell receptors. More precisely, our results indicate that host cell receptor usage is encoded in the amino acid sequences of different CoV spike proteins in the form of a set of specificity determining positions (SDPs). Furthermore, by integrating structural data, in silico mutagenesis and coevolution analysis we could elucidate the role of SDPs in mediating ACE2 binding across the Sarbecovirus lineage, either by engaging the receptor through direct intermolecular interactions or by affecting the local environment of the receptor binding motif. Finally, by the analysis of coevolving mutations across a paired MSA we were able to identify key intermolecular contacts occurring at the spike-ACE2 interface. These results show that effective mining of the evolutionary records held in the sequence of the spike protein family can help tracing the molecular mechanisms behind the evolution and host-receptor adaptation of circulating and future novel β-CoVs.
Collapse
Key Words
- APC, average product correction
- CoVs, Coronaviruses
- EV, evolutionary rate
- Functional specificity
- MCA, multiple correspondence analysis
- MI, mutual information
- MSA, multiple sequence alignment
- NTD, N-terminal domain
- Phylogenetic analysis
- Protein subfamilies
- RBD, receptor binding domain
- RBM, receptor binding motif
- SARS-CoV-2
- SDPs, specificity determining positions
- Specificity Determining Positions
- Spike protein evolution
- hACE2, human angiotensin converting enzyme 2
Collapse
Affiliation(s)
- Camila Pontes
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- University of Brasília (UnB), 70910-900, Brasília - DF, Brazil
| | | | - Rosalba Lepore
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
| | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
14
|
Bradley D, Viéitez C, Rajeeve V, Selkrig J, Cutillas PR, Beltrao P. Sequence and Structure-Based Analysis of Specificity Determinants in Eukaryotic Protein Kinases. Cell Rep 2021; 34:108602. [PMID: 33440154 PMCID: PMC7809594 DOI: 10.1016/j.celrep.2020.108602] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 11/03/2020] [Accepted: 12/14/2020] [Indexed: 01/04/2023] Open
Abstract
Protein kinases lie at the heart of cell-signaling processes and are often mutated in disease. Kinase target recognition at the active site is in part determined by a few amino acids around the phosphoacceptor residue. However, relatively little is known about how most preferences are encoded in the kinase sequence or how these preferences evolved. Here, we used alignment-based approaches to predict 30 specificity-determining residues (SDRs) for 16 preferences. These were studied with structural models and were validated by activity assays of mutant kinases. Cancer mutation data revealed that kinase SDRs are mutated more frequently than catalytic residues. We have observed that, throughout evolution, kinase specificity has been strongly conserved across orthologs but can diverge after gene duplication, as illustrated by the G protein-coupled receptor kinase family. The identified SDRs can be used to predict kinase specificity from sequence and aid in the interpretation of evolutionary or disease-related genomic variants.
Collapse
Affiliation(s)
- David Bradley
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
| | - Cristina Viéitez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK; European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Vinothini Rajeeve
- Integrative Cell Signalling & Proteomics, Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK
| | - Joel Selkrig
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Pedro R Cutillas
- Integrative Cell Signalling & Proteomics, Centre for Haemato-Oncology, Barts Cancer Institute, Queen Mary University of London, Charterhouse Square, London EC1M 6BQ, UK.
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK.
| |
Collapse
|
15
|
Mier P, Andrade-Navarro MA. MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments. Evol Bioinform Online 2020; 16:1176934320916199. [PMID: 32425492 PMCID: PMC7218316 DOI: 10.1177/1176934320916199] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 03/10/2020] [Indexed: 11/17/2022] Open
Abstract
Multiple sequence alignments are usually phylogenetically driven. They are studied in the framework of evolution. But sometimes, it is interesting to study residue conservation at positions unconstrained by evolutionary rules. We present a supervised method to access a layer of information difficult to appreciate visually when many protein sequences are aligned. This new tool (MAGA; http://cbdm-01.zdv.uni-mainz.de/~munoz/maga/) locates positions in multiple sequence alignments differentially conserved in manually defined groups of sequences.
Collapse
Affiliation(s)
- Pablo Mier
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, Mainz 55128, Germany
| | - Miguel A Andrade-Navarro
- Institute of Organismic and Molecular Evolution, Faculty of Biology, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, Mainz 55128, Germany
| |
Collapse
|
16
|
Sergeeva AP, Katsamba PS, Cosmanescu F, Brewer JJ, Ahlsen G, Mannepalli S, Shapiro L, Honig B. DIP/Dpr interactions and the evolutionary design of specificity in protein families. Nat Commun 2020; 11:2125. [PMID: 32358559 PMCID: PMC7195491 DOI: 10.1038/s41467-020-15981-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 04/06/2020] [Indexed: 01/10/2023] Open
Abstract
Differential binding affinities among closely related protein family members underlie many biological phenomena, including cell-cell recognition. Drosophila DIP and Dpr proteins mediate neuronal targeting in the fly through highly specific protein-protein interactions. We show here that DIPs/Dprs segregate into seven specificity subgroups defined by binding preferences between their DIP and Dpr members. We then describe a sequence-, structure- and energy-based computational approach, combined with experimental binding affinity measurements, to reveal how specificity is coded on the canonical DIP/Dpr interface. We show that binding specificity of DIP/Dpr subgroups is controlled by "negative constraints", which interfere with binding. To achieve specificity, each subgroup utilizes a different combination of negative constraints, which are broadly distributed and cover the majority of the protein-protein interface. We discuss the structural origins of negative constraints, and potential general implications for the evolutionary origins of binding specificity in multi-protein families.
Collapse
Affiliation(s)
- Alina P Sergeeva
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Phinikoula S Katsamba
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA
| | - Filip Cosmanescu
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA
| | - Joshua J Brewer
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA
| | - Goran Ahlsen
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA
| | - Seetha Mannepalli
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA
| | - Lawrence Shapiro
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA.
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
| | - Barry Honig
- Department of Systems Biology, Columbia University, New York, NY, USA.
- Zuckerman Mind, Brain and Behavior Institute, Columbia University, New York, NY, USA.
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
- Department of Medicine, Columbia University, New York, NY, USA.
| |
Collapse
|
17
|
Deep Analysis of Residue Constraints (DARC): identifying determinants of protein functional specificity. Sci Rep 2020; 10:1691. [PMID: 32015389 PMCID: PMC6997377 DOI: 10.1038/s41598-019-55118-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 11/23/2019] [Indexed: 01/03/2023] Open
Abstract
Protein functional constraints are manifest as superfamily and functional-subgroup conserved residues, and as pairwise correlations. Deep Analysis of Residue Constraints (DARC) aids the visualization of these constraints, characterizes how they correlate with each other and with structure, and estimates statistical significance. This can identify determinants of protein functional specificity, as we illustrate for bacterial DNA clamp loader ATPases. These load ring-shaped sliding clamps onto DNA to keep polymerase attached during replication and contain one δ, three γ, and one δ’ AAA+ subunits semi-circularly arranged in the order δ-γ1-γ2-γ3-δ’. Only γ is active, though both γ and δ’ functionally influence an adjacent γ subunit. DARC identifies, as functionally-congruent features linking allosterically the ATP, DNA, and clamp binding sites: residues distinctive of γ and of γ/δ’ that mutually interact in trans, centered on the catalytic base; several γ/δ’-residues and six γ/δ’-covariant residue pairs within the DNA binding N-termini of helices α2 and α3; and γ/δ’-residues associated with the α2 C-terminus and the clamp-binding loop. Most notable is a trans-acting γ/δ’ hydroxyl group that 99% of other AAA+ proteins lack. Mutation of this hydroxyl to a methyl group impedes clamp binding and opening, DNA binding, and ATP hydrolysis—implying a remarkably clamp-loader-specific function.
Collapse
|
18
|
Malinverni D, Barducci A. Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting. ENTROPY (BASEL, SWITZERLAND) 2020; 21:1127. [PMID: 32002010 PMCID: PMC6992422 DOI: 10.3390/e21111127] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 11/14/2019] [Indexed: 01/07/2023]
Abstract
Extracting structural information from sequence co-variation has become a common computational biology practice in the recent years, mainly due to the availability of large sequence alignments of protein families. However, identifying features that are specific to sub-classes and not shared by all members of the family using sequence-based approaches has remained an elusive problem. We here present a coevolutionary-based method to differentially analyze subfamily specific structural features by a continuous sequence reweighting (SR) approach. We introduce the underlying principles and test its predictive capabilities on the Response Regulator family, whose subfamilies have been previously shown to display distinct, specific homo-dimerization patterns. Our results show that this reweighting scheme is effective in assigning structural features known a priori to subfamilies, even when sequence data is relatively scarce. Furthermore, sequence reweighting allows assessing if individual structural contacts pertain to specific subfamilies and it thus paves the way for the identification specificity-determining contacts from sequence variation data.
Collapse
Affiliation(s)
- Duccio Malinverni
- Medical Research Council (MRC) Laboratory of Molecular Biology, Cambridge CB20QH, UK
| | - Alessandro Barducci
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| |
Collapse
|
19
|
Alballa M, Aplop F, Butler G. TranCEP: Predicting the substrate class of transmembrane transport proteins using compositional, evolutionary, and positional information. PLoS One 2020; 15:e0227683. [PMID: 31935244 PMCID: PMC6959595 DOI: 10.1371/journal.pone.0227683] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 12/26/2019] [Indexed: 11/24/2022] Open
Abstract
Transporters mediate the movement of compounds across the membranes that separate the cell from its environment and across the inner membranes surrounding cellular compartments. It is estimated that one third of a proteome consists of membrane proteins, and many of these are transport proteins. Given the increase in the number of genomes being sequenced, there is a need for computational tools that predict the substrates that are transported by the transmembrane transport proteins. In this paper, we present TranCEP, a predictor of the type of substrate transported by a transmembrane transport protein. TranCEP combines the traditional use of the amino acid composition of the protein, with evolutionary information captured in a multiple sequence alignment (MSA), and restriction to important positions of the alignment that play a role in determining the specificity of the protein. Our experimental results show that TranCEP significantly outperforms the state-of-the-art predictors. The results quantify the contribution made by each type of information used.
Collapse
Affiliation(s)
- Munira Alballa
- Department of Computer Science and Software Engineering, Concordia University, Montréal, Québec, Canada
- College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Faizah Aplop
- School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, Malaysia
| | - Gregory Butler
- Department of Computer Science and Software Engineering, Concordia University, Montréal, Québec, Canada
- Centre for Structural and Functional Genomics, Concordia University, Montréal, Québec, Canada
- * E-mail:
| |
Collapse
|
20
|
Karasev D, Sobolev B, Lagunin A, Filimonov D, Poroikov V. Prediction of Protein-Ligand Interaction Based on the Positional Similarity Scores Derived from Amino Acid Sequences. Int J Mol Sci 2019; 21:ijms21010024. [PMID: 31861473 PMCID: PMC6981593 DOI: 10.3390/ijms21010024] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 12/13/2019] [Accepted: 12/16/2019] [Indexed: 12/14/2022] Open
Abstract
The affinity of different drug-like ligands to multiple protein targets reflects general chemical–biological interactions. Computational methods estimating such interactions analyze the available information about the structure of the targets, ligands, or both. Prediction of protein–ligand interactions based on pairwise sequence alignment provides reasonable accuracy if the ligands’ specificity well coincides with the phylogenic taxonomy of the proteins. Methods using multiple alignment require an accurate match of functionally significant residues. Such conditions may not be met in the case of diverged protein families. To overcome these limitations, we propose an approach based on the analysis of local sequence similarity within the set of analyzed proteins. The positional scores, calculated by sequence fragment comparisons, are used as input data for the Bayesian classifier. Our approach provides a prediction accuracy comparable or exceeding those of other methods. It was demonstrated on the popular Gold Standard test sets, presenting different sequence heterogeneity and varying from the group, including different protein families to the more specific groups. A reasonable prediction accuracy was also found for protein kinases, displaying weak relationships between sequence phylogeny and inhibitor specificity. Thus, our method can be applied to the broad area of protein–ligand interactions.
Collapse
Affiliation(s)
- Dmitry Karasev
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia; (B.S.); (A.L.); (D.F.); (V.P.)
- Correspondence:
| | - Boris Sobolev
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia; (B.S.); (A.L.); (D.F.); (V.P.)
| | - Alexey Lagunin
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia; (B.S.); (A.L.); (D.F.); (V.P.)
- Department of Bioinformatics, Russian National Research Medical University, Moscow 117997, Russia
| | - Dmitry Filimonov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia; (B.S.); (A.L.); (D.F.); (V.P.)
| | - Vladimir Poroikov
- Department of Bioinformatics, Institute of Biomedical Chemistry, Moscow 119121, Russia; (B.S.); (A.L.); (D.F.); (V.P.)
| |
Collapse
|
21
|
Molecular mechanisms of the protein-protein interaction-regulated binding specificity of basic-region leucine zipper transcription factors. J Mol Model 2019; 25:246. [PMID: 31342181 DOI: 10.1007/s00894-019-4138-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 07/14/2019] [Indexed: 10/26/2022]
Abstract
It is well known that the DNA-binding specificity of transcription factors (TFs) is influenced by protein-protein interactions (PPIs). However, the underlying molecular mechanisms remain largely unknown. In this work, we adopted the cAMP-response element-binding protein (CREB) of the basic leucine zipper (bZIP) TF family as a model system, and a workflow of combined bioinformatics and molecular modeling analysis of protein-DNA interaction was tested. First, the multiple sequence alignment and SDPsite method were used to find potential bZIP family binding specificity determining positions (SDPs) within the protein-protein interaction region. Second, the mutation system was analyzed using molecular dynamics simulation. Molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) free energy calculations confirmed the enhancement of the binding affinity of the mutation, which was in agreement with experimental results. The root mean square fluctuation (RMSF) and hydrogen bonding changes suggested an open and close protein dimerization process after the system was mutated, which resulted in the change of the hydrogen bonding of the protein-DNA interface and a slight conformational change. We believe that this work will contribute to understanding the protein-protein interaction-regulated binding specificity of bZIP transcription factors.
Collapse
|
22
|
Bradley D, Beltrao P. Evolution of protein kinase substrate recognition at the active site. PLoS Biol 2019; 17:e3000341. [PMID: 31233486 PMCID: PMC6611643 DOI: 10.1371/journal.pbio.3000341] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 07/05/2019] [Accepted: 06/12/2019] [Indexed: 02/05/2023] Open
Abstract
Protein kinases catalyse the phosphorylation of target proteins, controlling most cellular processes. The specificity of serine/threonine kinases is partly determined by interactions with a few residues near the phospho-acceptor residue, forming the so-called kinase-substrate motif. Kinases have been extensively duplicated throughout evolution, but little is known about when in time new target motifs have arisen. Here, we show that sequence variation occurring early in the evolution of kinases is dominated by changes in specificity-determining residues. We then analysed kinase specificity models, based on known target sites, observing that specificity has remained mostly unchanged for recent kinase duplications. Finally, analysis of phosphorylation data from a taxonomically broad set of 48 eukaryotic species indicates that most phosphorylation motifs are broadly distributed in eukaryotes but are not present in prokaryotes. Overall, our results suggest that the set of eukaryotes kinase motifs present today was acquired around the time of the eukaryotic last common ancestor and that early expansions of the protein kinase fold rapidly explored the space of possible target motifs.
Collapse
Affiliation(s)
- David Bradley
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| |
Collapse
|
23
|
Gil N, Fiser A. The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis. Bioinformatics 2019; 35:12-19. [PMID: 29947739 PMCID: PMC6298051 DOI: 10.1093/bioinformatics/bty523] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 04/20/2018] [Accepted: 06/26/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation The analysis of sequence conservation patterns has been widely utilized to identify functionally important (catalytic and ligand-binding) protein residues for over a half-century. Despite decades of development, on average state-of-the-art non-template-based functional residue prediction methods must predict ∼25% of a protein's total residues to correctly identify half of the protein's functional site residues. The overwhelming proportion of false positives results in reported 'F-Scores' of ∼0.3. We investigated the limits of current approaches, focusing on the so-far neglected impact of the specific choice of homologs included in multiple sequence alignments (MSAs). Results The limits of conservation-based functional residue prediction were explored by surveying the binding sites of 1023 proteins. A straightforward conservation analysis of MSAs composed of randomly selected homologs sampled from a PSI-BLAST search achieves average F-Scores of ∼0.3, a performance matching that reported by state-of-the-art methods, which often consider additional features for the prediction in a machine learning setting. Interestingly, we found that a simple combinatorial MSA sampling algorithm will in almost every case produce an MSA with an optimal set of homologs whose conservation analysis reaches average F-Scores of ∼0.6, doubling state-of-the-art performance. We also show that this is nearly at the theoretical limit of possible performance given the agreement between different binding site definitions. Additionally, we showcase the progress in this direction made by Selection of Alignment by Maximal Mutual Information (SAMMI), an information-theory-based approach to identifying biologically informative MSAs. This work highlights the importance and the unused potential of optimally composed MSAs for conservation analysis. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nelson Gil
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| | - Andras Fiser
- Department of Systems & Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
24
|
Pazos F, Garcia-Moreno A, Oliveros JC. Automatic detection of genomic regions with informative epigenetic patterns. BMC Genomics 2018; 19:847. [PMID: 30486775 PMCID: PMC6264639 DOI: 10.1186/s12864-018-5286-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 11/20/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Epigenetic phenomena are crucial for explaining the phenotypic plasticity seen in the cells of different tissues, developmental stages and diseases, all holding the same DNA sequence. As technology is allowing to retrieve epigenetic information in a genome-wide fashion, massive epigenomic datasets are being accumulated in public repositories. New approaches are required to mine those data to extract useful knowledge. We present here an automatic approach for detecting genomic regions with epigenetic variation patterns across samples related to a grouping of these samples, as a way of detecting regions functionally associated to the phenomenon behind the classification. RESULTS We show that the regions automatically detected by the method in the whole human genome associated to three different classifications of a set of epigenomes (cancer vs. healthy, brain vs. other organs, and fetal vs. adult tissues) are enriched in genes associated to these processes. CONCLUSIONS The method is fully automatic and can exhaustively scan the whole human genome at any resolution using large collections of epigenomes as input, although it also produces good results with small datasets. Consequently, it will be valuable for obtaining functional information from the incoming epigenomic information as it continues to accumulate.
Collapse
Affiliation(s)
- Florencio Pazos
- National Center for Biotechnology (CNB-CSIC), c/ Darwin, 3, 28049 Madrid, Spain
| | | | - Juan C. Oliveros
- National Center for Biotechnology (CNB-CSIC), c/ Darwin, 3, 28049 Madrid, Spain
| |
Collapse
|
25
|
da Fonseca NJ, Afonso MQL, de Oliveira LC, Bleicher L. A new method bridging graph theory and residue co-evolutionary networks for specificity determinant positions detection. Bioinformatics 2018; 35:1478-1485. [DOI: 10.1093/bioinformatics/bty846] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 09/11/2018] [Accepted: 10/04/2018] [Indexed: 12/22/2022] Open
Affiliation(s)
- Néli José da Fonseca
- Departmento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Pampulha, Belo Horizonte – MG, Brazil
| | - Marcelo Querino Lima Afonso
- Departmento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Pampulha, Belo Horizonte – MG, Brazil
| | - Lucas Carrijo de Oliveira
- Departmento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Pampulha, Belo Horizonte – MG, Brazil
| | - Lucas Bleicher
- Departmento de Bioquímica e Imunologia, Instituto de Ciências Biológicas, Universidade Federal de Minas Gerais, Pampulha, Belo Horizonte – MG, Brazil
| |
Collapse
|
26
|
Kress A, Lecompte O, Poch O, Thompson JD. PROBE: analysis and visualization of protein block-level evolution. Bioinformatics 2018; 34:3390-3392. [DOI: 10.1093/bioinformatics/bty367] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 05/04/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Arnaud Kress
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Odile Lecompte
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Olivier Poch
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| | - Julie D Thompson
- Department of Computer Science, ICube, UMR 7357, University of Strasbourg, CNRS, Fédération de Médecine Translationnelle de Strasbourg, Strasbourg, France
| |
Collapse
|
27
|
Fonseca-Júnior NJ, Afonso MQ, Oliveira LC, Bleicher L. PFstats: A Network-Based Open Tool for Protein Family Analysis. J Comput Biol 2018; 25:480-486. [DOI: 10.1089/cmb.2017.0181] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Néli J. Fonseca-Júnior
- Departamento de Bioquimica e Imunologia, Instituto de Ciências Biologicas (ICB), Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Marcelo Q.L. Afonso
- Departamento de Bioquimica e Imunologia, Instituto de Ciências Biologicas (ICB), Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Lucas C. Oliveira
- Departamento de Bioquimica e Imunologia, Instituto de Ciências Biologicas (ICB), Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| | - Lucas Bleicher
- Departamento de Bioquimica e Imunologia, Instituto de Ciências Biologicas (ICB), Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Brazil
| |
Collapse
|
28
|
Lajkó DB, Valkai I, Domoki M, Ménesi D, Ferenc G, Ayaydin F, Fehér A. In silico identification and experimental validation of amino acid motifs required for the Rho-of-plants GTPase-mediated activation of receptor-like cytoplasmic kinases. PLANT CELL REPORTS 2018; 37:627-639. [PMID: 29340786 DOI: 10.1007/s00299-018-2256-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 01/08/2018] [Indexed: 06/07/2023]
Abstract
Several amino acid motifs required for Rop-dependent activity were found to form a common surface on RLCKVI_A kinases. This indicates a unique mechanism for Rho-type GTPase-mediated kinase activation in plants. Rho-of-plants (Rop) G-proteins are implicated in the regulation of various cellular processes, including cell growth, cell polarity, hormonal and pathogen responses. Our knowledge about the signalling pathways downstream of Rops is continuously increasing. However, there are still substantial gaps in this knowledge. One reason for this is that these pathways are considerably different from those described for yeast and/or animal Rho-type GTPases. Among others, plants lack all Rho/Rac/Cdc42-activated kinase families. Only a small group of plant-specific receptor-like cytoplasmic kinases (RLCK VI_A) has been shown to exhibit Rop-binding-dependent in vitro activity. These kinases do not carry any known GTPase-binding motifs. Based on the sequence comparison of the Rop-activated RLCK VI_A and the closely related but constitutively active RLCK VI_B kinases, several distinguishing amino acid residues/motifs were identified. All but one of these were found to be required for the Rop-mediated regulation of the in vitro activity of two RLCK VI_A kinases. Structural modelling indicated that these motifs might form a common Rop-binding surface. Based on in silico data mining, kinases that have the identified Rop-binding motifs are present in Embryophyta but not in unicellular green algae. It can, therefore, be supposed that Rops recruited these plant-specific kinases for signalling at an early stage of land plant evolution.
Collapse
Affiliation(s)
- Dézi Bianka Lajkó
- Biological Research Centre, Institute of Plant Biology, Hungarian Academy of Sciences, Temesvári krt. 62, P.O. Box 521, Szeged, 6701, Hungary
| | - Ildikó Valkai
- Biological Research Centre, Institute of Plant Biology, Hungarian Academy of Sciences, Temesvári krt. 62, P.O. Box 521, Szeged, 6701, Hungary
| | - Mónika Domoki
- Biological Research Centre, Institute of Plant Biology, Hungarian Academy of Sciences, Temesvári krt. 62, P.O. Box 521, Szeged, 6701, Hungary
| | - Dalma Ménesi
- Biological Research Centre, Institute of Plant Biology, Hungarian Academy of Sciences, Temesvári krt. 62, P.O. Box 521, Szeged, 6701, Hungary
| | - Györgyi Ferenc
- Biological Research Centre, Institute of Plant Biology, Hungarian Academy of Sciences, Temesvári krt. 62, P.O. Box 521, Szeged, 6701, Hungary
| | - Ferhan Ayaydin
- Biological Research Centre, Institute of Plant Biology, Hungarian Academy of Sciences, Temesvári krt. 62, P.O. Box 521, Szeged, 6701, Hungary
| | - Attila Fehér
- Biological Research Centre, Institute of Plant Biology, Hungarian Academy of Sciences, Temesvári krt. 62, P.O. Box 521, Szeged, 6701, Hungary.
- Department of Plant Biology, University of Szeged, Közép fasor 52, Szeged, 6726, Hungary.
| |
Collapse
|
29
|
Garrido-Martín D, Pazos F. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues. BMC Bioinformatics 2018; 19:67. [PMID: 29482506 PMCID: PMC5827975 DOI: 10.1186/s12859-018-2084-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 02/21/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. RESULTS In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. CONCLUSIONS These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.
Collapse
Affiliation(s)
- Diego Garrido-Martín
- Present address: Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, c/ Dr. Aiguader, 88, 08003, Barcelona, Spain.,Present address: Universitat Pompeu Fabra (UPF), Plaça de la Mercè, 10-12, 08002, Barcelona, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), c/ Darwin, 3, 28049, Madrid, Spain.
| |
Collapse
|
30
|
Brown T, Brown N, Stollar EJ. Most yeast SH3 domains bind peptide targets with high intrinsic specificity. PLoS One 2018; 13:e0193128. [PMID: 29470497 PMCID: PMC5823434 DOI: 10.1371/journal.pone.0193128] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Accepted: 02/04/2018] [Indexed: 01/07/2023] Open
Abstract
A need exists to develop bioinformatics for predicting differences in protein function, especially for members of a domain family who share a common fold, yet are found in a diverse array of proteins. Many domain families have been conserved over large evolutionary spans and representative genomic data during these periods are now available. This allows a simple method for grouping domain sequences to reveal common and unique/specific binding residues. As such, we hypothesize that sequence alignment analysis of the yeast SH3 domain family across ancestral species in the fungal kingdom can determine whether each member encodes specific information to bind unique peptide targets. With this approach, we identify important specific residues for a given domain as those that show little conservation within an alignment of yeast domain family members (paralogs) but are conserved in an alignment of its direct relatives (orthologs). We find most of the yeast SH3 domain family members have maintained unique amino acid conservation patterns that suggest they bind peptide targets with high intrinsic specificity through varying degrees of non-canonical recognition. For a minority of domains, we predict a less diverse binding surface, likely requiring additional factors to bind targets specifically. We observe that our predictions are consistent with high throughput binding data, which suggests our approach can probe intrinsic binding specificity in any other interaction domain family that is maintained during evolution.
Collapse
Affiliation(s)
- Tom Brown
- Math and Computer Science Department, Eastern New Mexico University, Portales, NM, United States of America
| | - Nick Brown
- Portales High School, Portales, NM, United States of America
| | - Elliott J. Stollar
- Physical Sciences Department, Eastern New Mexico University, Portales, NM, United States of America
- * E-mail:
| |
Collapse
|
31
|
Neuwald AF, Aravind L, Altschul SF. Inferring joint sequence-structural determinants of protein functional specificity. eLife 2018; 7. [PMID: 29336305 PMCID: PMC5770160 DOI: 10.7554/elife.29880] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 12/22/2017] [Indexed: 01/05/2023] Open
Abstract
Residues responsible for allostery, cooperativity, and other subtle but functionally important interactions remain difficult to detect. To aid such detection, we employ statistical inference based on the assumption that residues distinguishing a protein subgroup from evolutionarily divergent subgroups often constitute an interacting functional network. We identify such networks with the aid of two measures of statistical significance. One measure aids identification of divergent subgroups based on distinguishing residue patterns. For each subgroup, a second measure identifies structural interactions involving pattern residues. Such interactions are derived either from atomic coordinates or from Direct Coupling Analysis scores, used as surrogates for structural distances. Applying this approach to N-acetyltransferases, P-loop GTPases, RNA helicases, synaptojanin-superfamily phosphatases and nucleases, and thymine/uracil DNA glycosylases yielded results congruent with biochemical understanding of these proteins, and also revealed striking sequence-structural features overlooked by other methods. These and similar analyses can aid the design of drugs targeting allosteric sites.
Collapse
Affiliation(s)
- Andrew F Neuwald
- Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, United States.,Department of Biochemistry and Molecular Biology, University of Maryland School of Medicine, Baltimore, United States
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, United States
| | - Stephen F Altschul
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, United States
| |
Collapse
|
32
|
Sánchez-Gracia A, Guirao-Rico S, Hinojosa-Alvarez S, Rozas J. Computational prediction of the phenotypic effects of genetic variants: basic concepts and some application examples in Drosophila nervous system genes. J Neurogenet 2017; 31:307-319. [DOI: 10.1080/01677063.2017.1398241] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Alejandro Sánchez-Gracia
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Sara Guirao-Rico
- Center for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Bellaterra, Spain
| | - Silvia Hinojosa-Alvarez
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
33
|
Effective estimation of the minimum number of amino acid residues required for functional divergence between duplicate genes. Mol Phylogenet Evol 2017; 113:126-138. [DOI: 10.1016/j.ympev.2017.05.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2017] [Revised: 03/19/2017] [Accepted: 05/10/2017] [Indexed: 01/10/2023]
|
34
|
Swint-Kruse L. Using Evolution to Guide Protein Engineering: The Devil IS in the Details. Biophys J 2017; 111:10-8. [PMID: 27410729 DOI: 10.1016/j.bpj.2016.05.030] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Revised: 04/18/2016] [Accepted: 05/20/2016] [Indexed: 10/21/2022] Open
Abstract
For decades, protein engineers have endeavored to reengineer existing proteins for novel applications. Overall, protein folds and gross functions can be readily transferred from one protein to another by transplanting large blocks of sequence (i.e., domain recombination). However, predictably fine-tuning function (e.g., by adjusting ligand affinity, specificity, catalysis, and/or allosteric regulation) remains a challenge. One approach has been to use the sequences of protein families to identify amino acid positions that change during the evolution of functional variation. The rationale is that these nonconserved positions could be mutated to predictably fine-tune function. Evolutionary approaches to protein design have had some success, but the engineered proteins seldom replicate the functional performances of natural proteins. This Biophysical Perspective reviews several complexities that have been revealed by evolutionary and experimental studies of protein function. These include 1) challenges in defining computational and biological thresholds that define important amino acids; 2) the co-occurrence of many different patterns of amino acid changes in evolutionary data; 3) difficulties in mapping the patterns of amino acid changes to discrete functional parameters; 4) the nonconventional mutational outcomes that occur for a particular group of functionally important, nonconserved positions; 5) epistasis (nonadditivity) among multiple mutations; and 6) the fact that a large fraction of a protein's amino acids contribute to its overall function. To overcome these challenges, new goals are identified for future studies.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas.
| |
Collapse
|
35
|
Medvedev KE, Kolchanov NA, Afonnikov DA. Identification of residues of the archaeal RNA-binding Nip7 proteins specific to environmental conditions. J Bioinform Comput Biol 2017; 15:1650036. [DOI: 10.1142/s0219720016500360] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The understanding of biological and molecular mechanisms providing survival of cells under extreme temperatures and pressures will help to answer fundamental questions related to the origin of life and to design of biotechnologically important enzymes with new properties. Here, we analyze amino acid sequences of the Nip7 proteins from 35 archaeal species to identify positions containing mutations specific to the hydrostatic pressure and temperature of organism’s habitat. The number of such positions related to pressure change is much lower than related to temperature change. The results suggest that adaptation to temperature changes of the Nip7 protein cause more pronounced modifications in sequence and structure, than to the pressure changes. Structural analysis of residues at these positions demonstrated their involvement in salt-bridge formation, which may reflect the importance of protein structure stabilization by salt-bridges at extreme environmental conditions.
Collapse
Affiliation(s)
- Kirill E. Medvedev
- Department of Biophysics, University of Texas Southwestern, Medical Center, Dallas, Texas 75390, USA
- Institute of Cytology and Genetics Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
| | - Nikolay A. Kolchanov
- Institute of Cytology and Genetics Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
- NRC Kurchatov Institute, Akademika Kurchatova pl., 1, Moscow 123182, Russia
- Novosibirsk State University, Pirogova str. 2, Novosibirsk 630090, Russia
| | - Dmitry A. Afonnikov
- Institute of Cytology and Genetics Siberian Branch of the Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
- Novosibirsk State University, Pirogova str. 2, Novosibirsk 630090, Russia
| |
Collapse
|
36
|
Neuwald AF, Altschul SF. Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations. PLoS Comput Biol 2016; 12:e1005294. [PMID: 28002465 PMCID: PMC5225019 DOI: 10.1371/journal.pcbi.1005294] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 01/10/2017] [Accepted: 12/08/2016] [Indexed: 11/25/2022] Open
Abstract
Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes’ theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu). Protein sequence data, when gathered in great quantity, contain important but implicit biological information manifest as statistical correlations. Here we describe an approach to access this information by comprehensively modeling and characterizing the distribution of sequences belonging to a major protein superfamily. This approach takes as input a large set of unaligned sequences belonging to the superfamily. By applying the minimum description length principle, it seeks the statistical model that best explains the sequences while avoiding over-fitting the data. It concurrently aligns the sequences and, to model evolutionary divergence, partitions them into subgroups that are hierarchically-arranged based upon correlated residue patterns. Auxiliary routines create PyMOL scripts to visualize the locations of correlated residues within available structures. Because these correlations likely arise from structural and biochemical constraints, they can help elucidate protein properties important for functional specificity. Comparing and contrasting sequence and structural features in this way may therefore suggest, in the light of published studies, plausible biological hypotheses for experimental investigation. We illustrate this approach with N-acetyltransferases.
Collapse
Affiliation(s)
- Andrew F. Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, BioPark II, Room 617, Baltimore, MD, United States of America
- * E-mail:
| | - Stephen F. Altschul
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, United States of America
| |
Collapse
|
37
|
Blagus R, Goeman JJ. What (not) to expect when classifying rare events. Brief Bioinform 2016; 19:341-349. [DOI: 10.1093/bib/bbw107] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Indexed: 01/23/2023] Open
Affiliation(s)
- Rok Blagus
- Univerza v Ljubljani Medicinska Fakulteta, Institute for Biostatistics and Medical Informatics, Leiden, The Netherlands
| | - Jelle J Goeman
- Leiden University Medical Center, Department of Medical Statistics and Bioinformatics, Leiden, The Netherlands
| |
Collapse
|
38
|
A Bioinformatics Analysis Reveals a Group of MocR Bacterial Transcriptional Regulators Linked to a Family of Genes Coding for Membrane Proteins. Biochem Res Int 2016; 2016:4360285. [PMID: 27446613 PMCID: PMC4944035 DOI: 10.1155/2016/4360285] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Accepted: 05/26/2016] [Indexed: 01/30/2023] Open
Abstract
The MocR bacterial transcriptional regulators are characterized by an N-terminal domain, 60 residues long on average, possessing the winged-helix-turn-helix (wHTH) architecture responsible for DNA recognition and binding, linked to a large C-terminal domain (350 residues on average) that is homologous to fold type-I pyridoxal 5′-phosphate (PLP) dependent enzymes like aspartate aminotransferase (AAT). These regulators are involved in the expression of genes taking part in several metabolic pathways directly or indirectly connected to PLP chemistry, many of which are still uncharacterized. A bioinformatics analysis is here reported that studied the features of a distinct group of MocR regulators predicted to be functionally linked to a family of homologous genes coding for integral membrane proteins of unknown function. This group occurs mainly in the Actinobacteria and Gammaproteobacteria phyla. An analysis of the multiple sequence alignments of their wHTH and AAT domains suggested the presence of specificity-determining positions (SDPs). Mapping of SDPs onto a homology model of the AAT domain hinted at possible structural/functional roles in effector recognition. Likewise, SDPs in wHTH domain suggested the basis of specificity of Transcription Factor Binding Site recognition. The results reported represent a framework for rational design of experiments and for bioinformatics analysis of other MocR subgroups.
Collapse
|
39
|
Schwarz RF, Tamuri AU, Kultys M, King J, Godwin J, Florescu AM, Schultz J, Goldman N. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments. Nucleic Acids Res 2016; 44:e77. [PMID: 26819408 PMCID: PMC4856975 DOI: 10.1093/nar/gkw022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 01/08/2016] [Indexed: 12/19/2022] Open
Abstract
Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles).
Collapse
Affiliation(s)
- Roland F Schwarz
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Asif U Tamuri
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Marek Kultys
- Science Practice, 83-85 Paul Street, London, EC2A 4NQ, UK
| | - James King
- Science Practice, 83-85 Paul Street, London, EC2A 4NQ, UK
| | - James Godwin
- Science Practice, 83-85 Paul Street, London, EC2A 4NQ, UK
| | - Ana M Florescu
- Science Practice, 83-85 Paul Street, London, EC2A 4NQ, UK
| | - Jörg Schultz
- Center for Computational and Theoretical Biology and Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany
| | - Nick Goldman
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| |
Collapse
|
40
|
Das S, Dawson NL, Orengo CA. Diversity in protein domain superfamilies. Curr Opin Genet Dev 2015; 35:40-9. [PMID: 26451979 PMCID: PMC4686048 DOI: 10.1016/j.gde.2015.09.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 09/07/2015] [Accepted: 09/08/2015] [Indexed: 01/25/2023]
Abstract
Whilst ∼93% of domain superfamilies appear to be relatively structurally and functionally conserved based on the available data from the CATH-Gene3D domain classification resource, the remainder are much more diverse. In this review, we consider how domains in some of the most ubiquitous and promiscuous superfamilies have evolved, in particular the plasticity in their functional sites and surfaces which expands the repertoire of molecules they interact with and actions performed on them. To what extent can we identify a core function for these superfamilies which would allow us to develop a ‘domain grammar of function’ whereby a protein's biological role can be proposed from its constituent domains? Clearly the first step is to understand the extent to which these components vary and how changes in their molecular make-up modifies function.
Collapse
Affiliation(s)
- Sayoni Das
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, UCL, 627 Darwin Building, Gower Street, WC1E 6BT, UK.
| |
Collapse
|
41
|
Chagoyen M, García-Martín JA, Pazos F. Practical analysis of specificity-determining residues in protein families. Brief Bioinform 2015; 17:255-61. [DOI: 10.1093/bib/bbv045] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 06/15/2015] [Indexed: 12/17/2022] Open
|
42
|
Das S, Lee D, Sillitoe I, Dawson NL, Lees JG, Orengo CA. Functional classification of CATH superfamilies: a domain-based approach for protein function annotation. Bioinformatics 2015; 31:3460-7. [PMID: 26139634 PMCID: PMC4612221 DOI: 10.1093/bioinformatics/btv398] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Accepted: 06/24/2015] [Indexed: 11/18/2022] Open
Abstract
Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer. Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110 439 FunFams in 2735 superfamilies which can be used to functionally annotate > 16 million domain sequences. Availability and implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam. Contact:sayoni.das.12@ucl.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sayoni Das
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - David Lee
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Natalie L Dawson
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Jonathan G Lees
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, WC1E 6BT, UK
| |
Collapse
|
43
|
Tiwari P, Singh N, Dixit A, Choudhury D. Multivariate sequence analysis reveals additional function impacting residues in the SDR superfamily. Proteins 2014; 82:2842-56. [DOI: 10.1002/prot.24647] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2014] [Revised: 06/19/2014] [Accepted: 07/15/2014] [Indexed: 11/08/2022]
Affiliation(s)
- Pratibha Tiwari
- School of Biotechnology, Jawaharlal Nehru University; New Delhi 110 067 India
| | - Noopur Singh
- School of Biotechnology, Jawaharlal Nehru University; New Delhi 110 067 India
| | - Aparna Dixit
- School of Biotechnology, Jawaharlal Nehru University; New Delhi 110 067 India
| | - Devapriya Choudhury
- School of Biotechnology, Jawaharlal Nehru University; New Delhi 110 067 India
| |
Collapse
|