Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Hernández Berthet AS, Aptekmann AA, Tejero J, Sánchez IE, Noguera ME, Roman EA. Associating protein sequence positions with the modulation of quantitative phenotypes. Arch Biochem Biophys 2024;755:109979. [PMID: 38583654 DOI: 10.1016/j.abb.2024.109979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/11/2024] [Accepted: 03/27/2024] [Indexed: 04/09/2024]

Abstract

Although protein sequences encode the information for folding and function, understanding their link is not an easy task. Unluckily, the prediction of how specific amino acids contribute to these features is still considerably impaired. Here, we developed a simple algorithm that finds positions in a protein sequence with potential to modulate the studied quantitative phenotypes. From a few hundred protein sequences, we perform multiple sequence alignments, obtain the per-position pairwise differences for both the sequence and the observed phenotypes, and calculate the correlation between these last two quantities. We tested our methodology with four cases: archaeal Adenylate Kinases and the organisms optimal growth temperatures, microbial rhodopsins and their maximal absorption wavelengths, mammalian myoglobins and their muscular concentration, and inhibition of HIV protease clinical isolates by two different molecules. We found from 3 to 10 positions tightly associated with those phenotypes, depending on the studied case. We showed that these correlations appear using individual positions but an improvement is achieved when the most correlated positions are jointly analyzed. Noteworthy, we performed phenotype predictions using a simple linear model that links per-position divergences and differences in the observed phenotypes. Predictions are comparable to the state-of-art methodologies which, in most of the cases, are far more complex. All of the calculations are obtained at a very low information cost since the only input needed is a multiple sequence alignment of protein sequences with their associated quantitative phenotypes. The diversity of the explored systems makes our work a valuable tool to find sequence determinants of biological activity modulation and to predict various functional features for uncharacterized members of a protein family.

Collapse

Affiliation(s)

Ayelén S Hernández Berthet Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Intendente Güiraldes 2160 - Ciudad Universitaria, 1428EGA, C.A.B.A., Argentina.
Ariel A Aptekmann Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Buenos Aires, Argentina; Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ, 08873, USA; Institute of Marine and Coastal Sciences, Rutgers University, New Brunswick, NJ, 08901, USA.
Jesús Tejero Heart, Lung, Blood and Vascular Medicine Institute, University of Pittsburgh, Pittsburgh, PA, 15261, USA; Division of Pulmonary, Allergy and Critical Care Medicine, University of Pittsburgh, Pittsburgh, PA, 15261, USA; Department of Bioengineering, Swanson School of Engineering, University of Pittsburgh, Pittsburgh, PA, 15260, USA; Department of Pharmacology and Chemical Biology, University of Pittsburgh, Pittsburgh, PA, 15261, USA.
Ignacio E Sánchez Universidad de Buenos Aires, Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Facultad de Ciencias Exactas y Naturales, Laboratorio de Fisiología de Proteínas, Buenos Aires, Argentina.
Martín E Noguera Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química y Fisicoquímica Biológicas Dr. Alejandro Paladini, Junín 956, 1113AAD, C.A.B.A., Argentina; Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Saenz Peña 352, B1876BXD, Bernal, Argentina.
Ernesto A Roman Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Intendente Güiraldes 2160 - Ciudad Universitaria, 1428EGA, C.A.B.A., Argentina; Consejo Nacional de Investigaciones Científicas y Técnicas, Instituto de Química y Fisicoquímica Biológicas Dr. Alejandro Paladini, Junín 956, 1113AAD, C.A.B.A., Argentina.

Collapse

Jin R, He B, Qin Y, Du Z, Cao C, Li J. Unveiling the role of bZIP transcription factors CREB and CEBP in detoxification metabolism of Nilaparvata lugens (Stål). Int J Biol Macromol 2023;253:126576. [PMID: 37648128 DOI: 10.1016/j.ijbiomac.2023.126576] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2023] [Revised: 08/24/2023] [Accepted: 08/26/2023] [Indexed: 09/01/2023]

Bhadola P, Deo N. Exploring complexity of class-A Beta-lactamase family using physiochemical-based multiplex networks. Sci Rep 2023;13:20626. [PMID: 37996629 PMCID: PMC10667273 DOI: 10.1038/s41598-023-48128-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2023] [Accepted: 11/22/2023] [Indexed: 11/25/2023] Open

Szatkownik A, Zea DJ, Richard H, Laine E. Building alternative splicing and evolution-aware sequence-structure maps for protein repeats. J Struct Biol 2023;215:107997. [PMID: 37453591 DOI: 10.1016/j.jsb.2023.107997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 06/15/2023] [Accepted: 07/05/2023] [Indexed: 07/18/2023]

Pascarelli S, Laurino P. Inter-paralog amino acid inversion events in large phylogenies of duplicated proteins. PLoS Comput Biol 2022;18:e1010016. [PMID: 35377869 PMCID: PMC9009777 DOI: 10.1371/journal.pcbi.1010016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 04/14/2022] [Accepted: 03/12/2022] [Indexed: 11/25/2022] Open

Abstract

Connecting protein sequence to function is becoming increasingly relevant since high-throughput sequencing studies accumulate large amounts of genomic data. In order to go beyond the existing database annotation, it is fundamental to understand the mechanisms underlying functional inheritance and divergence. If the homology relationship between proteins is known, can we determine whether the function diverged? In this work, we analyze different possibilities of protein sequence evolution after gene duplication and identify “inter-paralog inversions”, i.e., sites where the relationship between the ancestry and the functional signal is decoupled. The amino acids in these sites are masked from being recognized by other prediction tools. Still, they play a role in functional divergence and could indicate a shift in protein function. We develop a method to specifically recognize inter-paralog amino acid inversions in a phylogeny and test it on real and simulated datasets. In a dataset built from the Epidermal Growth Factor Receptor (EGFR) sequences found in 88 fish species, we identify 19 amino acid sites that went through inversion after gene duplication, mostly located at the ligand-binding extracellular domain. Our work uncovers an outcome of protein duplications with direct implications in protein functional annotation and sequence evolution. The developed method is optimized to work with large protein datasets and can be readily included in a targeted protein analysis pipeline.

Proteins are critical components of living systems because they facilitate most biological processes like protein synthesis, DNA replication, chemical catalysis, etc. Proteins are encoded in their genes. During evolution, genes accumulate mutations that get translated at the protein level. These mutations can be “neutral” if they do not affect the protein function immediately and directly; otherwise, mutations can be functional if they directly modify protein function. An event that provides an opportunity to study protein function is gene duplication namely, when two copies of a gene encoding the same protein appear. One copy of the protein often retains the same function while the other is free to diverge and specialize to a different function. This work sheds light on an alternative outcome of gene duplication that might be critical to discern between neutral and functional mutations. By looking at 88 fish genomes, we found proteins in which the evolution of their sequences does not follow the expected pattern of divergence after gene duplication. In this case, the protein sequence of a subgroup of species diverges in the copy expected to retain its function, while the sequence is retained in the expectedly divergent one. We called this event “inter-paralog amino acid inversion”. Our data shows that this “inversion” event is correlated to function, and its detection has to be considered for assigning protein functions correctly.

Collapse

Pazos F. Computational prediction of protein functional sites-Applications in biotechnology and biomedicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022;130:39-57. [PMID: 35534114 DOI: 10.1016/bs.apcsb.2021.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Pazos F. Prediction of Protein Sites and Physicochemical Properties Related to Functional Specificity. Bioengineering (Basel) 2021;8:bioengineering8120201. [PMID: 34940354 PMCID: PMC8698372 DOI: 10.3390/bioengineering8120201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2021] [Revised: 11/25/2021] [Accepted: 11/29/2021] [Indexed: 11/16/2022] Open

Zea DJ, Laskina S, Baudin A, Richard H, Laine E. Assessing conservation of alternative splicing with evolutionary splicing graphs. Genome Res 2021;31:1462-1473. [PMID: 34266979 PMCID: PMC8327911 DOI: 10.1101/gr.274696.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Accepted: 06/11/2021] [Indexed: 12/29/2022]

Rauer C, Sen N, Waman VP, Abbasian M, Orengo CA. Computational approaches to predict protein functional families and functional sites. Curr Opin Struct Biol 2021;70:108-122. [PMID: 34225010 DOI: 10.1016/j.sbi.2021.05.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 05/13/2021] [Accepted: 05/25/2021] [Indexed: 01/06/2023]

Karakulak T, Rifaioglu AS, Rodrigues JPGLM, Karaca E. Predicting the Specificity- Determining Positions of Receptor Tyrosine Kinase Axl. Front Mol Biosci 2021;8:658906. [PMID: 34195226 PMCID: PMC8236827 DOI: 10.3389/fmolb.2021.658906] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 04/20/2021] [Indexed: 11/22/2022] Open

Fonseca NJ, Afonso MQL, Carrijo L, Bleicher L. CONAN: a web application to detect specificity determinants and functional sites by amino acids co-variation network analysis. Bioinformatics 2021;37:1026-1028. [PMID: 32780795 DOI: 10.1093/bioinformatics/btaa713] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2020] [Revised: 08/01/2020] [Accepted: 08/05/2020] [Indexed: 11/12/2022] Open

Pitarch B, Ranea JAG, Pazos F. Protein residues determining interaction specificity in paralogous families. Bioinformatics 2021;37:1076-1082. [PMID: 33135068 DOI: 10.1093/bioinformatics/btaa934] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 10/06/2020] [Accepted: 10/22/2020] [Indexed: 02/06/2023] Open

Pontes C, Ruiz-Serra V, Lepore R, Valencia A. Unraveling the molecular basis of host cell receptor usage in SARS-CoV-2 and other human pathogenic β-CoVs. Comput Struct Biotechnol J 2021;19:759-766. [PMID: 33456724 PMCID: PMC7802526 DOI: 10.1016/j.csbj.2021.01.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 01/07/2021] [Accepted: 01/07/2021] [Indexed: 01/13/2023] Open

Bradley D, Viéitez C, Rajeeve V, Selkrig J, Cutillas PR, Beltrao P. Sequence and Structure-Based Analysis of Specificity Determinants in Eukaryotic Protein Kinases. Cell Rep 2021;34:108602. [PMID: 33440154 PMCID: PMC7809594 DOI: 10.1016/j.celrep.2020.108602] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Revised: 11/03/2020] [Accepted: 12/14/2020] [Indexed: 01/04/2023] Open

Mier P, Andrade-Navarro MA. MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments. Evol Bioinform Online 2020;16:1176934320916199. [PMID: 32425492 PMCID: PMC7218316 DOI: 10.1177/1176934320916199] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Accepted: 03/10/2020] [Indexed: 11/17/2022] Open

Sergeeva AP, Katsamba PS, Cosmanescu F, Brewer JJ, Ahlsen G, Mannepalli S, Shapiro L, Honig B. DIP/Dpr interactions and the evolutionary design of specificity in protein families. Nat Commun 2020;11:2125. [PMID: 32358559 PMCID: PMC7195491 DOI: 10.1038/s41467-020-15981-8] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2019] [Accepted: 04/06/2020] [Indexed: 01/10/2023] Open

Deep Analysis of Residue Constraints (DARC): identifying determinants of protein functional specificity. Sci Rep 2020;10:1691. [PMID: 32015389 PMCID: PMC6997377 DOI: 10.1038/s41598-019-55118-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2019] [Accepted: 11/23/2019] [Indexed: 01/03/2023] Open

Malinverni D, Barducci A. Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting. ENTROPY (BASEL, SWITZERLAND) 2020;21:1127. [PMID: 32002010 PMCID: PMC6992422 DOI: 10.3390/e21111127] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 11/14/2019] [Indexed: 01/07/2023]

Alballa M, Aplop F, Butler G. TranCEP: Predicting the substrate class of transmembrane transport proteins using compositional, evolutionary, and positional information. PLoS One 2020;15:e0227683. [PMID: 31935244 PMCID: PMC6959595 DOI: 10.1371/journal.pone.0227683] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Accepted: 12/26/2019] [Indexed: 11/24/2022] Open

Karasev D, Sobolev B, Lagunin A, Filimonov D, Poroikov V. Prediction of Protein-Ligand Interaction Based on the Positional Similarity Scores Derived from Amino Acid Sequences. Int J Mol Sci 2019;21:ijms21010024. [PMID: 31861473 PMCID: PMC6981593 DOI: 10.3390/ijms21010024] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 12/13/2019] [Accepted: 12/16/2019] [Indexed: 12/14/2022] Open

Molecular mechanisms of the protein-protein interaction-regulated binding specificity of basic-region leucine zipper transcription factors. J Mol Model 2019;25:246. [PMID: 31342181 DOI: 10.1007/s00894-019-4138-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 07/14/2019] [Indexed: 10/26/2022]

Bradley D, Beltrao P. Evolution of protein kinase substrate recognition at the active site. PLoS Biol 2019;17:e3000341. [PMID: 31233486 PMCID: PMC6611643 DOI: 10.1371/journal.pbio.3000341] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Revised: 07/05/2019] [Accepted: 06/12/2019] [Indexed: 02/05/2023] Open

Gil N, Fiser A. The choice of sequence homologs included in multiple sequence alignments has a dramatic impact on evolutionary conservation analysis. Bioinformatics 2019;35:12-19. [PMID: 29947739 PMCID: PMC6298051 DOI: 10.1093/bioinformatics/bty523] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2018] [Revised: 04/20/2018] [Accepted: 06/26/2018] [Indexed: 11/12/2022] Open

Abstract

Motivation

The analysis of sequence conservation patterns has been widely utilized to identify functionally important (catalytic and ligand-binding) protein residues for over a half-century. Despite decades of development, on average state-of-the-art non-template-based functional residue prediction methods must predict ∼25% of a protein's total residues to correctly identify half of the protein's functional site residues. The overwhelming proportion of false positives results in reported 'F-Scores' of ∼0.3. We investigated the limits of current approaches, focusing on the so-far neglected impact of the specific choice of homologs included in multiple sequence alignments (MSAs).

Results

The limits of conservation-based functional residue prediction were explored by surveying the binding sites of 1023 proteins. A straightforward conservation analysis of MSAs composed of randomly selected homologs sampled from a PSI-BLAST search achieves average F-Scores of ∼0.3, a performance matching that reported by state-of-the-art methods, which often consider additional features for the prediction in a machine learning setting. Interestingly, we found that a simple combinatorial MSA sampling algorithm will in almost every case produce an MSA with an optimal set of homologs whose conservation analysis reaches average F-Scores of ∼0.6, doubling state-of-the-art performance. We also show that this is nearly at the theoretical limit of possible performance given the agreement between different binding site definitions. Additionally, we showcase the progress in this direction made by Selection of Alignment by Maximal Mutual Information (SAMMI), an information-theory-based approach to identifying biologically informative MSAs. This work highlights the importance and the unused potential of optimally composed MSAs for conservation analysis.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Pazos F, Garcia-Moreno A, Oliveros JC. Automatic detection of genomic regions with informative epigenetic patterns. BMC Genomics 2018;19:847. [PMID: 30486775 PMCID: PMC6264639 DOI: 10.1186/s12864-018-5286-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 11/20/2018] [Indexed: 12/14/2022] Open

da Fonseca NJ, Afonso MQL, de Oliveira LC, Bleicher L. A new method bridging graph theory and residue co-evolutionary networks for specificity determinant positions detection. Bioinformatics 2018;35:1478-1485. [DOI: 10.1093/bioinformatics/bty846] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Revised: 09/11/2018] [Accepted: 10/04/2018] [Indexed: 12/22/2022] Open

Kress A, Lecompte O, Poch O, Thompson JD. PROBE: analysis and visualization of protein block-level evolution. Bioinformatics 2018;34:3390-3392. [DOI: 10.1093/bioinformatics/bty367] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2017] [Accepted: 05/04/2018] [Indexed: 11/13/2022] Open

Fonseca-Júnior NJ, Afonso MQ, Oliveira LC, Bleicher L. PFstats: A Network-Based Open Tool for Protein Family Analysis. J Comput Biol 2018;25:480-486. [DOI: 10.1089/cmb.2017.0181] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Lajkó DB, Valkai I, Domoki M, Ménesi D, Ferenc G, Ayaydin F, Fehér A. In silico identification and experimental validation of amino acid motifs required for the Rho-of-plants GTPase-mediated activation of receptor-like cytoplasmic kinases. PLANT CELL REPORTS 2018;37:627-639. [PMID: 29340786 DOI: 10.1007/s00299-018-2256-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 01/08/2018] [Indexed: 06/07/2023]

Garrido-Martín D, Pazos F. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues. BMC Bioinformatics 2018;19:67. [PMID: 29482506 PMCID: PMC5827975 DOI: 10.1186/s12859-018-2084-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 02/21/2018] [Indexed: 11/10/2022] Open

Brown T, Brown N, Stollar EJ. Most yeast SH3 domains bind peptide targets with high intrinsic specificity. PLoS One 2018;13:e0193128. [PMID: 29470497 PMCID: PMC5823434 DOI: 10.1371/journal.pone.0193128] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Accepted: 02/04/2018] [Indexed: 01/07/2023] Open

Neuwald AF, Aravind L, Altschul SF. Inferring joint sequence-structural determinants of protein functional specificity. eLife 2018;7. [PMID: 29336305 PMCID: PMC5770160 DOI: 10.7554/elife.29880] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Accepted: 12/22/2017] [Indexed: 01/05/2023] Open

Sánchez-Gracia A, Guirao-Rico S, Hinojosa-Alvarez S, Rozas J. Computational prediction of the phenotypic effects of genetic variants: basic concepts and some application examples in Drosophila nervous system genes. J Neurogenet 2017;31:307-319. [DOI: 10.1080/01677063.2017.1398241] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Effective estimation of the minimum number of amino acid residues required for functional divergence between duplicate genes. Mol Phylogenet Evol 2017;113:126-138. [DOI: 10.1016/j.ympev.2017.05.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2017] [Revised: 03/19/2017] [Accepted: 05/10/2017] [Indexed: 01/10/2023]

Swint-Kruse L. Using Evolution to Guide Protein Engineering: The Devil IS in the Details. Biophys J 2017;111:10-8. [PMID: 27410729 DOI: 10.1016/j.bpj.2016.05.030] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Revised: 04/18/2016] [Accepted: 05/20/2016] [Indexed: 10/21/2022] Open

Medvedev KE, Kolchanov NA, Afonnikov DA. Identification of residues of the archaeal RNA-binding Nip7 proteins specific to environmental conditions. J Bioinform Comput Biol 2017;15:1650036. [DOI: 10.1142/s0219720016500360] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Neuwald AF, Altschul SF. Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations. PLoS Comput Biol 2016;12:e1005294. [PMID: 28002465 PMCID: PMC5225019 DOI: 10.1371/journal.pcbi.1005294] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2016] [Revised: 01/10/2017] [Accepted: 12/08/2016] [Indexed: 11/25/2022] Open

Abstract

Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes’ theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).

Protein sequence data, when gathered in great quantity, contain important but implicit biological information manifest as statistical correlations. Here we describe an approach to access this information by comprehensively modeling and characterizing the distribution of sequences belonging to a major protein superfamily. This approach takes as input a large set of unaligned sequences belonging to the superfamily. By applying the minimum description length principle, it seeks the statistical model that best explains the sequences while avoiding over-fitting the data. It concurrently aligns the sequences and, to model evolutionary divergence, partitions them into subgroups that are hierarchically-arranged based upon correlated residue patterns. Auxiliary routines create PyMOL scripts to visualize the locations of correlated residues within available structures. Because these correlations likely arise from structural and biochemical constraints, they can help elucidate protein properties important for functional specificity. Comparing and contrasting sequence and structural features in this way may therefore suggest, in the light of published studies, plausible biological hypotheses for experimental investigation. We illustrate this approach with N-acetyltransferases.

Collapse

Blagus R, Goeman JJ. What (not) to expect when classifying rare events. Brief Bioinform 2016;19:341-349. [DOI: 10.1093/bib/bbw107] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Indexed: 01/23/2023] Open

A Bioinformatics Analysis Reveals a Group of MocR Bacterial Transcriptional Regulators Linked to a Family of Genes Coding for Membrane Proteins. Biochem Res Int 2016;2016:4360285. [PMID: 27446613 PMCID: PMC4944035 DOI: 10.1155/2016/4360285] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2015] [Accepted: 05/26/2016] [Indexed: 01/30/2023] Open

Schwarz RF, Tamuri AU, Kultys M, King J, Godwin J, Florescu AM, Schultz J, Goldman N. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments. Nucleic Acids Res 2016;44:e77. [PMID: 26819408 PMCID: PMC4856975 DOI: 10.1093/nar/gkw022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 01/08/2016] [Indexed: 12/19/2022] Open

Das S, Dawson NL, Orengo CA. Diversity in protein domain superfamilies. Curr Opin Genet Dev 2015;35:40-9. [PMID: 26451979 PMCID: PMC4686048 DOI: 10.1016/j.gde.2015.09.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 09/07/2015] [Accepted: 09/08/2015] [Indexed: 01/25/2023]

Chagoyen M, García-Martín JA, Pazos F. Practical analysis of specificity-determining residues in protein families. Brief Bioinform 2015;17:255-61. [DOI: 10.1093/bib/bbv045] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 06/15/2015] [Indexed: 12/17/2022] Open

Das S, Lee D, Sillitoe I, Dawson NL, Lees JG, Orengo CA. Functional classification of CATH superfamilies: a domain-based approach for protein function annotation. Bioinformatics 2015;31:3460-7. [PMID: 26139634 PMCID: PMC4612221 DOI: 10.1093/bioinformatics/btv398] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Accepted: 06/24/2015] [Indexed: 11/18/2022] Open

Tiwari P, Singh N, Dixit A, Choudhury D. Multivariate sequence analysis reveals additional function impacting residues in the SDR superfamily. Proteins 2014;82:2842-56. [DOI: 10.1002/prot.24647] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2014] [Revised: 06/19/2014] [Accepted: 07/15/2014] [Indexed: 11/08/2022]