Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Chang MSS, Benner SA. Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J Mol Biol 2004;341:617-31. [PMID: 15276848 DOI: 10.1016/j.jmb.2004.05.045] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2003] [Revised: 05/17/2004] [Accepted: 05/24/2004] [Indexed: 10/26/2022]

For:	Chang MSS, Benner SA. Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J Mol Biol 2004;341:617-31. [PMID: 15276848 DOI: 10.1016/j.jmb.2004.05.045] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2003] [Revised: 05/17/2004] [Accepted: 05/24/2004] [Indexed: 10/26/2022]

Number

Cited by Other Article(s)

Truong A, Myerscough D, Campbell I, Atkinson J, Silberg JJ. A cellular selection identifies elongated flavodoxins that support electron transfer to sulfite reductase. Protein Sci 2023;32:e4746. [PMID: 37551563 PMCID: PMC10503412 DOI: 10.1002/pro.4746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 07/17/2023] [Accepted: 08/04/2023] [Indexed: 08/09/2023]

Sumanaweera D, Allison L, Konagurthu AS. Bridging the gaps in statistical models of protein alignment. Bioinformatics 2022;38:i229-i237. [PMID: 35758809 PMCID: PMC9235498 DOI: 10.1093/bioinformatics/btac246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Using the Evolutionary History of Proteins to Engineer Insertion-Deletion Mutants from Robust, Ancestral Templates Using Graphical Representation of Ancestral Sequence Predictions (GRASP). METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022;2397:85-110. [PMID: 34813061 DOI: 10.1007/978-1-0716-1826-4_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Dasmeh P, Doronin R, Wagner A. The length scale of multivalent interactions is evolutionarily conserved in fungal and vertebrate phase-separating proteins. Genetics 2022;220:iyab184. [PMID: 34791214 PMCID: PMC8733453 DOI: 10.1093/genetics/iyab184] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Accepted: 10/06/2021] [Indexed: 11/14/2022] Open

Loewenthal G, Rapoport D, Avram O, Moshe A, Wygoda E, Itzkovitch A, Israeli O, Azouri D, Cartwright RA, Mayrose I, Pupko T. A probabilistic model for indel evolution: differentiating insertions from deletions. Mol Biol Evol 2021;38:5769-5781. [PMID: 34469521 PMCID: PMC8662616 DOI: 10.1093/molbev/msab266] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Aadland K, Kolaczkowski B. Alignment-Integrated Reconstruction of Ancestral Sequences Improves Accuracy. Genome Biol Evol 2021;12:1549-1565. [PMID: 32785673 PMCID: PMC7523730 DOI: 10.1093/gbe/evaa164] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/03/2020] [Indexed: 12/31/2022] Open

Abstract

Ancestral sequence reconstruction (ASR) uses an alignment of extant protein sequences, a phylogeny describing the history of the protein family and a model of the molecular-evolutionary process to infer the sequences of ancient proteins, allowing researchers to directly investigate the impact of sequence evolution on protein structure and function. Like all statistical inferences, ASR can be sensitive to violations of its underlying assumptions. Previous studies have shown that, whereas phylogenetic uncertainty has only a very weak impact on ASR accuracy, uncertainty in the protein sequence alignment can more strongly affect inferred ancestral sequences. Here, we show that errors in sequence alignment can produce errors in ASR across a range of realistic and simplified evolutionary scenarios. Importantly, sequence reconstruction errors can lead to errors in estimates of structural and functional properties of ancestral proteins, potentially undermining the reliability of analyses relying on ASR. We introduce an alignment-integrated ASR approach that combines information from many different sequence alignments. We show that integrating alignment uncertainty improves ASR accuracy and the accuracy of downstream structural and functional inferences, often performing as well as highly accurate structure-guided alignment. Given the growing evidence that sequence alignment errors can impact the reliability of ASR studies, we recommend that future studies incorporate approaches to mitigate the impact of alignment uncertainty. Probabilistic modeling of insertion and deletion events has the potential to radically improve ASR accuracy when the model reflects the true underlying evolutionary history, but further studies are required to thoroughly evaluate the reliability of these approaches under realistic conditions.

Collapse

Ferguson AL, Ranganathan R. 100th Anniversary of Macromolecular Science Viewpoint: Data-Driven Protein Design. ACS Macro Lett 2021;10:327-340. [PMID: 35549066 DOI: 10.1021/acsmacrolett.0c00885] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Selberg AGA, Gaucher EA, Liberles DA. Ancestral Sequence Reconstruction: From Chemical Paleogenetics to Maximum Likelihood Algorithms and Beyond. J Mol Evol 2021;89:157-164. [PMID: 33486547 PMCID: PMC7828096 DOI: 10.1007/s00239-021-09993-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2020] [Accepted: 01/04/2021] [Indexed: 12/13/2022]

Kuitche E, Jammali S, Ouangraoua A. SimSpliceEvol: alternative splicing-aware simulation of biological sequence evolution. BMC Bioinformatics 2019;20:640. [PMID: 31842741 PMCID: PMC6916212 DOI: 10.1186/s12859-019-3207-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open

Zhang Z, Wang J, Gong Y, Li Y. Contributions of substitutions and indels to the structural variations in ancient protein superfamilies. BMC Genomics 2018;19:771. [PMID: 30355304 PMCID: PMC6201574 DOI: 10.1186/s12864-018-5178-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 10/16/2018] [Indexed: 11/10/2022] Open

Abstract

Background

Quantitative evaluation of protein structural evolution is important for our understanding of protein biological functions and their evolutionary adaptation, and is useful in guiding protein engineering. However, compared to the models for sequence evolution, the quantitative models for protein structural evolution received less attention. Ancient protein superfamilies are often considered versatile, allowing genetic and functional diversifications during long-term evolution. In this study, we investigated the quantitative impacts of sequence variations on the structural evolution of homologues in 68 ancient protein superfamilies that exist widely in sequenced eukaryotic, bacterial and archaeal genomes.

Results

We found that the accumulated structural variations within ancient superfamilies could be explained largely by a bilinear model that simultaneously considers amino acid substitution and insertion/deletion (indel). Both substitutions and indels are essential for explaining the structural variations within ancient superfamilies. For those ancient superfamilies with high bilinear multiple correlation coefficients, the influence of each unit of substitution or indel on structural variations is almost constant within each superfamily, but varies greatly among different superfamilies. The influence of each unit indel on structural variations is always larger than that of each unit substitution within each superfamily, but the accumulated contributions of indels to structural variations are lower than those of substitutions in most superfamilies. The total contributions of sequence indels and substitutions (46% and 54%, respectively) to the structural variations that result from sequence variations are slightly different in ancient superfamilies.

Conclusions

Structural variations within ancient protein superfamilies accumulated under the significantly bilinear influence of amino acid substitutions and indels in sequences. Both substitutions and indels are essential for explaining the structural variations within ancient superfamilies. For those structural variations resulting from sequence variations, the total contribution of indels is slightly lower than that of amino acid substitutions. The regular clock exists not only in protein sequences, but also probably in protein structures.

Electronic supplementary material

The online version of this article (10.1186/s12864-018-5178-8) contains supplementary material, which is available to authorized users.

Collapse

Levy Karin E, Ashkenazy H, Hein J, Pupko T. A Simulation-Based Approach to Statistical Alignment. Syst Biol 2018;68:252-266. [DOI: 10.1093/sysbio/syy059] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2018] [Accepted: 09/10/2018] [Indexed: 12/26/2022] Open

Levy Karin E, Shkedy D, Ashkenazy H, Cartwright RA, Pupko T. Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation. Genome Biol Evol 2018;9:1280-1294. [PMID: 28453624 PMCID: PMC5438127 DOI: 10.1093/gbe/evx084] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2017] [Indexed: 02/07/2023] Open

Holmes IH. Solving the master equation for Indels. BMC Bioinformatics 2017;18:255. [PMID: 28494756 PMCID: PMC5427538 DOI: 10.1186/s12859-017-1665-1] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Accepted: 04/30/2017] [Indexed: 01/09/2023] Open

Internal epitope tagging informed by relative lack of sequence conservation. Sci Rep 2016;6:36986. [PMID: 27892520 PMCID: PMC5125009 DOI: 10.1038/srep36986] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2016] [Accepted: 10/20/2016] [Indexed: 01/03/2023] Open

Levy Karin E, Rabin A, Ashkenazy H, Shkedy D, Avram O, Cartwright RA, Pupko T. Inferring Indel Parameters using a Simulation-based Approach. Genome Biol Evol 2015;7:3226-38. [PMID: 26537226 PMCID: PMC4700945 DOI: 10.1093/gbe/evv212] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Wright ES. DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment. BMC Bioinformatics 2015;16:322. [PMID: 26445311 PMCID: PMC4595117 DOI: 10.1186/s12859-015-0749-z] [Citation(s) in RCA: 206] [Impact Index Per Article: 22.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 09/23/2015] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Alignment of large and diverse sequence sets is a common task in biological investigations, yet there remains considerable room for improvement in alignment quality. Multiple sequence alignment programs tend to reach maximal accuracy when aligning only a few sequences, and then diminish steadily as more sequences are added. This drop in accuracy can be partly attributed to a build-up of error and ambiguity as more sequences are aligned. Most high-throughput sequence alignment algorithms do not use contextual information under the assumption that sites are independent. This study examines the extent to which local sequence context can be exploited to improve the quality of large multiple sequence alignments.

RESULTS

Two predictors based on local sequence context were assessed: (i) single sequence secondary structure predictions, and (ii) modulation of gap costs according to the surrounding residues. The results indicate that context-based predictors have appreciable information content that can be utilized to create more accurate alignments. Furthermore, local context becomes more informative as the number of sequences increases, enabling more accurate protein alignments of large empirical benchmarks. These discoveries became the basis for DECIPHER, a new context-aware program for sequence alignment, which outperformed other programs on large sequence sets.

CONCLUSIONS

Predicting secondary structure based on local sequence context is an efficient means of breaking the independence assumption in alignment. Since secondary structure is more conserved than primary sequence, it can be leveraged to improve the alignment of distantly related proteins. Moreover, secondary structure predictions increase in accuracy as more sequences are used in the prediction. This enables the scalable generation of large sequence alignments that maintain high accuracy even on diverse sequence sets. The DECIPHER R package and source code are freely available for download at DECIPHER.cee.wisc.edu and from the Bioconductor repository.

Collapse

Reply to Tan et al.: Differences between real and simulated proteins in multiple sequence alignments. Proc Natl Acad Sci U S A 2015;112:E101. [PMID: 25564671 DOI: 10.1073/pnas.1419351112] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Peng B. Reproducible simulations of realistic samples for next-generation sequencing studies using Variant Simulation Tools. Genet Epidemiol 2015;39:45-52. [PMID: 25395236 PMCID: PMC6432799 DOI: 10.1002/gepi.21867] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Revised: 09/14/2014] [Accepted: 09/26/2014] [Indexed: 12/31/2022]

Lechner M, Hernandez-Rosales M, Doerr D, Wieseke N, Thévenin A, Stoye J, Hartmann RK, Prohaska SJ, Stadler PF. Orthology detection combining clustering and synteny for very large datasets. PLoS One 2014;9:e105015. [PMID: 25137074 PMCID: PMC4138177 DOI: 10.1371/journal.pone.0105015] [Citation(s) in RCA: 73] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Accepted: 07/14/2014] [Indexed: 11/18/2022] Open

Affiliation(s)

Marcus Lechner Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, Marburg, Germany * E-mail:
Maribel Hernandez-Rosales Bioinformatics Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Departamento de Ciência da Computação, Instituto de Ciências Exatas, Universidade de Brasília, Brasília, Brasil
Daniel Doerr Genome Informatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
Nicolas Wieseke Faculty of Mathematics and Computer Science University of Leipzig, Leipzig, Germany
Annelyse Thévenin Genome Informatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
Jens Stoye Genome Informatics, Faculty of Technology, Bielefeld University, Bielefeld, Germany Institute for Bioinformatics, Center for Biotechnology, Bielefeld University, Bielefeld, Germany
Roland K. Hartmann Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, Marburg, Germany
Sonja J. Prohaska Computational EvoDevo Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany
Peter F. Stadler Bioinformatics Group, Department of Computer Science, Universität Leipzig, Leipzig, Germany Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Institute for Theoretical Chemistry, University of Vienna, Vienna, Austria Center for non-coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark The Santa Fe Institute, Santa Fe, New Mexico, United States of America RNomics Group, Fraunhofer Institut for Cell Therapy and Immunology, Leipzig, Germany

Collapse

Malleshappa Gowder S, Chatterjee J, Chaudhuri T, Paul K. Prediction and analysis of surface hydrophobic residues in tertiary structure of proteins. ScientificWorldJournal 2014;2014:971258. [PMID: 24672404 PMCID: PMC3930195 DOI: 10.1155/2014/971258] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2013] [Accepted: 10/17/2013] [Indexed: 11/17/2022] Open

SIFT Indel: predictions for the functional effects of amino acid insertions/deletions in proteins. PLoS One 2013;8:e77940. [PMID: 24194902 PMCID: PMC3806772 DOI: 10.1371/journal.pone.0077940] [Citation(s) in RCA: 94] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 09/05/2013] [Indexed: 12/02/2022] Open

Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013;449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]

Warnow T. Large-Scale Multiple Sequence Alignment and Phylogeny Estimation. MODELS AND ALGORITHMS FOR GENOME EVOLUTION 2013. [DOI: 10.1007/978-1-4471-5298-9_6] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Varón A, Wheeler WC. The tree alignment problem. BMC Bioinformatics 2012;13:293. [PMID: 23140486 PMCID: PMC3605350 DOI: 10.1186/1471-2105-13-293] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2012] [Accepted: 10/22/2012] [Indexed: 11/28/2022] Open

Abstract

BACKGROUND

The inference of homologies among DNA sequences, that is, positions in multiple genomes that share a common evolutionary origin, is a crucial, yet difficult task facing biologists. Its computational counterpart is known as the multiple sequence alignment problem. There are various criteria and methods available to perform multiple sequence alignments, and among these, the minimization of the overall cost of the alignment on a phylogenetic tree is known in combinatorial optimization as the Tree Alignment Problem. This problem typically occurs as a subproblem of the Generalized Tree Alignment Problem, which looks for the tree with the lowest alignment cost among all possible trees. This is equivalent to the Maximum Parsimony problem when the input sequences are not aligned, that is, when phylogeny and alignments are simultaneously inferred.

RESULTS

For large data sets, a popular heuristic is Direct Optimization (DO). DO provides a good tradeoff between speed, scalability, and competitive scores, and is implemented in the computer program POY. All other (competitive) algorithms have greater time complexities compared to DO. Here, we introduce and present experiments a new algorithm Affine-DO to accommodate the indel (alignment gap) models commonly used in phylogenetic analysis of molecular sequence data. Affine-DO has the same time complexity as DO, but is correctly suited for the affine gap edit distance. We demonstrate its performance with more than 330,000 experimental tests. These experiments show that the solutions of Affine-DO are close to the lower bound inferred from a linear programming solution. Moreover, iterating over a solution produced using Affine-DO shows little improvement.

CONCLUSIONS

Our results show that Affine-DO is likely producing near-optimal solutions, with approximations within 10% for sequences with small divergence, and within 30% for random sequences, for which Affine-DO produced the worst solutions. The Affine-DO algorithm has the necessary scalability and optimality to be a significant improvement in the real-world phylogenetic analysis of sequence data.

Collapse

Schaper E, Kajava AV, Hauser A, Anisimova M. Repeat or not repeat?--Statistical validation of tandem repeat prediction in genomic sequences. Nucleic Acids Res 2012;40:10005-17. [PMID: 22923522 PMCID: PMC3488214 DOI: 10.1093/nar/gks726] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open

Carrigan MA, Uryasev O, Davis RP, Zhai L, Hurley TD, Benner SA. The natural history of class I primate alcohol dehydrogenases includes gene duplication, gene loss, and gene conversion. PLoS One 2012;7:e41175. [PMID: 22859968 PMCID: PMC3409193 DOI: 10.1371/journal.pone.0041175] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2012] [Accepted: 06/18/2012] [Indexed: 01/29/2023] Open

Abstract

BACKGROUND

Gene duplication is a source of molecular innovation throughout evolution. However, even with massive amounts of genome sequence data, correlating gene duplication with speciation and other events in natural history can be difficult. This is especially true in its most interesting cases, where rapid and multiple duplications are likely to reflect adaptation to rapidly changing environments and life styles. This may be so for Class I of alcohol dehydrogenases (ADH1s), where multiple duplications occurred in primate lineages in Old and New World monkeys (OWMs and NWMs) and hominoids.

METHODOLOGY/PRINCIPAL FINDINGS

To build a preferred model for the natural history of ADH1s, we determined the sequences of nine new ADH1 genes, finding for the first time multiple paralogs in various prosimians (lemurs, strepsirhines). Database mining then identified novel ADH1 paralogs in both macaque (an OWM) and marmoset (a NWM). These were used with the previously identified human paralogs to resolve controversies relating to dates of duplication and gene conversion in the ADH1 family. Central to these controversies are differences in the topologies of trees generated from exonic (coding) sequences and intronic sequences.

CONCLUSIONS/SIGNIFICANCE

We provide evidence that gene conversions are the primary source of difference, using molecular clock dating of duplications and analyses of microinsertions and deletions (micro-indels). The tree topology inferred from intron sequences appear to more correctly represent the natural history of ADH1s, with the ADH1 paralogs in platyrrhines (NWMs) and catarrhines (OWMs and hominoids) having arisen by duplications shortly predating the divergence of OWMs and NWMs. We also conclude that paralogs in lemurs arose independently. Finally, we identify errors in database interpretation as the source of controversies concerning gene conversion. These analyses provide a model for the natural history of ADH1s that posits four ADH1 paralogs in the ancestor of Catarrhine and Platyrrhine primates, followed by the loss of an ADH1 paralog in the human lineage.

Collapse

Joseph AP, Valadié H, Srinivasan N, de Brevern AG. Local structural differences in homologous proteins: specificities in different SCOP classes. PLoS One 2012;7:e38805. [PMID: 22745680 PMCID: PMC3382195 DOI: 10.1371/journal.pone.0038805] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 05/10/2012] [Indexed: 11/19/2022] Open

Abstract

The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions.

Collapse

Liberles DA, Teichmann SA, Bahar I, Bastolla U, Bloom J, Bornberg-Bauer E, Colwell LJ, de Koning APJ, Dokholyan NV, Echave J, Elofsson A, Gerloff DL, Goldstein RA, Grahnen JA, Holder MT, Lakner C, Lartillot N, Lovell SC, Naylor G, Perica T, Pollock DD, Pupko T, Regan L, Roger A, Rubinstein N, Shakhnovich E, Sjölander K, Sunyaev S, Teufel AI, Thorne JL, Thornton JW, Weinreich DM, Whelan S. The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci 2012;21:769-85. [PMID: 22528593 PMCID: PMC3403413 DOI: 10.1002/pro.2071] [Citation(s) in RCA: 149] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Revised: 03/22/2012] [Accepted: 03/23/2012] [Indexed: 12/20/2022]

Affiliation(s)

David A Liberles Department of Molecular Biology, University of WyomingLaramie, Wyoming 82071
Sarah A Teichmann MRC Laboratory of Molecular BiologyHills Road, Cambridge CB2 0QH, United Kingdom
Ivet Bahar Department of Computational and Systems Biology, School of Medicine, University of PittsburghPittsburgh, Pennsylvania 15213
Ugo Bastolla Bioinformatics Unit. Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autonoma de Madrid28049 Cantoblanco Madrid, Spain
Jesse Bloom Division of Basic Sciences, Fred Hutchinson Cancer Research CenterSeattle, Washington 98109
Erich Bornberg-Bauer Evolutionary Bioinformatics Group, Institute for Evolution and Biodiversity, University of MuensterGermany
Lucy J Colwell MRC Laboratory of Molecular BiologyHills Road, Cambridge CB2 0QH, United Kingdom
A P Jason de Koning Department of Biochemistry and Molecular Genetics, School of Medicine, University of ColoradoAurora, Colorado
Nikolay V Dokholyan Department of Biochemistry and Biophysics, University of North Carolina at Chapel HillNorth Carolina 27599
Julian Echave Escuela de Ciencia y Tecnología, Universidad Nacional de San MartínMartín de Irigoyen 3100, 1650 San Martín, Buenos Aires, Argentina
Arne Elofsson Department of Biochemistry and Biophysics, Center for Biomembrane Research, Stockholm Bioinformatics Center, Science for Life Laboratory, Swedish E-science Research Center, Stockholm University106 91 Stockholm, Sweden
Dietlind L Gerloff Biomolecular Engineering Department, University of CaliforniaSanta Cruz, California 95064
Richard A Goldstein Division of Mathematical Biology, National Institute for Medical Research (MRC)Mill Hill, London NW7 1AA, United Kingdom
Johan A Grahnen Department of Molecular Biology, University of WyomingLaramie, Wyoming 82071
Mark T Holder Department of Ecology and Evolutionary Biology, University of KansasLawrence, Kansas 66045
Clemens Lakner Bioinformatics Research Center, North Carolina State UniversityRaleigh, North Carolina 27695
Nicholas Lartillot Département de Biochimie, Faculté de Médecine, Université de MontréalMontréal, QC H3T1J4, Canada
Simon C Lovell Faculty of Life Sciences, University of ManchesterManchester M13 9PT, United Kingdom
Gavin Naylor Department of Biology, College of CharlestonCharleston, South Carolina 29424
Tina Perica MRC Laboratory of Molecular BiologyHills Road, Cambridge CB2 0QH, United Kingdom
David D Pollock Department of Biochemistry and Molecular Genetics, School of Medicine, University of ColoradoAurora, Colorado
Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv UniversityTel Aviv, Israel
Lynne Regan Department of Molecular Biophysics and Biochemistry, Yale UniversityNew Haven 06511
Andrew Roger Department of Biochemistry and Molecular Biology, Dalhousie UniversityHalifax, NS, Canada
Nimrod Rubinstein Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv UniversityTel Aviv, Israel
Eugene Shakhnovich Department of Chemistry and Chemical Biology, Harvard UniversityCambridge, Massachusetts 02138
Kimmen Sjölander Department of Bioengineering, University of CaliforniaBerkeley, Berkeley, California 94720
Shamil Sunyaev Division of Genetics, Brigham and Women's Hospital, Harvard Medical School77 Avenue Louis Pasteur, Boston, Massachusetts 02115
Ashley I Teufel Department of Molecular Biology, University of WyomingLaramie, Wyoming 82071
Jeffrey L Thorne Bioinformatics Research Center, North Carolina State UniversityRaleigh, North Carolina 27695
Joseph W Thornton Howard Hughes Medical Institute and Institute for Ecology and Evolution, University of OregonEugene, Oregon 97403 Department of Human Genetics, University of ChicagoChicago, Illinois 60637 Department of Ecology and Evolution, University of ChicagoChicago, Illinois 60637
Daniel M Weinreich Department of Ecology and Evolutionary Biology, and Center for Computational Molecular Biology, Brown UniversityProvidence, Rhode Island 02912
Simon Whelan Faculty of Life Sciences, University of ManchesterManchester M13 9PT, United Kingdom

Collapse

Koestler T, von Haeseler A, Ebersberger I. REvolver: modeling sequence evolution under domain constraints. Mol Biol Evol 2012;29:2133-45. [PMID: 22383532 DOI: 10.1093/molbev/mss078] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open

Abstract

Simulating the change of protein sequences over time in a biologically realistic way is fundamental for a broad range of studies with a focus on evolution. It is, thus, problematic that typically simulators evolve individual sites of a sequence identically and independently. More realistic simulations are possible; however, they are often prohibited by limited knowledge concerning site-specific evolutionary constraints or functional dependencies between amino acids. As a consequence, a protein's functional and structural characteristics are rapidly lost in the course of simulated evolution. Here, we present REvolver (www.cibiv.at/software/revolver), a program that simulates protein sequence alteration such that evolutionarily stable sequence characteristics, like functional domains, are maintained. For this purpose, REvolver recruits profile hidden Markov models (pHMMs) for parameterizing site-specific models of sequence evolution in an automated fashion. pHMMs derived from alignments of homologous proteins or protein domains capture information regarding which sequence sites remained conserved over time and where in a sequence insertions or deletions are more likely to occur. Thus, they describe constraints on the evolutionary process acting on these sequences. To demonstrate the performance of REvolver as well as its applicability in large-scale simulation studies, we evolved the entire human proteome up to 1.5 expected substitutions per site. Simultaneously, we analyzed the preservation of Pfam and SMART domains in the simulated sequences over time. REvolver preserved 92% of the Pfam domains originally present in the human sequences. This value drops to 15% when traditional models of amino acid sequence evolution are used. Thus, REvolver represents a significant advance toward a realistic simulation of protein sequence evolution on a proteome-wide scale. Further, REvolver facilitates the simulation of a protein family with a user-defined domain architecture at the root.

Collapse

Dalquen DA, Anisimova M, Gonnet GH, Dessimoz C. ALF--a simulation framework for genome evolution. Mol Biol Evol 2011;29:1115-23. [PMID: 22160766 PMCID: PMC3341827 DOI: 10.1093/molbev/msr268] [Citation(s) in RCA: 111] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open

Wang C, Yan RX, Wang XF, Si JN, Zhang Z. Comparison of linear gap penalties and profile-based variable gap penalties in profile–profile alignments. Comput Biol Chem 2011;35:308-18. [DOI: 10.1016/j.compbiolchem.2011.07.006] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2011] [Revised: 05/06/2011] [Accepted: 07/11/2011] [Indexed: 10/18/2022]

Kamneva OK, Liberles DA, Ward NL. Genome-wide influence of indel Substitutions on evolution of bacteria of the PVC superphylum, revealed using a novel computational method. Genome Biol Evol 2010;2:870-86. [PMID: 21048002 PMCID: PMC3000692 DOI: 10.1093/gbe/evq071] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Kim R, Guo JT. Systematic analysis of short internal indels and their impact on protein folding. BMC STRUCTURAL BIOLOGY 2010;10:24. [PMID: 20684774 PMCID: PMC2924343 DOI: 10.1186/1472-6807-10-24] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2010] [Accepted: 08/04/2010] [Indexed: 12/03/2022]

Abstract

Background

Protein sequence insertions/deletions (indels) can be introduced during evolution or through alternative splicing (AS). Alternative splicing is an important biological phenomenon and is considered as the major means of expanding structural and functional diversity in eukaryotes. Knowledge of the structural changes due to indels is critical to our understanding of the evolution of protein structure and function. In addition, it can help us probe the evolution of alternative splicing and the diversity of functional isoforms. However, little is known about the effects of indels, in particular the ones involving core secondary structures, on the folding of protein structures. The long term goal of our study is to accurately predict the protein AS isoform structures. As a first step towards this goal, we performed a systematic analysis on the structural changes caused by short internal indels through mining highly homologous proteins in Protein Data Bank (PDB).

Results

We compiled a non-redundant dataset of short internal indels (2-40 amino acids) from highly homologous protein pairs and analyzed the sequence and structural features of the indels. We found that about one third of indel residues are in disordered state and majority of the residues are exposed to solvent, suggesting that these indels are generally located on the surface of proteins. Though naturally occurring indels are fewer than engineered ones in the dataset, there are no statistically significant differences in terms of amino acid frequencies and secondary structure types between the "Natural" indels and "All" indels in the dataset. Structural comparisons show that all the protein pairs with short internal indels in the dataset preserve the structural folds and about 85% of protein pairs have global RMSDs (root mean square deviations) of 2Å or less, suggesting that protein structures tend to be conserved and can tolerate short insertions and deletions. A few pairs with high RMSDs are results of relative domain positions of the proteins, probably due to the intrinsically dynamic nature of the proteins.

Conclusions

The analysis demonstrated that protein structures have the "plasticity" to tolerate short indels. This study can provide valuable guides in modeling protein AS isoform structures and homologous proteins with indels through placing the indels at the right locations since the accuracy of sequence alignments dictate model qualities in homology modeling.

Collapse

Zhang Z, Huang J, Wang Z, Wang L, Gao P. Impact of indels on the flanking regions in structural domains. Mol Biol Evol 2010;28:291-301. [PMID: 20671041 DOI: 10.1093/molbev/msq196] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Schönhuth A, Salari R, Hormozdiari F, Cherkasov A, Cenk Sahinalp S. Towards Improved Assessment of Functional Similarity in Large-Scale Screens: A Study on Indel Length. J Comput Biol 2010;17:1-20. [DOI: 10.1089/cmb.2009.0031] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Zhang J, Xiao L, Yin Y, Sirois P, Gao H, Li K. A law of mutation: power decay of small insertions and small deletions associated with human diseases. Appl Biochem Biotechnol 2009;162:321-8. [PMID: 19816659 DOI: 10.1007/s12010-009-8793-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2009] [Accepted: 09/24/2009] [Indexed: 11/28/2022]

Strope CL, Abel K, Scott SD, Moriyama EN. Biological sequence simulation for testing complex evolutionary hypotheses: indel-Seq-Gen version 2.0. Mol Biol Evol 2009;26:2581-93. [PMID: 19651852 PMCID: PMC2760465 DOI: 10.1093/molbev/msp174] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Fletcher W, Yang Z. INDELible: a flexible simulator of biological sequence evolution. Mol Biol Evol 2009;26:1879-88. [PMID: 19423664 PMCID: PMC2712615 DOI: 10.1093/molbev/msp098] [Citation(s) in RCA: 319] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

An indel in transmembrane helix 2 helps to trace the molecular evolution of class A G-protein-coupled receptors. J Mol Evol 2009;68:475-89. [PMID: 19357801 DOI: 10.1007/s00239-009-9214-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2008] [Revised: 02/05/2009] [Accepted: 02/16/2009] [Indexed: 10/25/2022]

Hormozdiari F, Salari R, Hsing M, Schönhuth A, Chan SK, Sahinalp SC, Cherkasov A. The Effect of Insertions and Deletions on Wirings in Protein-Protein Interaction Networks: A Large-Scale Study. J Comput Biol 2009;16:159-67. [DOI: 10.1089/cmb.2008.03tt] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

Liberles DA. Reading the Story in DNA: A Beginner's Guide to Molecular Evolution. Syst Biol 2009. [DOI: 10.1093/sysbio/syp003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open

Rayan A. New tips for structure prediction by comparative modeling. Bioinformation 2009;3:263-7. [PMID: 19255646 PMCID: PMC2646861 DOI: 10.6026/97320630003263] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2008] [Accepted: 12/29/2008] [Indexed: 11/23/2022] Open

Cartwright RA. Problems and solutions for estimating indel rates and length distributions. Mol Biol Evol 2008;26:473-80. [PMID: 19042944 DOI: 10.1093/molbev/msn275] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open

Probabilistic phylogenetic inference with insertions and deletions. PLoS Comput Biol 2008;4:e1000172. [PMID: 18787703 PMCID: PMC2527138 DOI: 10.1371/journal.pcbi.1000172] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2007] [Accepted: 07/31/2008] [Indexed: 11/19/2022] Open

The rates and patterns of insertions, deletions and substitutions in mouse and rat inferred from introns. Sci Bull (Beijing) 2008. [DOI: 10.1007/s11434-008-0352-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

Stojmirović A, Gertz EM, Altschul SF, Yu YK. The effectiveness of position- and composition-specific gap costs for protein similarity searches. Bioinformatics 2008;24:i15-23. [PMID: 18586708 PMCID: PMC2718649 DOI: 10.1093/bioinformatics/btn171] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Liberles DA, Dittmar K. Characterizing gene family evolution. Biol Proced Online 2008;10:66-73. [PMID: 19461954 PMCID: PMC2683547 DOI: 10.1251/bpo144] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2007] [Revised: 03/17/2008] [Accepted: 04/07/2008] [Indexed: 11/23/2022] Open

Tanay A, Siggia ED. Sequence context affects the rate of short insertions and deletions in flies and primates. Genome Biol 2008;9:R37. [PMID: 18291026 PMCID: PMC2374710 DOI: 10.1186/gb-2008-9-2-r37] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Revised: 09/25/2007] [Accepted: 02/21/2008] [Indexed: 01/04/2023] Open

Simmons MP, Müller K, Norton AP. The relative performance of indel-coding methods in simulations. Mol Phylogenet Evol 2007;44:724-40. [PMID: 17512758 DOI: 10.1016/j.ympev.2007.04.001] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2006] [Revised: 04/02/2007] [Accepted: 04/04/2007] [Indexed: 11/26/2022]

Cartwright RA. Ngila: global pairwise alignments with logarithmic and affine gap costs. Bioinformatics 2007;23:1427-8. [PMID: 17387111 PMCID: PMC4739816 DOI: 10.1093/bioinformatics/btm095] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open