Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Axe DD. Estimating the Prevalence of Protein Sequences Adopting Functional Enzyme Folds. J Mol Biol 2004;341:1295-315. [PMID: 15321723 DOI: 10.1016/j.jmb.2004.06.058] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2003] [Revised: 05/02/2004] [Accepted: 06/18/2004] [Indexed: 11/25/2022]

For:	Axe DD. Estimating the Prevalence of Protein Sequences Adopting Functional Enzyme Folds. J Mol Biol 2004;341:1295-315. [PMID: 15321723 DOI: 10.1016/j.jmb.2004.06.058] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2003] [Revised: 05/02/2004] [Accepted: 06/18/2004] [Indexed: 11/25/2022]

Number

Cited by Other Article(s)

Abel DL. Selection in molecular evolution. STUDIES IN HISTORY AND PHILOSOPHY OF SCIENCE 2024;107:54-63. [PMID: 39137534 DOI: 10.1016/j.shpsa.2024.07.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Revised: 05/29/2024] [Accepted: 07/29/2024] [Indexed: 08/15/2024]

Ardern Z. Alternative Reading Frames are an Underappreciated Source of Protein Sequence Novelty. J Mol Evol 2023;91:570-580. [PMID: 37326679 DOI: 10.1007/s00239-023-10122-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2022] [Accepted: 05/31/2023] [Indexed: 06/17/2023]

Ding W, Nakai K, Gong H. Protein design via deep learning. Brief Bioinform 2022;23:bbac102. [PMID: 35348602 PMCID: PMC9116377 DOI: 10.1093/bib/bbac102] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 02/26/2022] [Accepted: 03/01/2022] [Indexed: 12/11/2022] Open

Bitard-Feildel T. Navigating the amino acid sequence space between functional proteins using a deep learning framework. PeerJ Comput Sci 2021;7:e684. [PMID: 34616884 PMCID: PMC8459775 DOI: 10.7717/peerj-cs.684] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Accepted: 07/30/2021] [Indexed: 06/13/2023]

Abstract

MOTIVATION

Shedding light on the relationships between protein sequences and functions is a challenging task with many implications in protein evolution, diseases understanding, and protein design. The protein sequence space mapping to specific functions is however hard to comprehend due to its complexity. Generative models help to decipher complex systems thanks to their abilities to learn and recreate data specificity. Applied to proteins, they can capture the sequence patterns associated with functions and point out important relationships between sequence positions. By learning these dependencies between sequences and functions, they can ultimately be used to generate new sequences and navigate through uncharted area of molecular evolution.

RESULTS

This study presents an Adversarial Auto-Encoder (AAE) approached, an unsupervised generative model, to generate new protein sequences. AAEs are tested on three protein families known for their multiple functions the sulfatase, the HUP and the TPP families. Clustering results on the encoded sequences from the latent space computed by AAEs display high level of homogeneity regarding the protein sequence functions. The study also reports and analyzes for the first time two sampling strategies based on latent space interpolation and latent space arithmetic to generate intermediate protein sequences sharing sequential properties of original sequences linked to known functional properties issued from different families and functions. Generated sequences by interpolation between latent space data points demonstrate the ability of the AAE to generalize and produce meaningful biological sequences from an evolutionary uncharted area of the biological sequence space. Finally, 3D structure models computed by comparative modelling using generated sequences and templates of different sub-families point out to the ability of the latent space arithmetic to successfully transfer protein sequence properties linked to function between different sub-families. All in all this study confirms the ability of deep learning frameworks to model biological complexity and bring new tools to explore amino acid sequence and functional spaces.

Collapse

Ferguson AL, Ranganathan R. 100th Anniversary of Macromolecular Science Viewpoint: Data-Driven Protein Design. ACS Macro Lett 2021;10:327-340. [PMID: 35549066 DOI: 10.1021/acsmacrolett.0c00885] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Repecka D, Jauniskis V, Karpus L, Rembeza E, Rokaitis I, Zrimec J, Poviloniene S, Laurynenas A, Viknander S, Abuajwa W, Savolainen O, Meskys R, Engqvist MKM, Zelezniak A. Expanding functional protein sequence spaces using generative adversarial networks. NAT MACH INTELL 2021. [DOI: 10.1038/s42256-021-00310-5] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]

Thorvaldsen S, Hössjer O. Using statistical methods to model the fine-tuning of molecular machines and systems. J Theor Biol 2020;501:110352. [PMID: 32505827 DOI: 10.1016/j.jtbi.2020.110352] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2019] [Revised: 05/26/2020] [Accepted: 05/27/2020] [Indexed: 10/24/2022]

Chowdhury R, Maranas CD. From directed evolution to computational enzyme engineering—A review. AIChE J 2019. [DOI: 10.1002/aic.16847] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Rai J. Peptide and protein mimetics by retro and retroinverso analogs. Chem Biol Drug Des 2019;93:724-736. [PMID: 30582286 DOI: 10.1111/cbdd.13472] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 12/10/2018] [Accepted: 12/16/2018] [Indexed: 12/19/2022]

Lipinski CA. Rule of five in 2015 and beyond: Target and ligand structural limitations, ligand chemistry structure and drug discovery project decisions. Adv Drug Deliv Rev 2016;101:34-41. [PMID: 27154268 DOI: 10.1016/j.addr.2016.04.029] [Citation(s) in RCA: 275] [Impact Index Per Article: 34.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Revised: 04/22/2016] [Accepted: 04/27/2016] [Indexed: 12/13/2022]

Quantifying protein sequences with reference to the genetic code. J Theor Biol 2015;372:39-46. [DOI: 10.1016/j.jtbi.2015.02.017] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2014] [Revised: 01/28/2015] [Accepted: 02/16/2015] [Indexed: 11/21/2022]

Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 2015;44:1172-239. [PMID: 25503938 PMCID: PMC4349129 DOI: 10.1039/c4cs00351a] [Citation(s) in RCA: 251] [Impact Index Per Article: 27.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Indexed: 12/21/2022]

Abstract

The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust.

Collapse

Molecular Dynamics Simulations for the Ranking, Evaluation, and Refinement of Computationally Designed Proteins. Methods Enzymol 2013;523:145-70. [DOI: 10.1016/b978-0-12-394292-0.00007-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/15/2023]

Navigating the protein fitness landscape with Gaussian processes. Proc Natl Acad Sci U S A 2012;110:E193-201. [PMID: 23277561 DOI: 10.1073/pnas.1215251110] [Citation(s) in RCA: 183] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Romero PA, Arnold FH. Random field model reveals structure of the protein recombinational landscape. PLoS Comput Biol 2012;8:e1002713. [PMID: 23055915 PMCID: PMC3464211 DOI: 10.1371/journal.pcbi.1002713] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2012] [Accepted: 08/03/2012] [Indexed: 11/28/2022] Open

Abstract

We are interested in how intragenic recombination contributes to the evolution of proteins and how this mechanism complements and enhances the diversity generated by random mutation. Experiments have revealed that proteins are highly tolerant to recombination with homologous sequences (mutation by recombination is conservative); more surprisingly, they have also shown that homologous sequence fragments make largely additive contributions to biophysical properties such as stability. Here, we develop a random field model to describe the statistical features of the subset of protein space accessible by recombination, which we refer to as the recombinational landscape. This model shows quantitative agreement with experimental results compiled from eight libraries of proteins that were generated by recombining gene fragments from homologous proteins. The model reveals a recombinational landscape that is highly enriched in functional sequences, with properties dominated by a large-scale additive structure. It also quantifies the relative contributions of parent sequence identity, crossover locations, and protein fold to the tolerance of proteins to recombination. Intragenic recombination explores a unique subset of sequence space that promotes rapid molecular diversification and functional adaptation.

Mutation and recombination are the primary sources of genetic variation in evolving populations. The relative benefit of these two diversification mechanisms and how they complement each other has been a long-standing question in evolutionary biology. While it is clear what types of genetic diversity these two mechanisms can create, a significant challenge is relating these sequence changes to changes in fitness. The fitness landscape, which describes this mapping from genotype to phenotype, is extraordinarily complex and defined over an incomprehensibly large space of sequences. Here, we develop a model of the landscape that relies not on the details of this mapping, but rather on the statistical relationships between sequences. By studying the expected values of landscape properties, we can gain insights into the structure of the landscape that are independent of the details of how genotype dictates phenotype. We use this random field model to understand how recombination explores a functionally enriched and diverse subset of protein sequence space.

Collapse

Computational design of chimeric protein libraries for directed evolution. Methods Mol Biol 2010. [PMID: 20835798 DOI: 10.1007/978-1-60761-842-3_10] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Ferrada E, Wagner A. Evolutionary innovations and the organization of protein functions in genotype space. PLoS One 2010;5:e14172. [PMID: 21152394 PMCID: PMC2994758 DOI: 10.1371/journal.pone.0014172] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2010] [Accepted: 10/28/2010] [Indexed: 11/18/2022] Open

Abel DL. The Universal Plausibility Metric (UPM) & Principle (UPP). Theor Biol Med Model 2009;6:27. [PMID: 19958539 PMCID: PMC2796651 DOI: 10.1186/1742-4682-6-27] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2009] [Accepted: 12/03/2009] [Indexed: 11/10/2022] Open

Romero PA, Arnold FH. Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol 2009;10:866-76. [PMID: 19935669 PMCID: PMC2997618 DOI: 10.1038/nrm2805] [Citation(s) in RCA: 728] [Impact Index Per Article: 48.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Patel SC, Bradley LH, Jinadasa SP, Hecht MH. Cofactor binding and enzymatic activity in an unevolved superfamily of de novo designed 4-helix bundle proteins. Protein Sci 2009;18:1388-400. [PMID: 19544578 PMCID: PMC2775209 DOI: 10.1002/pro.147] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2009] [Revised: 04/12/2009] [Accepted: 04/13/2009] [Indexed: 11/09/2022]

Dryden DTF, Thomson AR, White JH. How much of protein sequence space has been explored by life on Earth? J R Soc Interface 2008;5:953-6. [PMID: 18426772 PMCID: PMC2459213 DOI: 10.1098/rsif.2008.0085] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Stylus: a system for evolutionary experimentation based on a protein/proteome model with non-arbitrary functional constraints. PLoS One 2008;3:e2246. [PMID: 18523658 PMCID: PMC2405935 DOI: 10.1371/journal.pone.0002246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2008] [Accepted: 04/15/2008] [Indexed: 11/28/2022] Open

Abstract

The study of protein evolution is complicated by the vast size of protein sequence space, the huge number of possible protein folds, and the extraordinary complexity of the causal relationships between protein sequence, structure, and function. Much simpler model constructs may therefore provide an attractive complement to experimental studies in this area. Lattice models, which have long been useful in studies of protein folding, have found increasing use here. However, while these models incorporate actual sequences and structures (albeit non-biological ones), they incorporate no actual functions—relying instead on largely arbitrary structural criteria as a proxy for function. In view of the central importance of function to evolution, and the impossibility of incorporating real functional constraints without real function, it is important that protein-like models be developed around real structure–function relationships. Here we describe such a model and introduce open-source software that implements it. The model is based on the structure–function relationship in written language, where structures are two-dimensional ink paths and functions are the meanings that result when these paths form legible characters. To capture something like the hierarchical complexity of protein structure, we use the traditional characters of Chinese origin. Twenty coplanar vectors, encoded by base triplets, act like amino acids in building the character forms. This vector-world model captures many aspects of real proteins, including life-size sequences, a life-size structural repertoire, a realistic genetic code, secondary, tertiary, and quaternary structure, structural domains and motifs, operon-like genetic structures, and layered functional complexity up to a level resembling bacterial genomes and proteomes. Stylus is a full-featured implementation of the vector world for Unix systems. To demonstrate the utility of Stylus, we generated a sample set of homologous vector proteins by evolving successive lines from a single starting gene. These homologues show sequence and structure divergence resembling those of natural homologues in many respects, suggesting that the system may be sufficiently life-like for informative comparison to biology.

Collapse

Rao AG. The outlook for protein engineering in crop improvement. PLANT PHYSIOLOGY 2008;147:6-12. [PMID: 18443101 PMCID: PMC2330291 DOI: 10.1104/pp.108.117929] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2008] [Accepted: 03/10/2008] [Indexed: 05/26/2023]

Leisola M, Turunen O. Protein engineering: opportunities and challenges. Appl Microbiol Biotechnol 2007;75:1225-32. [PMID: 17404726 DOI: 10.1007/s00253-007-0964-2] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2007] [Revised: 03/20/2007] [Accepted: 03/21/2007] [Indexed: 11/26/2022]

Otey CR, Landwehr M, Endelman JB, Hiraga K, Bloom JD, Arnold FH. Structure-guided recombination creates an artificial family of cytochromes P450. PLoS Biol 2006;4:e112. [PMID: 16594730 PMCID: PMC1431580 DOI: 10.1371/journal.pbio.0040112] [Citation(s) in RCA: 103] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2005] [Accepted: 02/09/2006] [Indexed: 11/19/2022] Open

Bloom JD, Silberg JJ, Wilke CO, Drummond DA, Adami C, Arnold FH. Thermodynamic prediction of protein neutrality. Proc Natl Acad Sci U S A 2005;102:606-11. [PMID: 15644440 PMCID: PMC545518 DOI: 10.1073/pnas.0406744102] [Citation(s) in RCA: 261] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open