201
|
Pucci F, Zerihun MB, Peter EK, Schug A. Evaluating DCA-based method performances for RNA contact prediction by a well-curated data set. RNA (NEW YORK, N.Y.) 2020; 26:794-802. [PMID: 32276988 PMCID: PMC7297115 DOI: 10.1261/rna.073809.119] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Accepted: 03/31/2020] [Indexed: 06/11/2023]
Abstract
RNA molecules play many pivotal roles in a cell that are still not fully understood. Any detailed understanding of RNA function requires knowledge of its three-dimensional structure, yet experimental RNA structure resolution remains demanding. Recent advances in sequencing provide unprecedented amounts of sequence data that can be statistically analyzed by methods such as direct coupling analysis (DCA) to determine spatial proximity or contacts of specific nucleic acid pairs, which improve the quality of structure prediction. To quantify this structure prediction improvement, we here present a well curated data set of about 70 RNA structures of high resolution and compare different nucleotide-nucleotide contact prediction methods available in the literature. We observe only minor differences between the performances of the different methods. Moreover, we discuss how robust these predictions are for different contact definitions and how strongly they depend on procedures used to curate and align the families of homologous RNA sequences.
Collapse
Affiliation(s)
- Fabrizio Pucci
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany
| | - Mehari B Zerihun
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany
- Department of Physics, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany
| | - Emanuel K Peter
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany
| |
Collapse
|
202
|
Li Y, Hu J, Zhang C, Yu DJ, Zhang Y. ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks. Bioinformatics 2020; 35:4647-4655. [PMID: 31070716 DOI: 10.1093/bioinformatics/btz291] [Citation(s) in RCA: 109] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2018] [Revised: 03/18/2019] [Accepted: 04/17/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Contact-map of a protein sequence dictates the global topology of structural fold. Accurate prediction of the contact-map is thus essential to protein 3D structure prediction, which is particularly useful for the protein sequences that do not have close homology templates in the Protein Data Bank. RESULTS We developed a new method, ResPRE, to predict residue-level protein contacts using inverse covariance matrix (or precision matrix) of multiple sequence alignments (MSAs) through deep residual convolutional neural network training. The approach was tested on a set of 158 non-homologous proteins collected from the CASP experiments and achieved an average accuracy of 50.6% in the top-L long-range contact prediction with L being the sequence length, which is 11.7% higher than the best of other state-of-the-art approaches ranging from coevolution coupling analysis to deep neural network training. Detailed data analyses show that the major advantage of ResPRE lies at the utilization of precision matrix that helps rule out transitional noises of contact-maps compared with the previously used covariance matrix. Meanwhile, the residual network with parallel shortcut layer connections increases the learning ability of deep neural network training. It was also found that appropriate collection of MSAs can further improve the accuracy of final contact-map predictions. The standalone package and online server of ResPRE are made freely available, which should bring important impact on protein structure and function modeling studies in particular for the distant- and non-homology protein targets. AVAILABILITY AND IMPLEMENTATION https://zhanglab.ccmb.med.umich.edu/ResPRE and https://github.com/leeyang/ResPRE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Jun Hu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| |
Collapse
|
203
|
Crean RM, Gardner JM, Kamerlin SCL. Harnessing Conformational Plasticity to Generate Designer Enzymes. J Am Chem Soc 2020; 142:11324-11342. [PMID: 32496764 PMCID: PMC7467679 DOI: 10.1021/jacs.0c04924] [Citation(s) in RCA: 71] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2020] [Indexed: 02/08/2023]
Abstract
Recent years have witnessed an explosion of interest in understanding the role of conformational dynamics both in the evolution of new enzymatic activities from existing enzymes and in facilitating the emergence of enzymatic activity de novo on scaffolds that were previously non-catalytic. There are also an increasing number of examples in the literature of targeted engineering of conformational dynamics being successfully used to alter enzyme selectivity and activity. Despite the obvious importance of conformational dynamics to both enzyme function and evolvability, many (although not all) computational design approaches still focus either on pure sequence-based approaches or on using structures with limited flexibility to guide the design. However, there exist a wide variety of computational approaches that can be (re)purposed to introduce conformational dynamics as a key consideration in the design process. Coupled with laboratory evolution and more conventional existing sequence- and structure-based approaches, these techniques provide powerful tools for greatly expanding the protein engineering toolkit. This Perspective provides an overview of evolutionary studies that have dissected the role of conformational dynamics in facilitating the emergence of novel enzymes, as well as advances in computational approaches that allow one to target conformational dynamics as part of enzyme design. Harnessing conformational dynamics in engineering studies is a powerful paradigm with which to engineer the next generation of designer biocatalysts.
Collapse
Affiliation(s)
- Rory M. Crean
- Department of Chemistry -
BMC, Uppsala University, Box 576, 751 23 Uppsala, Sweden
| | - Jasmine M. Gardner
- Department of Chemistry -
BMC, Uppsala University, Box 576, 751 23 Uppsala, Sweden
| | - Shina C. L. Kamerlin
- Department of Chemistry -
BMC, Uppsala University, Box 576, 751 23 Uppsala, Sweden
| |
Collapse
|
204
|
Leman JK, Weitzner BD, Lewis SM, Adolf-Bryfogle J, Alam N, Alford RF, Aprahamian M, Baker D, Barlow KA, Barth P, Basanta B, Bender BJ, Blacklock K, Bonet J, Boyken SE, Bradley P, Bystroff C, Conway P, Cooper S, Correia BE, Coventry B, Das R, De Jong RM, DiMaio F, Dsilva L, Dunbrack R, Ford AS, Frenz B, Fu DY, Geniesse C, Goldschmidt L, Gowthaman R, Gray JJ, Gront D, Guffy S, Horowitz S, Huang PS, Huber T, Jacobs TM, Jeliazkov JR, Johnson DK, Kappel K, Karanicolas J, Khakzad H, Khar KR, Khare SD, Khatib F, Khramushin A, King IC, Kleffner R, Koepnick B, Kortemme T, Kuenze G, Kuhlman B, Kuroda D, Labonte JW, Lai JK, Lapidoth G, Leaver-Fay A, Lindert S, Linsky T, London N, Lubin JH, Lyskov S, Maguire J, Malmström L, Marcos E, Marcu O, Marze NA, Meiler J, Moretti R, Mulligan VK, Nerli S, Norn C, Ó'Conchúir S, Ollikainen N, Ovchinnikov S, Pacella MS, Pan X, Park H, Pavlovicz RE, Pethe M, Pierce BG, Pilla KB, Raveh B, Renfrew PD, Burman SSR, Rubenstein A, Sauer MF, Scheck A, Schief W, Schueler-Furman O, Sedan Y, Sevy AM, Sgourakis NG, Shi L, Siegel JB, Silva DA, Smith S, Song Y, Stein A, Szegedy M, Teets FD, Thyme SB, Wang RYR, Watkins A, Zimmerman L, Bonneau R. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat Methods 2020; 17:665-680. [PMID: 32483333 PMCID: PMC7603796 DOI: 10.1038/s41592-020-0848-2] [Citation(s) in RCA: 473] [Impact Index Per Article: 94.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2019] [Accepted: 04/22/2020] [Indexed: 12/12/2022]
Abstract
The Rosetta software for macromolecular modeling, docking and design is extensively used in laboratories worldwide. During two decades of development by a community of laboratories at more than 60 institutions, Rosetta has been continuously refactored and extended. Its advantages are its performance and interoperability between broad modeling capabilities. Here we review tools developed in the last 5 years, including over 80 methods. We discuss improvements to the score function, user interfaces and usability. Rosetta is available at http://www.rosettacommons.org.
Collapse
Affiliation(s)
- Julia Koehler Leman
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.
- Department of Biology, New York University, New York, New York, USA.
| | - Brian D Weitzner
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Lyell Immunopharma Inc., Seattle, WA, USA
| | - Steven M Lewis
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Biochemistry, Duke University, Durham, NC, USA
- Cyrus Biotechnology, Seattle, WA, USA
| | - Jared Adolf-Bryfogle
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nawsad Alam
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Rebecca F Alford
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Melanie Aprahamian
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Kyle A Barlow
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, CA, USA
| | - Patrick Barth
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Baylor College of Medicine, Department of Pharmacology, Houston, TX, USA
| | - Benjamin Basanta
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Biological Physics Structure and Design PhD Program, University of Washington, Seattle, WA, USA
| | - Brian J Bender
- Department of Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Kristin Blacklock
- Institute of Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Jaume Bonet
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Scott E Boyken
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Lyell Immunopharma Inc., Seattle, WA, USA
| | - Phil Bradley
- Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Chris Bystroff
- Department of Biological Sciences, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Patrick Conway
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Seth Cooper
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Bruno E Correia
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Brian Coventry
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Rhiju Das
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Lorna Dsilva
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Roland Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA, USA
| | - Alexander S Ford
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Brandon Frenz
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Cyrus Biotechnology, Seattle, WA, USA
| | - Darwin Y Fu
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
| | - Caleb Geniesse
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | | | - Ragul Gowthaman
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, USA
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD, USA
| | - Dominik Gront
- Faculty of Chemistry, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
| | - Sharon Guffy
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Scott Horowitz
- Department of Chemistry & Biochemistry, University of Denver, Denver, CO, USA
- The Knoebel Institute for Healthy Aging, University of Denver, Denver, CO, USA
| | - Po-Ssu Huang
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Thomas Huber
- Research School of Chemistry, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Tim M Jacobs
- Program in Bioinformatics and Computational Biology, Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - David K Johnson
- Center for Computational Biology, University of Kansas, Lawrence, KS, USA
| | - Kalli Kappel
- Biophysics Program, Stanford University, Stanford, CA, USA
| | - John Karanicolas
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, PA, USA
| | - Hamed Khakzad
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute for Computational Science, University of Zurich, Zurich, Switzerland
- S3IT, University of Zurich, Zurich, Switzerland
| | - Karen R Khar
- Cyrus Biotechnology, Seattle, WA, USA
- Center for Computational Biology, University of Kansas, Lawrence, KS, USA
| | - Sagar D Khare
- Institute of Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
- Department of Chemistry and Chemical Biology, The State University of New Jersey, Piscataway, NJ, USA
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
- Computational Biology and Molecular Biophysics Program, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Firas Khatib
- Department of Computer and Information Science, University of Massachusetts Dartmouth, Dartmouth, MA, USA
| | - Alisa Khramushin
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Indigo C King
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Cyrus Biotechnology, Seattle, WA, USA
| | - Robert Kleffner
- Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA
| | - Brian Koepnick
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Georg Kuenze
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Brian Kuhlman
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Daisuke Kuroda
- Medical Device Development and Regulation Research Center, School of Engineering, University of Tokyo, Tokyo, Japan
- Department of Bioengineering, School of Engineering, University of Tokyo, Tokyo, Japan
| | - Jason W Labonte
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Chemistry, Franklin & Marshall College, Lancaster, PA, USA
| | - Jason K Lai
- Baylor College of Medicine, Department of Pharmacology, Houston, TX, USA
| | - Gideon Lapidoth
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Andrew Leaver-Fay
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Steffen Lindert
- Department of Chemistry and Biochemistry, Ohio State University, Columbus, OH, USA
| | - Thomas Linsky
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Nir London
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Joseph H Lubin
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Sergey Lyskov
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jack Maguire
- Program in Bioinformatics and Computational Biology, Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Lars Malmström
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute for Computational Science, University of Zurich, Zurich, Switzerland
- S3IT, University of Zurich, Zurich, Switzerland
- Division of Infection Medicine, Department of Clinical Sciences Lund, Faculty of Medicine, Lund University, Lund, Sweden
| | - Enrique Marcos
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Research in Biomedicine Barcelona, The Barcelona Institute of Science and Technology, Barcelona, Spain
| | - Orly Marcu
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Nicholas A Marze
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Jens Meiler
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
- Departments of Chemistry, Pharmacology and Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
- Institute for Chemical Biology, Vanderbilt University, Nashville, TN, USA
| | - Rocco Moretti
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
| | - Vikram Khipple Mulligan
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Santrupti Nerli
- Department of Computer Science, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Christoffer Norn
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot, Israel
| | - Shane Ó'Conchúir
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Noah Ollikainen
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | - Michael S Pacella
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Xingjie Pan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Hahnbeom Park
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Ryan E Pavlovicz
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Cyrus Biotechnology, Seattle, WA, USA
| | - Manasi Pethe
- Department of Chemistry and Chemical Biology, The State University of New Jersey, Piscataway, NJ, USA
- Center for Integrative Proteomics Research, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Brian G Pierce
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD, USA
| | - Kala Bharath Pilla
- Research School of Chemistry, Australian National University, Canberra, Australian Capital Territory, Australia
| | - Barak Raveh
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - P Douglas Renfrew
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA
| | - Shourya S Roy Burman
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Aliza Rubenstein
- Institute of Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
- Computational Biology and Molecular Biophysics Program, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Marion F Sauer
- Chemical and Physical Biology Program, Vanderbilt Vaccine Center, Vanderbilt University, Nashville, TN, USA
| | - Andreas Scheck
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - William Schief
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
| | - Ora Schueler-Furman
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Yuval Sedan
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Alexander M Sevy
- Chemical and Physical Biology Program, Vanderbilt Vaccine Center, Vanderbilt University, Nashville, TN, USA
| | - Nikolaos G Sgourakis
- Department of Chemistry and Biochemistry, University of California Santa Cruz, Santa Cruz, CA, USA
| | - Lei Shi
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Justin B Siegel
- Department of Chemistry, University of California, Davis, Davis, CA, USA
- Department of Biochemistry and Molecular Medicine, University of California, Davis, Davis, California, USA
- Genome Center, University of California, Davis, Davis, CA, USA
| | | | - Shannon Smith
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
| | - Yifan Song
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Cyrus Biotechnology, Seattle, WA, USA
| | - Amelie Stein
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Maria Szegedy
- Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Frank D Teets
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Summer B Thyme
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Ray Yu-Ruei Wang
- Department of Biochemistry, University of Washington, Seattle, WA, USA
| | - Andrew Watkins
- Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA
| | - Lior Zimmerman
- Department of Microbiology and Molecular Genetics, IMRIC, Ein Kerem Faculty of Medicine, Hebrew University of Jerusalem, Jerusalem, Israel
| | - Richard Bonneau
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, NY, USA.
- Department of Biology, New York University, New York, New York, USA.
- Department of Computer Science, New York University, New York, NY, USA.
- Center for Data Science, New York University, New York, NY, USA.
| |
Collapse
|
205
|
Abideen ZU, Ahmad A, Usman M, Majaz S, Ali W, Noreen S, Mahmood T, Nouroz F. Dynamics and conformational propensities of staphylococcal CntA. J Biomol Struct Dyn 2020; 39:4923-4935. [PMID: 32573341 DOI: 10.1080/07391102.2020.1782263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Enzymes use transition metals as co-factors for catalytic roles in biological processes. Notably, manganese, iron, cobalt, nickel, copper and zinc are abundantly used. Staphylococcus aureus, a commensal bacterium asymptomatically, lies on the human body causing variety of infections. S. aureus is equipped by advanced virulence-regulatory circuits of metal acquisition like Cnt that acquires metals at infection sites by utilizing a nicotianamine-like metallophore staphylopine. Despite significant growth in structural studies, how CntA of Cnt system transmits conformational signal upon staphylopine recognition remains elusive. Here, we analyzed the structural changes adopted by CntA during close-to-open transition by computational approaches. CntA uses a bi-domain architectural form of domain II which performed 37° rigid body rotation and 1.1 Å translation assisted by inter-domain hinge cluster residues. Important clustered communities were found regulating the conformational changes in CntA where communities 4 and 5 are found crucial. Besides open and close states, the fluctuating regions sampled two additional intermediate states which were considered close or open previously. CntA prefers fluctuating the non-conserved regions rather than conserved where domain II turned out to be rigid and maintains a stable fold. Overall, the CntA system is a potential target for structural biologist to hamper such conformational behaviors at family level.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Zain Ul Abideen
- Department of Bioinformatics, Hazara University, Mansehra, KPK, Pakistan
| | - Ashfaq Ahmad
- Department of Bioinformatics, Hazara University, Mansehra, KPK, Pakistan
| | - Muhammad Usman
- Department of Bioinformatics, Hazara University, Mansehra, KPK, Pakistan
| | - Sidra Majaz
- Department of Bioinformatics, Hazara University, Mansehra, KPK, Pakistan
| | - Waqar Ali
- Department of Bioinformatics, Hazara University, Mansehra, KPK, Pakistan
| | - Shumaila Noreen
- Department of Zoology, University of Peshawar, Peshawar, KPK, Pakistan
| | - Tariq Mahmood
- Department of Bioinformatics, Hazara University, Mansehra, KPK, Pakistan.,Department of Agriculture, Hazara University, Mansehra, KPK, Pakistan
| | - Faisal Nouroz
- Department of Bioinformatics, Hazara University, Mansehra, KPK, Pakistan.,Department of Botany, Hazara University, Mansehra, KPK, Pakistan
| |
Collapse
|
206
|
Cooper CJ, Zheng K, Rush KW, Johs A, Sanders BC, Pavlopoulos GA, Kyrpides NC, Podar M, Ovchinnikov S, Ragsdale SW, Parks JM. Structure determination of the HgcAB complex using metagenome sequence data: insights into microbial mercury methylation. Commun Biol 2020; 3:320. [PMID: 32561885 PMCID: PMC7305189 DOI: 10.1038/s42003-020-1047-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 05/27/2020] [Indexed: 11/09/2022] Open
Abstract
Bacteria and archaea possessing the hgcAB gene pair methylate inorganic mercury (Hg) to form highly toxic methylmercury. HgcA consists of a corrinoid binding domain and a transmembrane domain, and HgcB is a dicluster ferredoxin. However, their detailed structure and function have not been thoroughly characterized. We modeled the HgcAB complex by combining metagenome sequence data mining, coevolution analysis, and Rosetta structure calculations. In addition, we overexpressed HgcA and HgcB in Escherichia coli, confirmed spectroscopically that they bind cobalamin and [4Fe-4S] clusters, respectively, and incorporated these cofactors into the structural model. Surprisingly, the two domains of HgcA do not interact with each other, but HgcB forms extensive contacts with both domains. The model suggests that conserved cysteines in HgcB are involved in shuttling HgII, methylmercury, or both. These findings refine our understanding of the mechanism of Hg methylation and expand the known repertoire of corrinoid methyltransferases in nature. Connor J. Cooper et al. expressed HgcA and HgcB in Escherichia coli and modeled the structure of the HgcAB complex by combining metagenome sequence data, coevolution analysis, and ab initio structure calculations. This study provides insights into the biochemical mechanism of mercury (Hg) methylation.
Collapse
Affiliation(s)
- Connor J Cooper
- Graduate School of Genome Science and Technology, University of Tennessee, F225 Walters Life Science, Knoxville, TN, 37996, USA.,Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831-6038, USA
| | - Kaiyuan Zheng
- Department of Biological Chemistry, University of Michigan Medical School, 1150 West Medical Center Drive, Ann Arbor, MI, 48109-0606, USA
| | - Katherine W Rush
- Department of Biological Chemistry, University of Michigan Medical School, 1150 West Medical Center Drive, Ann Arbor, MI, 48109-0606, USA
| | - Alexander Johs
- Environmental Sciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831-6038, USA
| | - Brian C Sanders
- Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831-6038, USA
| | - Georgios A Pavlopoulos
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA.,Institute for Fundamental Biomedical Research, Biomedical Science Research Center "Alexander Fleming", 34 Fleming Street, 16672, Vari, Greece
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory Berkeley, California, USA
| | - Mircea Podar
- Graduate School of Genome Science and Technology, University of Tennessee, F225 Walters Life Science, Knoxville, TN, 37996, USA.,Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831-6038, USA
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, 02138, USA
| | - Stephen W Ragsdale
- Department of Biological Chemistry, University of Michigan Medical School, 1150 West Medical Center Drive, Ann Arbor, MI, 48109-0606, USA
| | - Jerry M Parks
- Graduate School of Genome Science and Technology, University of Tennessee, F225 Walters Life Science, Knoxville, TN, 37996, USA. .,Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831-6038, USA.
| |
Collapse
|
207
|
Robins WP, Mekalanos JJ. Protein covariance networks reveal interactions important to the emergence of SARS coronaviruses as human pathogens. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020. [PMID: 32577639 DOI: 10.1101/2020.06.05.136887] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
SARS-CoV-2 is one of three recognized coronaviruses (CoVs) that have caused epidemics or pandemics in the 21 st century and that have likely emerged from animal reservoirs based on genomic similarities to bat and other animal viruses. Here we report the analysis of conserved interactions between amino acid residues in proteins encoded by SARS-CoV-related viruses. We identified pairs and networks of residue variants that exhibited statistically high frequencies of covariance with each other. While these interactions are likely key to both protein structure and other protein-protein interactions, we have also found that they can be used to provide a new computational approach (CoVariance-based Phylogeny Analysis) for understanding viral evolution and adaptation. Our data provide evidence that the evolutionary processes that converted a bat virus into human pathogen occurred through recombination with other viruses in combination with new adaptive mutations important for entry into human cells.
Collapse
|
208
|
Chanda P, Costa E, Hu J, Sukumar S, Van Hemert J, Walia R. Information Theory in Computational Biology: Where We Stand Today. ENTROPY (BASEL, SWITZERLAND) 2020; 22:E627. [PMID: 33286399 PMCID: PMC7517167 DOI: 10.3390/e22060627] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 05/31/2020] [Accepted: 06/03/2020] [Indexed: 12/30/2022]
Abstract
"A Mathematical Theory of Communication" was published in 1948 by Claude Shannon to address the problems in the field of data compression and communication over (noisy) communication channels. Since then, the concepts and ideas developed in Shannon's work have formed the basis of information theory, a cornerstone of statistical learning and inference, and has been playing a key role in disciplines such as physics and thermodynamics, probability and statistics, computational sciences and biological sciences. In this article we review the basic information theory based concepts and describe their key applications in multiple major areas of research in computational biology-gene expression and transcriptomics, alignment-free sequence comparison, sequencing and error correction, genome-wide disease-gene association mapping, metabolic networks and metabolomics, and protein sequence, structure and interaction analysis.
Collapse
Affiliation(s)
- Pritam Chanda
- Corteva Agriscience™, Indianapolis, IN 46268, USA
- Computer and Information Science, Indiana University-Purdue University, Indianapolis, IN 46202, USA
| | - Eduardo Costa
- Corteva Agriscience™, Mogi Mirim, Sao Paulo 13801-540, Brazil
| | - Jie Hu
- Corteva Agriscience™, Indianapolis, IN 46268, USA
| | | | | | - Rasna Walia
- Corteva Agriscience™, Johnston, IA 50131, USA
| |
Collapse
|
209
|
Lee GR, Won J, Heo L, Seok C. GalaxyRefine2: simultaneous refinement of inaccurate local regions and overall protein structure. Nucleic Acids Res 2020; 47:W451-W455. [PMID: 31001635 PMCID: PMC6602442 DOI: 10.1093/nar/gkz288] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 04/01/2019] [Accepted: 04/11/2019] [Indexed: 11/12/2022] Open
Abstract
The 3D structure of a protein can be predicted from its amino acid sequence with high accuracy for a large fraction of cases because of the availability of large quantities of experimental data and the advance of computational algorithms. Recently, deep learning methods exploiting the coevolution information obtained by comparing related protein sequences have been successfully used to generate highly accurate model structures even in the absence of template structure information. However, structures predicted based on either template structures or related sequences require further improvement in regions for which information is missing. Refining a predicted protein structure with insufficient information on certain regions is critical because these regions may be connected to functional specificity that is not conserved among related proteins. The GalaxyRefine2 web server, freely available via http://galaxy.seoklab.org/refine2, is an upgraded version of the GalaxyRefine protein structure refinement server and reflects recent developments successfully tested through CASP blind prediction experiments. This method adopts an iterative optimization approach involving various structure move sets to refine both local and global structures. The estimation of local error and hybridization of available homolog structures are also employed for effective conformation search.
Collapse
Affiliation(s)
- Gyu Rie Lee
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| | - Jonghun Won
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| | - Lim Heo
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
210
|
Baldessari F, Capelli R, Carloni P, Giorgetti A. Coevolutionary data-based interaction networks approach highlighting key residues across protein families: The case of the G-protein coupled receptors. Comput Struct Biotechnol J 2020; 18:1153-1159. [PMID: 32489528 PMCID: PMC7260681 DOI: 10.1016/j.csbj.2020.05.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Revised: 05/01/2020] [Accepted: 05/06/2020] [Indexed: 12/26/2022] Open
Abstract
We present an approach that, by integrating structural data with Direct Coupling Analysis, is able to pinpoint most of the interaction hotspots (i.e. key residues for the biological activity) across very sparse protein families in a single run. An application to the Class A G-protein coupled receptors (GPCRs), both in their active and inactive states, demonstrates the predictive power of our approach. The latter can be easily extended to any other kind of protein family, where it is expected to highlight most key sites involved in their functional activity.
Collapse
Affiliation(s)
- Filippo Baldessari
- Department of Biotechnology, Università di Verona, Ca Vignal 1, strada Le Grazie 15, I-37134 Verona, Italy
| | - Riccardo Capelli
- Computational Biomedicine Section, IAS-5/INM-9, Forschungzentrum Jülich, Wilhelm-Johnen-straße, D-52425 Jülich, Germany
| | - Paolo Carloni
- Computational Biomedicine Section, IAS-5/INM-9, Forschungzentrum Jülich, Wilhelm-Johnen-straße, D-52425 Jülich, Germany
| | - Alejandro Giorgetti
- Department of Biotechnology, Università di Verona, Ca Vignal 1, strada Le Grazie 15, I-37134 Verona, Italy
- Computational Biomedicine Section, IAS-5/INM-9, Forschungzentrum Jülich, Wilhelm-Johnen-straße, D-52425 Jülich, Germany
| |
Collapse
|
211
|
Shapovalov M, Dunbrack RL, Vucetic S. Multifaceted analysis of training and testing convolutional neural networks for protein secondary structure prediction. PLoS One 2020; 15:e0232528. [PMID: 32374785 PMCID: PMC7202669 DOI: 10.1371/journal.pone.0232528] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Accepted: 04/16/2020] [Indexed: 11/30/2022] Open
Abstract
Protein secondary structure prediction remains a vital topic with broad applications. Due to lack of a widely accepted standard in secondary structure predictor evaluation, a fair comparison of predictors is challenging. A detailed examination of factors that contribute to higher accuracy is also lacking. In this paper, we present: (1) new test sets, Test2018, Test2019, and Test2018-2019, consisting of proteins from structures released in 2018 and 2019 with less than 25% identity to any protein published before 2018; (2) a 4-layer convolutional neural network, SecNet, with an input window of ±14 amino acids which was trained on proteins ≤25% identical to proteins in Test2018 and the commonly used CB513 test set; (3) an additional test set that shares no homologous domains with the training set proteins, according to the Evolutionary Classification of Proteins (ECOD) database; (4) a detailed ablation study where we reverse one algorithmic choice at a time in SecNet and evaluate the effect on the prediction accuracy; (5) new 4- and 5-label prediction alphabets that may be more practical for tertiary structure prediction methods. The 3-label accuracy (helix, sheet, coil) of the leading predictors on both Test2018 and CB513 is 81-82%, while SecNet's accuracy is 84% for both sets. Accuracy on the non-homologous ECOD set is only 0.6 points (83.9%) lower than the results on the Test2018-2019 set (84.5%). The ablation study of features, neural network architecture, and training hyper-parameters suggests the best accuracy results are achieved with good choices for each of them while the neural network architecture is not as critical as long as it is not too simple. Protocols for generating and using unbiased test, validation, and training sets are provided. Our data sets, including input features and assigned labels, and SecNet software including third-party dependencies and databases, are downloadable from dunbrack.fccc.edu/ss and github.com/sh-maxim/ss.
Collapse
Affiliation(s)
- Maxim Shapovalov
- Fox Chase Cancer Center, Philadelphia, PA, United States of America
- Temple University, Philadelphia, PA, United States of America
| | | | | |
Collapse
|
212
|
Chen MC, Li Y, Zhu YH, Ge F, Yu DJ. SSCpred: Single-Sequence-Based Protein Contact Prediction Using Deep Fully Convolutional Network. J Chem Inf Model 2020; 60:3295-3303. [PMID: 32338512 DOI: 10.1021/acs.jcim.9b01207] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ming-Cai Chen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Washtenaw 100, Ann Arbor, Michigan 48109-2218, United States
| | - Yi-Heng Zhu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Fang Ge
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| |
Collapse
|
213
|
Katoh K, Rozewicki J, Yamada KD. MAFFT online service: multiple sequence alignment, interactive sequence choice and visualization. Brief Bioinform 2020; 20:1160-1166. [PMID: 28968734 PMCID: PMC6781576 DOI: 10.1093/bib/bbx108] [Citation(s) in RCA: 4203] [Impact Index Per Article: 840.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Revised: 07/27/2017] [Indexed: 11/28/2022] Open
Abstract
This article describes several features in the MAFFT online service for multiple sequence alignment (MSA). As a result of recent advances in sequencing technologies, huge numbers of biological sequences are available and the need for MSAs with large numbers of sequences is increasing. To extract biologically relevant information from such data, sophistication of algorithms is necessary but not sufficient. Intuitive and interactive tools for experimental biologists to semiautomatically handle large data are becoming important. We are working on development of MAFFT toward these two directions. Here, we explain (i) the Web interface for recently developed options for large data and (ii) interactive usage to refine sequence data sets and MSAs.
Collapse
Affiliation(s)
- Kazutaka Katoh
- Corresponding author: Kazutaka Katoh, 3-1 Yamadaoka, Suita, Osaka 565-0871, JAPAN. E-mail:
| | | | | |
Collapse
|
214
|
Protein Contact Map Prediction Based on ResNet and DenseNet. BIOMED RESEARCH INTERNATIONAL 2020; 2020:7584968. [PMID: 32337273 PMCID: PMC7165324 DOI: 10.1155/2020/7584968] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 03/05/2020] [Indexed: 11/18/2022]
Abstract
Residue-residue contact prediction has become an increasingly important tool for modeling the three-dimensional structure of a protein when no homologous structure is available. Ultradeep residual neural network (ResNet) has become the most popular method for making contact predictions because it captures the contextual information between residues. In this paper, we propose a novel deep neural network framework for contact prediction which combines ResNet and DenseNet. This framework uses 1D ResNet to process sequential features, and besides PSSM, SS3, and solvent accessibility, we have introduced a new feature, position-specific frequency matrix (PSFM), as an input. Using ResNet's residual module and identity mapping, it can effectively process sequential features after which the outer concatenation function is used for sequential and pairwise features. Prediction accuracy is improved following a final processing step using the dense connection of DenseNet. The prediction accuracy of the protein contact map shows that our method is more effective than other popular methods due to the new network architecture and the added feature input.
Collapse
|
215
|
Fantini M, Lisi S, De Los Rios P, Cattaneo A, Pastore A. Protein Structural Information and Evolutionary Landscape by In Vitro Evolution. Mol Biol Evol 2020; 37:1179-1192. [PMID: 31670785 PMCID: PMC7086169 DOI: 10.1093/molbev/msz256] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein structure is tightly intertwined with function according to the laws of evolution. Understanding how structure determines function has been the aim of structural biology for decades. Here, we have wondered instead whether it is possible to exploit the function for which a protein was evolutionary selected to gain information on protein structure and on the landscape explored during the early stages of molecular and natural evolution. To answer to this question, we developed a new methodology, which we named CAMELS (Coupling Analysis by Molecular Evolution Library Sequencing), that is able to obtain the in vitro evolution of a protein from an artificial selection based on function. We were able to observe with CAMELS many features of the TEM-1 beta-lactamase local fold exclusively by generating and sequencing large libraries of mutational variants. We demonstrated that we can, whenever a functional phenotypic selection of a protein is available, sketch the structural and evolutionary landscape of a protein without utilizing purified proteins, collecting physical measurements, or relying on the pool of natural protein variants.
Collapse
Affiliation(s)
- Marco Fantini
- BioSNS Laboratory of Biology, Scuola Normale Superiore (SNS), Pisa, Italy
| | - Simonetta Lisi
- BioSNS Laboratory of Biology, Scuola Normale Superiore (SNS), Pisa, Italy
| | - Paolo De Los Rios
- Institute of Physics, School of Basic Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | - Antonino Cattaneo
- BioSNS Laboratory of Biology, Scuola Normale Superiore (SNS), Pisa, Italy
- European Brain Research Institute, Rome, Italy
| | - Annalisa Pastore
- Department of Clinical and Basic Neuroscience, Maurice Wohl Institute, King's College London, London, United Kingdom
- Dementia Research Institute, King’s College London, London, United Kingdom
| |
Collapse
|
216
|
Koukos P, Bonvin A. Integrative Modelling of Biomolecular Complexes. J Mol Biol 2020; 432:2861-2881. [DOI: 10.1016/j.jmb.2019.11.009] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 11/12/2019] [Accepted: 11/13/2019] [Indexed: 12/31/2022]
|
217
|
Fang C, Jia Y, Hu L, Lu Y, Wang H. IMPContact: An Interhelical Residue Contact Prediction Method. BIOMED RESEARCH INTERNATIONAL 2020; 2020:4569037. [PMID: 32309431 PMCID: PMC7140131 DOI: 10.1155/2020/4569037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Accepted: 03/09/2020] [Indexed: 11/17/2022]
Abstract
As an important category of proteins, alpha-helix transmembrane proteins (αTMPs) play an important role in various biological activities. Because the solved αTMP structures are inadequate, predicting the residue contacts among the transmembrane segments of an αTMP exhibits the basis of protein fold, which can be used to further discover more protein functions. A few efforts have been devoted to predict the interhelical residue contact using machine learning methods based on the prior knowledge of transmembrane protein structure. However, it is still a challenge to improve the prediction accuracy, while the deep learning method provides an opportunity to utilize the structural knowledge in a different insight. For this purpose, we proposed a novel αTMP residue-residue contact prediction method IMPContact, in which a convolutional neural network (CNN) was applied to recognize those interhelical contacts in a TMP using its specific structural features. There were four sequence-based TMP-specific features selected to descript a pair of residues, namely, evolutionary covariation, predicted topology structure, residue relative position, and evolutionary conservation. An up-to-date dataset was used to train and test the IMPContact; our method achieved better performance compared to peer methods. In the case studies, IHRCs in the regular transmembrane helixes were better predicted than in the irregular ones.
Collapse
Affiliation(s)
- Chao Fang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yajie Jia
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Institute of Computational Biology, Northeast Normal University, Changchun 130117, China
| | - Lihong Hu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Yinghua Lu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China
| | - Han Wang
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
- Institute of Computational Biology, Northeast Normal University, Changchun 130117, China
- Department of Computer Science, College of Humanities & Sciences of Northeast Normal University, Changchun 130117, China
| |
Collapse
|
218
|
Blazejewski T, Ho HI, Wang HH. Synthetic sequence entanglement augments stability and containment of genetic information in cells. Science 2020; 365:595-598. [PMID: 31395784 DOI: 10.1126/science.aav5477] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Revised: 06/21/2019] [Accepted: 07/15/2019] [Indexed: 12/28/2022]
Abstract
In synthetic biology, methods for stabilizing genetically engineered functions and confining recombinant DNA to intended hosts are necessary to cope with natural mutation accumulation and pervasive lateral gene flow. We present a generalizable strategy to preserve and constrain genetic information through the computational design of overlapping genes. Overlapping a sequence with an essential gene altered its fitness landscape and produced a constrained evolutionary path, even for synonymous mutations. Embedding a toxin gene in a gene of interest restricted its horizontal propagation. We further demonstrated a multiplex and scalable approach to build and test >7500 overlapping sequence designs, yielding functional yet highly divergent variants from natural homologs. This work enables deeper exploration of natural and engineered overlapping genes and facilitates enhanced genetic stability and biocontainment in emerging applications.
Collapse
Affiliation(s)
- Tomasz Blazejewski
- Department of Systems Biology, Columbia University, New York, NY, USA.,Integrated Program in Cellular, Molecular, and Biomedical Studies, Columbia University, New York, NY, USA
| | - Hsing-I Ho
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Harris H Wang
- Department of Systems Biology, Columbia University, New York, NY, USA. .,Department of Pathology and Cell Biology, Columbia University, New York, NY, USA
| |
Collapse
|
219
|
Abriata LA, Dal Peraro M. Will Cryo-Electron Microscopy Shift the Current Paradigm in Protein Structure Prediction? J Chem Inf Model 2020; 60:2443-2447. [PMID: 32134661 DOI: 10.1021/acs.jcim.0c00177] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Protein dynamics is undoubtedly a pervasive ingredient in all biological functions. However, structural biology has been strongly driven by a static-centered view of protein architecture. We argue that the recent advances of cryo-electron microscopy (EM) have the potential to more broadly explore the conformational landscapes of protein complexes and therefore will enhance our ability to predict the diverse conformations of tertiary and quaternary protein structures that are functionally relevant in physiological conditions.
Collapse
Affiliation(s)
- Luciano A Abriata
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| | - Matteo Dal Peraro
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland.,Swiss Institute of Bioinformatics (SIB), CH-1015 Lausanne, Switzerland
| |
Collapse
|
220
|
|
221
|
Karczyńska AS, Ziȩba K, Uciechowska U, Mozolewska MA, Krupa P, Lubecka EA, Lipska AG, Sikorska C, Samsonov SA, Sieradzan AK, Giełdoń A, Liwo A, Ślusarz R, Ślusarz M, Lee J, Joo K, Czaplewski C. Improved Consensus-Fragment Selection in Template-Assisted Prediction of Protein Structures with the UNRES Force Field in CASP13. J Chem Inf Model 2020; 60:1844-1864. [PMID: 31999919 PMCID: PMC7588044 DOI: 10.1021/acs.jcim.9b00864] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
The method for protein-structure
prediction, which combines the
physics-based coarse-grained UNRES force field with knowledge-based
modeling, has been developed further and tested in the 13th Community
Wide Experiment on the Critical Assessment of Techniques for Protein
Structure Prediction (CASP13). The method implements restraints from
the consensus fragments common to server models. In this work, the
server models to derive fragments have been chosen on the basis of
quality assessment; a fully automatic fragment-selection procedure
has been introduced, and Dynamic Fragment Assembly pseudopotentials
have been fully implemented. The Global Distance Test Score (GDT_TS),
averaged over our “Model 1” predictions, increased by
over 10 units with respect to CASP12 for the free-modeling category
to reach 40.82. Our “Model 1” predictions ranked 20
and 14 for all and free-modeling targets, respectively (upper 20.2%
and 14.3% of all models submitted to CASP13 in these categories, respectively),
compared to 27 (upper 21.1%) and 24 (upper 18.9%) in CASP12, respectively.
For oligomeric targets, the Interface Patch Similarity (IPS) and Interface
Contact Similarity (ICS) averaged over our best oligomer models increased
from 0.28 to 0.36 and from 12.4 to 17.8, respectively, from CASP12
to CASP13, and top-ranking models of 2 targets (H0968 and T0997o)
were obtained (none in CASP12). The improvement of our method in CASP13
over CASP12 was ascribed to the combined effect of the overall enhancement
of server-model quality, our success in selecting server models and
fragments to derive restraints, and improvements of the restraint
and potential-energy functions.
Collapse
Affiliation(s)
| | - Karolina Ziȩba
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Urszula Uciechowska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Magdalena A Mozolewska
- Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, Warsaw PL-02668, Poland
| | - Paweł Krupa
- Institute of Physics, Polish Academy of Sciences, Aleja Lotników 32/46, Warsaw PL-02668, Poland
| | - Emilia A Lubecka
- Institute of Informatics, Faculty of Mathematics, Physics, and Informatics, University of Gdańsk, Wita Stwosza 57, Gdańsk 80-308, Poland
| | - Agnieszka G Lipska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Celina Sikorska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Sergey A Samsonov
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Adam K Sieradzan
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland.,School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Artur Giełdoń
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland.,School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Rafał Ślusarz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Magdalena Ślusarz
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| | - Jooyoung Lee
- School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Keehyoung Joo
- Center for Advanced Computation, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, Gdańsk 80-308, Poland
| |
Collapse
|
222
|
Fukuda H, Tomii K. DeepECA: an end-to-end learning framework for protein contact prediction from a multiple sequence alignment. BMC Bioinformatics 2020; 21:10. [PMID: 31918654 PMCID: PMC6953294 DOI: 10.1186/s12859-019-3190-x] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 11/04/2019] [Indexed: 12/30/2022] Open
Abstract
Background Recently developed methods of protein contact prediction, a crucially important step for protein structure prediction, depend heavily on deep neural networks (DNNs) and multiple sequence alignments (MSAs) of target proteins. Protein sequences are accumulating to an increasing degree such that abundant sequences to construct an MSA of a target protein are readily obtainable. Nevertheless, many cases present different ends of the number of sequences that can be included in an MSA used for contact prediction. The abundant sequences might degrade prediction results, but opportunities remain for a limited number of sequences to construct an MSA. To resolve these persistent issues, we strove to develop a novel framework using DNNs in an end-to-end manner for contact prediction. Results We developed neural network models to improve precision of both deep and shallow MSAs. Results show that higher prediction accuracy was achieved by assigning weights to sequences in a deep MSA. Moreover, for shallow MSAs, adding a few sequential features was useful to increase the prediction accuracy of long-range contacts in our model. Based on these models, we expanded our model to a multi-task model to achieve higher accuracy by incorporating predictions of secondary structures and solvent-accessible surface areas. Moreover, we demonstrated that ensemble averaging of our models can raise accuracy. Using past CASP target protein domains, we tested our models and demonstrated that our final model is superior to or equivalent to existing meta-predictors. Conclusions The end-to-end learning framework we built can use information derived from either deep or shallow MSAs for contact prediction. Recently, an increasing number of protein sequences have become accessible, including metagenomic sequences, which might degrade contact prediction results. Under such circumstances, our model can provide a means to reduce noise automatically. According to results of tertiary structure prediction based on contacts and secondary structures predicted by our model, more accurate three-dimensional models of a target protein are obtainable than those from existing ECA methods, starting from its MSA. DeepECA is available from https://github.com/tomiilab/DeepECA.
Collapse
Affiliation(s)
- Hiroyuki Fukuda
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba-ken, 277-8562, Japan
| | - Kentaro Tomii
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba-ken, 277-8562, Japan. .,Artificial Intelligence Research Center (AIRC), Biotechnology Research Institute for Drug Discovery, Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo, 135-0064, Japan.
| |
Collapse
|
223
|
Improved protein structure prediction using predicted interresidue orientations. Proc Natl Acad Sci U S A 2020; 117:1496-1503. [PMID: 31896580 DOI: 10.1073/pnas.1914677117] [Citation(s) in RCA: 865] [Impact Index Per Article: 173.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
The prediction of interresidue contacts and distances from coevolutionary data using deep learning has considerably advanced protein structure prediction. Here, we build on these advances by developing a deep residual network for predicting interresidue orientations, in addition to distances, and a Rosetta-constrained energy-minimization protocol for rapidly and accurately generating structure models guided by these restraints. In benchmark tests on 13th Community-Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP13)- and Continuous Automated Model Evaluation (CAMEO)-derived sets, the method outperforms all previously described structure-prediction methods. Although trained entirely on native proteins, the network consistently assigns higher probability to de novo-designed proteins, identifying the key fold-determining residues and providing an independent quantitative measure of the "ideality" of a protein structure. The method promises to be useful for a broad range of protein structure prediction and design problems.
Collapse
|
224
|
Teppa E, Nadalin F, Combet C, Zea DJ, David L, Carbone A. Coevolution analysis of amino-acids reveals diversified drug-resistance solutions in viral sequences: a case study of hepatitis B virus. Virus Evol 2020; 6:veaa006. [PMID: 32158552 PMCID: PMC7050494 DOI: 10.1093/ve/veaa006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The study of mutational landscapes of viral proteins is fundamental for the understanding of the mechanisms of cross-resistance to drugs and the design of effective therapeutic strategies based on several drugs. Antiviral therapy with nucleos(t)ide analogues targeting the hepatitis B virus (HBV) polymerase protein (Pol) can inhibit disease progression by suppression of HBV replication and makes it an important case study. In HBV, treatment may fail due to the emergence of drug-resistant mutants. Primary and compensatory mutations have been associated with lamivudine resistance, whereas more complex mutational patterns are responsible for resistance to other HBV antiviral drugs. So far, all known drug-resistance mutations are located in one of the four Pol domains, called reverse transcriptase. We demonstrate that sequence covariation identifies drug-resistance mutations in viral sequences. A new algorithmic strategy, BIS2TreeAnalyzer, is designed to apply the coevolution analysis method BIS2, successfully used in the past on small sets of conserved sequences, to large sets of evolutionary related sequences. When applied to HBV, BIS2TreeAnalyzer highlights diversified viral solutions by discovering thirty-seven positions coevolving with residues known to be associated with drug resistance and located on the four Pol domains. These results suggest a sequential mechanism of emergence for some mutational patterns. They reveal complex combinations of positions involved in HBV drug resistance and contribute with new information to the landscape of HBV evolutionary solutions. The computational approach is general and can be applied to other viral sequences when compensatory mutations are presumed.
Collapse
Affiliation(s)
- Elin Teppa
- Sorbonne Université, Univ P6, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, 4 Place Jussieu, 75005 Paris, France
- Sorbonne Université, Institut des Sciences du Calcul et des Données (ISCD), 4 Place Jussieu, 75005 Paris, France
| | - Francesca Nadalin
- Sorbonne Université, Univ P6, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, 4 Place Jussieu, 75005 Paris, France
- Institute Curie, PSL Research University, INSERM U932, Immunity and Cancer Department, 26 rue d’Ulm, 75248 Paris, France
| | - Christophe Combet
- Univ Lyon, Université Claude Bernard Lyon 1, INSERM 1052, CNRS 5286, Centre Léon Bérard, Centre de recherche en cancérologie de Lyon, 151 Cours Albert Thomas, 69424 Lyon, France
| | - Diego Javier Zea
- Sorbonne Université, Univ P6, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, 4 Place Jussieu, 75005 Paris, France
| | - Laurent David
- Sorbonne Université, Univ P6, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, 4 Place Jussieu, 75005 Paris, France
| | - Alessandra Carbone
- Sorbonne Université, Univ P6, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB) - UMR 7238, 4 Place Jussieu, 75005 Paris, France
- Institut Universitaire de France, 1 rue Descartes, 75231 Paris, France
| |
Collapse
|
225
|
Contreras S, Bertolani SJ, Siegel JB. A Benchmark for Homomeric Enzyme Active Site Structure Prediction Highlights the Importance of Accurate Modeling of Protein Symmetry. ACS OMEGA 2019; 4:22356-22362. [PMID: 31909318 PMCID: PMC6941179 DOI: 10.1021/acsomega.9b02636] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 12/04/2019] [Indexed: 05/15/2023]
Abstract
Accurate prediction and modeling of an enzyme's active site are critical for engineering efforts as well as providing insight into an enzyme's naturally occurring function. Previous efforts demonstrated that the integration of constraints enforcing strict geometric orientations between catalytic residues significantly improved the modeling accuracy for the active sites of monomeric enzymes. In this study, a similar approach was explored to evaluate the effect on the active sites of homomeric enzymes. A benchmark of 17 homomeric enzymes with known structures and a bound ligand relevant to the established chemistry were identified from the protein data bank. The enzymes identified span multiple classes as well as symmetries. Unlike what was observed for the monomeric enzymes, upon the application of catalytic geometric constraints, there was no significant improvement observed in modeling accuracy for either the active site of the protein structure or the accuracy of the subsequently docked ligand. Upon further analysis, it is apparent that the symmetric interface being modeled is inaccurate and prevented the active sites from being modeled at atomic-level accuracy. This is consistent with the challenge others have identified in being able to predict de novo protein symmetry. To further improve the accuracy of active site modeling for homomeric proteins, new methodologies to accurately model the symmetric interfaces of these complexes are needed.
Collapse
Affiliation(s)
- Stephanie
C. Contreras
- Department
of Chemistry, Department of Biochemistry and Molecular Medicine, and Genome Center, University of California, Davis, Davis, California 95616, United States
| | - Steve J. Bertolani
- Department
of Chemistry, Department of Biochemistry and Molecular Medicine, and Genome Center, University of California, Davis, Davis, California 95616, United States
| | - Justin B. Siegel
- Department
of Chemistry, Department of Biochemistry and Molecular Medicine, and Genome Center, University of California, Davis, Davis, California 95616, United States
- E-mail:
| |
Collapse
|
226
|
Badaczewska-Dawid AE, Kolinski A, Kmiecik S. Computational reconstruction of atomistic protein structures from coarse-grained models. Comput Struct Biotechnol J 2019; 18:162-176. [PMID: 31969975 PMCID: PMC6961067 DOI: 10.1016/j.csbj.2019.12.007] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 12/10/2019] [Indexed: 01/02/2023] Open
Abstract
Three-dimensional protein structures, whether determined experimentally or theoretically, are often too low resolution. In this mini-review, we outline the computational methods for protein structure reconstruction from incomplete coarse-grained to all atomistic models. Typical reconstruction schemes can be divided into four major steps. Usually, the first step is reconstruction of the protein backbone chain starting from the C-alpha trace. This is followed by side-chains rebuilding based on protein backbone geometry. Subsequently, hydrogen atoms can be reconstructed. Finally, the resulting all-atom models may require structure optimization. Many methods are available to perform each of these tasks. We discuss the available tools and their potential applications in integrative modeling pipelines that can transfer coarse-grained information from computational predictions, or experiment, to all atomistic structures.
Collapse
Affiliation(s)
| | | | - Sebastian Kmiecik
- Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| |
Collapse
|
227
|
Ryl PSJ, Bohlke-Schneider M, Lenz S, Fischer L, Budzinski L, Stuiver M, Mendes MML, Sinn L, O'Reilly FJ, Rappsilber J. In Situ Structural Restraints from Cross-Linking Mass Spectrometry in Human Mitochondria. J Proteome Res 2019; 19:327-336. [PMID: 31746214 PMCID: PMC7010328 DOI: 10.1021/acs.jproteome.9b00541] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The field of structural biology is increasingly focusing on studying proteins in situ, i.e., in their greater biological context. Cross-linking mass spectrometry (CLMS) is contributing to this effort, typically through the use of mass spectrometry (MS)-cleavable cross-linkers. Here, we apply the popular noncleavable cross-linker disuccinimidyl suberate (DSS) to human mitochondria and identify 5518 distance restraints between protein residues. Each distance restraint on proteins or their interactions provides structural information within mitochondria. Comparing these restraints to protein data bank (PDB)-deposited structures and comparative models reveals novel protein conformations. Our data suggest, among others, substrates and protein flexibility of mitochondrial heat shock proteins. Through this study, we bring forward two central points for the progression of CLMS towards large-scale in situ structural biology: First, clustered conflicts of cross-link data reveal in situ protein conformation states in contrast to error-rich individual conflicts. Second, noncleavable cross-linkers are compatible with proteome-wide studies.
Collapse
Affiliation(s)
- Petra S J Ryl
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Michael Bohlke-Schneider
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Swantje Lenz
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Lutz Fischer
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany.,Wellcome Centre for Cell Biology, School of Biological Sciences , University of Edinburgh , Edinburgh EH9 3BF , Scotland , United Kingdom
| | - Lisa Budzinski
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Marchel Stuiver
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Marta M L Mendes
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Ludwig Sinn
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Francis J O'Reilly
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany
| | - Juri Rappsilber
- Bioanalytics, Institute of Biotechnology , Technische Universität Berlin , 13355 Berlin , Germany.,Wellcome Centre for Cell Biology, School of Biological Sciences , University of Edinburgh , Edinburgh EH9 3BF , Scotland , United Kingdom
| |
Collapse
|
228
|
Ding X, Zou Z, Brooks Iii CL. Deciphering protein evolution and fitness landscapes with latent space models. Nat Commun 2019; 10:5644. [PMID: 31822668 PMCID: PMC6904478 DOI: 10.1038/s41467-019-13633-0] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 11/12/2019] [Indexed: 12/03/2022] Open
Abstract
Protein sequences contain rich information about protein evolution, fitness landscapes, and stability. Here we investigate how latent space models trained using variational auto-encoders can infer these properties from sequences. Using both simulated and real sequences, we show that the low dimensional latent space representation of sequences, calculated using the encoder model, captures both evolutionary and ancestral relationships between sequences. Together with experimental fitness data and Gaussian process regression, the latent space representation also enables learning the protein fitness landscape in a continuous low dimensional space. Moreover, the model is also useful in predicting protein mutational stability landscapes and quantifying the importance of stability in shaping protein evolution. Overall, we illustrate that the latent space models learned using variational auto-encoders provide a mechanism for exploration of the rich data contained in protein sequences regarding evolution, fitness and stability and hence are well-suited to help guide protein engineering efforts.
Collapse
Affiliation(s)
- Xinqiang Ding
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Zhengting Zou
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Charles L Brooks Iii
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
- Biophysics Program, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
229
|
Tang N, Sandahl TD, Ott P, Kepp KP. Computing the Pathogenicity of Wilson's Disease ATP7B Mutations: Implications for Disease Prevalence. J Chem Inf Model 2019; 59:5230-5243. [PMID: 31751128 DOI: 10.1021/acs.jcim.9b00852] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Genetic variations in the gene encoding the copper-transport protein ATP7B are the primary cause of Wilson's disease. Controversially, clinical prevalence seems much smaller than the prevalence estimated by genetic screening tools, causing fear that many people are undiagnosed, although early diagnosis and treatment is essential. To address this issue, we benchmarked 16 state-of-the-art computational disease-prediction methods against established data of missense ATP7B mutations. Our results show that the quality of the methods varies widely. We show the importance of optimizing the threshold of the methods used to distinguish pathogenic from nonpathogenic mutations against data of clinically confirmed pathogenic and nonpathogenic mutations. We find that most methods use thresholds that predict too many ATP7B mutations to be pathogenic. Thus, our findings explain the current controversy on Wilson's disease prevalence because meta-analysis and text search methods include many computational estimates that lead to higher disease prevalence than clinically observed. As proteins and diseases differ widely, a one-size-fits-all threshold cannot distinguish pathogenic and nonpathogenic mutations efficiently, as shown here. We also show that amino acid changes with small evolutionary substitution probability, mainly due to amino acid volume, are more associated with the disease, implying a pathological effect on the conformational state of the protein, which could affect copper transport or adenosine triphosphate recognition and hydrolysis. These findings may be a first step toward a more quantitative genotype-phenotype relationship of Wilson's disease.
Collapse
Affiliation(s)
- Ning Tang
- DTU Chemistry , Technical University of Denmark , Kemitorvet 206 , 2800 Kongens Lyngby , Denmark
| | - Thomas D Sandahl
- Department of Hepatology and Gastroenterology , Aarhus University Hospital , 8200 Aarhus , Denmark
| | - Peter Ott
- Department of Hepatology and Gastroenterology , Aarhus University Hospital , 8200 Aarhus , Denmark
| | - Kasper P Kepp
- DTU Chemistry , Technical University of Denmark , Kemitorvet 206 , 2800 Kongens Lyngby , Denmark
| |
Collapse
|
230
|
Shrestha R, Fajardo E, Gil N, Fidelis K, Kryshtafovych A, Monastyrskyy B, Fiser A. Assessing the accuracy of contact predictions in CASP13. Proteins 2019; 87:1058-1068. [PMID: 31587357 PMCID: PMC6851495 DOI: 10.1002/prot.25819] [Citation(s) in RCA: 50] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/17/2019] [Accepted: 09/17/2019] [Indexed: 01/07/2023]
Abstract
The accuracy of sequence-based tertiary contact predictions was assessed in a blind prediction experiment at the CASP13 meeting. After 4 years of significant improvements in prediction accuracy, another dramatic advance has taken place since CASP12 was held 2 years ago. The precision of predicting the top L/5 contacts in the free modeling category, where L is the corresponding length of the protein in residues, has exceeded 70%. As a comparison, the best-performing group at CASP12 with a 47% precision would have finished below the top 1/3 of the CASP13 groups. Extensively trained deep neural network approaches dominate the top performing algorithms, which appear to efficiently integrate information on coevolving residues and interacting fragments or possibly utilize memories of sequence similarities and sometimes can deliver accurate results even in the absence of virtually any target specific evolutionary information. If the current performance is evaluated by F-score on L contacts, it stands around 24% right now, which, despite the tremendous impact and advance in improving its utility for structure modeling, also suggests that there is much room left for further improvement.
Collapse
Affiliation(s)
- Rojan Shrestha
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Eduardo Fajardo
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Nelson Gil
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Bohdan Monastyrskyy
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Andras Fiser
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| |
Collapse
|
231
|
Zheng W, Li Y, Zhang C, Pearce R, Mortuza SM, Zhang Y. Deep-learning contact-map guided protein structure prediction in CASP13. Proteins 2019; 87:1149-1164. [PMID: 31365149 PMCID: PMC6851476 DOI: 10.1002/prot.25792] [Citation(s) in RCA: 131] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 07/14/2019] [Accepted: 07/27/2019] [Indexed: 12/28/2022]
Abstract
We report the results of two fully automated structure prediction pipelines, "Zhang-Server" and "QUARK", in CASP13. The pipelines were built upon the C-I-TASSER and C-QUARK programs, which in turn are based on I-TASSER and QUARK but with three new modules: (a) a novel multiple sequence alignment (MSA) generation protocol to construct deep sequence-profiles for contact prediction; (b) an improved meta-method, NeBcon, which combines multiple contact predictors, including ResPRE that predicts contact-maps by coupling precision-matrices with deep residual convolutional neural-networks; and (c) an optimized contact potential to guide structure assembly simulations. For 50 CASP13 FM domains that lacked homologous templates, average TM-scores of the first models produced by C-I-TASSER and C-QUARK were 28% and 56% higher than those constructed by I-TASSER and QUARK, respectively. For the first time, contact-map predictions demonstrated usefulness on TBM domains with close homologous templates, where TM-scores of C-I-TASSER models were significantly higher than those of I-TASSER models with a P-value <.05. Detailed data analyses showed that the success of C-I-TASSER and C-QUARK was mainly due to the increased accuracy of deep-learning-based contact-maps, as well as the careful balance between sequence-based contact restraints, threading templates, and generic knowledge-based potentials. Nevertheless, challenges still remain for predicting quaternary structure of multi-domain proteins, due to the difficulties in domain partitioning and domain reassembly. In addition, contact prediction in terminal regions was often unsatisfactory due to the sparsity of MSAs. Development of new contact-based domain partitioning and assembly methods and training contact models on sparse MSAs may help address these issues.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
232
|
Li Y, Zhang C, Bell EW, Yu DJ, Zhang Y. Ensembling multiple raw coevolutionary features with deep residual neural networks for contact-map prediction in CASP13. Proteins 2019; 87:1082-1091. [PMID: 31407406 PMCID: PMC6851483 DOI: 10.1002/prot.25798] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 07/20/2019] [Accepted: 08/08/2019] [Indexed: 12/26/2022]
Abstract
We report the results of residue-residue contact prediction of a new pipeline built purely on the learning of coevolutionary features in the CASP13 experiment. For a query sequence, the pipeline starts with the collection of multiple sequence alignments (MSAs) from multiple genome and metagenome sequence databases using two complementary Hidden Markov Model (HMM)-based searching tools. Three profile matrices, built on covariance, precision, and pseudolikelihood maximization respectively, are then created from the MSAs, which are used as the input features of a deep residual convolutional neural network architecture for contact-map training and prediction. Two ensembling strategies have been proposed to integrate the matrix features through end-to-end training and stacking, resulting in two complementary programs called TripletRes and ResTriplet, respectively. For the 31 free-modeling domains that do not have homologous templates in the PDB, TripletRes and ResTriplet generated comparable results with an average accuracy of 0.640 and 0.646, respectively, for the top L/5 long-range predictions, where 71% and 74% of the cases have an accuracy above 0.5. Detailed data analyses showed that the strength of the pipeline is due to the sensitive MSA construction and the advanced strategies for coevolutionary feature ensembling. Domain splitting was also found to help enhance the contact prediction performance. Nevertheless, contact models for tail regions, which often involve a high number of alignment gaps, and for targets with few homologous sequences are still suboptimal. Development of new approaches where the model is specifically trained on these regions and targets might help address these problems.
Collapse
Affiliation(s)
- Yang Li
- School of computer science and engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, China, 210094
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Dong-Jun Yu
- School of computer science and engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing, China, 210094
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109 USA
| |
Collapse
|
233
|
Coevolutive, evolutive and stochastic information in protein-protein interactions. Comput Struct Biotechnol J 2019; 17:1429-1435. [PMID: 31871588 PMCID: PMC6906720 DOI: 10.1016/j.csbj.2019.10.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Revised: 10/19/2019] [Accepted: 10/22/2019] [Indexed: 11/24/2022] Open
Abstract
Here, we investigate the contributions of coevolutive, evolutive and stochastic information in determining protein-protein interactions (PPIs) based on primary sequences of two interacting protein families A and B. Specifically, under the assumption that coevolutive information is imprinted on the interacting amino acids of two proteins in contrast to other (evolutive and stochastic) sources spread over their sequences, we dissect those contributions in terms of compensatory mutations at physically-coupled and uncoupled amino acids of A and B. We find that physically-coupled amino-acids at short range distances store the largest per-contact mutual information content, with a significant fraction of that content resulting from coevolutive sources alone. The information stored in coupled amino acids is shown further to discriminate multi-sequence alignments (MSAs) with the largest expectation fraction of PPI matches – a conclusion that holds against various definitions of intermolecular contacts and binding modes. When compared to the informational content resulting from evolution at long-range interactions, the mutual information in physically-coupled amino-acids is the strongest signal to distinguish PPIs derived from cospeciation and likely, the unique indication in case of molecular coevolution in independent genomes as the evolutive information must vanish for uncorrelated proteins.
Collapse
|
234
|
Wang X, Jing X, Deng Y, Nie Y, Xu F, Xu Y, Zhao YL, Hunt JF, Montelione GT, Szyperski T. Evolutionary coupling saturation mutagenesis: Coevolution-guided identification of distant sites influencing Bacillus naganoensis pullulanase activity. FEBS Lett 2019; 594:799-812. [PMID: 31665817 DOI: 10.1002/1873-3468.13652] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 10/15/2019] [Accepted: 10/25/2019] [Indexed: 01/20/2023]
Abstract
Pullulanases are well-known debranching enzymes hydrolyzing α-1,6-glycosidic linkages. To date, engineering of pullulanase is mainly focused on catalytic pocket or domain tailoring based on structure/sequence information. Saturation mutagenesis-involved directed evolution is, however, limited by the low number of mutational sites compatible with combinatorial libraries of feasible size. Using Bacillus naganoensis pullulanase as a target protein, here we introduce the 'evolutionary coupling saturation mutagenesis' (ECSM) approach: residue pair covariances are calculated to identify residues for saturation mutagenesis, focusing directed evolution on residue pairs playing important roles in natural evolution. Evolutionary coupling (EC) analysis identified seven residue pairs as evolutionary mutational hotspots. Subsequent saturation mutagenesis yielded variants with enhanced catalytic activity. The functional pairs apparently represent distant sites affecting enzyme activity.
Collapse
Affiliation(s)
- Xinye Wang
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China
| | - Xiaoran Jing
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China
| | - Yi Deng
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China
| | - Yao Nie
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China
| | - Fei Xu
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China
| | - Yan Xu
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi, China.,State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, China
| | - Yi-Lei Zhao
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic and Developmental Sciences, MOE-LSB & MOE-LSC, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, China
| | - John F Hunt
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.,Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.,Department of Chemistry and Chemical Biology, and Center for Biotechnology and Integrative Studies, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Thomas Szyperski
- Department of Chemistry, The State University of New York at Buffalo, NY, USA
| |
Collapse
|
235
|
Wang S, Fei S, Wang Z, Li Y, Xu J, Zhao F, Gao X. PredMP: a web server for de novo prediction and visualization of membrane proteins. Bioinformatics 2019; 35:691-693. [PMID: 30084960 DOI: 10.1093/bioinformatics/bty684] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Revised: 06/29/2018] [Accepted: 08/02/2018] [Indexed: 01/21/2023] Open
Abstract
MOTIVATION PredMP is the first web service, to our knowledge, that aims at de novo prediction of the membrane protein (MP) 3D structure followed by the embedding of the MP into the lipid bilayer for visualization. Our approach is based on a high-throughput Deep Transfer Learning (DTL) method that first predicts MP contacts by learning from non-MPs and then predicts the 3D model of the MP using the predicted contacts as distance restraints. This algorithm is derived from our previous Deep Learning (DL) method originally developed for soluble protein contact prediction, which has been officially ranked No. 1 in CASP12. The DTL framework in our approach overcomes the challenge that there are only a limited number of solved MP structures for training the deep learning model. There are three modules in the PredMP server: (i) The DTL framework followed by the contact-assisted folding protocol has already been implemented in RaptorX-Contact, which serves as the key module for 3D model generation; (ii) The 1D annotation module, implemented in RaptorX-Property, is used to predict the secondary structure and disordered regions; and (iii) the visualization module to display the predicted MPs embedded in the lipid bilayer guided by the predicted transmembrane topology. RESULTS Tested on 510 non-redundant MPs, our server predicts correct folds for ∼290 MPs, which significantly outperforms existing methods. Tested on a blind and live benchmark CAMEO from September 2016 to January 2018, PredMP can successfully model all 10 MPs belonging to the hard category. AVAILABILITY AND IMPLEMENTATION PredMP is freely accessed on the web at http://www.predmp.com. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sheng Wang
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | | | - Zongan Wang
- Department of Chemistry, James Franck Institute, University of Chicago, Chicago, IL, USA
| | - Yu Li
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | - Feng Zhao
- Prospect Institute of Fatty Acids and Health, Qingdao University, Ningxia, China
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| |
Collapse
|
236
|
AlQuraishi M. AlphaFold at CASP13. Bioinformatics 2019; 35:4862-4865. [PMID: 31116374 PMCID: PMC6907002 DOI: 10.1093/bioinformatics/btz422] [Citation(s) in RCA: 163] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 03/26/2019] [Accepted: 05/15/2019] [Indexed: 11/13/2022] Open
Abstract
SUMMARY Computational prediction of protein structure from sequence is broadly viewed as a foundational problem of biochemistry and one of the most difficult challenges in bioinformatics. Once every two years the Critical Assessment of protein Structure Prediction (CASP) experiments are held to assess the state of the art in the field in a blind fashion, by presenting predictor groups with protein sequences whose structures have been solved but have not yet been made publicly available. The first CASP was organized in 1994, and the latest, CASP13, took place last December, when for the first time the industrial laboratory DeepMind entered the competition. DeepMind's entry, AlphaFold, placed first in the Free Modeling (FM) category, which assesses methods on their ability to predict novel protein folds (the Zhang group placed first in the Template-Based Modeling (TBM) category, which assess methods on predicting proteins whose folds are related to ones already in the Protein Data Bank.) DeepMind's success generated significant public interest. Their approach builds on two ideas developed in the academic community during the preceding decade: (i) the use of co-evolutionary analysis to map residue co-variation in protein sequence to physical contact in protein structure, and (ii) the application of deep neural networks to robustly identify patterns in protein sequence and co-evolutionary couplings and convert them into contact maps. In this Letter, we contextualize the significance of DeepMind's entry within the broader history of CASP, relate AlphaFold's methodological advances to prior work, and speculate on the future of this important problem.
Collapse
Affiliation(s)
- Mohammed AlQuraishi
- Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA
- Lab of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
237
|
Wang Y, Shi Q, Yang P, Zhang C, Mortuza SM, Xue Z, Ning K, Zhang Y. Fueling ab initio folding with marine metagenomics enables structure and function predictions of new protein families. Genome Biol 2019; 20:229. [PMID: 31676016 PMCID: PMC6825341 DOI: 10.1186/s13059-019-1823-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Accepted: 09/13/2019] [Indexed: 02/01/2023] Open
Abstract
INTRODUCTION The ocean microbiome represents one of the largest microbiomes and produces nearly half of the primary energy on the planet through photosynthesis or chemosynthesis. Using recent advances in marine genomics, we explore new applications of oceanic metagenomes for protein structure and function prediction. RESULTS By processing 1.3 TB of high-quality reads from the Tara Oceans data, we obtain 97 million non-redundant genes. Of the 5721 Pfam families that lack experimental structures, 2801 have at least one member associated with the oceanic metagenomics dataset. We apply C-QUARK, a deep-learning contact-guided ab initio structure prediction pipeline, to model 27 families, where 20 are predicted to have a reliable fold with estimated template modeling score (TM-score) at least 0.5. Detailed analyses reveal that the abundance of microbial genera in the ocean is highly correlated to the frequency of occurrence in the modeled Pfam families, suggesting the significant role of the Tara Oceans genomes in the contact-map prediction and subsequent ab initio folding simulations. Of interesting note, PF15461, which has a majority of members coming from ocean-related bacteria, is identified as an important photosynthetic protein by structure-based function annotations. The pipeline is extended to a set of 417 Pfam families, built on the combination of Tara with other metagenomics datasets, which results in 235 families with an estimated TM-score over 0.5. CONCLUSIONS These results demonstrate a new avenue to improve the capacity of protein structure and function modeling through marine metagenomics, especially for difficult proteins with few homologous sequences.
Collapse
Affiliation(s)
- Yan Wang
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Qiang Shi
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Pengshuo Yang
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Zhidong Xue
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China.
| | - Kang Ning
- College of Life Science and Technology and College of Software, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China.
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
238
|
Buchko GW, Abendroth J, Robinson JI, Phan IQ, Myler PJ, Edwards TE. Structural diversity in the Mycobacteria DUF3349 superfamily. Protein Sci 2019; 29:670-685. [PMID: 31658388 DOI: 10.1002/pro.3758] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Revised: 10/17/2019] [Accepted: 10/21/2019] [Indexed: 11/11/2022]
Abstract
A protein superfamily with a "Domain of Unknown Function,", DUF3349 (PF11829), is present predominately in Mycobacterium and Rhodococcus bacterial species suggesting that these proteins may have a biological function unique to these bacteria. We previously reported the inaugural structure of a DUF3349 superfamily member, Mycobacterium tuberculosis Rv0543c. Here, we report the structures determined for three additional DUF3349 proteins: Mycobacterium smegmatis MSMEG_1063 and MSMEG_1066 and Mycobacterium abscessus MAB_3403c. Like Rv0543c, the NMR solution structure of MSMEG_1063 revealed a monomeric five α-helix bundle with a similar overall topology. Conversely, the crystal structure of MSMEG_1066 revealed a five α-helix protein with a strikingly different topology and a tetrameric quaternary structure that was confirmed by size exclusion chromatography. The NMR solution structure of a fourth member of the DUF3349 superfamily, MAB_3403c, with 18 residues missing at the N-terminus, revealed a monomeric α-helical protein with a folding topology similar to the three C-terminal helices in the protomer of the MSMEG_1066 tetramer. These structures, together with a GREMLIN-based bioinformatics analysis of the DUF3349 primary amino acid sequences, suggest two subfamilies within the DUF3349 family. The division of the DUF3349 into two distinct subfamilies would have been lost if structure solution had stopped with the first structure in the DUF3349 family, highlighting the insights generated by solving multiple structures within a protein superfamily. Future studies will determine if the structural diversity at the tertiary and quaternary levels in the DUF3349 protein superfamily have functional roles in Mycobacteria and Rhodococcus species with potential implications for structure-based drug discovery.
Collapse
Affiliation(s)
- Garry W Buchko
- Seattle Structural Genomics Center for Infectious Disease, Seattle, Washington.,Earth and Biological Sciences Directorate, Pacific Northwest National Laboratory, Richland, Washington.,School of Molecular Biosciences, Washington State University, Pullman, Washington
| | - Jan Abendroth
- Seattle Structural Genomics Center for Infectious Disease, Seattle, Washington.,UCB, Bainbridge Island, Washington
| | - John I Robinson
- Seattle Structural Genomics Center for Infectious Disease, Seattle, Washington.,UCB, Bainbridge Island, Washington
| | - Isabelle Q Phan
- Seattle Structural Genomics Center for Infectious Disease, Seattle, Washington.,Center for Global Infectious Disease Research, Seattle Children's Hospital, Seattle, Washington
| | - Peter J Myler
- Seattle Structural Genomics Center for Infectious Disease, Seattle, Washington.,Center for Global Infectious Disease Research, Seattle Children's Hospital, Seattle, Washington.,Department of Medical Education and Biomedical Informatics, University of Washington, Seattle, Washington.,Department of Global Health, University of Washington, Seattle, Washington
| | - Thomas E Edwards
- Seattle Structural Genomics Center for Infectious Disease, Seattle, Washington.,UCB, Bainbridge Island, Washington
| |
Collapse
|
239
|
Zhang H, Zhang Q, Ju F, Zhu J, Gao Y, Xie Z, Deng M, Sun S, Zheng WM, Bu D. Predicting protein inter-residue contacts using composite likelihood maximization and deep learning. BMC Bioinformatics 2019; 20:537. [PMID: 31664895 PMCID: PMC6821021 DOI: 10.1186/s12859-019-3051-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 08/22/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Accurate prediction of inter-residue contacts of a protein is important to calculating its tertiary structure. Analysis of co-evolutionary events among residues has been proved effective in inferring inter-residue contacts. The Markov random field (MRF) technique, although being widely used for contact prediction, suffers from the following dilemma: the actual likelihood function of MRF is accurate but time-consuming to calculate; in contrast, approximations to the actual likelihood, say pseudo-likelihood, are efficient to calculate but inaccurate. Thus, how to achieve both accuracy and efficiency simultaneously remains a challenge. RESULTS In this study, we present such an approach (called clmDCA) for contact prediction. Unlike plmDCA using pseudo-likelihood, i.e., the product of conditional probability of individual residues, our approach uses composite-likelihood, i.e., the product of conditional probability of all residue pairs. Composite likelihood has been theoretically proved as a better approximation to the actual likelihood function than pseudo-likelihood. Meanwhile, composite likelihood is still efficient to maximize, thus ensuring the efficiency of clmDCA. We present comprehensive experiments on popular benchmark datasets, including PSICOV dataset and CASP-11 dataset, to show that: i) clmDCA alone outperforms the existing MRF-based approaches in prediction accuracy. ii) When equipped with deep learning technique for refinement, the prediction accuracy of clmDCA was further significantly improved, suggesting the suitability of clmDCA for subsequent refinement procedure. We further present a successful application of the predicted contacts to accurately build tertiary structures for proteins in the PSICOV dataset. CONCLUSIONS Composite likelihood maximization algorithm can efficiently estimate the parameters of Markov Random Fields and can improve the prediction accuracy of protein inter-residue contacts.
Collapse
Affiliation(s)
- Haicang Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Qi Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Fusong Ju
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Jianwei Zhu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Yujuan Gao
- Center for Quantitative Biology, School of Mathematical Sciences, Center for Statistical Sciences, Peking University, Beijing, China
| | - Ziwei Xie
- College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, China
| | - Minghua Deng
- Center for Quantitative Biology, School of Mathematical Sciences, Center for Statistical Sciences, Peking University, Beijing, China
| | - Shiwei Sun
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China.
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China.
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China. .,University of Chinese Academy of Sciences, Beijing, China.
| |
Collapse
|
240
|
Hanson J, Paliwal K, Litfin T, Yang Y, Zhou Y. Accurate prediction of protein contact maps by coupling residual two-dimensional bidirectional long short-term memory with convolutional neural networks. Bioinformatics 2019; 34:4039-4045. [PMID: 29931279 DOI: 10.1093/bioinformatics/bty481] [Citation(s) in RCA: 70] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2018] [Accepted: 06/13/2018] [Indexed: 11/12/2022] Open
Abstract
Motivation Accurate prediction of a protein contact map depends greatly on capturing as much contextual information as possible from surrounding residues for a target residue pair. Recently, ultra-deep residual convolutional networks were found to be state-of-the-art in the latest Critical Assessment of Structure Prediction techniques (CASP12) for protein contact map prediction by attempting to provide a protein-wide context at each residue pair. Recurrent neural networks have seen great success in recent protein residue classification problems due to their ability to propagate information through long protein sequences, especially Long Short-Term Memory (LSTM) cells. Here, we propose a novel protein contact map prediction method by stacking residual convolutional networks with two-dimensional residual bidirectional recurrent LSTM networks, and using both one-dimensional sequence-based and two-dimensional evolutionary coupling-based information. Results We show that the proposed method achieves a robust performance over validation and independent test sets with the Area Under the receiver operating characteristic Curve (AUC) > 0.95 in all tests. When compared to several state-of-the-art methods for independent testing of 228 proteins, the method yields an AUC value of 0.958, whereas the next-best method obtains an AUC of 0.909. More importantly, the improvement is over contacts at all sequence-position separations. Specifically, a 8.95%, 5.65% and 2.84% increase in precision were observed for the top L∕10 predictions over the next best for short, medium and long-range contacts, respectively. This confirms the usefulness of ResNets to congregate the short-range relations and 2D-BRLSTM to propagate the long-range dependencies throughout the entire protein contact map 'image'. Availability and implementation SPOT-Contact server url: http://sparks-lab.org/jack/server/SPOT-Contact/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
| | - Yuedong Yang
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
- School of Data and Computer Science, Sun-Yat Sen University, Guangzhou, Guangdong, China
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Australia
| |
Collapse
|
241
|
Zheng W, Wuyun Q, Li Y, Mortuza SM, Zhang C, Pearce R, Ruan J, Zhang Y. Detecting distant-homology protein structures by aligning deep neural-network based contact maps. PLoS Comput Biol 2019; 15:e1007411. [PMID: 31622328 PMCID: PMC6818797 DOI: 10.1371/journal.pcbi.1007411] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Revised: 10/29/2019] [Accepted: 09/21/2019] [Indexed: 12/31/2022] Open
Abstract
Accurate prediction of atomic-level protein structure is important for annotating the biological functions of protein molecules and for designing new compounds to regulate the functions. Template-based modeling (TBM), which aims to construct structural models by copying and refining the structural frameworks of other known proteins, remains the most accurate method for protein structure prediction. Due to the difficulty in recognizing distant-homology templates, however, the accuracy of TBM decreases rapidly when the evolutionary relationship between the query and template vanishes. In this study, we propose a new method, CEthreader, which first predicts residue-residue contacts by coupling evolutionary precision matrices with deep residual convolutional neural-networks. The predicted contact maps are then integrated with sequence profile alignments to recognize structural templates from the PDB. The method was tested on two independent benchmark sets consisting collectively of 1,153 non-homologous protein targets, where CEthreader detected 176% or 36% more correct templates with a TM-score >0.5 than the best state-of-the-art profile- or contact-based threading methods, respectively, for the Hard targets that lacked homologous templates. Moreover, CEthreader was able to identify 114% or 20% more correct templates with the same Fold as the query, after excluding structures from the same SCOPe Superfamily, than the best profile- or contact-based threading methods. Detailed analyses show that the major advantage of CEthreader lies in the efficient coupling of contact maps with profile alignments, which helps recognize global fold of protein structures when the homologous relationship between the query and template is weak. These results demonstrate an efficient new strategy to combine ab initio contact map prediction with profile alignments to significantly improve the accuracy of template-based structure prediction, especially for distant-homology proteins. Despite decades of effort in computational method development, template-based modeling (TBM) still remains the most reliable approach to high-resolution protein structure prediction. Previous studies have shown that the PDB library is complete for single-domain proteins and TBM is in principle sufficient to solve the structure prediction problem if the most similar structure in the PDB could be reliably identified and used as template for model reconstruction. But in reality, the success of TBM depends on the availability of closely-homologous templates, where its accuracy and reliability decrease sharply when the evolutionary relationship between query and template becomes more distant. We developed a new threading approach, CEthreader, which allows for dynamic programing alignments of predicted contact-maps through eigen-decomposition. The large-scale benchmark tests show that the coupling of contact map with profile and secondary structure alignments through the proposed protocol can significantly improve the accuracy of template recognition for distantly-homologous protein targets.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
- College of Mathematical Sciences and LPMC, Nankai University, Tianjin, PR China
| | - Qiqige Wuyun
- College of Mathematical Sciences and LPMC, Nankai University, Tianjin, PR China
- Computer Science and Engineering Department, Michigan State University, East Lansing, MI, United States of America
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
| | - S. M. Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
| | - Jishou Ruan
- College of Mathematical Sciences and LPMC, Nankai University, Tianjin, PR China
- State Key Laboratory of Medicinal Chemical Biology, Nankai University, Tianjin, PR China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, United States of America
- * E-mail:
| |
Collapse
|
242
|
Kandathil SM, Greener JG, Jones DT. Recent developments in deep learning applied to protein structure prediction. Proteins 2019; 87:1179-1189. [PMID: 31589782 PMCID: PMC6899861 DOI: 10.1002/prot.25824] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 12/29/2022]
Abstract
Although many structural bioinformatics tools have been using neural network models for a long time, deep neural network (DNN) models have attracted considerable interest in recent years. Methods employing DNNs have had a significant impact in recent CASP experiments, notably in CASP12 and especially CASP13. In this article, we offer a brief introduction to some of the key principles and properties of DNN models and discuss why they are naturally suited to certain problems in structural bioinformatics. We also briefly discuss methodological improvements that have enabled these successes. Using the contact prediction task as an example, we also speculate why DNN models are able to produce reasonably accurate predictions even in the absence of many homologues for a given target sequence, a result that can at first glance appear surprising given the lack of input information. We end on some thoughts about how and why these types of models can be so effective, as well as a discussion on potential pitfalls.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - Joe G Greener
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| | - David T Jones
- Department of Computer Science, University College London, London, UK.,Biomedical Data Science Laboratory, The Francis Crick Institute, London, UK
| |
Collapse
|
243
|
Levine TP. Remote homology searches identify bacterial homologues of eukaryotic lipid transfer proteins, including Chorein-N domains in TamB and AsmA and Mdm31p. BMC Mol Cell Biol 2019; 20:43. [PMID: 31607262 PMCID: PMC6791001 DOI: 10.1186/s12860-019-0226-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 09/05/2019] [Indexed: 02/07/2023] Open
Abstract
Background All cells rely on lipids for key functions. Lipid transfer proteins allow lipids to exit the hydrophobic environment of bilayers, and cross aqueous spaces. One lipid transfer domain fold present in almost all eukaryotes is the TUbular LIPid binding (TULIP) domain. Three TULIP families have been identified in bacteria (P47, OrfX2 and YceB), but their homology to eukaryotic proteins is too low to specify a common origin. Another recently described eukaryotic lipid transfer domain in VPS13 and ATG2 is Chorein-N, which has no known bacterial homologues. There has been no systematic search for bacterial TULIPs or Chorein-N domains. Results Remote homology predictions for bacterial TULIP domains using HHsearch identified four new TULIP domains in three bacterial families. DUF4403 is a full length pseudo-dimeric TULIP with a 6 strand β-meander dimer interface like eukaryotic TULIPs. A similar sheet is also present in YceB, suggesting it homo-dimerizes. TULIP domains were also found in DUF2140 and in the C-terminus DUF2993. Remote homology predictions for bacterial Chorein-N domains identified strong hits in the N-termini of AsmA and TamB in diderm bacteria, which are related to Mdm31p in eukaryotic mitochondria. The N-terminus of DUF2993 has a Chorein-N domain adjacent to its TULIP domain. Conclusions TULIP lipid transfer domains are widespread in bacteria. Chorein-N domains are also found in bacteria, at the N-terminus of multiple proteins in the intermembrane space of diderms (AsmA, TamB and their relatives) and in Mdm31p, a protein that is likely to have evolved from an AsmA/TamB-like protein in the endosymbiotic mitochondrial ancestor. This indicates that both TULIP and Chorein-N lipid transfer domains may have originated in bacteria.
Collapse
Affiliation(s)
- Timothy P Levine
- UCL Institute of Ophthalmology, 11-43 Bath Street, London, EC1V 9EL, UK.
| |
Collapse
|
244
|
Abstract
Homologous sequence alignments contain important information about the constraints that shape protein family evolution. Correlated changes between different residues, for instance, can be highly predictive of physical contacts within three-dimensional structures. Detecting such co-evolutionary signals via direct coupling analysis is particularly challenging given the shared phylogenetic history and uneven sampling of different lineages from which protein sequences are derived. Current best practices for mitigating such effects include sequence-identity-based weighting of input sequences and post-hoc re-scaling of evolutionary coupling scores. However, numerous weighting schemes have been previously developed for other applications, and it is unknown whether any of these schemes may better account for phylogenetic artifacts in evolutionary coupling analyses. Here, we show across a dataset of 150 diverse protein families that the current best practices out-perform several alternative sequence- and tree-based weighting methods. Nevertheless, we find that sequence weighting in general provides only a minor benefit relative to post-hoc transformations that re-scale the derived evolutionary couplings. While our findings do not rule out the possibility that an as-yet-untested weighting method may show improved results, the similar predictive accuracies that we observe across distinct weighting methods suggests that there may be little room for further improvement on top of existing strategies.
Collapse
|
245
|
Croce G, Gueudré T, Ruiz Cuevas MV, Keidel V, Figliuzzi M, Szurmant H, Weigt M. A multi-scale coevolutionary approach to predict interactions between protein domains. PLoS Comput Biol 2019; 15:e1006891. [PMID: 31634362 PMCID: PMC6822775 DOI: 10.1371/journal.pcbi.1006891] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 10/31/2019] [Accepted: 09/27/2019] [Indexed: 11/18/2022] Open
Abstract
Interacting proteins and protein domains coevolve on multiple scales, from their correlated presence across species, to correlations in amino-acid usage. Genomic databases provide rapidly growing data for variability in genomic protein content and in protein sequences, calling for computational predictions of unknown interactions. We first introduce the concept of direct phyletic couplings, based on global statistical models of phylogenetic profiles. They strongly increase the accuracy of predicting pairs of related protein domains beyond simpler correlation-based approaches like phylogenetic profiling (80% vs. 30-50% positives out of the 1000 highest-scoring pairs). Combined with the direct coupling analysis of inter-protein residue-residue coevolution, we provide multi-scale evidence for direct but unknown interaction between protein families. An in-depth discussion shows these to be biologically sensible and directly experimentally testable. Negative phyletic couplings highlight alternative solutions for the same functionality, including documented cases of convergent evolution. Thereby our work proves the strong potential of global statistical modeling approaches to genome-wide coevolutionary analysis, far beyond the established use for individual protein complexes and domain-domain interactions.
Collapse
Affiliation(s)
- Giancarlo Croce
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | | | - Maria Virginia Ruiz Cuevas
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Victoria Keidel
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Matteo Figliuzzi
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| |
Collapse
|
246
|
Porter KA, Padhorny D, Desta I, Ignatov M, Beglov D, Kotelnikov S, Sun Z, Alekseenko A, Anishchenko I, Cong Q, Ovchinnikov S, Baker D, Vajda S, Kozakov D. Template-based modeling by ClusPro in CASP13 and the potential for using co-evolutionary information in docking. Proteins 2019; 87:1241-1248. [PMID: 31444975 DOI: 10.1002/prot.25808] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2019] [Revised: 07/21/2019] [Accepted: 07/30/2019] [Indexed: 12/29/2022]
Abstract
As a participant in the joint CASP13-CAPRI46 assessment, the ClusPro server debuted its new template-based modeling functionality. The addition of this feature, called ClusPro TBM, was motivated by the previous CASP-CAPRI assessments and by the proven ability of template-based methods to produce higher-quality models, provided templates are available. In prior assessments, ClusPro submissions consisted of models that were produced via free docking of pre-generated homology models. This method was successful in terms of the number of acceptable predictions across targets; however, analysis of results showed that purely template-based methods produced a substantially higher number of medium-quality models for targets for which there were good templates available. The addition of template-based modeling has expanded ClusPro's ability to produce higher accuracy predictions, primarily for homomeric but also for some heteromeric targets. Here we review the newest additions to the ClusPro web server and discuss examples of CASP-CAPRI targets that continue to drive further development. We also describe ongoing work not yet implemented in the server. This includes the development of methods to improve template-based models and the use of co-evolutionary information for data-assisted free docking.
Collapse
Affiliation(s)
- Kathryn A Porter
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Dzmitry Padhorny
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York.,Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Israel Desta
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Mikhail Ignatov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York.,Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Dmitri Beglov
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Sergei Kotelnikov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York.,Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York.,Moscow Institute of Physics and Technology, Dolgoprudny, Russia
| | - Zhuyezi Sun
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts
| | - Andrey Alekseenko
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York.,Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, Washington.,Institute for Protein Design, University of Washington, Seattle, Washington
| | - Qian Cong
- Department of Biochemistry, University of Washington, Seattle, Washington.,Institute for Protein Design, University of Washington, Seattle, Washington
| | - Sergey Ovchinnikov
- Center for Systems Biology, Harvard University, Cambridge, Massachusetts
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, Washington.,Institute for Protein Design, University of Washington, Seattle, Washington.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts.,Department of Chemistry, Boston University, Boston, Massachusetts
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York.,Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York
| |
Collapse
|
247
|
Martinez-Ortiz W, Cardozo TJ. An Improved Method for Modeling Voltage-Gated Ion Channels at Atomic Accuracy Applied to Human Ca v Channels. Cell Rep 2019; 23:1399-1408. [PMID: 29719253 PMCID: PMC5957504 DOI: 10.1016/j.celrep.2018.04.024] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 11/01/2017] [Accepted: 04/04/2018] [Indexed: 12/26/2022] Open
Abstract
Voltage-gated ion channels (VGICs) are associated with hundreds of human diseases. To date, 3D structural models of human VGICs have not been reported. We developed a 3D structural integrity metric to rank the accuracy of all VGIC structures deposited in the PDB. The metric revealed inaccuracies in structural models built from recent single-particle, non-crystalline cryo-electron microscopy maps and enabled the building of highly accurate homology models of human Cav channel α1 subunits at atomic resolution. Human Cav Mendelian mutations mostly located to segments involved in the mechanism of voltage sensing and gating within the 3D structure, with multiple mutations targeting equivalent 3D structural locations despite eliciting distinct clinical phenotypes. The models also revealed that the architecture of the ion selectivity filter is highly conserved from bacteria to humans and between sodium and calcium VGICs.
Collapse
Affiliation(s)
- Wilnelly Martinez-Ortiz
- Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016, USA
| | - Timothy J Cardozo
- Department of Biochemistry and Molecular Pharmacology, NYU Langone Health, New York, NY 10016, USA.
| |
Collapse
|
248
|
Cross KL, Campbell JH, Balachandran M, Campbell AG, Cooper SJ, Griffen A, Heaton M, Joshi S, Klingeman D, Leys E, Yang Z, Parks JM, Podar M. Targeted isolation and cultivation of uncultivated bacteria by reverse genomics. Nat Biotechnol 2019; 37:1314-1321. [PMID: 31570900 PMCID: PMC6858544 DOI: 10.1038/s41587-019-0260-6] [Citation(s) in RCA: 190] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2019] [Accepted: 08/15/2019] [Indexed: 12/16/2022]
Abstract
Most microorganisms from all taxonomic levels are uncultured. Single-cell
genomes and metagenomes continue to increase the known diversity of
Bacteria and Archaea, but while
‘omics can be used to infer physiological or ecological roles for species
in a community, most of those hypothetical roles remain unvalidated. Here we
report an approach to capture specific microorganisms from complex communities
into pure cultures using genome-informed antibody engineering. We apply our
reverse genomics approach to isolate and sequence single cells and to cultivate
three different species-level lineages of human oral Saccharibacteria/TM7. Using
our pure cultures we show that all three saccharibacteria species are epibionts
of diverse Actinobacteria. We also isolate and cultivate human
oral SR1 bacteria, which are members of a lineage of previously uncultured
bacteria. Reverse-genomics-enabled cultivation of microorganisms can be applied
to any species from any environment and has the potential to unlock the
isolation, cultivation and characterization of species from as-yet-uncultured
branches of the microbial tree of life.
Collapse
Affiliation(s)
- Karissa L Cross
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.,Department of Microbiology, University of Tennessee, Knoxville, TN, USA
| | - James H Campbell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.,Department of Natural Sciences, Northwest Missouri State University, Maryville, MO, USA
| | | | - Alisha G Campbell
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.,Genome Science and Technology Program, University of Tennessee, Knoxville, TN, USA.,Department of Natural Sciences, Northwest Missouri State University, Maryville, MO, USA
| | - Sarah J Cooper
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.,Genome Science and Technology Program, University of Tennessee, Knoxville, TN, USA
| | - Ann Griffen
- College of Dentistry, The Ohio State University, Columbus, OH, USA
| | | | - Snehal Joshi
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Dawn Klingeman
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Eugene Leys
- College of Dentistry, The Ohio State University, Columbus, OH, USA
| | - Zamin Yang
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Jerry M Parks
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA.,Genome Science and Technology Program, University of Tennessee, Knoxville, TN, USA
| | - Mircea Podar
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA. .,Department of Microbiology, University of Tennessee, Knoxville, TN, USA. .,Genome Science and Technology Program, University of Tennessee, Knoxville, TN, USA.
| |
Collapse
|
249
|
Zhu L, Hofestadt R, Ester M. Tissue-Specific Subcellular Localization Prediction Using Multi-Label Markov Random Fields. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1471-1482. [PMID: 30736003 DOI: 10.1109/tcbb.2019.2897683] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The understanding of subcellular localization (SCL) of proteins and proteome variation in the different tissues and organs of the human body are two crucial aspects for increasing our knowledge of the dynamic rules of proteins, the cell biology, and the mechanism of diseases. Although there have been tremendous contributions to these two fields independently, the lack of knowledge of the variation of spatial distribution of proteins in the different tissues still exists. Here, we proposed an approach that allows predicting protein SCL on tissue specificity through the use of tissue-specific functional associations and physical protein-protein interactions (PPIs). We applied our previously developed Bayesian collective Markov random fields (BCMRFs) on tissue-specific protein-protein interaction network (PPI network) for nine types of tissues focusing on eight high-level SCL. The evaluated results demonstrate the strength of our approach in predicting tissue-specific SCL. We identified 1,314 proteins that their SCL were previously proven cell line dependent. We predicted 549 novel tissue-specific localized candidate proteins while some of them were validated via text-mining.
Collapse
|
250
|
Accurate Classification of Biological and non-Biological Interfaces in Protein Crystal Structures using Subtle Covariation Signals. Sci Rep 2019; 9:12603. [PMID: 31471543 PMCID: PMC6717244 DOI: 10.1038/s41598-019-48913-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 08/14/2019] [Indexed: 11/08/2022] Open
Abstract
Proteins often work as oligomers or multimers in vivo. Therefore, elucidating their oligomeric or multimeric form (quaternary structure) is crucially important to ascertain their function. X-ray crystal structures of numerous proteins have been accumulated, providing information related to their biological units. Extracting information of biological units from protein crystal structures represents a meaningful task for modern biology. Nevertheless, although many methods have been proposed for identifying biological units appearing in protein crystal structures, it is difficult to distinguish biological protein-protein interfaces from crystallographic ones. Therefore, our simple but highly accurate classifier was developed to infer biological units in protein crystal structures using large amounts of protein sequence information and a modern contact prediction method to exploit covariation signals (CSs) in proteins. We demonstrate that our proposed method is promising even for weak signals of biological interfaces. We also discuss the relation between classification accuracy and conservation of biological units, and illustrate how the selection of sequences included in multiple sequence alignments as sources for obtaining CSs affects the results. With increased amounts of sequence data, the proposed method is expected to become increasingly useful.
Collapse
|