1
|
Varadi M, Bertoni D, Magana P, Paramval U, Pidruchna I, Radhakrishnan M, Tsenkov M, Nair S, Mirdita M, Yeo J, Kovalevskiy O, Tunyasuvunakool K, Laydon A, Žídek A, Tomlinson H, Hariharan D, Abrahamson J, Green T, Jumper J, Birney E, Steinegger M, Hassabis D, Velankar S. AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences. Nucleic Acids Res 2024; 52:D368-D375. [PMID: 37933859 PMCID: PMC10767828 DOI: 10.1093/nar/gkad1011] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 10/13/2023] [Accepted: 10/18/2023] [Indexed: 11/08/2023] Open
Abstract
The AlphaFold Database Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) has significantly impacted structural biology by amassing over 214 million predicted protein structures, expanding from the initial 300k structures released in 2021. Enabled by the groundbreaking AlphaFold2 artificial intelligence (AI) system, the predictions archived in AlphaFold DB have been integrated into primary data resources such as PDB, UniProt, Ensembl, InterPro and MobiDB. Our manuscript details subsequent enhancements in data archiving, covering successive releases encompassing model organisms, global health proteomes, Swiss-Prot integration, and a host of curated protein datasets. We detail the data access mechanisms of AlphaFold DB, from direct file access via FTP to advanced queries using Google Cloud Public Datasets and the programmatic access endpoints of the database. We also discuss the improvements and services added since its initial release, including enhancements to the Predicted Aligned Error viewer, customisation options for the 3D viewer, and improvements in the search engine of AlphaFold DB.
Collapse
Affiliation(s)
- Mihaly Varadi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Damian Bertoni
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Paulyna Magana
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Urmila Paramval
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Ivanna Pidruchna
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Maxim Tsenkov
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Sreenath Nair
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | - Jingi Yeo
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | | | | | | | | | | | | | | | | | | | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
| | | | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| |
Collapse
|
2
|
Jumper J, Hassabis D. The Protein Structure Prediction Revolution and Its Implications for Medicine: 2023 Albert Lasker Basic Medical Research Award. JAMA 2023; 330:1425-1426. [PMID: 37732824 DOI: 10.1001/jama.2023.17095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
In this Viewpoint, 2023 Lasker award winners John Jumper and Demis Hassabis describe their invention, the artificial intelligence–based system AlphaFold, which is able to predict protein structure with great accuracy.
Collapse
|
3
|
Cheng J, Novati G, Pan J, Bycroft C, Žemgulytė A, Applebaum T, Pritzel A, Wong LH, Zielinski M, Sargeant T, Schneider RG, Senior AW, Jumper J, Hassabis D, Kohli P, Avsec Ž. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 2023; 381:eadg7492. [PMID: 37733863 DOI: 10.1126/science.adg7492] [Citation(s) in RCA: 128] [Impact Index Per Article: 128.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2023] [Accepted: 08/23/2023] [Indexed: 09/23/2023]
Abstract
The vast majority of missense variants observed in the human genome are of unknown clinical significance. We present AlphaMissense, an adaptation of AlphaFold fine-tuned on human and primate variant population frequency databases to predict missense variant pathogenicity. By combining structural context and evolutionary conservation, our model achieves state-of-the-art results across a wide range of genetic and experimental benchmarks, all without explicitly training on such data. The average pathogenicity score of genes is also predictive for their cell essentiality, capable of identifying short essential genes that existing statistical approaches are underpowered to detect. As a resource to the community, we provide a database of predictions for all possible human single amino acid substitutions and classify 89% of missense variants as either likely benign or likely pathogenic.
Collapse
|
4
|
Varadi M, Nair S, Sillitoe I, Tauriello G, Anyango S, Bienert S, Borges C, Deshpande M, Green T, Hassabis D, Hatos A, Hegedus T, Hekkelman ML, Joosten R, Jumper J, Laydon A, Molodenskiy D, Piovesan D, Salladini E, Salzberg SL, Sommer MJ, Steinegger M, Suhajda E, Svergun D, Tenorio-Ku L, Tosatto S, Tunyasuvunakool K, Waterhouse AM, Žídek A, Schwede T, Orengo C, Velankar S. 3D-Beacons: decreasing the gap between protein sequences and structures through a federated network of protein structure data resources. Gigascience 2022; 11:6854872. [PMID: 36448847 PMCID: PMC9709962 DOI: 10.1093/gigascience/giac118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/20/2022] [Accepted: 11/11/2022] [Indexed: 12/02/2022] Open
Abstract
While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, the gap between the number of known protein sequences and their experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational protein modeling approaches. While often powerful on their own, most methods have strengths and weaknesses. Therefore, it benefits researchers to examine models from various model providers and perform comparative analysis to identify what models can best address their specific use cases. To make data from a large array of model providers more easily accessible to the broader scientific community, we established 3D-Beacons, a collaborative initiative to create a federated network with unified data access mechanisms. The 3D-Beacons Network allows researchers to collate coordinate files and metadata for experimentally determined and theoretical protein models from state-of-the-art and specialist model providers and also from the Protein Data Bank.
Collapse
Affiliation(s)
- Mihaly Varadi
- Correspondence address. Mihaly Varadi, PDBe team, Wellcome Trust Genome Campus, Saffron Walden CB10 1SA, UK. E-mail:
| | | | | | | | - Stephen Anyango
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| | - Stefan Bienert
- Biozentrum, University of Basel, Basel 4056, Switzerland,Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Clemente Borges
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland,European Molecular Biology Laboratory, EMBL Hamburg, Hamburg 69117, Germany
| | - Mandar Deshpande
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| | | | | | - Andras Hatos
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy,Department of Oncology, Lausanne University Hospital, Lausanne 1015, Switzerland,Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland,Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland,Swiss Cancer Center Leman, Lausanne 1005, Switzerland
| | - Tamas Hegedus
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest 1094, Hungary
| | | | - Robbie Joosten
- Netherlands Cancer Institute, Amsterdam 1066 CX, The Netherlands
| | | | | | - Dmitry Molodenskiy
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland,European Molecular Biology Laboratory, EMBL Hamburg, Hamburg 69117, Germany
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
| | - Edoardo Salladini
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
| | - Steven L Salzberg
- Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Markus J Sommer
- Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Martin Steinegger
- School of Biology, Seoul National University, Seoul 82-2-880-6971, 6977, South Korea
| | - Erzsebet Suhajda
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest 1094, Hungary
| | - Dmitri Svergun
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland,European Molecular Biology Laboratory, EMBL Hamburg, Hamburg 69117, Germany
| | - Luiggi Tenorio-Ku
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
| | - Silvio Tosatto
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
| | | | - Andrew Mark Waterhouse
- Biozentrum, University of Basel, Basel 4056, Switzerland,Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | | | - Torsten Schwede
- Biozentrum, University of Basel, Basel 4056, Switzerland,Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Christine Orengo
- Department of Structural and Molecular Biology, UCL, London WC1E 6BT, UK
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| |
Collapse
|
5
|
Judge RA, Sridar J, Tunyasuvunakool K, Jain R, Wang JCK, Ouch C, Xu J, Mafi A, Nile AH, Remarcik C, Smith CL, Ghosh C, Xu C, Stoll V, Jumper J, Singh AH, Eaton D, Hao Q. Author Correction: Structure of the PAPP-A BP5 complex reveals mechanism of substrate recognition. Nat Commun 2022; 13:5694. [PMID: 36171222 PMCID: PMC9519949 DOI: 10.1038/s41467-022-33522-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Affiliation(s)
| | - Janani Sridar
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | - Rinku Jain
- AbbVie, 1 North Waukegan Road, North Chicago, IL, USA
| | - John C K Wang
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Christna Ouch
- Department of Biochemistry & Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Jun Xu
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | - Aaron H Nile
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | | | - Crystal Ghosh
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Chen Xu
- Department of Biochemistry & Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Vincent Stoll
- AbbVie, 1 North Waukegan Road, North Chicago, IL, USA
| | | | - Amoolya H Singh
- Calico Life Sciences LLC, South San Francisco, CA, USA.,GRAIL, Menlo Park, CA, USA
| | - Dan Eaton
- Calico Life Sciences LLC, South San Francisco, CA, USA.
| | - Qi Hao
- Calico Life Sciences LLC, South San Francisco, CA, USA.
| |
Collapse
|
6
|
Judge RA, Sridar J, Tunyasunvunakool K, Jain R, Wang JCK, Ouch C, Xu J, Mafi A, Nile AH, Remarcik C, Smith CL, Ghosh C, Xu C, Stoll V, Jumper J, Singh AH, Eaton D, Hao Q. Structure of the PAPP-A BP5 complex reveals mechanism of substrate recognition. Nat Commun 2022; 13:5500. [PMID: 36127359 PMCID: PMC9489782 DOI: 10.1038/s41467-022-33175-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 09/03/2022] [Indexed: 11/09/2022] Open
Abstract
Insulin-like growth factor (IGF) signaling is highly conserved and tightly regulated by proteases including Pregnancy-Associated Plasma Protein A (PAPP-A). PAPP-A and its paralog PAPP-A2 are metalloproteases that mediate IGF bioavailability through cleavage of IGF binding proteins (IGFBPs). Here, we present single-particle cryo-EM structures of the catalytically inactive mutant PAPP-A (E483A) in complex with a peptide from its substrate IGFBP5 (PAPP-ABP5) and also in its substrate-free form, by leveraging the power of AlphaFold to generate a high quality predicted model as a starting template. We show that PAPP-A is a flexible trans-dimer that binds IGFBP5 via a 25-amino acid anchor peptide which extends into the metalloprotease active site. This unique IGFBP5 anchor peptide that mediates the specific PAPP-A-IGFBP5 interaction is not found in other PAPP-A substrates. Additionally, we illustrate the critical role of the PAPP-A central domain as it mediates both IGFBP5 recognition and trans-dimerization. We further demonstrate that PAPP-A trans-dimer formation and distal inter-domain interactions are both required for efficient proteolysis of IGFBP4, but dispensable for IGFBP5 cleavage. Together the structural and biochemical studies reveal the mechanism of PAPP-A substrate binding and selectivity.
Collapse
Affiliation(s)
| | - Janani Sridar
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | - Rinku Jain
- AbbVie, 1 North Waukegan Road, North Chicago, IL, USA
| | - John C K Wang
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Christna Ouch
- Department of Biochemistry & Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Jun Xu
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | - Aaron H Nile
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | | | - Crystal Ghosh
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | - Chen Xu
- Department of Biochemistry & Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, MA, USA
| | - Vincent Stoll
- AbbVie, 1 North Waukegan Road, North Chicago, IL, USA
| | | | - Amoolya H Singh
- Calico Life Sciences LLC, South San Francisco, CA, USA.,GRAIL, Menlo Park, CA, USA
| | - Dan Eaton
- Calico Life Sciences LLC, South San Francisco, CA, USA.
| | - Qi Hao
- Calico Life Sciences LLC, South San Francisco, CA, USA.
| |
Collapse
|
7
|
Fraser A, Sokolova ML, Drobysheva AV, Gordeeva JV, Borukhov S, Jumper J, Severinov KV, Leiman PG. Structural basis of template strand deoxyuridine promoter recognition by a viral RNA polymerase. Nat Commun 2022; 13:3526. [PMID: 35725571 PMCID: PMC9209446 DOI: 10.1038/s41467-022-31214-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 06/07/2022] [Indexed: 11/23/2022] Open
Abstract
Recognition of promoters in bacterial RNA polymerases (RNAPs) is controlled by sigma subunits. The key sequence motif recognized by the sigma, the -10 promoter element, is located in the non-template strand of the double-stranded DNA molecule ~10 nucleotides upstream of the transcription start site. Here, we explain the mechanism by which the phage AR9 non-virion RNAP (nvRNAP), a bacterial RNAP homolog, recognizes the -10 element of its deoxyuridine-containing promoter in the template strand. The AR9 sigma-like subunit, the nvRNAP enzyme core, and the template strand together form two nucleotide base-accepting pockets whose shapes dictate the requirement for the conserved deoxyuridines. A single amino acid substitution in the AR9 sigma-like subunit allows one of these pockets to accept a thymine thus expanding the promoter consensus. Our work demonstrates the extent to which viruses can evolve host-derived multisubunit enzymes to make transcription of their own genes independent of the host.
Collapse
Affiliation(s)
- Alec Fraser
- grid.176731.50000 0001 1547 9964Department of Biochemistry and Molecular Biology, Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, TX 77555-0647 USA
| | - Maria L. Sokolova
- grid.176731.50000 0001 1547 9964Department of Biochemistry and Molecular Biology, Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, TX 77555-0647 USA ,grid.454320.40000 0004 0555 3608Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, 121205 Russia
| | - Arina V. Drobysheva
- grid.454320.40000 0004 0555 3608Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, 121205 Russia
| | - Julia V. Gordeeva
- grid.454320.40000 0004 0555 3608Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, 121205 Russia
| | - Sergei Borukhov
- grid.262671.60000 0000 8828 4546Department of Cell Biology and Neuroscience, Rowan University School of Osteopathic Medicine at Stratford, Stratford, NJ 08084-1489 USA
| | - John Jumper
- grid.498210.60000 0004 5999 1726DeepMind Technologies Limited, London, UK
| | - Konstantin V. Severinov
- grid.454320.40000 0004 0555 3608Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, 121205 Russia ,grid.4886.20000 0001 2192 9124Institute of Molecular Genetics, Russian Academy of Sciences, Moscow, 123182 Russia ,grid.430387.b0000 0004 1936 8796Waksman Institute for Microbiology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854 USA
| | - Petr G. Leiman
- grid.176731.50000 0001 1547 9964Department of Biochemistry and Molecular Biology, Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, TX 77555-0647 USA
| |
Collapse
|
8
|
Stsiapanava A, Xu C, Nishio S, Han L, Yamakawa N, Carroni M, Tunyasuvunakool K, Jumper J, de Sanctis D, Wu B, Jovine L. Structure of the decoy module of human glycoprotein 2 and uromodulin and its interaction with bacterial adhesin FimH. Nat Struct Mol Biol 2022; 29:190-193. [PMID: 35273390 PMCID: PMC8930769 DOI: 10.1038/s41594-022-00729-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Accepted: 01/21/2022] [Indexed: 12/25/2022]
Abstract
Glycoprotein 2 (GP2) and uromodulin (UMOD) filaments protect against gastrointestinal and urinary tract infections by acting as decoys for bacterial fimbrial lectin FimH. By combining AlphaFold2 predictions with X-ray crystallography and cryo-EM, we show that these proteins contain a bipartite decoy module whose new fold presents the high-mannose glycan recognized by FimH. The structure rationalizes UMOD mutations associated with kidney diseases and visualizes a key epitope implicated in cast nephropathy. AlphaFold2 predictions, X-ray crystallography and cryo-EM analyses reveal how related human glycoproteins GP2 and uromodulin catch pathogenic bacteria by presenting a high-mannose glycan that acts as a decoy for fimbrial adhesin FimH.
Collapse
Affiliation(s)
- Alena Stsiapanava
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Chenrui Xu
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.,NTU Institute of Structural Biology, Nanyang Technological University, Singapore, Singapore
| | - Shunsuke Nishio
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Ling Han
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Nao Yamakawa
- US 41-UMS 2014-PLBS, Université de Lille, CNRS, INSERM, CHU Lille, Institut Pasteur de Lille, Lille, France
| | - Marta Carroni
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | | | | | | | - Bin Wu
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.,NTU Institute of Structural Biology, Nanyang Technological University, Singapore, Singapore
| | - Luca Jovine
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden. .,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
9
|
Varadi M, Anyango S, Deshpande M, Nair S, Natassia C, Yordanova G, Yuan D, Stroe O, Wood G, Laydon A, Žídek A, Green T, Tunyasuvunakool K, Petersen S, Jumper J, Clancy E, Green R, Vora A, Lutfi M, Figurnov M, Cowie A, Hobbs N, Kohli P, Kleywegt G, Birney E, Hassabis D, Velankar S. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 2021; 50:D439-D444. [PMID: 34791371 PMCID: PMC8728224 DOI: 10.1093/nar/gkab1061] [Citation(s) in RCA: 2945] [Impact Index Per Article: 981.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Revised: 10/14/2021] [Accepted: 10/19/2021] [Indexed: 12/16/2022] Open
Abstract
The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by AlphaFold v2.0 of DeepMind, it has enabled an unprecedented expansion of the structural coverage of the known protein-sequence space. AlphaFold DB provides programmatic access to and interactive visualization of predicted atomic coordinates, per-residue and pairwise model-confidence estimates and predicted aligned errors. The initial release of AlphaFold DB contains over 360,000 predicted structures across 21 model-organism proteomes, which will soon be expanded to cover most of the (over 100 million) representative sequences from the UniRef90 data set. The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an extensive, public database of highly accurate protein structure models. The models are the products of AlphaFold2, an Artificial Intelligence algorithm developed by DeepMind. AlphaFold enabled scientists to investigate an unprecedented number of protein structures. The database we describe here provides access to these predicted models and information on their accuracy. The first version of AlphaFold DB contains over 360,000 models of 21 biologically essential species.
Collapse
Affiliation(s)
- Mihaly Varadi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Stephen Anyango
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Mandar Deshpande
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Sreenath Nair
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Cindy Natassia
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Galabina Yordanova
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - David Yuan
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Oana Stroe
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Gemma Wood
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Gerard Kleywegt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| |
Collapse
|
10
|
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. Applying and improving AlphaFold at CASP14. Proteins 2021; 89:1711-1721. [PMID: 34599769 PMCID: PMC9299164 DOI: 10.1002/prot.26257] [Citation(s) in RCA: 173] [Impact Index Per Article: 57.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Revised: 09/06/2021] [Accepted: 09/21/2021] [Indexed: 12/27/2022]
Abstract
We describe the operation and improvement of AlphaFold, the system that was entered by the team AlphaFold2 to the “human” category in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The AlphaFold system entered in CASP14 is entirely different to the one entered in CASP13. It used a novel end‐to‐end deep neural network trained to produce protein structures from amino acid sequence, multiple sequence alignments, and homologous proteins. In the assessors' ranking by summed z scores (>2.0), AlphaFold scored 244.0 compared to 90.8 by the next best group. The predictions made by AlphaFold had a median domain GDT_TS of 92.4; this is the first time that this level of average accuracy has been achieved during CASP, especially on the more difficult Free Modeling targets, and represents a significant improvement in the state of the art in protein structure prediction. We reported how AlphaFold was run as a human team during CASP14 and improved such that it now achieves an equivalent level of performance without intervention, opening the door to highly accurate large‐scale structure prediction.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea.,Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Avsec Ž, Agarwal V, Visentin D, Ledsam JR, Grabska-Barwinska A, Taylor KR, Assael Y, Jumper J, Kohli P, Kelley DR. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods 2021; 18:1196-1203. [PMID: 34608324 PMCID: PMC8490152 DOI: 10.1038/s41592-021-01252-x] [Citation(s) in RCA: 241] [Impact Index Per Article: 80.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 07/27/2021] [Indexed: 02/08/2023]
Abstract
How noncoding DNA determines gene expression in different cell types is a major unsolved problem, and critical downstream applications in human genetics depend on improved solutions. Here, we report substantially improved gene expression prediction accuracy from DNA sequences through the use of a deep learning architecture, called Enformer, that is able to integrate information from long-range interactions (up to 100 kb away) in the genome. This improvement yielded more accurate variant effect predictions on gene expression for both natural genetic variants and saturation mutagenesis measured by massively parallel reporter assays. Furthermore, Enformer learned to predict enhancer-promoter interactions directly from the DNA sequence competitively with methods that take direct experimental data as input. We expect that these advances will enable more effective fine-mapping of human disease associations and provide a framework to interpret cis-regulatory evolution.
Collapse
|
12
|
Tunyasuvunakool K, Adler J, Wu Z, Green T, Zielinski M, Žídek A, Bridgland A, Cowie A, Meyer C, Laydon A, Velankar S, Kleywegt GJ, Bateman A, Evans R, Pritzel A, Figurnov M, Ronneberger O, Bates R, Kohl SAA, Potapenko A, Ballard AJ, Romera-Paredes B, Nikolov S, Jain R, Clancy E, Reiman D, Petersen S, Senior AW, Kavukcuoglu K, Birney E, Kohli P, Jumper J, Hassabis D. Highly accurate protein structure prediction for the human proteome. Nature 2021; 596:590-596. [PMID: 34293799 PMCID: PMC8387240 DOI: 10.1038/s41586-021-03828-1] [Citation(s) in RCA: 1336] [Impact Index Per Article: 445.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023]
Abstract
Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Gerard J Kleywegt
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Ewan Birney
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | | | | |
Collapse
|
13
|
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature 2021; 596:583-589. [PMID: 34265844 PMCID: PMC8371605 DOI: 10.1038/s41586-021-03819-2] [Citation(s) in RCA: 13457] [Impact Index Per Article: 4485.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 07/12/2021] [Indexed: 02/07/2023]
Abstract
Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort1-4, the structures of around 100,000 unique proteins have been determined5, but this represents a small fraction of the billions of known protein sequences6,7. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence-the structure prediction component of the 'protein folding problem'8-has been an important open research problem for more than 50 years9. Despite recent progress10-14, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known. We validated an entirely redesigned version of our neural network-based model, AlphaFold, in the challenging 14th Critical Assessment of protein Structure Prediction (CASP14)15, demonstrating accuracy competitive with experimental structures in a majority of cases and greatly outperforming other methods. Underpinning the latest version of AlphaFold is a novel machine learning approach that incorporates physical and biological knowledge about protein structure, leveraging multi-sequence alignments, into the design of the deep learning algorithm.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul, South Korea
- Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
| | | | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Improved protein structure prediction using potentials from deep learning. Nature 2020; 577:706-710. [PMID: 31942072 DOI: 10.1038/s41586-019-1923-7] [Citation(s) in RCA: 1309] [Impact Index Per Article: 327.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 12/10/2019] [Indexed: 12/16/2022]
Abstract
Protein structure prediction can be used to determine the three-dimensional shape of a protein from its amino acid sequence1. This problem is of fundamental importance as the structure of a protein largely determines its function2; however, protein structures can be difficult to determine experimentally. Considerable progress has recently been made by leveraging genetic information. It is possible to infer which amino acid residues are in contact by analysing covariation in homologous sequences, which aids in the prediction of protein structures3. Here we show that we can train a neural network to make accurate predictions of the distances between pairs of residues, which convey more information about the structure than contact predictions. Using this information, we construct a potential of mean force4 that can accurately describe the shape of a protein. We find that the resulting potential can be optimized by a simple gradient descent algorithm to generate structures without complex sampling procedures. The resulting system, named AlphaFold, achieves high accuracy, even for sequences with fewer homologous sequences. In the recent Critical Assessment of Protein Structure Prediction5 (CASP13)-a blind assessment of the state of the field-AlphaFold created high-accuracy structures (with template modelling (TM) scores6 of 0.7 or higher) for 24 out of 43 free modelling domains, whereas the next best method, which used sampling and contact information, achieved such accuracy for only 14 out of 43 domains. AlphaFold represents a considerable advance in protein-structure prediction. We expect this increased accuracy to enable insights into the function and malfunction of proteins, especially in cases for which no structures for homologous proteins have been experimentally determined7.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - David T Jones
- The Francis Crick Institute, London, UK.,University College London, London, UK
| | | | | | | |
Collapse
|
15
|
Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, Qin C, Žídek A, Nelson AWR, Bridgland A, Penedones H, Petersen S, Simonyan K, Crossan S, Kohli P, Jones DT, Silver D, Kavukcuoglu K, Hassabis D. Protein structure prediction using multiple deep neural networks in the 13th Critical Assessment of Protein Structure Prediction (CASP13). Proteins 2019; 87:1141-1148. [PMID: 31602685 PMCID: PMC7079254 DOI: 10.1002/prot.25834] [Citation(s) in RCA: 166] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Revised: 09/25/2019] [Accepted: 09/27/2019] [Indexed: 12/17/2022]
Abstract
We describe AlphaFold, the protein structure prediction system that was entered by the group A7D in CASP13. Submissions were made by three free-modeling (FM) methods which combine the predictions of three neural networks. All three systems were guided by predictions of distances between pairs of residues produced by a neural network. Two systems assembled fragments produced by a generative neural network, one using scores from a network trained to regress GDT_TS. The third system shows that simple gradient descent on a properly constructed potential is able to perform on par with more expensive traditional search techniques and without requiring domain segmentation. In the CASP13 FM assessors' ranking by summed z-scores, this system scored highest with 68.3 vs 48.2 for the next closest group (an average GDT_TS of 61.4). The system produced high-accuracy structures (with GDT_TS scores of 70 or higher) for 11 out of 43 FM domains. Despite not explicitly using template information, the results in the template category were comparable to the best performing template-based methods.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - David T. Jones
- The Francis Crick InstituteLondonUK
- University College LondonLondonUK
| | | | | | | |
Collapse
|
16
|
Riback JA, Bowman MA, Zmyslowski A, Knoverek CR, Jumper J, Kaye EB, Freed KF, Clark PL, Sosnick TR. Response to Comment on “Innovative scattering analysis shows that hydrophobic disordered proteins are expanded in water”. Science 2018; 361:361/6405/eaar7949. [PMID: 30166460 DOI: 10.1126/science.aar7949] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Accepted: 07/30/2018] [Indexed: 12/30/2022]
Abstract
Best et al. claim that we provide no convincing basis to assert that a discrepancy remains between FRET and SAXS results on the dimensions of disordered proteins under physiological conditions. We maintain that a clear discrepancy is apparent in our and other recent publications, including results shown in the Best et al. comment. A plausible origin is fluorophore interactions in FRET experiments.
Collapse
Affiliation(s)
- Joshua A Riback
- Graduate Program in Biophysical Sciences, University of Chicago, Chicago, IL 60637, USA
| | - Micayla A Bowman
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Adam Zmyslowski
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637, USA
| | - Catherine R Knoverek
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN 46556, USA
| | - John Jumper
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637, USA
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, IL 60637, USA
| | - Emily B Kaye
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Karl F Freed
- Department of Chemistry and James Franck Institute, University of Chicago, Chicago, IL 60637, USA
| | - Patricia L Clark
- Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, IN 46556, USA.
| | - Tobin R Sosnick
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL 60637, USA.
- Institute for Biophysical Dynamics, University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
17
|
Wang S, Wang Z, Jumper J, Freed KF, Sosnick TR, Xu J. Folding Membrane Proteins by Contacts Inferred from Non-Membrane Proteins and Near-Atomic Level Refinement. Biophys J 2017. [DOI: 10.1016/j.bpj.2016.11.1130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
|
18
|
Riback JA, Bowman MA, Zmyslowski A, Knoverek CR, Jumper J, Hinshaw J, Kaye EB, Freed KF, Clark PL, Sosnick TR. Measuring the (Good) Solvent Quality of Water for Disordered Proteins from a Single SAXS Measurement. Biophys J 2017. [DOI: 10.1016/j.bpj.2016.11.1712] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
19
|
Faruk NF, Jumper J, Roux B, Sosnick TR. Extending Upside , a Near-Atomic Level Model for Fast Protein Folding, for Predicting Protein-Protein Interactions. Biophys J 2017. [DOI: 10.1016/j.bpj.2016.11.1946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
20
|
Shan Y, Eastwood M, Zhang X, Kim E, Arkhipov A, Dror R, Jumper J, Kuriyan J, Shaw D. Oncogenic Mutations Counteract Intrinsic Disorder in the EGFR Kinase and Promote Receptor Dimerization. Cell 2012; 149:860-70. [DOI: 10.1016/j.cell.2012.02.063] [Citation(s) in RCA: 219] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Revised: 01/20/2012] [Accepted: 02/23/2012] [Indexed: 10/28/2022]
|