1
|
Tauriello G, Waterhouse AM, Haas J, Behringer D, Bienert S, Garello T, Schwede T. ModelArchive: A Deposition Database for Computational Macromolecular Structural Models. J Mol Biol 2025:168996. [PMID: 39947281 DOI: 10.1016/j.jmb.2025.168996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Revised: 02/04/2025] [Accepted: 02/07/2025] [Indexed: 02/27/2025]
Abstract
A wide range of applications in life science research benefit from the availability of three-dimensional structures of biological macromolecules as they provide valuable insights into their molecular function. Recent advances in structure prediction techniques have made it possible to generate high quality computational macromolecular structural models for almost all known proteins. In this context, ModelArchive (https://modelarchive.org/) serves as a deposition database for computational models, complementing the Protein Data Bank (PDB) and PDB-IHM, which require experimental data, and specialised databases such as the AlphaFold DB. ModelArchive contains over 600,000 models contributed by researchers using a variety of modelling techniques. It supports single biological macromolecules and complexes, including any combination of polymers and small molecules. Each deposited model can be referenced in manuscripts using an immutable accession code provided by ModelArchive. Depositors are required to provide a minimal set of information about the modelling process and the expected accuracy of the resulting model, enabling scientific reproducibility and maximising the potential reuse of the models. The vast majority of models in ModelArchive use the ModelCIF format which includes coordinates and metadata, allows for programmatic validation of the models, and makes the models interoperable with structures obtained from other sources such as the PDB. The ModelArchive web service provides access to the models and search queries. Model findability is also provided in external services either through APIs or by importing data from ModelArchive.
Collapse
Affiliation(s)
- Gerardo Tauriello
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Andrew M Waterhouse
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Juergen Haas
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Dario Behringer
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Stefan Bienert
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Thomas Garello
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland.
| |
Collapse
|
2
|
Fleming J, Magana P, Nair S, Tsenkov M, Bertoni D, Pidruchna I, Lima Afonso MQ, Midlik A, Paramval U, Žídek A, Laydon A, Kovalevskiy O, Pan J, Cheng J, Avsec Ž, Bycroft C, Wong LH, Last M, Mirdita M, Steinegger M, Kohli P, Váradi M, Velankar S. AlphaFold Protein Structure Database and 3D-Beacons: New Data and Capabilities. J Mol Biol 2025:168967. [PMID: 40133787 DOI: 10.1016/j.jmb.2025.168967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2024] [Revised: 01/18/2025] [Accepted: 01/24/2025] [Indexed: 03/27/2025]
Abstract
The AlphaFold Protein Structure Database (https://alphafold.ebi.ac.uk/) has made significant strides in enhancing its utility and accessibility for the life science research community. The recent integration of AlphaMissense predictions enables access to the pathogenicity of human protein missense variants, with an innovative and interactive heatmap and 3D visualisation that display variant data at the residue level. Users can now toggle between structure model quality (pLDDT) and average pathogenicity scores, providing insights into the implications of specific residue changes. The Foldseek integration offers a rapid and accurate method for protein structure searches and comparisons. Bulk data download options further facilitate comprehensive data analysis and integration with other computational tools. The 3D-Beacons framework (https://www.ebi.ac.uk/pdbe/pdbe-kb/3dbeacons/) has also been enhanced with detailed annotation endpoints (such as AlphaMissense data) and integrates LevyLab's dataset of homomeric AlphaFold 2 models. These advancements significantly improve the functionality and accessibility of these resources, enabling discoveries using structure data.
Collapse
Affiliation(s)
- Jennifer Fleming
- European Molecular Biology Laboratory European Bioinformatics Institute Hinxton UK
| | - Paulyna Magana
- European Molecular Biology Laboratory European Bioinformatics Institute Hinxton UK
| | - Sreenath Nair
- European Molecular Biology Laboratory European Bioinformatics Institute Hinxton UK
| | - Maxim Tsenkov
- European Molecular Biology Laboratory European Bioinformatics Institute Hinxton UK
| | - Damian Bertoni
- European Molecular Biology Laboratory European Bioinformatics Institute Hinxton UK
| | - Ivanna Pidruchna
- European Molecular Biology Laboratory European Bioinformatics Institute Hinxton UK
| | | | - Adam Midlik
- European Molecular Biology Laboratory European Bioinformatics Institute Hinxton UK
| | - Urmila Paramval
- European Molecular Biology Laboratory European Bioinformatics Institute Hinxton UK
| | | | | | | | | | | | | | | | | | | | - Milot Mirdita
- School of Biological Sciences, Seoul National University, Seoul 08826 Republic of Korea
| | - Martin Steinegger
- School of Biological Sciences, Seoul National University, Seoul 08826 Republic of Korea; Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826 Republic of Korea; Institute of Molecular Biology and Genetics, Seoul 08826 Republic of Korea; Artificial Intelligence Institute, Seoul National University, Seoul 08826 Republic of Korea
| | | | - Mihály Váradi
- European Molecular Biology Laboratory European Bioinformatics Institute Hinxton UK
| | - Sameer Velankar
- European Molecular Biology Laboratory European Bioinformatics Institute Hinxton UK.
| |
Collapse
|
3
|
Varadi M, Tsenkov M, Velankar S. Challenges in bridging the gap between protein structure prediction and functional interpretation. Proteins 2025; 93:400-410. [PMID: 37850517 PMCID: PMC11623436 DOI: 10.1002/prot.26614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/26/2023] [Accepted: 10/04/2023] [Indexed: 10/19/2023]
Abstract
The rapid evolution of protein structure prediction tools has significantly broadened access to protein structural data. Although predicted structure models have the potential to accelerate and impact fundamental and translational research significantly, it is essential to note that they are not validated and cannot be considered the ground truth. Thus, challenges persist, particularly in capturing protein dynamics, predicting multi-chain structures, interpreting protein function, and assessing model quality. Interdisciplinary collaborations are crucial to overcoming these obstacles. Databases like the AlphaFold Protein Structure Database, the ESM Metagenomic Atlas, and initiatives like the 3D-Beacons Network provide FAIR access to these data, enabling their interpretation and application across a broader scientific community. Whilst substantial advancements have been made in protein structure prediction, further progress is required to address the remaining challenges. Developing training materials, nurturing collaborations, and ensuring open data sharing will be paramount in this pursuit. The continued evolution of these tools and methodologies will deepen our understanding of protein function and accelerate disease pathogenesis and drug development discoveries.
Collapse
Affiliation(s)
- Mihaly Varadi
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL‐EBI), Wellcome Genome CampusHinxtonCambridgeUK
| | - Maxim Tsenkov
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL‐EBI), Wellcome Genome CampusHinxtonCambridgeUK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL‐EBI), Wellcome Genome CampusHinxtonCambridgeUK
| |
Collapse
|
4
|
Al-Hilfi A, Li Z, Merz KM, Walker KD. Mg 2+-Ion Dependence Revealed for a BAHD 13- O-β-Aminoacyltransferase from Taxus Plants. JACS AU 2024; 4:4249-4262. [PMID: 39610752 PMCID: PMC11600153 DOI: 10.1021/jacsau.4c00577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 09/11/2024] [Accepted: 09/13/2024] [Indexed: 11/30/2024]
Abstract
A Taxus baccatin III:3-amino-3-phenylpropanoyltransferase (BAPT, Accession: AY082804) in clade 6 of the BAHD family catalyzed a Mg2+-dependent transfer of isoserines from their corresponding CoA thioesters. An advanced taxane baccatin III on the paclitaxel biosynthetic pathway in Taxus plants was incubated BAPT and phenylisoserine CoA or isobutenylisoserinyl CoA with and without MgCl2. BAPT biocatalytically converted baccatin III to its 13-O-phenylisoserinyl and 3-(1',1'-dimethylvinyl)isoserinyl analogs, an activity that abrogated when Mg2+ ions were omitted. Baccatin III analogs that are precursors to new generation taxanes were also assayed with BAPT, the Mg2+ cofactor, and 3-(1',1'-dimethylvinyl)isoserinyl CoA to make paclitaxel derivatives at k cat/K M ranging between 27 and 234 s-1 M-1. Molecular dynamics simulations of the BAPT active site modeled on the crystal structure of a BAHD family member (PDB: 4G0B) suggest that Mg2+ causes BAPT to use an unconventional active site space compared to those of other BAHD catalysts, studied over the last 25 years, that use a conserved catalytic histidine residue that is glycine in BAPT. The simulated six-membered Mg2+-coordination complex includes an interaction that disrupts an intramolecular hydrogen bond between the C13-hydroxyl and the carbonyl oxygen of the C4-acetate of baccatin III. A simulation snapshot captured an active site conformation showing the liberated C13-hydroxyl of baccatin III poised for acylation by BAPT through a potential substrate-assisted mechanism.
Collapse
Affiliation(s)
- Aimen Al-Hilfi
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Zhen Li
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
| | - Kenneth M. Merz
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| | - Kevin D. Walker
- Department
of Chemistry, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
5
|
Vollmar M, Tirunagari S, Harrus D, Armstrong D, Gáborová R, Gupta D, Afonso MQL, Evans G, Velankar S. Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature. Sci Data 2024; 11:1032. [PMID: 39333508 PMCID: PMC11436914 DOI: 10.1038/s41597-024-03841-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 08/29/2024] [Indexed: 09/29/2024] Open
Abstract
We present a novel system that leverages curators in the loop to develop a dataset and model for detecting structure features and functional annotations at residue-level from standard publication text. Our approach involves the integration of data from multiple resources, including PDBe, EuropePMC, PubMedCentral, and PubMed, combined with annotation guidelines from UniProt, and LitSuggest and HuggingFace models as tools in the annotation process. A team of seven annotators manually curated ten articles for named entities, which we utilized to train a starting PubmedBert model from HuggingFace. Using a human-in-the-loop annotation system, we iteratively developed the best model with commendable performance metrics of 0.90 for precision, 0.92 for recall, and 0.91 for F1-measure. Our proposed system showcases a successful synergy of machine learning techniques and human expertise in curating a dataset for residue-level functional annotations and protein structure features. The results demonstrate the potential for broader applications in protein research, bridging the gap between advanced machine learning models and the indispensable insights of domain experts.
Collapse
Affiliation(s)
- Melanie Vollmar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
| | - Santosh Tirunagari
- Literature Services, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Deborah Harrus
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - David Armstrong
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Romana Gáborová
- CEITEC - Central European Institute of Technology, Masaryk University, Kamenice 5, 62500, Brno, Czech Republic
| | - Deepti Gupta
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Marcelo Querino Lima Afonso
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Genevieve Evans
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
6
|
Schweke H, Pacesa M, Levin T, Goverde CA, Kumar P, Duhoo Y, Dornfeld LJ, Dubreuil B, Georgeon S, Ovchinnikov S, Woolfson DN, Correia BE, Dey S, Levy ED. An atlas of protein homo-oligomerization across domains of life. Cell 2024; 187:999-1010.e15. [PMID: 38325366 DOI: 10.1016/j.cell.2024.01.022] [Citation(s) in RCA: 26] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Revised: 11/03/2023] [Accepted: 01/15/2024] [Indexed: 02/09/2024]
Abstract
Protein structures are essential to understanding cellular processes in molecular detail. While advances in artificial intelligence revealed the tertiary structure of proteins at scale, their quaternary structure remains mostly unknown. We devise a scalable strategy based on AlphaFold2 to predict homo-oligomeric assemblies across four proteomes spanning the tree of life. Our results suggest that approximately 45% of an archaeal proteome and a bacterial proteome and 20% of two eukaryotic proteomes form homomers. Our predictions accurately capture protein homo-oligomerization, recapitulate megadalton complexes, and unveil hundreds of homo-oligomer types, including three confirmed experimentally by structure determination. Integrating these datasets with omics information suggests that a majority of known protein complexes are symmetric. Finally, these datasets provide a structural context for interpreting disease mutations and reveal coiled-coil regions as major enablers of quaternary structure evolution in human. Our strategy is applicable to any organism and provides a comprehensive view of homo-oligomerization in proteomes.
Collapse
Affiliation(s)
- Hugo Schweke
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Martin Pacesa
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Tal Levin
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Casper A Goverde
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Prasun Kumar
- School of Chemistry, University of Bristol, Bristol BS8 1TS, UK; School of Biochemistry, University of Bristol, Bristol BS8 1TD, UK; Bristol BioDesign Institute, University of Bristol, Life Sciences Building, Bristol BS8 1TQ, UK; Max Planck-Bristol Centre for Minimal Biology, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK
| | - Yoan Duhoo
- Protein Production and Structure Characterization Core Facility (PTPSP), School of Life Sciences, École polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Lars J Dornfeld
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Benjamin Dubreuil
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Sandrine Georgeon
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, USA
| | - Derek N Woolfson
- School of Chemistry, University of Bristol, Bristol BS8 1TS, UK; School of Biochemistry, University of Bristol, Bristol BS8 1TD, UK; Bristol BioDesign Institute, University of Bristol, Life Sciences Building, Bristol BS8 1TQ, UK; Max Planck-Bristol Centre for Minimal Biology, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK.
| | - Bruno E Correia
- Laboratory of Protein Design and Immunoengineering, École Polytechnique Fédérale de Lausanne and Swiss Institute of Bioinformatics, Lausanne, Switzerland.
| | - Sucharita Dey
- Department of Bioscience and Bioengineering, Indian Institute of Technology Jodhpur, Rajasthan, India.
| | - Emmanuel D Levy
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot, Israel.
| |
Collapse
|
7
|
Thakur M, Buniello A, Brooksbank C, Gurwitz KT, Hall M, Hartley M, Hulcoop DG, Leach AR, Marques D, Martin M, Mithani A, McDonagh EM, Mutasa-Gottgens E, Ochoa D, Perez-Riverol Y, Stephenson J, Varadi M, Velankar S, Vizcaino JA, Witham R, McEntyre J. EMBL's European Bioinformatics Institute (EMBL-EBI) in 2023. Nucleic Acids Res 2024; 52:D10-D17. [PMID: 38015445 PMCID: PMC10767983 DOI: 10.1093/nar/gkad1088] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 10/23/2023] [Accepted: 10/30/2023] [Indexed: 11/29/2023] Open
Abstract
The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the latest developments in the services provided by EMBL-EBI data resources to scientific communities globally. These developments aim to ensure EMBL-EBI resources meet the current and future needs of these scientific communities, accelerating the impact of open biological data for all.
Collapse
Affiliation(s)
- Matthew Thakur
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Annalisa Buniello
- Open Targets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Catherine Brooksbank
- Training Team, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Kim T Gurwitz
- Training Team, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matthew Hall
- Industry Partnerships, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Matthew Hartley
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - David G Hulcoop
- Open Targets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Andrew R Leach
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
- Industry Partnerships, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Diana Marques
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Maria Martin
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Aziz Mithani
- Training Team, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Ellen M McDonagh
- Open Targets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Euphemia Mutasa-Gottgens
- Industry Partnerships, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - David Ochoa
- Open Targets, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Yasset Perez-Riverol
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - James Stephenson
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Mihaly Varadi
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Sameer Velankar
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Juan Antonio Vizcaino
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Rick Witham
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| | - Johanna McEntyre
- Data Services Teams, EMBL’s European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
| |
Collapse
|
8
|
Ghafouri H, Lazar T, Del Conte A, Tenorio Ku LG, Tompa P, Tosatto SCE, Monzon AM. PED in 2024: improving the community deposition of structural ensembles for intrinsically disordered proteins. Nucleic Acids Res 2024; 52:D536-D544. [PMID: 37904608 PMCID: PMC10767937 DOI: 10.1093/nar/gkad947] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/10/2023] [Accepted: 10/13/2023] [Indexed: 11/01/2023] Open
Abstract
The Protein Ensemble Database (PED) (URL: https://proteinensemble.org) is the primary resource for depositing structural ensembles of intrinsically disordered proteins. This updated version of PED reflects advancements in the field, denoting a continual expansion with a total of 461 entries and 538 ensembles, including those generated without explicit experimental data through novel machine learning (ML) techniques. With this significant increment in the number of ensembles, a few yet-unprecedented new entries entered the database, including those also determined or refined by electron paramagnetic resonance or circular dichroism data. In addition, PED was enriched with several new features, including a novel deposition service, improved user interface, new database cross-referencing options and integration with the 3D-Beacons network-all representing efforts to improve the FAIRness of the database. Foreseeably, PED will keep growing in size and expanding with new types of ensembles generated by accurate and fast ML-based generative models and coarse-grained simulations. Therefore, among future efforts, priority will be given to further develop the database to be compatible with ensembles modeled at a coarse-grained level.
Collapse
Affiliation(s)
| | - Tamas Lazar
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie (VIB), Brussels, Belgium
- Structural Biology Brussels, Department of Bioengineering, Vrije Universiteit Brussel (VUB), Brussels, Belgium
| | - Alessio Del Conte
- Department of Biomedical Sciences, University of Padova, Padova, Italy
| | | | - Peter Tompa
- VIB-VUB Center for Structural Biology, Vlaams Instituut voor Biotechnologie (VIB), Brussels, Belgium
- Structural Biology Brussels, Department of Bioengineering, Vrije Universiteit Brussel (VUB), Brussels, Belgium
- Institute of Enzymology, Research Centre for Natural Sciences (RCNS), Budapest, Hungary
| | | | | |
Collapse
|
9
|
Degn K, Beltrame L, Tiberti M, Papaleo E. PDBminer to Find and Annotate Protein Structures for Computational Analysis. J Chem Inf Model 2023; 63:7274-7281. [PMID: 37977136 DOI: 10.1021/acs.jcim.3c00884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2023]
Abstract
Computational methods relying on protein structure strongly depend on the structure selected for investigation. Typical sources of protein structures include experimental structures available at the Protein Data Bank (PDB) and high-quality in silico model structures, such as those available at the AlphaFold Protein Structure Database. Either option has significant advantages and drawbacks, and exploring the wealth of available structures to identify the most suitable ones for specific applications can be a daunting task. We provide an open-source software package, PDBminer, with the purpose of making structure identification and selection easier, faster, and less error prone. PDBminer searches the AlphaFold Database and the PDB for available structures of interest and provides an up-to-date, quality-ranked table of structures applicable for further use. PDBminer provides an overview of the available protein structures to one or more input proteins, parallelizing the runs if multiple cores are specified. The output table reports the coverage of the protein structures aligned to the UniProt sequence, overcoming numbering differences in PDB structures and providing information regarding model quality, protein complexes, ligands, and nucleic acid chain binding. The PDBminer2coverage and PDBminer2network tools assist in visualizing the results. PDBminer can be applied to overcome the tedious task of choosing a PDB structure without losing the wealth of additional information available in the PDB. Here, we showcase the main functionalities of the package on the p53 tumor suppressor protein. The package is available at http://github.com/ELELAB/PDBminer.
Collapse
Affiliation(s)
- Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Ludovica Beltrame
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Systems Biology, Section for Bioinformatics, Department of Health and Technology, Technical University of Denmark, 2800 Lyngby, Denmark
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| |
Collapse
|
10
|
Vallat B, Tauriello G, Bienert S, Haas J, Webb BM, Žídek A, Zheng W, Peisach E, Piehl DW, Anischanka I, Sillitoe I, Tolchard J, Varadi M, Baker D, Orengo C, Zhang Y, Hoch JC, Kurisu G, Patwardhan A, Velankar S, Burley SK, Sali A, Schwede T, Berman HM, Westbrook JD. ModelCIF: An Extension of PDBx/mmCIF Data Representation for Computed Structure Models. J Mol Biol 2023; 435:168021. [PMID: 36828268 PMCID: PMC10293049 DOI: 10.1016/j.jmb.2023.168021] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Revised: 02/15/2023] [Accepted: 02/16/2023] [Indexed: 02/24/2023]
Abstract
ModelCIF (github.com/ihmwg/ModelCIF) is a data information framework developed for and by computational structural biologists to enable delivery of Findable, Accessible, Interoperable, and Reusable (FAIR) data to users worldwide. ModelCIF describes the specific set of attributes and metadata associated with macromolecular structures modeled by solely computational methods and provides an extensible data representation for deposition, archiving, and public dissemination of predicted three-dimensional (3D) models of macromolecules. It is an extension of the Protein Data Bank Exchange / macromolecular Crystallographic Information Framework (PDBx/mmCIF), which is the global data standard for representing experimentally-determined 3D structures of macromolecules and associated metadata. The PDBx/mmCIF framework and its extensions (e.g., ModelCIF) are managed by the Worldwide Protein Data Bank partnership (wwPDB, wwpdb.org) in collaboration with relevant community stakeholders such as the wwPDB ModelCIF Working Group (wwpdb.org/task/modelcif). This semantically rich and extensible data framework for representing computed structure models (CSMs) accelerates the pace of scientific discovery. Herein, we describe the architecture, contents, and governance of ModelCIF, and tools and processes for maintaining and extending the data standard. Community tools and software libraries that support ModelCIF are also described.
Collapse
Affiliation(s)
- Brinda Vallat
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA.
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Stefan Bienert
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Juergen Haas
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Benjamin M Webb
- Department of Bioengineering and Therapeutic Sciences, the Quantitative Biosciences Institute (QBI), and the Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94157, USA
| | | | - Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Ezra Peisach
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Dennis W Piehl
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Ivan Anischanka
- Department of Biochemistry, and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, UCL, London, UK
| | - James Tolchard
- AlphaFold Protein Structure Database, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK; Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | - Mihaly Varadi
- AlphaFold Protein Structure Database, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK; Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | - David Baker
- Department of Biochemistry, and Institute for Protein Design, University of Washington, Seattle, WA 98195, USA; Howard Hughes Medical Institute, University of Washington, Seattle, WA 98195, USA
| | | | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jeffrey C Hoch
- Biological Magnetic Resonance Data Bank, Department of Molecular Biology and Biophysics, University of Connecticut, Farmington, CT 06030, USA
| | - Genji Kurisu
- Protein Data Bank Japan, Institute for Protein Research, Osaka University, Suita, Osaka 565-0871, Japan
| | - Ardan Patwardhan
- Electron Microscopy Data Bank, European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Sameer Velankar
- AlphaFold Protein Structure Database, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK; Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | - Stephen K Burley
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA; Research Collaboratory for Structural Bioinformatics Protein Data Bank, San Diego Supercomputer Center, University of California, La Jolla, CA 92093, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, the Quantitative Biosciences Institute (QBI), and the Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA 94157, USA. https://twitter.com/salilab_ucsf
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland; Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Helen M Berman
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
| | - John D Westbrook
- Research Collaboratory for Structural Bioinformatics Protein Data Bank, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA; Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901, USA
| |
Collapse
|
11
|
Bolz SN, Schroeder M. Promiscuity in drug discovery on the verge of the structural revolution: recent advances and future chances. Expert Opin Drug Discov 2023; 18:973-985. [PMID: 37489516 DOI: 10.1080/17460441.2023.2239700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 07/19/2023] [Indexed: 07/26/2023]
Abstract
INTRODUCTION Promiscuity denotes the ability of ligands and targets to specifically interact with multiple binding partners. Despite negative aspects like side effects, promiscuity is receiving increasing attention in drug discovery as it can enhance drug efficacy and provides a molecular basis for drug repositioning. The three-dimensional structure of ligand-target complexes delivers exclusive insights into the molecular mechanisms of promiscuity and structure-based methods enable the identification of promiscuous interactions. With the recent breakthrough in protein structure prediction, novel possibilities open up to reveal unknown connections in ligand-target interaction networks. AREAS COVERED This review highlights the significance of structure in the identification and characterization of promiscuity and evaluates the potential of protein structure prediction to advance our knowledge of drug-target interaction networks. It discusses the definition and relevance of promiscuity in drug discovery and explores different approaches to detecting promiscuous ligands and targets. EXPERT OPINION Examination of structural data is essential for understanding and quantifying promiscuity. The recent advancements in structure prediction have resulted in an abundance of targets that are well-suited for structure-based methods like docking. In silico approaches may eventually completely transform our understanding of drug-target networks by complementing the millions of predicted protein structures with billions of predicted drug-target interactions.
Collapse
Affiliation(s)
- Sarah Naomi Bolz
- Biotechnology Center (BIOTEC), CMCB, Technische Universität Dresden, Dresden, Germany
| | - Michael Schroeder
- Biotechnology Center (BIOTEC), CMCB, Technische Universität Dresden, Dresden, Germany
| |
Collapse
|
12
|
AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023; 6:160. [PMID: 36755055 PMCID: PMC9908985 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open
Abstract
Deep-learning (DL) methods like DeepMind's AlphaFold2 (AF2) have led to substantial improvements in protein structure prediction. We analyse confident AF2 models from 21 model organisms using a new classification protocol (CATH-Assign) which exploits novel DL methods for structural comparison and classification. Of ~370,000 confident models, 92% can be assigned to 3253 superfamilies in our CATH domain superfamily classification. The remaining cluster into 2367 putative novel superfamilies. Detailed manual analysis on 618 of these, having at least one human relative, reveal extremely remote homologies and further unusual features. Only 25 novel superfamilies could be confirmed. Although most models map to existing superfamilies, AF2 domains expand CATH by 67% and increases the number of unique 'global' folds by 36% and will provide valuable insights on structure function relationships. CATH-Assign will harness the huge expansion in structural data provided by DeepMind to rationalise evolutionary changes driving functional divergence.
Collapse
|
13
|
Varadi M, Nair S, Sillitoe I, Tauriello G, Anyango S, Bienert S, Borges C, Deshpande M, Green T, Hassabis D, Hatos A, Hegedus T, Hekkelman ML, Joosten R, Jumper J, Laydon A, Molodenskiy D, Piovesan D, Salladini E, Salzberg SL, Sommer MJ, Steinegger M, Suhajda E, Svergun D, Tenorio-Ku L, Tosatto S, Tunyasuvunakool K, Waterhouse AM, Žídek A, Schwede T, Orengo C, Velankar S. 3D-Beacons: decreasing the gap between protein sequences and structures through a federated network of protein structure data resources. Gigascience 2022; 11:giac118. [PMID: 36448847 PMCID: PMC9709962 DOI: 10.1093/gigascience/giac118] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/20/2022] [Accepted: 11/11/2022] [Indexed: 12/02/2022] Open
Abstract
While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, the gap between the number of known protein sequences and their experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational protein modeling approaches. While often powerful on their own, most methods have strengths and weaknesses. Therefore, it benefits researchers to examine models from various model providers and perform comparative analysis to identify what models can best address their specific use cases. To make data from a large array of model providers more easily accessible to the broader scientific community, we established 3D-Beacons, a collaborative initiative to create a federated network with unified data access mechanisms. The 3D-Beacons Network allows researchers to collate coordinate files and metadata for experimentally determined and theoretical protein models from state-of-the-art and specialist model providers and also from the Protein Data Bank.
Collapse
Affiliation(s)
- Mihaly Varadi
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| | - Sreenath Nair
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| | - Ian Sillitoe
- Department of Structural and Molecular Biology, UCL, London WC1E 6BT, UK
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel 4056, Switzerland
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Stephen Anyango
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| | - Stefan Bienert
- Biozentrum, University of Basel, Basel 4056, Switzerland
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Clemente Borges
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
- European Molecular Biology Laboratory, EMBL Hamburg, Hamburg 69117, Germany
| | - Mandar Deshpande
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| | | | | | - Andras Hatos
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
- Department of Oncology, Lausanne University Hospital, Lausanne 1015, Switzerland
- Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland
- Swiss Institute of Bioinformatics, Lausanne 1015, Switzerland
- Swiss Cancer Center Leman, Lausanne 1005, Switzerland
| | - Tamas Hegedus
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest 1094, Hungary
| | | | - Robbie Joosten
- Netherlands Cancer Institute, Amsterdam 1066 CX, The Netherlands
| | | | | | - Dmitry Molodenskiy
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
- European Molecular Biology Laboratory, EMBL Hamburg, Hamburg 69117, Germany
| | - Damiano Piovesan
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
| | - Edoardo Salladini
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
| | - Steven L Salzberg
- Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Markus J Sommer
- Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Martin Steinegger
- School of Biology, Seoul National University, Seoul 82-2-880-6971, 6977, South Korea
| | - Erzsebet Suhajda
- Department of Biophysics and Radiation Biology, Semmelweis University, Budapest 1094, Hungary
| | - Dmitri Svergun
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
- European Molecular Biology Laboratory, EMBL Hamburg, Hamburg 69117, Germany
| | - Luiggi Tenorio-Ku
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
| | - Silvio Tosatto
- Department of Biomedical Sciences, University of Padova, Padova 35129, Italy
| | | | - Andrew Mark Waterhouse
- Biozentrum, University of Basel, Basel 4056, Switzerland
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | | | - Torsten Schwede
- Biozentrum, University of Basel, Basel 4056, Switzerland
- Computational Structural Biology, SIB Swiss Institute of Bioinformatics, Basel 4056, Switzerland
| | - Christine Orengo
- Department of Structural and Molecular Biology, UCL, London WC1E 6BT, UK
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton CB10 1SA, UK
| |
Collapse
|