1
|
Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff-Ribeiro A, Griss J, Sevilla C, Matthews L, Gong C, Deng C, Varusai T, Ragueneau E, Haider Y, May B, Shamovsky V, Weiser J, Brunson T, Sanati N, Beckman L, Shao X, Fabregat A, Sidiropoulos K, Murillo J, Viteri G, Cook J, Shorser S, Bader G, Demir E, Sander C, Haw R, Wu G, Stein L, Hermjakob H, D’Eustachio P. The reactome pathway knowledgebase 2022. Nucleic Acids Res 2022; 50:D687-D692. [PMID: 34788843 PMCID: PMC8689983 DOI: 10.1093/nar/gkab1028] [Citation(s) in RCA: 1139] [Impact Index Per Article: 379.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/11/2021] [Accepted: 10/13/2021] [Indexed: 11/13/2022] Open
Abstract
The Reactome Knowledgebase (https://reactome.org), an Elixir core resource, provides manually curated molecular details across a broad range of physiological and pathological biological processes in humans, including both hereditary and acquired disease processes. The processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Recent curation work has expanded our annotations of normal and disease-associated signaling processes and of the drugs that target them, in particular infections caused by the SARS-CoV-1 and SARS-CoV-2 coronaviruses and the host response to infection. New tools support better simultaneous analysis of high-throughput data from multiple sources and the placement of understudied ('dark') proteins from analyzed datasets in the context of Reactome's manually curated pathways.
Collapse
|
Research Support, N.I.H., Extramural |
3 |
1139 |
2
|
Milacic M, Beavers D, Conley P, Gong C, Gillespie M, Griss J, Haw R, Jassal B, Matthews L, May B, Petryszak R, Ragueneau E, Rothfels K, Sevilla C, Shamovsky V, Stephan R, Tiwari K, Varusai T, Weiser J, Wright A, Wu G, Stein L, Hermjakob H, D’Eustachio P. The Reactome Pathway Knowledgebase 2024. Nucleic Acids Res 2024; 52:D672-D678. [PMID: 37941124 PMCID: PMC10767911 DOI: 10.1093/nar/gkad1025] [Citation(s) in RCA: 341] [Impact Index Per Article: 341.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 10/14/2023] [Accepted: 10/20/2023] [Indexed: 11/10/2023] Open
Abstract
The Reactome Knowledgebase (https://reactome.org), an Elixir and GCBR core biological data resource, provides manually curated molecular details of a broad range of normal and disease-related biological processes. Processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Here we review progress towards annotation of the entire human proteome, targeted annotation of disease-causing genetic variants of proteins and of small-molecule drugs in a pathway context, and towards supporting explicit annotation of cell- and tissue-specific pathways. Finally, we briefly discuss issues involved in making Reactome more fully interoperable with other related resources such as the Gene Ontology and maintaining the resulting community resource network.
Collapse
|
research-article |
1 |
341 |
3
|
Zdrazil B, Felix E, Hunter F, Manners EJ, Blackshaw J, Corbett S, de Veij M, Ioannidis H, Lopez DM, Mosquera J, Magarinos M, Bosc N, Arcila R, Kizilören T, Gaulton A, Bento A, Adasme M, Monecke P, Landrum G, Leach A. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Res 2024; 52:D1180-D1192. [PMID: 37933841 PMCID: PMC10767899 DOI: 10.1093/nar/gkad1004] [Citation(s) in RCA: 281] [Impact Index Per Article: 281.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 10/09/2023] [Accepted: 10/23/2023] [Indexed: 11/08/2023] Open
Abstract
ChEMBL (https://www.ebi.ac.uk/chembl/) is a manually curated, high-quality, large-scale, open, FAIR and Global Core Biodata Resource of bioactive molecules with drug-like properties, previously described in the 2012, 2014, 2017 and 2019 Nucleic Acids Research Database Issues. Since its introduction in 2009, ChEMBL's content has changed dramatically in size and diversity of data types. Through incorporation of multiple new datasets from depositors since the 2019 update, ChEMBL now contains slightly more bioactivity data from deposited data vs data extracted from literature. In collaboration with the EUbOPEN consortium, chemical probe data is now regularly deposited into ChEMBL. Release 27 made curated data available for compounds screened for potential anti-SARS-CoV-2 activity from several large-scale drug repurposing screens. In addition, new patent bioactivity data have been added to the latest ChEMBL releases, and various new features have been incorporated, including a Natural Product likeness score, updated flags for Natural Products, a new flag for Chemical Probes, and the initial annotation of the action type for ∼270 000 bioactivity measurements.
Collapse
|
research-article |
1 |
281 |
4
|
Laumer CE, Gruber-Vodicka H, Hadfield MG, Pearse VB, Riesgo A, Marioni JC, Giribet G. Support for a clade of Placozoa and Cnidaria in genes with minimal compositional bias. eLife 2018; 7:e36278. [PMID: 30373720 PMCID: PMC6277202 DOI: 10.7554/elife.36278] [Citation(s) in RCA: 66] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Accepted: 10/11/2018] [Indexed: 12/22/2022] Open
Abstract
The phylogenetic placement of the morphologically simple placozoans is crucial to understanding the evolution of complex animal traits. Here, we examine the influence of adding new genomes from placozoans to a large dataset designed to study the deepest splits in the animal phylogeny. Using site-heterogeneous substitution models, we show that it is possible to obtain strong support, in both amino acid and reduced-alphabet matrices, for either a sister-group relationship between Cnidaria and Placozoa, or for Cnidaria and Bilateria as seen in most published work to date, depending on the orthologues selected to construct the matrix. We demonstrate that a majority of genes show evidence of compositional heterogeneity, and that support for the Cnidaria + Bilateria clade can be assigned to this source of systematic error. In interpreting these results, we caution against a peremptory reading of placozoans as secondarily reduced forms of little relevance to broader discussions of early animal evolution.
Collapse
|
research-article |
7 |
66 |
5
|
Urban L, Holzer A, Baronas JJ, Hall MB, Braeuninger-Weimer P, Scherm MJ, Kunz DJ, Perera SN, Martin-Herranz DE, Tipper ET, Salter SJ, Stammnitz MR. Freshwater monitoring by nanopore sequencing. eLife 2021; 10:e61504. [PMID: 33461660 PMCID: PMC7815314 DOI: 10.7554/elife.61504] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2020] [Accepted: 12/01/2020] [Indexed: 12/15/2022] Open
Abstract
While traditional microbiological freshwater tests focus on the detection of specific bacterial indicator species, including pathogens, direct tracing of all aquatic DNA through metagenomics poses a profound alternative. Yet, in situ metagenomic water surveys face substantial challenges in cost and logistics. Here, we present a simple, fast, cost-effective and remotely accessible freshwater diagnostics workflow centred around the portable nanopore sequencing technology. Using defined compositions and spatiotemporal microbiota from surface water of an example river in Cambridge (UK), we provide optimised experimental and bioinformatics guidelines, including a benchmark with twelve taxonomic classification tools for nanopore sequences. We find that nanopore metagenomics can depict the hydrological core microbiome and fine temporal gradients in line with complementary physicochemical measurements. In a public health context, these data feature relevant sewage signals and pathogen maps at species level resolution. We anticipate that this framework will gather momentum for new environmental monitoring initiatives using portable devices.
Collapse
|
research-article |
4 |
60 |
6
|
PDBe-KB consortium, Varadi M, Anyango S, Armstrong D, Berrisford J, Choudhary P, Deshpande M, Nadzirin N, Nair SS, Pravda L, Tanweer A, Al-Lazikani B, Andreini C, Barton GJ, Bednar D, Berka K, Blundell T, Brock KP, Carazo JM, Damborsky J, David A, Dey S, Dunbrack R, Recio JF, Fraternali F, Gibson T, Helmer-Citterich M, Hoksza D, Hopf T, Jakubec D, Kannan N, Krivak R, Kumar M, Levy ED, London N, Macias JR, Srivatsan MM, Marks DS, Martens L, McGowan SA, McGreig JE, Modi V, Parra RG, Pepe G, Piovesan D, Prilusky J, Putignano V, Radusky LG, Ramasamy P, Rausch AO, Reuter N, Rodriguez LA, Rollins NJ, Rosato A, Rubach P, Serrano L, Singh G, Skoda P, Sorzano COS, Stourac J, Sulkowska JI, Svobodova R, Tichshenko N, Tosatto SCE, Vranken W, Wass MN, Xue D, Zaidman D, Thornton J, Sternberg M, Orengo C, Velankar S. PDBe-KB: collaboratively defining the biological context of structural data. Nucleic Acids Res 2022; 50:D534-D542. [PMID: 34755867 PMCID: PMC8728252 DOI: 10.1093/nar/gkab988] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 10/01/2021] [Accepted: 10/14/2021] [Indexed: 12/15/2022] Open
Abstract
The Protein Data Bank in Europe - Knowledge Base (PDBe-KB, https://pdbe-kb.org) is an open collaboration between world-leading specialist data resources contributing functional and biophysical annotations derived from or relevant to the Protein Data Bank (PDB). The goal of PDBe-KB is to place macromolecular structure data in their biological context by developing standardised data exchange formats and integrating functional annotations from the contributing partner resources into a knowledge graph that can provide valuable biological insights. Since we described PDBe-KB in 2019, there have been significant improvements in the variety of available annotation data sets and user functionality. Here, we provide an overview of the consortium, highlighting the addition of annotations such as predicted covalent binders, phosphorylation sites, effects of mutations on the protein structure and energetic local frustration. In addition, we describe a library of reusable web-based visualisation components and introduce new features such as a bulk download data service and a novel superposition service that generates clusters of superposed protein chains weekly for the whole PDB archive.
Collapse
|
research-article |
3 |
54 |
7
|
The wwPDB Consortium, Turner J, Abbott S, Fonseca N, Pye R, Carrijo L, Duraisamy AK, Salih O, Wang Z, Kleywegt GJ, Morris KL, Patwardhan A, Burley SK, Crichlow G, Feng Z, Flatt JW, Ghosh S, Hudson BP, Lawson CL, Liang Y, Peisach E, Persikova I, Sekharan M, Shao C, Young J, Velankar S, Armstrong D, Bage M, Bueno WM, Evans G, Gaborova R, Ganguly S, Gupta D, Harrus D, Tanweer A, Bansal M, Rangannan V, Kurisu G, Cho H, Ikegawa Y, Kengaku Y, Kim JY, Niwa S, Sato J, Takuwa A, Yu J, Hoch JC, Baskaran K, Xu W, Zhang W, Ma X. EMDB-the Electron Microscopy Data Bank. Nucleic Acids Res 2024; 52:D456-D465. [PMID: 37994703 PMCID: PMC10767987 DOI: 10.1093/nar/gkad1019] [Citation(s) in RCA: 48] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 10/18/2023] [Accepted: 10/20/2023] [Indexed: 11/24/2023] Open
Abstract
The Electron Microscopy Data Bank (EMDB) is the global public archive of three-dimensional electron microscopy (3DEM) maps of biological specimens derived from transmission electron microscopy experiments. As of 2021, EMDB is managed by the Worldwide Protein Data Bank consortium (wwPDB; wwpdb.org) as a wwPDB Core Archive, and the EMDB team is a core member of the consortium. Today, EMDB houses over 30 000 entries with maps containing macromolecules, complexes, viruses, organelles and cells. Herein, we provide an overview of the rapidly growing EMDB archive, including its current holdings, recent updates, and future plans.
Collapse
|
Review |
1 |
48 |
8
|
Meldal BHM, Perfetto L, Combe C, Lubiana T, Ferreira Cavalcante JV, Bye-A-Jee H, Waagmeester A, del-Toro N, Shrivastava A, Barrera E, Wong E, Mlecnik B, Bindea G, Panneerselvam K, Willighagen E, Rappsilber J, Porras P, Hermjakob H, Orchard S. Complex Portal 2022: new curation frontiers. Nucleic Acids Res 2022; 50:D578-D586. [PMID: 34718729 PMCID: PMC8689886 DOI: 10.1093/nar/gkab991] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/07/2021] [Accepted: 10/10/2021] [Indexed: 01/02/2023] Open
Abstract
The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database of macromolecular complexes with known function from a range of model organisms. It summarizes complex composition, topology and function along with links to a large range of domain-specific resources (i.e. wwPDB, EMDB and Reactome). Since the last update in 2019, we have produced a first draft complexome for Escherichia coli, maintained and updated that of Saccharomyces cerevisiae, added over 40 coronavirus complexes and increased the human complexome to over 1100 complexes that include approximately 200 complexes that act as targets for viral proteins or are part of the immune system. The display of protein features in ComplexViewer has been improved and the participant table is now colour-coordinated with the nodes in ComplexViewer. Community collaboration has expanded, for example by contributing to an analysis of putative transcription cofactors and providing data accessible to semantic web tools through Wikidata which is now populated with manually curated Complex Portal content through a new bot. Our data license is now CC0 to encourage data reuse. Users are encouraged to get in touch, provide us with feedback and send curation requests through the 'Support' link.
Collapse
|
Research Support, N.I.H., Extramural |
3 |
39 |
9
|
Haim-Vilmovsky L, Henriksson J, Walker JA, Miao Z, Natan E, Kar G, Clare S, Barlow JL, Charidemou E, Mamanova L, Chen X, Proserpio V, Pramanik J, Woodhouse S, Protasio AV, Efremova M, Griffin JL, Berriman M, Dougan G, Fisher J, Marioni JC, McKenzie ANJ, Teichmann SA. Mapping Rora expression in resting and activated CD4+ T cells. PLoS One 2021; 16:e0251233. [PMID: 34003838 PMCID: PMC8130942 DOI: 10.1371/journal.pone.0251233] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 04/22/2021] [Indexed: 11/19/2022] Open
Abstract
The transcription factor Rora has been shown to be important for the development of ILC2 and the regulation of ILC3, macrophages and Treg cells. Here we investigate the role of Rora across CD4+ T cells in general, but with an emphasis on Th2 cells, both in vitro as well as in the context of several in vivo type 2 infection models. We dissect the function of Rora using overexpression and a CD4-conditional Rora-knockout mouse, as well as a RORA-reporter mouse. We establish the importance of Rora in CD4+ T cells for controlling lung inflammation induced by Nippostrongylus brasiliensis infection, and have measured the effect on downstream genes using RNA-seq. Using a systematic stimulation screen of CD4+ T cells, coupled with RNA-seq, we identify upstream regulators of Rora, most importantly IL-33 and CCL7. Our data suggest that Rora is a negative regulator of the immune system, possibly through several downstream pathways, and is under control of the local microenvironment.
Collapse
MESH Headings
- Animals
- Antigens, Helminth/immunology
- Antigens, Helminth/metabolism
- CD4-Positive T-Lymphocytes/immunology
- Cells, Cultured
- Cytokines/metabolism
- Disease Models, Animal
- Female
- Gene Expression Regulation/immunology
- Lymphocyte Activation
- Macrophages/immunology
- Mice
- Mice, Inbred C57BL
- Mice, Knockout
- Mice, Transgenic
- Nippostrongylus/immunology
- Nuclear Receptor Subfamily 1, Group F, Member 1/immunology
- Nuclear Receptor Subfamily 1, Group F, Member 1/metabolism
- Pneumonia/immunology
- Pneumonia/parasitology
- Pneumonia/pathology
- Strongylida Infections/immunology
- Strongylida Infections/parasitology
- Th2 Cells/immunology
Collapse
|
research-article |
4 |
27 |
10
|
Matentzoglu N, Balhoff JP, Bello SM, Bizon C, Brush M, Callahan TJ, Chute CG, Duncan WD, Evelo CT, Gabriel D, Graybeal J, Gray A, Gyori BM, Haendel M, Harmse H, Harris NL, Harrow I, Hegde HB, Hoyt AL, Hoyt CT, Jiao D, Jiménez-Ruiz E, Jupp S, Kim H, Koehler S, Liener T, Long Q, Malone J, McLaughlin JA, McMurry JA, Moxon S, Munoz-Torres MC, Osumi-Sutherland D, Overton JA, Peters B, Putman T, Queralt-Rosinach N, Shefchek K, Solbrig H, Thessen A, Tudorache T, Vasilevsky N, Wagner AH, Mungall CJ. A Simple Standard for Sharing Ontological Mappings (SSSOM). Database (Oxford) 2022; 2022:baac035. [PMID: 35616100 PMCID: PMC9216545 DOI: 10.1093/database/baac035] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 03/08/2022] [Accepted: 05/11/2022] [Indexed: 02/03/2023]
Abstract
Despite progress in the development of standards for describing and exchanging scientific information, the lack of easy-to-use standards for mapping between different representations of the same or similar objects in different databases poses a major impediment to data integration and interoperability. Mappings often lack the metadata needed to be correctly interpreted and applied. For example, are two terms equivalent or merely related? Are they narrow or broad matches? Or are they associated in some other way? Such relationships between the mapped terms are often not documented, which leads to incorrect assumptions and makes them hard to use in scenarios that require a high degree of precision (such as diagnostics or risk prediction). Furthermore, the lack of descriptions of how mappings were done makes it hard to combine and reconcile mappings, particularly curated and automated ones. We have developed the Simple Standard for Sharing Ontological Mappings (SSSOM) which addresses these problems by: (i) Introducing a machine-readable and extensible vocabulary to describe metadata that makes imprecision, inaccuracy and incompleteness in mappings explicit. (ii) Defining an easy-to-use simple table-based format that can be integrated into existing data science pipelines without the need to parse or query ontologies, and that integrates seamlessly with Linked Data principles. (iii) Implementing open and community-driven collaborative workflows that are designed to evolve the standard continuously to address changing requirements and mapping practices. (iv) Providing reference tools and software libraries for working with the standard. In this paper, we present the SSSOM standard, describe several use cases in detail and survey some of the existing work on standardizing the exchange of mappings, with the goal of making mappings Findable, Accessible, Interoperable and Reusable (FAIR). The SSSOM specification can be found at http://w3id.org/sssom/spec. Database URL: http://w3id.org/sssom/spec.
Collapse
|
Research Support, N.I.H., Extramural |
3 |
25 |
11
|
Kapourani CA, Argelaguet R, Sanguinetti G, Vallejos CA. scMET: Bayesian modeling of DNA methylation heterogeneity at single-cell resolution. Genome Biol 2021; 22:114. [PMID: 33879195 PMCID: PMC8056718 DOI: 10.1186/s13059-021-02329-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 03/25/2021] [Indexed: 02/06/2023] Open
Abstract
High-throughput single-cell measurements of DNA methylomes can quantify methylation heterogeneity and uncover its role in gene regulation. However, technical limitations and sparse coverage can preclude this task. scMET is a hierarchical Bayesian model which overcomes sparsity, sharing information across cells and genomic features to robustly quantify genuine biological heterogeneity. scMET can identify highly variable features that drive epigenetic heterogeneity, and perform differential methylation and variability analyses. We illustrate how scMET facilitates the characterization of epigenetically distinct cell populations and how it enables the formulation of novel hypotheses on the epigenetic regulation of gene expression. scMET is available at https://github.com/andreaskapou/scMET .
Collapse
|
research-article |
4 |
8 |
12
|
Orlic-Milacic M, Rothfels K, Matthews L, Wright A, Jassal B, Shamovsky V, Trinh Q, Gillespie ME, Sevilla C, Tiwari K, Ragueneau E, Gong C, Stephan R, May B, Haw R, Weiser J, Beavers D, Conley P, Hermjakob H, Stein LD, D’Eustachio P, Wu G. Pathway-based, reaction-specific annotation of disease variants for elucidation of molecular phenotypes. Database (Oxford) 2024; 2024:baae031. [PMID: 38713862 PMCID: PMC11184451 DOI: 10.1093/database/baae031] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 02/23/2024] [Accepted: 04/01/2024] [Indexed: 05/09/2024]
Abstract
Germline and somatic mutations can give rise to proteins with altered activity, including both gain and loss-of-function. The effects of these variants can be captured in disease-specific reactions and pathways that highlight the resulting changes to normal biology. A disease reaction is defined as an aberrant reaction in which a variant protein participates. A disease pathway is defined as a pathway that contains a disease reaction. Annotation of disease variants as participants of disease reactions and disease pathways can provide a standardized overview of molecular phenotypes of pathogenic variants that is amenable to computational mining and mathematical modeling. Reactome (https://reactome.org/), an open source, manually curated, peer-reviewed database of human biological pathways, in addition to providing annotations for >11 000 unique human proteins in the context of ∼15 000 wild-type reactions within more than 2000 wild-type pathways, also provides annotations for >4000 disease variants of close to 400 genes as participants of ∼800 disease reactions in the context of ∼400 disease pathways. Functional annotation of disease variants proceeds from normal gene functions, described in wild-type reactions and pathways, through disease variants whose divergence from normal molecular behaviors has been experimentally verified, to extrapolation from molecular phenotypes of characterized variants to variants of unknown significance using criteria of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Reactome's data model enables mapping of disease variant datasets to specific disease reactions within disease pathways, providing a platform to infer pathway output impacts of numerous human disease variants and model organism orthologs, complementing computational predictions of variant pathogenicity. Database URL: https://reactome.org/.
Collapse
|
Research Support, N.I.H., Extramural |
1 |
2 |
13
|
Ormazábal A, Carletti MS, Saldaño TE, Gonzalez Buitron M, Marchetti J, Palopoli N, Bateman A. Expanding the repertoire of human tandem repeat RNA-binding proteins. PLoS One 2023; 18:e0290890. [PMID: 37729217 PMCID: PMC10511089 DOI: 10.1371/journal.pone.0290890] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 08/15/2023] [Indexed: 09/22/2023] Open
Abstract
Protein regions consisting of arrays of tandem repeats are known to bind other molecular partners, including nucleic acid molecules. Although the interactions between repeat proteins and DNA are already widely explored, studies characterising tandem repeat RNA-binding proteins are lacking. We performed a large-scale analysis of human proteins devoted to expanding the knowledge about tandem repeat proteins experimentally reported as RNA-binding molecules. This work is timely because of the release of a full set of accurate structural models for the human proteome amenable to repeat detection using structural methods. The main goal of our analysis was to build a comprehensive set of human RNA-binding proteins that contain repeats at the sequence or structure level. Our results showed that the combination of sequence and structural methods finds significantly more tandem repeat proteins than either method alone. We identified 219 tandem repeat proteins that bind RNA molecules and characterised the overlap between repeat regions and RNA-binding regions as a first step towards assessing their functional relationship. We observed differences in the characteristics of repeat regions predicted by sequence-based or structure-based methods in terms of their sequence composition, their functions and their protein domains.
Collapse
|
research-article |
2 |
1 |
14
|
Cacheiro P, Pava D, Parkinson H, VanZanten M, Wilson R, Gunes O, The International Mouse Phenotyping Consortium, Smedley D. Computational identification of disease models through cross-species phenotype comparison. Dis Model Mech 2024; 17:dmm050604. [PMID: 38881316 PMCID: PMC11247498 DOI: 10.1242/dmm.050604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 06/11/2024] [Indexed: 06/18/2024] Open
Abstract
The use of standardised phenotyping screens to identify abnormal phenotypes in mouse knockouts, together with the use of ontologies to describe such phenotypic features, allows the implementation of an automated and unbiased pipeline to identify new models of disease by performing phenotype comparisons across species. Using data from the International Mouse Phenotyping Consortium (IMPC), approximately half of mouse mutants are able to mimic, at least partially, the human ortholog disease phenotypes as computed by the PhenoDigm algorithm. We found the number of phenotypic abnormalities in the mouse and the corresponding Mendelian disorder, the pleiotropy and severity of the disease, and the viability and zygosity status of the mouse knockout to be associated with the ability of mouse models to recapitulate the human disorder. An analysis of the IMPC impact on disease gene discovery through a publication-tracking system revealed that the resource has been implicated in at least 109 validated rare disease-gene associations over the last decade.
Collapse
|
Comparative Study |
1 |
|
15
|
Missarova A, Dann E, Rosen L, Satija R, Marioni J. Leveraging neighborhood representations of single-cell data to achieve sensitive DE testing with miloDE. Genome Biol 2024; 25:189. [PMID: 39026254 PMCID: PMC11256449 DOI: 10.1186/s13059-024-03334-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 07/10/2024] [Indexed: 07/20/2024] Open
Abstract
Single-cell RNA-sequencing enables testing for differential expression (DE) between conditions at a cell type level. While powerful, one of the limitations of such approaches is that the sensitivity of DE testing is dictated by the sensitivity of clustering, which is often suboptimal. To overcome this, we present miloDE-a cluster-free framework for DE testing (available as an open-source R package). We illustrate the performance of miloDE on both simulated and real data. Using miloDE, we identify a transient hemogenic endothelia-like state in mouse embryos lacking Tal1 and detect distinct programs during macrophage activation in idiopathic pulmonary fibrosis.
Collapse
|
research-article |
1 |
|
16
|
Kunnakkattu IR, Choudhary P, Pravda L, Nadzirin N, Smart OS, Yuan Q, Anyango S, Nair S, Varadi M, Velankar S. PDBe CCDUtils: an RDKit-based toolkit for handling and analysing small molecules in the Protein Data Bank. J Cheminform 2023; 15:117. [PMID: 38042830 PMCID: PMC10693035 DOI: 10.1186/s13321-023-00786-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2023] [Accepted: 11/17/2023] [Indexed: 12/04/2023] Open
Abstract
While the Protein Data Bank (PDB) contains a wealth of structural information on ligands bound to macromolecules, their analysis can be challenging due to the large amount and diversity of data. Here, we present PDBe CCDUtils, a versatile toolkit for processing and analysing small molecules from the PDB in PDBx/mmCIF format. PDBe CCDUtils provides streamlined access to all the metadata for small molecules in the PDB and offers a set of convenient methods to compute various properties using RDKit, such as 2D depictions, 3D conformers, physicochemical properties, scaffolds, common fragments, and cross-references to small molecule databases using UniChem. The toolkit also provides methods for identifying all the covalently attached chemical components in a macromolecular structure and calculating similarity among small molecules. By providing a broad range of functionality, PDBe CCDUtils caters to the needs of researchers in cheminformatics, structural biology, bioinformatics and computational chemistry.
Collapse
|
research-article |
2 |
|
17
|
Grentner A, Ragueneau E, Gong C, Prinz A, Gansberger S, Oyarzun I, Hermjakob H, Griss J. ReactomeGSA: new features to simplify public data reuse. Bioinformatics 2024; 40:btae338. [PMID: 38806182 PMCID: PMC11147800 DOI: 10.1093/bioinformatics/btae338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/20/2024] [Accepted: 05/26/2024] [Indexed: 05/30/2024] Open
Abstract
MOTIVATION ReactomeGSA is part of the Reactome knowledgebase and one of the leading multi-omics pathway analysis platforms. ReactomeGSA provides access to quantitative pathway analysis methods supporting different 'omics data types. Additionally, ReactomeGSA can process different datasets simultaneously, leading to a comparative pathway analysis that can also be performed across different species. RESULTS We present a major update to the ReactomeGSA analysis platforms that greatly simplifies the reuse and direct integration of public data. In order to increase the number of available datasets, we developed the new grein_loader Python application that can directly fetch experiments from the GREIN resource. This enabled us to support both EMBL-EBI's Expression Atlas and GEO RNA-seq Experiments Interactive Navigator within ReactomeGSA. To further increase the visibility and simplify the reuse of public datasets, we integrated a novel search function into ReactomeGSA that enables users to search for public datasets across both supported resources. Finally, we completely re-developed ReactomeGSA's web-frontend and R/Bioconductor package to support the new search and loading features, and greatly simplify the use of ReactomeGSA. AVAILABILITY AND IMPLEMENTATION The new ReactomeGSA web frontend is available at https://www.reactome.org/gsa with an built-in, interactive tutorial. The ReactomeGSA R package (https://bioconductor.org/packages/release/bioc/html/ReactomeGSA.html) is available through Bioconductor and shipped with detailed documentation and vignettes. The grein_loader Python application is available through the Python Package Index (pypi). The complete source code for all applications is available on GitHub at https://github.com/grisslab/grein_loader and https://github.com/reactome.
Collapse
|
Research Support, N.I.H., Extramural |
1 |
|
18
|
Ribeiro AJM, Riziotis IG, Borkakoti N, Fernandes PA, Ramos MJ, Thornton JM. Measuring catalytic mechanism similarity - a new approach to study enzyme function and evolution. FEBS J 2025. [PMID: 40260653 DOI: 10.1111/febs.70106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Revised: 03/07/2025] [Accepted: 04/11/2025] [Indexed: 04/23/2025]
Abstract
Similarity measures for protein sequence, structure and enzyme reactions have been essential tools for translating an abundance of experimental data about enzymes into biological insights. Enzymes with similar sequence and structure, for example, can be organised into evolutionary families, and within families, reaction similarity can highlight examples of conservation or divergent evolution. When it comes to reaction mechanisms, despite their importance in explaining the catalytic power of enzymes and their evolution, no similarity measures have been developed until now. We addressed this gap by developing a method to calculate mechanism similarity based on the bond changes and charge transfers occurring at each catalytic step, where we have the ability to adjust the size of the chemical environment surrounding the atoms directly involved in these transformations. Using this newly developed method, we performed a pairwise comparison of all the mechanisms stored in the Mechanism and Catalytic Site Atlas (M-CSA) database. This analysis illustrates how mechanism similarity can be a powerful tool to navigate the known catalytic space and to discover and characterise both convergent and divergent evolutionary relationships.
Collapse
|
|
1 |
|
19
|
Haselimashhadi H, Babalola K, Wilson R, Groza T, Muñoz-Fuentes V. A consensus score to combine inferences from multiple centres. Mamm Genome 2023; 34:379-388. [PMID: 37154937 PMCID: PMC10382396 DOI: 10.1007/s00335-023-09993-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Accepted: 03/02/2023] [Indexed: 05/10/2023]
Abstract
Experiments in which data are collected by multiple independent resources, including multicentre data, different laboratories within the same centre or with different operators, are challenging in design, data collection and interpretation. Indeed, inconsistent results across the resources are possible. In this paper, we propose a statistical solution for the problem of multi-resource consensus inferences when statistical results from different resources show variation in magnitude, directionality, and significance. Our proposed method allows combining the corrected p-values, effect sizes and the total number of centres into a global consensus score. We apply this method to obtain a consensus score for data collected by the International Mouse Phenotyping Consortium (IMPC) across 11 centres. We show the application of this method to detect sexual dimorphism in haematological data and discuss the suitability of the methodology.
Collapse
|
Research Support, N.I.H., Extramural |
2 |
|
20
|
Tagirdzhanova G, Saary P, Cameron ES, Allen CCG, Garber AI, Escandón DD, Cook AT, Goyette S, Nogerius VT, Passo A, Mayrhofer H, Holien H, Tønsberg T, Stein LY, Finn RD, Spribille T. Microbial occurrence and symbiont detection in a global sample of lichen metagenomes. PLoS Biol 2024; 22:e3002862. [PMID: 39509454 PMCID: PMC11542873 DOI: 10.1371/journal.pbio.3002862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 09/24/2024] [Indexed: 11/15/2024] Open
Abstract
In lichen research, metagenomes are increasingly being used for evaluating symbiont composition and metabolic potential, but the overall content and limitations of these metagenomes have not been assessed. We reassembled over 400 publicly available metagenomes, generated metagenome-assembled genomes (MAGs), constructed phylogenomic trees, and mapped MAG occurrence and frequency across the data set. Ninety-seven percent of the 1,000 recovered MAGs were bacterial or the fungal symbiont that provides most cellular mass. Our mapping of recovered MAGs provides the most detailed survey to date of bacteria in lichens and shows that 4 family-level lineages from 2 phyla accounted for as many bacterial occurrences in lichens as all other 71 families from 16 phyla combined. Annotation of highly complete bacterial, fungal, and algal MAGs reveals functional profiles that suggest interdigitated vitamin prototrophies and auxotrophies, with most lichen fungi auxotrophic for biotin, most bacteria auxotrophic for thiamine and the few annotated algae with partial or complete pathways for both, suggesting a novel dimension of microbial cross-feeding in lichen symbioses. Contrary to longstanding hypotheses, we found no annotations consistent with nitrogen fixation in bacteria other than known cyanobacterial symbionts. Core lichen symbionts such as algae were recovered as MAGs in only a fraction of the lichen symbioses in which they are known to occur. However, the presence of these and other microbes could be detected at high frequency using small subunit rRNA analysis, including in many lichens in which they are not otherwise recognized to occur. The rate of MAG recovery correlates with sequencing depth, but is almost certainly influenced by biological attributes of organisms that affect the likelihood of DNA extraction, sequencing and successful assembly, including cellular abundance, ploidy and strain co-occurrence. Our results suggest that, though metagenomes are a powerful tool for surveying microbial occurrence, they are of limited use in assessing absence, and their interpretation should be guided by an awareness of the interacting effects of microbial community complexity and sequencing depth.
Collapse
|
research-article |
1 |
|
21
|
Cacheiro P, Spielmann N, Mashhadi HH, Fuchs H, Gailus-Durner V, Smedley D, de Angelis MH. Knockout mice are an important tool for human monogenic heart disease studies. Dis Model Mech 2023; 16:dmm049770. [PMID: 36825469 PMCID: PMC10073007 DOI: 10.1242/dmm.049770] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Accepted: 02/15/2023] [Indexed: 02/25/2023] Open
Abstract
Mouse models are relevant to studying the functionality of genes involved in human diseases; however, translation of phenotypes can be challenging. Here, we investigated genes related to monogenic forms of cardiovascular disease based on the Genomics England PanelApp and aligned them to International Mouse Phenotyping Consortium (IMPC) data. We found 153 genes associated with cardiomyopathy, cardiac arrhythmias or congenital heart disease in humans, of which 151 have one-to-one mouse orthologues. For 37.7% (57/151), viability and heart data captured by electrocardiography, transthoracic echocardiography, morphology and pathology from embryos and young adult mice are available. In knockout mice, 75.4% (43/57) of these genes showed non-viable phenotypes, whereas records of prenatal, neonatal or infant death in humans were found for 35.1% (20/57). Multisystem phenotypes are common, with 58.8% (20/34) of heterozygous (homozygous lethal) and 78.6% (11/14) of homozygous (viable) mice showing cardiovascular, metabolic/homeostasis, musculoskeletal, hematopoietic, nervous system and/or growth abnormalities mimicking the clinical manifestations observed in patients. These IMPC data are critical beyond cardiac diagnostics given their multisystemic nature, allowing detection of abnormalities across physiological systems and providing a valuable resource to understand pleiotropic effects.
Collapse
|
Research Support, N.I.H., Extramural |
2 |
|
22
|
Camacho OM, Ramsbottom KA, Prakash A, Sun Z, Perez Riverol Y, Bowler-Barnett E, Martin M, Fan J, Deutsch EW, Vizcaíno JA, Jones AR. Phosphorylation in the Plasmodium falciparum Proteome: A Meta-Analysis of Publicly Available Data Sets. J Proteome Res 2024; 23:5326-5341. [PMID: 39475123 PMCID: PMC11629380 DOI: 10.1021/acs.jproteome.4c00418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Revised: 10/07/2024] [Accepted: 10/11/2024] [Indexed: 12/07/2024]
Abstract
Malaria is a deadly disease caused by Apicomplexan parasites of the Plasmodium genus. Several species of the Plasmodium genus are known to be infectious to humans, of which P. falciparum is the most virulent. Post-translational modifications (PTMs) of proteins coordinate cell signaling and hence regulate many biological processes in P. falciparum homeostasis and host infection, of which the most highly studied is phosphorylation. Phosphosites on proteins can be identified by tandem mass spectrometry (MS) performed on enriched samples (phosphoproteomics), followed by downstream computational analyses. We have performed a large-scale meta-analysis of 11 publicly available phosphoproteomics data sets to build a comprehensive atlas of phosphosites in the P. falciparum proteome, using robust pipelines aimed at strict control of false identifications. We identified a total of 26,609 phosphorylated sites on P. falciparum proteins, split across three categories of data reliability (gold/silver/bronze). We identified significant sequence motifs, likely indicative of different groups of kinases responsible for different groups of phosphosites. Conservation analysis identified clusters of phosphoproteins that are highly conserved and others that are evolving faster within the Plasmodium genus, and implicated in different pathways. We were also able to identify over 180,000 phosphosites within Plasmodium species beyond falciparum, based on orthologue mapping. We also explored the structural context of phosphosites, identifying a strong enrichment for phosphosites on fast-evolving (low conservation) intrinsically disordered regions (IDRs) of proteins. In other species, IDRs have been shown to have an important role in modulating protein-protein interactions, particularly in signaling, and thus warranting further study for their roles in host-pathogen interactions. All data have been made available via UniProtKB, PRIDE, and PeptideAtlas, with visualization interfaces for exploring phosphosites in the context of other data on Plasmodium proteins.
Collapse
|
Meta-Analysis |
1 |
|
23
|
Rosonovski S, Levchenko M, Bhatnagar R, Chandrasekaran U, Faulk L, Hassan I, Jeffryes M, Mubashar SI, Nassar M, Jayaprabha Palanisamy M, Parkin M, Poluru J, Rogers F, Saha S, Selim M, Shafique Z, Ide-Smith M, Stephenson D, Tirunagari S, Venkatesan A, Xing L, Harrison M. Europe PMC in 2023. Nucleic Acids Res 2024; 52:D1668-D1676. [PMID: 37994696 PMCID: PMC10767826 DOI: 10.1093/nar/gkad1085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 10/23/2023] [Accepted: 10/30/2023] [Indexed: 11/24/2023] Open
Abstract
Europe PMC (https://europepmc.org/) is an open access database of life science journal articles and preprints, which contains over 42 million abstracts and over 9 million full text articles accessible via the website, APIs and bulk download. This publication outlines new developments to the Europe PMC platform since the last database update in 2020 (1) and focuses on five main areas. (i) Improving discoverability, reproducibility and trust in preprints by indexing new preprint content, enriching preprint metadata and identifying withdrawn and removed preprints. (ii) Enhancing support for text and data mining by expanding the types of annotations provided and developing the Europe PMC Annotations Corpus, which can be used to train machine learning models to increase their accuracy and precision. (iii) Developing the Article Status Monitor tool and email alerts, to notify users about new articles and updates to existing records. (iv) Positioning Europe PMC as an open scholarly infrastructure through increasing the portion of open source core software, improving sustainability and accessibility of the service.
Collapse
|
research-article |
1 |
|
24
|
Bielow C, Hoffmann N, Jimenez-Morales D, Van Den Bossche T, Vizcaíno JA, Tabb DL, Bittremieux W, Walzer M. Communicating Mass Spectrometry Quality Information in mzQC with Python, R, and Java. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:1875-1882. [PMID: 38918936 PMCID: PMC11311537 DOI: 10.1021/jasms.4c00174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Revised: 06/07/2024] [Accepted: 06/11/2024] [Indexed: 06/27/2024]
Abstract
Mass spectrometry is a powerful technique for analyzing molecules in complex biological samples. However, inter- and intralaboratory variability and bias can affect the data due to various factors, including sample handling and preparation, instrument calibration and performance, and data acquisition and processing. To address this issue, the Quality Control (QC) working group of the Human Proteome Organization's Proteomics Standards Initiative has established the standard mzQC file format for reporting and exchanging information relating to data quality. mzQC is based on the JavaScript Object Notation (JSON) format and provides a lightweight yet versatile file format that can be easily implemented in software. Here, we present open-source software libraries to process mzQC data in three programming languages: Python, using pymzqc; R, using rmzqc; and Java, using jmzqc. The libraries follow a common data model and provide shared functionalities, including the (de)serialization and validation of mzQC files. We demonstrate use of the software libraries in a workflow for extracting, analyzing, and visualizing QC metrics from different sources. Additionally, we show how these libraries can be integrated with each other, with existing software tools, and in automated workflows for the QC of mass spectrometry data. All software libraries are available as open source under the MS-Quality-Hub organization on GitHub (https://github.com/MS-Quality-Hub).
Collapse
|
research-article |
1 |
|
25
|
Waman V, Bordin N, Lau A, Kandathil S, Wells J, Miller D, Velankar S, Jones D, Sillitoe I, Orengo C. CATH v4.4: major expansion of CATH by experimental and predicted structural data. Nucleic Acids Res 2025; 53:D348-D355. [PMID: 39565206 PMCID: PMC11701635 DOI: 10.1093/nar/gkae1087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2024] [Revised: 10/18/2024] [Accepted: 10/24/2024] [Indexed: 11/21/2024] Open
Abstract
CATH (https://www.cathdb.info) is a structural classification database that assigns domains to the structures in the Protein Data Bank (PDB) and AlphaFold Protein Structure Database (AFDB) and adds layers of biological information, including homology and functional annotation. This article covers developments in the CATH classification since 2021. We report the significant expansion of structural information (180-fold) for CATH superfamilies through classification of PDB domains and predicted domain structures from the Encyclopedia of Domains (TED) resource. TED provides information on predicted domains in AFDB. CATH v4.4 represents an expansion of ∼64 844 experimentally determined domain structures from PDB. We also present a mapping of ∼90 million predicted domains from TED to CATH superfamilies. New PDB and TED data increases the number of superfamilies from 5841 to 6573, folds from 1349 to 2078 and architectures from 41 to 77. TED data comprises predicted structures, so these new folds and architectures remain hypothetical until experimentally confirmed. CATH also classifies domains into functional families (FunFams) within a superfamily. We have updated sequences in FunFams by scanning FunFam-HMMs against UniProt release 2024_02, giving a 276% increase in FunFams coverage. The mapping of TED structural domains has resulted in a 4-fold increase in FunFams with structural information.
Collapse
|
research-article |
1 |
|