1
|
Wermers Z, Yoo S, Radenbaugh B, Douglass A, Biesecker LG, Johnston JJ. Comparison of literature mining tools for variant classification: Through the lens of 50 RYR1 variants. Genet Med 2024; 26:101083. [PMID: 38281099 DOI: 10.1016/j.gim.2024.101083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 01/19/2024] [Accepted: 01/22/2024] [Indexed: 01/29/2024] Open
Abstract
PURPOSE The American College of Medical Genetics and Genomics and the Association for Molecular Pathology have outlined a schema that allows for systematic classification of variant pathogenicity. Although gnomAD is generally accepted as a reliable source of population frequency data and ClinGen has provided guidance on the utility of specific bioinformatic predictors, there is no consensus source for identifying publications relevant to a variant. Multiple tools are available to aid in the identification of relevant variant literature, including manually curated databases and literature search engines. We set out to determine the utility of 4 literature mining tools used for ascertainment to inform the discussion of the use of these tools. METHODS Four literature mining tools including the Human Gene Mutation Database, Mastermind, ClinVar, and LitVar 2.0 were used to identify relevant variant literature for 50 RYR1 variants. Sensitivity and precision were determined for each tool. RESULTS Sensitivity among the 4 tools ranged from 0.332 to 0.687. Precision ranged from 0.389 to 0.906. No single tool retrieved all relevant publications. CONCLUSION At the current time, the use of multiple tools is necessary to completely identify the literature relevant to curate a variant.
Collapse
Affiliation(s)
- Zara Wermers
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Seeley Yoo
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Bailey Radenbaugh
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Amber Douglass
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Leslie G Biesecker
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD
| | - Jennifer J Johnston
- Center for Precision Health Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD.
| |
Collapse
|
2
|
Attrill H, Antonazzo G, Goodman JL, Thurmond J, Strelets VB, Brown NH, the FlyBase Consortium. A new experimental evidence-weighted signaling pathway resource in FlyBase. Development 2024; 151:dev202255. [PMID: 38230566 PMCID: PMC10911275 DOI: 10.1242/dev.202255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 01/09/2024] [Indexed: 01/18/2024]
Abstract
Research in model organisms is central to the characterization of signaling pathways in multicellular organisms. Here, we present the comprehensive and systematic curation of 17 Drosophila signaling pathways using the Gene Ontology framework to establish a dynamic resource that has been incorporated into FlyBase, providing visualization and data integration tools to aid research projects. By restricting to experimental evidence reported in the research literature and quantifying the amount of such evidence for each gene in a pathway, we captured the landscape of empirical knowledge of signaling pathways in Drosophila.
Collapse
Affiliation(s)
- Helen Attrill
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Giulia Antonazzo
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | - Joshua L. Goodman
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Jim Thurmond
- Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | | | - Nicholas H. Brown
- Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3DY, UK
| | | |
Collapse
|
3
|
Maraver P, Tecuatl C, Ascoli GA. Automatic identification of scientific publications describing digital reconstructions of neural morphology. Brain Inform 2023; 10:23. [PMID: 37684527 PMCID: PMC10491540 DOI: 10.1186/s40708-023-00202-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 08/06/2023] [Indexed: 09/10/2023] Open
Abstract
The increasing number of peer-reviewed publications constitutes a challenge for biocuration. For example, NeuroMorpho.Org, a sharing platform for digital reconstructions of neural morphology, must evaluate more than 6000 potentially relevant articles per year to identify data of interest. Here, we describe a tool that uses natural language processing and deep learning to assess the likelihood of a publication to be relevant for the project. The tool automatically identifies articles describing digitally reconstructed neural morphologies with high accuracy. Its processing rate of 900 publications per hour is not only amply sufficient to autonomously track new research, but also allowed the successful evaluation of older publications backlogged due to limited human resources. The number of bio-entities found since launching the tool almost doubled while greatly reducing manual labor. The classification tool is open source, configurable, and simple to use, making it extensible to other biocuration projects.
Collapse
Affiliation(s)
- Patricia Maraver
- Bioengineering Department; College of Engineering and Computing, George Mason University, Fairfax, VA, USA
- Center for Neural Informatics, Structures, & Plasticity; Krasnow Institute for Advanced Study, George Mason University, Fairfax, VA, USA
| | - Carolina Tecuatl
- Bioengineering Department; College of Engineering and Computing, George Mason University, Fairfax, VA, USA
- Center for Neural Informatics, Structures, & Plasticity; Krasnow Institute for Advanced Study, George Mason University, Fairfax, VA, USA
| | - Giorgio A Ascoli
- Bioengineering Department; College of Engineering and Computing, George Mason University, Fairfax, VA, USA.
- Center for Neural Informatics, Structures, & Plasticity; Krasnow Institute for Advanced Study, George Mason University, Fairfax, VA, USA.
| |
Collapse
|
4
|
Chatterjee A, Swierstra T, Kuiper M. Dealing with different conceptions of pollution in the Gene Regulation Knowledge Commons. Biochim Biophys Acta Gene Regul Mech 2022; 1865:194779. [PMID: 34971789 DOI: 10.1016/j.bbagrm.2021.194779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2021] [Revised: 11/28/2021] [Accepted: 11/29/2021] [Indexed: 06/14/2023]
Abstract
Current research of gene regulatory mechanisms is increasingly dependent on the availability of high-quality information from manually curated databases. Biocurators undertake the task of extracting knowledge claims from scholarly publications, organizing these claims in a meaningful format and making them computable. In doing so, they enhance the value of existing scientific knowledge by making it accessible to the users of their databases. In this capacity, biocurators are well positioned to identify and weed out information that is of insufficient quality. The criteria that define information quality are typically outlined in curation guidelines developed by biocurators. These guidelines have been prudently developed to reflect the needs of the user community the database caters to. The guidelines depict the standard evidence that this community recognizes as sufficient justification for trustworthy data. Additionally, these guidelines determine the process by which data should be organized and maintained to be valuable to users. Following these guidelines, biocurators assess the quality, reliability, and validity of the information they encounter. In this article we explore to what extent different use cases agree with the inclusion criteria that define positive and negative data, implemented by the database. What are the drawbacks to users who have queries that would be well served by results that fall just short of the criteria used by a database? Finally, how can databases (and biocurators) accommodate the needs of such more explorative use cases?
Collapse
Affiliation(s)
- Anamika Chatterjee
- Department of Philosophy and Religious Studies, Norwegian University of Science and Technology (NTNU), Trondheim, Norway.
| | - Tsjalling Swierstra
- Department of Philosophy, Maastricht University, Maastricht, the Netherlands
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| |
Collapse
|
5
|
Kuiper M, Bonello J, Fernández-Breis JT, Bucher P, Futschik ME, Gaudet P, Kulakovskiy IV, Licata L, Logie C, Lovering RC, Makeev VJ, Orchard S, Panni S, Perfetto L, Sant D, Schulz S, Zerbino DR, Lægreid A. The Gene Regulation Knowledge Commons: The action area of GREEKC. Biochim Biophys Acta Gene Regul Mech 2021; 1865:194768. [PMID: 34757206 DOI: 10.1016/j.bbagrm.2021.194768] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 10/18/2021] [Accepted: 10/20/2021] [Indexed: 02/08/2023]
Abstract
The COST Action Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC, CA15205, www.greekc.org) organized nine workshops in a four-year period, starting September 2016. The workshops brought together a wide range of experts from all over the world working on various parts of the knowledge cycle that is central to understanding gene regulatory mechanisms. The discussions between ontologists, curators, text miners, biologists, bioinformaticians, philosophers and computational scientists spawned a host of activities aimed to update and standardise existing knowledge management workflows, encourage new experimental approaches and thoroughly involve end-users in the process to design the Gene Regulation Knowledge Commons (GRKC). The GREEKC consortium describes its main achievements, contextualised in a state-of-the-art of current tools and resources that today represent the GRKC.
Collapse
Affiliation(s)
- Martin Kuiper
- Systems Biology Group, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway.
| | - Joseph Bonello
- Faculty of Information & Communication Technology, University of Malta, Msida, Malta
| | | | - Philipp Bucher
- Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Amphipôle, 1015 Lausanne, Switzerland
| | - Matthias E Futschik
- Systems Biology and Bioinformatics Laboratory (SysBioLab), Centre of Marine Sciences (CCMAR), University of Algarve, 8005-139 Faro, Portugal
| | - Pascale Gaudet
- SIB Swiss Institute of Bioinformatics, 1 Rue Michel-Servet, 1204 Geneva, Switzerland
| | - Ivan V Kulakovskiy
- Institute of Protein Research, Russian Academy of Sciences, Institutskaya 4, 142290 Pushchino, Russia
| | - Luana Licata
- Department of Biology, University of Rome Tor Vergata, Rome, Italy
| | - Colin Logie
- Department of Molecular Biology, Faculty of Science, Radboud University, PO Box 9101, Nijmegen 6500HG, the Netherlands
| | - Ruth C Lovering
- Functional Gene Annotation, Pre-clinical and Fundamental Science, Institute of Cardiovascular Science, University College London, 5 University Street, London WC1E 6JF, UK
| | - Vsevolod J Makeev
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkina 3, 119991 Moscow, Russia
| | - Sandra Orchard
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Simona Panni
- Department DIBEST, University of Calabria, Rende, Italy
| | - Livia Perfetto
- Fondazione Human Technopole, Department of Biology, Via Cristina Belgioioso, 171, 20157 Milan, Italy
| | - David Sant
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way #140, Salt Lake City, UT 84108, United States
| | - Stefan Schulz
- Institute of Medical Informatics, Statistics and Documentation, Medical University of Graz, Auenbruggerpl. 2, Graz, Austria
| | - Daniel R Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Astrid Lægreid
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, 7491 Trondheim, Norway
| | | |
Collapse
|
6
|
Lovering RC, Gaudet P, Acencio ML, Ignatchenko A, Jolma A, Fornes O, Kuiper M, Kulakovskiy IV, Lægreid A, Martin MJ, Logie C. A GO catalogue of human DNA-binding transcription factors. Biochim Biophys Acta Gene Regul Mech 2021; 1864:194765. [PMID: 34673265 DOI: 10.1016/j.bbagrm.2021.194765] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 10/08/2021] [Accepted: 10/09/2021] [Indexed: 12/27/2022]
Abstract
To control gene transcription, DNA-binding transcription factors recognise specific sequence motifs in gene regulatory regions. A complete and reliable GO annotation of all DNA-binding transcription factors is key to investigating the delicate balance of gene regulation in response to environmental and developmental stimuli. The need for such information is demonstrated by the many lists of transcription factors that have been produced over the past decade. The COST Action Gene Regulation Ensemble Effort for the Knowledge Commons (GREEKC) Consortium brought together experts in the field of transcription with the aim of providing high quality and interoperable gene regulatory data. The Gene Ontology (GO) Consortium provides strict definitions for gene product function, including factors that regulate transcription. The collaboration between the GREEKC and GO Consortia has enabled the application of those definitions to produce a new curated catalogue of over 1400 human DNA-binding transcription factors, that can be accessed at https://www.ebi.ac.uk/QuickGO/targetset/dbTF. This catalogue has facilitated an improvement in the GO annotation of human DNA-binding transcription factors and led to the GO annotation of almost sixty thousand DNA-binding transcription factors in over a hundred species. Thus, this work will aid researchers investigating the regulation of transcription in both biomedical and basic science.
Collapse
Affiliation(s)
- Ruth C Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, UCL Institute of Cardiovascular Science, University College London, London WC1E 6BT, United Kingdom.
| | - Pascale Gaudet
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, 1 Rue Michel-Servet, 1211 Geneve 4, Switzerland.
| | - Marcio L Acencio
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim NO-7491, Norway.
| | - Alex Ignatchenko
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
| | - Arttu Jolma
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada.
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, 950 W 28th Ave, Vancouver, British Columbia V5Z 4H4, Canada.
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology, Trondheim NO-7491, Norway.
| | - Ivan V Kulakovskiy
- Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119991, Russia; Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow 119991, Russia.
| | - Astrid Lægreid
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim NO-7491, Norway.
| | - Maria J Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
| | - Colin Logie
- Molecular Biology Department, Faculty of Science, Radboud University, PO Box 9101, 6500HB Nijmegen, the Netherlands.
| |
Collapse
|
7
|
Gaudet P, Logie C, Lovering RC, Kuiper M, Lægreid A, Thomas PD. Gene Ontology representation for transcription factor functions. Biochim Biophys Acta Gene Regul Mech 2021; 1864:194752. [PMID: 34461313 DOI: 10.1016/j.bbagrm.2021.194752] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 08/24/2021] [Accepted: 08/25/2021] [Indexed: 12/31/2022]
Abstract
Transcription plays a central role in defining the identity and functionalities of cells, as well as in their responses to changes in the cellular environment. The Gene Ontology (GO) provides a rigorously defined set of concepts that describe the functions of gene products. A GO annotation is a statement about the function of a particular gene product, represented as an association between a gene product and the biological concept a GO term defines. Critically, each GO annotation is based on traceable scientific evidence. Here, we describe the different GO terms that are associated with proteins involved in transcription and its regulation, focusing on the standard of evidence required to support these associations. This article is intended to help users of GO annotations understand how to interpret the annotations and can contribute to the consistency of GO annotations. We distinguish between three classes of activities involved in transcription or directly regulating it - general transcription factors, DNA-binding transcription factors, and transcription co-regulators.
Collapse
Affiliation(s)
- Pascale Gaudet
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, 1 Rue Michel-Servet, 1211 Genève, Switzerland.
| | - Colin Logie
- Molecular Biology Department, Faculty of Science, Radboud University, PO box 9101, 6500HB Nijmegen, the Netherlands
| | - Ruth C Lovering
- Functional Gene Annotation, Preclinical and Fundamental Science, UCL Institute of Cardiovascular Science, University College London, London, UK
| | - Martin Kuiper
- Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway
| | - Astrid Lægreid
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Paul D Thomas
- Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA, USA
| |
Collapse
|
8
|
Díaz-Rodríguez M, Lithgow-Serrano O, Guadarrama-García F, Tierrafría VH, Gama-Castro S, Solano-Lira H, Salgado H, Rinaldi F, Méndez-Cruz CF, Collado-Vides J. Lisen&Curate: A platform to facilitate gathering textual evidence for curation of regulation of transcription initiation in bacteria. Biochim Biophys Acta Gene Regul Mech 2021; 1864:194753. [PMID: 34461312 PMCID: PMC10155859 DOI: 10.1016/j.bbagrm.2021.194753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 07/12/2021] [Accepted: 08/25/2021] [Indexed: 10/20/2022]
Abstract
The number of published papers in biomedical research makes it rather impossible for a researcher to keep up to date. This is where manually curated databases contribute facilitating the access to knowledge. However, the structure required by databases strongly limits the type of valuable information that can be incorporated. Here, we present Lisen&Curate, a curation system that facilitates linking sentences or part of sentences (both considered sources) in articles with their corresponding curated objects, so that rich additional information of these objects is easily available to users. These sources are going to be offered both within RegulonDB and a new database, L-Regulon. To show the relevance of our work, two senior curators performed a curation of 31 articles on the regulation of transcription initiation of E. coli using Lisen&Curate. As a result, 194 objects were curated and 781 sources were recorded. We also found that these sources are useful to develop automatic approaches to detect objects in articles by observing word frequency patterns and by carrying out an open information extraction task. Sources may help to elaborate a controlled vocabulary of experimental methods. Finally, we discuss our ecosystem of interconnected applications, RegulonDB, L-Regulon, and Lisen&Curate, to facilitate the access to knowledge on regulation of transcription initiation in bacteria. We see our proposal as the starting point to change the way experimentalists connect a piece of knowledge with its evidence using RegulonDB.
Collapse
Affiliation(s)
- Martín Díaz-Rodríguez
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico
| | - Oscar Lithgow-Serrano
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico; Dalle Molle Institute for Artificial Intelligence Research, IDSIA USI-SUPSI, Polo universitario Lugano-Campus Est, Via la Santa 1, CH-6962 Lugano, Switzerland
| | - Francisco Guadarrama-García
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico
| | - Víctor H Tierrafría
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico
| | - Socorro Gama-Castro
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico
| | - Hilda Solano-Lira
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico
| | - Heladia Salgado
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico
| | - Fabio Rinaldi
- Dalle Molle Institute for Artificial Intelligence Research, IDSIA USI-SUPSI, Polo universitario Lugano-Campus Est, Via la Santa 1, CH-6962 Lugano, Switzerland; Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
| | - Carlos-Francisco Méndez-Cruz
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico.
| | - Julio Collado-Vides
- Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico; Department of Biomedical Engineering, Boston University, 44 Cummington Mall Room 403, 02215 Boston, MA, USA; Center for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain
| |
Collapse
|
9
|
Aravind A, Palollathil A, Rex DAB, Kumar KMK, Vijayakumar M, Shetty R, Codi JAK, Prasad TSK, Raju R. A multi-cellular molecular signaling and functional network map of C-C motif chemokine ligand 18 (CCL18): a chemokine with immunosuppressive and pro-tumor functions. J Cell Commun Signal 2021. [PMID: 34196939 DOI: 10.1007/s12079-021-00633-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 06/23/2021] [Indexed: 12/09/2022] Open
Abstract
The C-C Motif Chemokine Ligand 18 (CCL18) is a beta-chemokine sub-family member with immunomodulatory functions in primates. CCL18-dependent migration and epithelial-to-mesenchymal transition of oral squamous cell carcinoma, squamous cell carcinoma of head and neck, breast cancer, hepatocellular carcinoma, non-small cell lung carcinoma, ovarian cancer, pancreatic ductal carcinoma and bladder cancer cells are well-established. In the tumor niche, tumor-associated macrophages produce CCL18 and its overexpression is correlated with reduced patient survival in multiple cancers. Although multiple receptors including C-C chemokine receptor type 3 (CCR3), type 6 (CCR6), type 8 (CCR8) and G-protein coupled estrogen receptor (GPER1) are reported for CCL18, the Phosphatidylinositol Transfer Protein, Membrane-Associated 3 (PITPNM3) receptor is currently considered as its predominant receptor. Characterization of the molecular events and check points associated with the immunosuppressive and cancer progression support functions induced by CCL18 for their potential towards therapeutic applications is an area of active research. Hence, in this study, we assembled 917 signaling events reported to be induced by CCL18 through their studied receptors in diverse cell types as an integrated knowledgebase for reference, data integration and gene-set enrichment analysis of global transcriptomic and/or proteomics datasets.
Collapse
|
10
|
Holinski A, Burke ML, Morgan SL, McQuilton P, Palagi PM. Biocuration - mapping resources and needs. F1000Res 2020; 9:ELIXIR-1094. [PMID: 33145007 PMCID: PMC7590901 DOI: 10.12688/f1000research.25413.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/25/2020] [Indexed: 03/31/2024] Open
Abstract
Background: Biocuration involves a variety of teams and individuals across the globe. However, they may not self-identify as biocurators, as they may be unaware of biocuration as a career path or because biocuration is only part of their role. The lack of a clear, up-to-date profile of biocuration creates challenges for organisations like ELIXIR, the ISB and GOBLET to systematically support biocurators and for biocurators themselves to develop their own careers. Therefore, the ELIXIR Training Platform launched an Implementation Study in order to i) identify communities of biocurators, ii) map the type of curation work being done, iii) assess biocuration training, and iv) draw a picture of biocuration career development. Methods: To achieve the goals of the study, we carried out a global survey on the nature of biocuration work, the tools and resources that are used, training that has been received and additional training needs. To examine these topics in more detail we ran workshop-based discussions at ISB Biocuration Conference 2019 and the ELIXIR All Hands Meeting 2019. We also had guided conversations with selected people from the EMBL-European Bioinformatics Institute. Results: The study illustrates that biocurators have diverse job titles, are highly skilled, perform a variety of activities and use a wide range of tools and resources. The study emphasises the need for training in programming and coding skills, but also highlights the difficulties curators face in terms of career development and community building. Conclusion: Biocurators themselves, as well as organisations like ELIXIR, GOBLET and ISB must work together towards structural change to overcome these difficulties. In this article we discuss recommendations to ensure that biocuration as a role is visible and valued, thereby helping biocurators to proceed with their career.
Collapse
Affiliation(s)
- Alexandra Holinski
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Melissa L. Burke
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah L. Morgan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Peter McQuilton
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, Oxfordshire, OX1 3QG, UK
| | | |
Collapse
|
11
|
Abstract
Background: Biocuration involves a variety of teams and individuals across the globe. However, they may not self-identify as biocurators, as they may be unaware of biocuration as a career path or because biocuration is only part of their role. The lack of a clear, up-to-date profile of biocuration creates challenges for organisations like ELIXIR, the ISB and GOBLET to systematically support biocurators and for biocurators themselves to develop their own careers. Therefore, the ELIXIR Training Platform launched an Implementation Study in order to i) identify communities of biocurators, ii) map the type of curation work being done, iii) assess biocuration training, and iv) draw a picture of biocuration career development. Methods: To achieve the goals of the study, we carried out a global survey on the nature of biocuration work, the tools and resources that are used, training that has been received and additional training needs. To examine these topics in more detail we ran workshop-based discussions at ISB Biocuration Conference 2019 and the ELIXIR All Hands Meeting 2019. We also had guided conversations with selected people from the EMBL-European Bioinformatics Institute. Results: The study illustrates that biocurators have diverse job titles, are highly skilled, perform a variety of activities and use a wide range of tools and resources. The study emphasises the need for training in programming and coding skills, but also highlights the difficulties curators face in terms of career development and community building. Conclusion: Biocurators themselves, as well as organisations like ELIXIR, GOBLET and ISB must work together towards structural change to overcome these difficulties. In this article we discuss recommendations to ensure that biocuration as a role is visible and valued, thereby helping biocurators to proceed with their career.
Collapse
Affiliation(s)
- Alexandra Holinski
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Melissa L. Burke
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Sarah L. Morgan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, CB10 1SD, UK
| | - Peter McQuilton
- Oxford e-Research Centre, Department of Engineering Science, University of Oxford, Oxford, Oxfordshire, OX1 3QG, UK
| | | |
Collapse
|
12
|
Abstract
Data-intensive science comes with increased risks concerning quality and reliability of data, and while trust in science has traditionally been framed as a matter of scientists being expected to adhere to certain technical and moral norms for behaviour, emerging discourses of open science present openness and transparency as substitutes for established trust mechanisms. By ensuring access to all available information, quality becomes a matter of informed judgement by the users, and trust no longer seems necessary. This strategy does not, however, take into consideration the networks of professionals already enabling data-intensive science by providing high-quality data. In the life sciences, biological data- and knowledge bases managed by expert biocurators have become crucial for data-intensive research. In this paper, I will use the case of biocurators to argue that openness and transparency will not diminish the need for trust in data-intensive science. On the contrary, data-intensive science requires a reconfiguration of existing trust mechanisms in order to include those who take care of and manage scientific data after its production.
Collapse
|
13
|
Bachman JA, Gyori BM, Sorger PK. FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining. BMC Bioinformatics 2018; 19:248. [PMID: 29954318 PMCID: PMC6022344 DOI: 10.1186/s12859-018-2211-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 05/17/2018] [Indexed: 11/29/2022] Open
Abstract
Background For automated reading of scientific publications to extract useful information about molecular mechanisms it is critical that genes, proteins and other entities be correctly associated with uniform identifiers, a process known as named entity linking or “grounding.” Correct grounding is essential for resolving relationships among mined information, curated interaction databases, and biological datasets. The accuracy of this process is largely dependent on the availability of machine-readable resources associating synonyms and abbreviations commonly found in biomedical literature with uniform identifiers. Results In a task involving automated reading of ∼215,000 articles using the REACH event extraction software we found that grounding was disproportionately inaccurate for multi-protein families (e.g., “AKT”) and complexes with multiple subunits (e.g.“NF- κB”). To address this problem we constructed FamPlex, a manually curated resource defining protein families and complexes as they are commonly encountered in biomedical text. In FamPlex the gene-level constituents of families and complexes are defined in a flexible format allowing for multi-level, hierarchical membership. To create FamPlex, text strings corresponding to entities were identified empirically from literature and linked manually to uniform identifiers; these identifiers were also mapped to equivalent entries in multiple related databases. FamPlex also includes curated prefix and suffix patterns that improve named entity recognition and event extraction. Evaluation of REACH extractions on a test corpus of ∼54,000 articles showed that FamPlex significantly increased grounding accuracy for families and complexes (from 15 to 71%). The hierarchical organization of entities in FamPlex also made it possible to integrate otherwise unconnected mechanistic information across families, subfamilies, and individual proteins. Applications of FamPlex to the TRIPS/DRUM reading system and the Biocreative VI Bioentity Normalization Task dataset demonstrated the utility of FamPlex in other settings. Conclusion FamPlex is an effective resource for improving named entity recognition, grounding, and relationship resolution in automated reading of biomedical text. The content in FamPlex is available in both tabular and Open Biomedical Ontology formats at https://github.com/sorgerlab/famplex
under the Creative Commons CC0 license and has been integrated into the TRIPS/DRUM and REACH reading systems.
Collapse
Affiliation(s)
- John A Bachman
- Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Ave, Boston, MA, 02115, USA
| | - Benjamin M Gyori
- Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Ave, Boston, MA, 02115, USA
| | - Peter K Sorger
- Laboratory of Systems Pharmacology, Harvard Medical School, 200 Longwood Ave, Boston, MA, 02115, USA.
| |
Collapse
|
14
|
Abstract
The Gene Ontology Consortium (GOC) produces a wealth of resources widely used throughout the scientific community. In this chapter, we discuss the different ways in which researchers can access the resources of the GOC. We here share details about the mechanics of obtaining GO annotations, both by manually browsing, querying, and downloading data from the GO website, as well as computationally accessing the resources from the command line, including the ability to restrict the data being retrieved to subsets with only certain attributes.
Collapse
|
15
|
Abstract
The specificity of knowledge that Gene Ontology (GO) annotations currently can represent is still restricted by the legacy format of the GO annotation file, a format intentionally designed for simplicity to keep the barriers to entry low and thus encourage initial adoption. Historically, the information that could be captured in a GO annotation was simply the role or location of a gene product, although genetically interacting or binding partners could be specified. While there was no mechanism within the original GO annotation format for capturing additional information about the context of a GO term, such as the target gene of an activity or the location of a molecular function, the long-term vision for the GO Consortium was to provide greater expressivity in its annotations to capture physiologically relevant information.Thus, as a step forwards, the GO Consortium has introduced a new field into the annotation format, annotation extensions, which can be used to capture valuable contextual detail. This provides experimentally verified links between gene products and other physiological information that is crucial for accurate analysis of pathway and network data. This chapter will provide a simple overview of annotation extensions, illustrated with examples of their usage, and explain why they are useful for scientists and bioinformaticians alike.
Collapse
|
16
|
Venkatesan A, Kim JH, Talo F, Ide-Smith M, Gobeill J, Carter J, Batista-Navarro R, Ananiadou S, Ruch P, McEntyre J. SciLite: a platform for displaying text-mined annotations as a means to link research articles with biological data. Wellcome Open Res 2017. [PMID: 28948232 DOI: 10.12688/wellcomeopenres.10210.1] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The tremendous growth in biological data has resulted in an increase in the number of research papers being published. This presents a great challenge for scientists in searching and assimilating facts described in those papers. Particularly, biological databases depend on curators to add highly precise and useful information that are usually extracted by reading research articles. Therefore, there is an urgent need to find ways to improve linking literature to the underlying data, thereby minimising the effort in browsing content and identifying key biological concepts. As part of the development of Europe PMC, we have developed a new platform, SciLite, which integrates text-mined annotations from different sources and overlays those outputs on research articles. The aim is to aid researchers and curators using Europe PMC in finding key concepts more easily and provide links to related resources or tools, bridging the gap between literature and biological data.
Collapse
Affiliation(s)
- Aravind Venkatesan
- Literature Service group, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Jee-Hyub Kim
- Literature Service group, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Francesco Talo
- Literature Service group, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Michele Ide-Smith
- Literature Service group, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Julien Gobeill
- SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Jacob Carter
- National Centre for Text Mining (NaCTeM), Manchester Institute of Biotechnology, Manchester, UK
| | - Riza Batista-Navarro
- National Centre for Text Mining (NaCTeM), Manchester Institute of Biotechnology, Manchester, UK
| | - Sophia Ananiadou
- National Centre for Text Mining (NaCTeM), Manchester Institute of Biotechnology, Manchester, UK
| | - Patrick Ruch
- SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland.,Bibliomics and Text Mining Group (BiTeM), HES-SO, Geneva, Switzerland
| | - Johanna McEntyre
- Literature Service group, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| |
Collapse
|
17
|
Venkatesan A, Kim JH, Talo F, Ide-Smith M, Gobeill J, Carter J, Batista-Navarro R, Ananiadou S, Ruch P, McEntyre J. SciLite: a platform for displaying text-mined annotations as a means to link research articles with biological data. Wellcome Open Res 2017; 1:25. [PMID: 28948232 PMCID: PMC5527546 DOI: 10.12688/wellcomeopenres.10210.2] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/06/2017] [Indexed: 12/31/2022] Open
Abstract
The tremendous growth in biological data has resulted in an increase in the number of research papers being published. This presents a great challenge for scientists in searching and assimilating facts described in those papers. Particularly, biological databases depend on curators to add highly precise and useful information that are usually extracted by reading research articles. Therefore, there is an urgent need to find ways to improve linking literature to the underlying data, thereby minimising the effort in browsing content and identifying key biological concepts. As part of the development of Europe PMC, we have developed a new platform, SciLite, which integrates text-mined annotations from different sources and overlays those outputs on research articles. The aim is to aid researchers and curators using Europe PMC in finding key concepts more easily and provide links to related resources or tools, bridging the gap between literature and biological data.
Collapse
Affiliation(s)
- Aravind Venkatesan
- Literature Service group, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Jee-Hyub Kim
- Literature Service group, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Francesco Talo
- Literature Service group, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Michele Ide-Smith
- Literature Service group, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Julien Gobeill
- SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland
| | - Jacob Carter
- National Centre for Text Mining (NaCTeM), Manchester Institute of Biotechnology, Manchester, UK
| | - Riza Batista-Navarro
- National Centre for Text Mining (NaCTeM), Manchester Institute of Biotechnology, Manchester, UK
| | - Sophia Ananiadou
- National Centre for Text Mining (NaCTeM), Manchester Institute of Biotechnology, Manchester, UK
| | - Patrick Ruch
- SIB Text Mining, Swiss Institute of Bioinformatics, Geneva, Switzerland.,Bibliomics and Text Mining Group (BiTeM), HES-SO, Geneva, Switzerland
| | - Johanna McEntyre
- Literature Service group, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| |
Collapse
|
18
|
Abstract
The Evidence and Conclusion Ontology (ECO) is a community resource for describing the various types of evidence that are generated during the course of a scientific study and which are typically used to support assertions made by researchers. ECO describes multiple evidence types, including evidence resulting from experimental (i.e., wet lab) techniques, evidence arising from computational methods, statements made by authors (whether or not supported by evidence), and inferences drawn by researchers curating the literature. In addition to summarizing the evidence that supports a particular assertion, ECO also offers a means to document whether a computer or a human performed the process of making the annotation. Incorporating ECO into an annotation system makes it possible to leverage the structure of the ontology such that associated data can be grouped hierarchically, users can select data associated with particular evidence types, and quality control pipelines can be optimized. Today, over 30 resources, including the Gene Ontology, use the Evidence and Conclusion Ontology to represent both evidence and how annotations are made.
Collapse
Affiliation(s)
- Marcus C Chibucos
- Department of Microbiology and Immunology, Institute for Genome Sciences, University of Maryland School of Medicine, 801 W. Baltimore Street, Baltimore, MD, 21201, USA.
| | - Deborah A Siegele
- Department of Biology, Texas A&M University, College Station, TX, 77843, USA
| | - James C Hu
- Department of Biochemistry and Biophysics, Texas A&M University and Texas AgriLife Research, College Station, TX, 77843, USA
| | - Michelle Giglio
- Department of Medicine, Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| |
Collapse
|
19
|
Abstract
Collaborations between the scientific community and members of the Gene Ontology (GO) Consortium have led to an increase in the number and specificity of GO terms, as well as increasing the number of GO annotations. A variety of approaches have been taken to encourage research scientists to contribute to the GO, but the success of these approaches has been variable. This chapter reviews both the successes and failures of engaging the scientific community in GO development and annotation, as well as, providing motivation and advice to encourage individual researchers to contribute to GO.
Collapse
Affiliation(s)
- Ruth C Lovering
- Functional Gene Annotation Initiative, Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, University College London, 5 University Street, London, WC1E 6JF, UK.
| |
Collapse
|
20
|
Abstract
The Gene Ontology (GO) is a framework designed to represent biological knowledge about gene products' biological roles and the cellular location in which they act. Biocuration is a complex process: the body of scientific literature is large and selection of appropriate GO terms can be challenging. Both these issues are compounded by the fact that our understanding of biology is still incomplete; hence it is important to appreciate that GO is inherently an evolving model. In this chapter, we describe how biocurators create GO annotations from experimental findings from research articles. We describe the current best practices for high-quality literature curation and how GO curators succeed in modeling biology using a relatively simple framework. We also highlight a number of difficulties when translating experimental assays into GO annotations.
Collapse
Affiliation(s)
- Sylvain Poux
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211, Geneva 4, Switzerland
| | - Pascale Gaudet
- CALIPHO group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, 1 rue Michel Servet, 1211, Geneva 4, Switzerland. .,Department of Human Protein Sciences, Faculty of Medicine, University of Geneva, Geneva, Switzerland.
| |
Collapse
|
21
|
Verspoor KM, Heo GE, Kang KY, Song M. Establishing a baseline for literature mining human genetic variants and their relationships to disease cohorts. BMC Med Inform Decis Mak 2016; 16 Suppl 1:68. [PMID: 27454860 PMCID: PMC4959367 DOI: 10.1186/s12911-016-0294-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Variome corpus, a small collection of published articles about inherited colorectal cancer, includes annotations of 11 entity types and 13 relation types related to the curation of the relationship between genetic variation and disease. Due to the richness of these annotations, the corpus provides a good testbed for evaluation of biomedical literature information extraction systems. METHODS In this paper, we focus on assessing performance on extracting the relations in the corpus, using gold standard entities as a starting point, to establish a baseline for extraction of relations important for extraction of genetic variant information from the literature. We test the application of the Public Knowledge Discovery Engine for Java (PKDE4J) system, a natural language processing system designed for information extraction of entities and relations in text, on the relation extraction task using this corpus. RESULTS For the relations which are attested at least 100 times in the Variome corpus, we realise a performance ranging from 0.78-0.84 Precision-weighted F-score, depending on the relation. We find that the PKDE4J system adapted straightforwardly to the range of relation types represented in the corpus; some extensions to the original methodology were required to adapt to the multi-relational classification context. The results are competitive with state-of-the-art relation extraction performance on more heavily studied corpora, although the analysis shows that the Recall of a co-occurrence baseline outweighs the benefit of improved Precision for many relations, indicating the value of simple semantic constraints on relations. CONCLUSIONS This work represents the first attempt to apply relation extraction methods to the Variome corpus. The results demonstrate that automated methods have good potential to structure the information expressed in the published literature related to genetic variants, connecting mutations to genes, diseases, and patient cohorts. Further development of such approaches will facilitate more efficient biocuration of genetic variant information into structured databases, leveraging the knowledge embedded in the vast publication literature.
Collapse
Affiliation(s)
- Karin M Verspoor
- Department of Computing and Information Systems, The University of Melbourne, Melbourne, Australia
| | - Go Eun Heo
- Department of Library and Information Science, Yonsei University, Seoul, Korea
| | - Keun Young Kang
- Department of Library and Information Science, Yonsei University, Seoul, Korea
| | - Min Song
- Department of Library and Information Science, Yonsei University, Seoul, Korea.
| |
Collapse
|
22
|
Abstract
Biocuration involves adding value to biomedical data by the processes of standardization, quality control and information transferring (also known as data annotation). It enhances data interoperability and consistency, and is critical in translating biomedical data into scientific discovery. Although China is becoming a leading scientific data producer, biocuration is still very new to the Chinese biomedical data community. In fact, there currently lacks an equivalent acknowledged word in Chinese for the word “curation”. Here we propose its Chinese translation as “审编” (Pinyin: shěn biān), based on its implied meanings taken by biomedical data community. The 8th International Biocuration Conference to be held in China (http://biocuration2015.tilsi.org) next year bears the potential to raise the general awareness in China of the significant role of biocuration in scientific discovery. However, challenges are ahead in its implementation.
Collapse
Affiliation(s)
- Zhang Zhang
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
| | - Weimin Zhu
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, Beijing 100730, China; Taicang Institute of Life Sciences Information, Taicang 215400, China
| | - Jingchu Luo
- College of Life Sciences and Center for Bioinformatics, Peking University, Beijing 100871, China
| |
Collapse
|