Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Biocuration: Distilling data into knowledge. PLoS Biol 2018;16:e2002846. [PMID: 29659566 PMCID: PMC5919672 DOI: 10.1371/journal.pbio.2002846] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Revised: 04/26/2018] [Indexed: 11/18/2022] Open

For:	Biocuration: Distilling data into knowledge. PLoS Biol 2018;16:e2002846. [PMID: 29659566 PMCID: PMC5919672 DOI: 10.1371/journal.pbio.2002846] [Citation(s) in RCA: 59] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Revised: 04/26/2018] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Lin D, McAuliffe M, Pruitt KD, Gururaj A, Melchior C, Schmitt C, Wright SN. Biomedical Data Repository Concepts and Management Principles. Sci Data 2024;11:622. [PMID: 38871749 PMCID: PMC11176378 DOI: 10.1038/s41597-024-03449-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 05/31/2024] [Indexed: 06/15/2024] Open

Novoa J, López-Ibáñez J, Chagoyen M, Ranea JAG, Pazos F. CoMentG: comprehensive retrieval of generic relationships between biomedical concepts from the scientific literature. Database (Oxford) 2024;2024:baae025. [PMID: 38564426 PMCID: PMC10986793 DOI: 10.1093/database/baae025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 03/01/2024] [Accepted: 03/15/2024] [Indexed: 04/04/2024]

Zhang B, Chen L, Xiao S, Dang C, Wang F, Fang Q, Ye X, Stanley DW, Ye G. iSalivaomicDB: A comprehensive saliva omics database for insects. INSECT SCIENCE 2024. [PMID: 38450904 DOI: 10.1111/1744-7917.13349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 01/26/2024] [Accepted: 02/05/2024] [Indexed: 03/08/2024]

Affiliation(s)

Bo Zhang State Key Laboratory of Rice Biology and Breeding, Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insect Pests & Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou, China
Longfei Chen State Key Laboratory of Rice Biology and Breeding, Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insect Pests & Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou, China
Shan Xiao State Key Laboratory of Rice Biology and Breeding, Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insect Pests & Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou, China
Cong Dang College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, China
Fang Wang State Key Laboratory of Rice Biology and Breeding, Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insect Pests & Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou, China
Qi Fang State Key Laboratory of Rice Biology and Breeding, Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insect Pests & Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou, China
Xinhai Ye State Key Laboratory of Rice Biology and Breeding, Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insect Pests & Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou, China
David W Stanley Biological Control of Insects Research Laboratory USDA/Agricultural Research Service, Columbia MO, USA
Gongyin Ye State Key Laboratory of Rice Biology and Breeding, Ministry of Agricultural and Rural Affairs Key Laboratory of Molecular Biology of Crop Pathogens and Insect Pests & Key Laboratory of Biology of Crop Pathogens and Insects of Zhejiang Province, Zhejiang University, Hangzhou, China

Collapse

Ma L, Zou D, Liu L, Shireen H, Abbasi AA, Bateman A, Xiao J, Zhao W, Bao Y, Zhang Z. Database Commons: A Catalog of Worldwide Biological Databases. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023;21:1054-1058. [PMID: 36572336 PMCID: PMC10928426 DOI: 10.1016/j.gpb.2022.12.004] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 12/13/2022] [Accepted: 12/14/2022] [Indexed: 12/25/2022]

Affiliation(s)

Lina Ma National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.
Dong Zou National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
Lin Liu National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China
Huma Shireen National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
Amir A Abbasi National Center for Bioinformatics, Programme of Comparative and Evolutionary Genomics, Faculty of Biological Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan
Alex Bateman European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge CB10 1SD, United Kingdom
Jingfa Xiao National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
Wenming Zhao National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
Yiming Bao National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China
Zhang Zhang National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing 100101, China; University of Chinese Academy of Sciences, Beijing 100049, China.

Collapse

Bowler-Barnett EH, Fan J, Luo J, Magrane M, Martin MJ, Orchard S. UniProt and Mass Spectrometry-Based Proteomics-A 2-Way Working Relationship. Mol Cell Proteomics 2023;22:100591. [PMID: 37301379 PMCID: PMC10404557 DOI: 10.1016/j.mcpro.2023.100591] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 05/20/2023] [Accepted: 06/07/2023] [Indexed: 06/12/2023] Open

Cuzick A, Seager J, Wood V, Urban M, Rutherford K, Hammond-Kosack KE. A framework for community curation of interspecies interactions literature. eLife 2023;12:e84658. [PMID: 37401199 DOI: 10.7554/elife.84658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 05/18/2023] [Indexed: 07/05/2023] Open

Picot C, Ajiji P, Jurek L, Nourredine M, Massardier J, Peron A, Cucherat M, Cottin J. Risk of drug use during pregnancy: master protocol for living systematic reviews and meta-analyses performed in the metaPreg project. Syst Rev 2023;12:101. [PMID: 37344917 DOI: 10.1186/s13643-023-02256-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 05/12/2023] [Indexed: 06/23/2023] Open

Pérez-Pérez M, Ferreira T, Igrejas G, Fdez-Riverola F. A novel gluten knowledge base of potential biomedical and health-related interactions extracted from the literature: using machine learning and graph analysis methodologies to reconstruct the bibliome. J Biomed Inform 2023:104398. [PMID: 37230405 DOI: 10.1016/j.jbi.2023.104398] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 05/12/2023] [Accepted: 05/15/2023] [Indexed: 05/27/2023]

Abstract

BACKGROUND

In return for their nutritional properties and broad availability, cereal crops have been associated with different alimentary disorders and symptoms, with the majority of the responsibility being attributed to gluten. Therefore, the research of gluten-related literature data continues to be produced at ever-growing rates, driven in part by the recent exploratory studies that link gluten to non-traditional diseases and the popularity of gluten-free diets, making it increasingly difficult to access and analyse practical and structured information. In this sense, the accelerated discovery of novel advances in diagnosis and treatment, as well as exploratory studies, produce a favourable scenario for disinformation and misinformation.

OBJECTIVES

Aligned with, the European Union strategy "Delivering on EU Food Safety and Nutrition in 2050" which emphasizes the inextricable links between imbalanced diets, the increased exposure to unreliable sources of information and misleading information, and the increased dependency on reliable sources of information; this paper presents GlutKNOIS, a public and interactive literature-based database that reconstructs and represents the experimental biomedical knowledge extracted from the gluten-related literature. The developed platform includes different external database knowledge, bibliometrics statistics and social media discussion to propose a novel and enhanced way to search, visualise and analyse potential biomedical and health-related interactions in relation to the gluten domain.

METHODS

For this purpose, the presented study applies a semi-supervised curation workflow that combines natural language processing techniques, machine learning algorithms, ontology-based normalization and integration approaches, named entity recognition methods, and graph knowledge reconstruction methodologies to process, classify, represent and analyse the experimental findings contained in the literature, which is also complemented by data from the social discussion.

RESULTS

and Conclusions: In this sense, 5,814 documents were manually annotated and 7,424 were fully automatically processed to reconstruct the first online gluten-related knowledge database of evidenced health-related interactions that produce health or metabolic changes based on the literature. In addition, the automatic processing of the literature combined with the knowledge representation methodologies proposed has the potential to assist in the revision and analysis of years of gluten research. The reconstructed knowledge base is public and accessible at https://sing-group.org/glutknois/.

Collapse

Launer-Wachs S, Taub-Tabib H, Tokarev Madem J, Bar-Natan O, Goldberg Y, Shamay Y. From Centralized to Ad-Hoc Knowledge Base Construction for Hypotheses Generation. J Biomed Inform 2023;142:104383. [PMID: 37196989 DOI: 10.1016/j.jbi.2023.104383] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 04/27/2023] [Accepted: 05/03/2023] [Indexed: 05/19/2023]

Abstract

OBJECTIVE

To demonstrate and develop an approach enabling individual researchers or small teams to create their own ad-hoc, lightweight knowledge bases tailored for specialized scientific interests, using text-mining over scientific literature, and demonstrate the effectiveness of these knowledge bases in hypothesis generation and literature-based discovery (LBD).

METHODS

We propose a lightweight process using an extractive search framework to create ad-hoc knowledge bases, which require minimal training and no background in bio-curation or computer science. These knowledge bases are particularly effective for LBD and hypothesis generation using Swanson's ABC method. The personalized nature of the knowledge bases allows for a somewhat higher level of noise than "public facing" ones, as researchers are expected to have prior domain experience to separate signal from noise. Fact verification is shifted from exhaustive verification of the knowledge base to post-hoc verification of specific entries of interest, allowing researchers to assess the correctness of relevant knowledge base entries by considering the paragraphs in which the facts were introduced.

RESULTS

We demonstrate the methodology by constructing several knowledge bases of different kinds: three knowledge bases that support lab-internal hypothesis generation: Drug Delivery to Ovarian Tumors (DDOT); Tissue Engineering and Regeneration; Challenges in Cancer Research; and an additional comprehensive, accurate knowledge base designated as a public resource for the wider community on the topic of Cell Specific Drug Delivery (CSDD). In each case, we show the design and construction process, along with relevant visualizations for data exploration, and hypothesis generation. For CSDD and DDOT we also show meta-analysis, human evaluation, and in vitro experimental evaluation.

CONCLUSION

Our approach enables researchers to create personalized, lightweight knowledge bases for specialized scientific interests, effectively facilitating hypothesis generation and literature-based discovery (LBD). By shifting fact verification efforts to post-hoc verification of specific entries, researchers can focus on exploring and generating hypotheses based on their expertise. The constructed knowledge bases demonstrate the versatility and adaptability of our approach to versatile research interests. The web-based platform, available at https://spike-kbc.apps.allenai.org , provides researchers with a valuable tool for rapid construction of knowledge bases tailored to their needs.

Collapse

Mayer C, Vogt A, Uslu T, Scalzitti N, Chennen K, Poch O, Thompson JD. CeGAL: Redefining a Widespread Fungal-Specific Transcription Factor Family Using an In Silico Error-Tracking Approach. J Fungi (Basel) 2023;9:jof9040424. [PMID: 37108879 PMCID: PMC10141177 DOI: 10.3390/jof9040424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/21/2023] [Accepted: 03/28/2023] [Indexed: 03/31/2023] Open

Shaw F, Minotto A, McTaggart S, Providence A, Harrison P, Paupério J, Rajan J, Burgin J, Cochrane G, Kilias E, Lawniczak M, Davey R. Managing sample metadata for biodiversity: considerations from the Darwin Tree of Life project. Wellcome Open Res 2022. [DOI: 10.12688/wellcomeopenres.18499.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract Large-scale reference genome sequencing projects for all of biodiversity are underway and common standards have been in place for some years to enable the understanding and sharing of sequence data. However, the metadata that describes the collection, processing and management of samples, and link to the associated sequencing and genome data, are not yet adequately developed and standardised for these projects. At the time of writing, the Darwin Tree of Life (DToL) Project is over two years into its ten-year ambition to sequence all described eukaryotic species in Britain and Ireland. We have sought consensus from a wide range of scientists across taxonomic domains to determine the minimal set of metadata that we collectively deem as critically important to accompany each sequenced specimen. These metadata are made available throughout the subsequent laboratory processes, and once collected, need to be adequately managed to fulfil the requirements of good data management practice. Due to the size and scale of management required, software tools are needed. These tools need to implement rigorous development pathways and change management procedures to ensure that effective research data management of key project and sample metadata is maintained. Tracking of sample properties through the sequencing process is handled by Lab Information Management Systems (LIMS), so publication of the sequenced data is achieved via technical integration of LIMS and data management tools. Discussions with community members on how metadata standards need to be managed within large-scale programmes is a priority in the planning process. Here we report on the standards we developed with respect to a robust and reusable mechanism of metadata collection, in the hopes that other projects forthcoming or underway will adopt these practices for metadata. Collapse

Chen Q, Allot A, Leaman R, Wei CH, Aghaarabi E, Guerrerio J, Xu L, Lu Z. LitCovid in 2022: an information resource for the COVID-19 literature. Nucleic Acids Res 2022;51:D1512-D1518. [PMID: 36350613 PMCID: PMC9825538 DOI: 10.1093/nar/gkac1005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/11/2022] [Accepted: 10/19/2022] [Indexed: 11/11/2022] Open

A group theoretic approach to model comparison with simplicial representations. J Math Biol 2022;85:48. [PMID: 36209430 PMCID: PMC9548478 DOI: 10.1007/s00285-022-01807-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Revised: 05/31/2022] [Accepted: 07/25/2022] [Indexed: 10/28/2022]

Abstract AbstractThe complexity of biological systems, and the increasingly large amount of associated experimental data, necessitates that we develop mathematical models to further our understanding of these systems. Because biological systems are generally not well understood, most mathematical models of these systems are based on experimental data, resulting in a seemingly heterogeneous collection of models that ostensibly represent the same system. To understand the system we therefore need to understand how the different models are related to each other, with a view to obtaining a unified mathematical description. This goal is complicated by the fact that a number of distinct mathematical formalisms may be employed to represent the same system, making direct comparison of the models very difficult. A methodology for comparing mathematical models based on their underlying conceptual structure is therefore required. In previous work we developed an appropriate framework for model comparison where we represent models, specifically the conceptual structure of the models, as labelled simplicial complexes and compare them with the two general methodologies of comparison by distance and comparison by equivalence. In this article we continue the development of our model comparison methodology in two directions. First, we present a rigorous and automatable methodology for the core process of comparison by equivalence, namely determining the vertices in a simplicial representation, corresponding to model components, that are conceptually related and the identification of these vertices via simplicial operations. Our methodology is based on considerations of vertex symmetry in the simplicial representation, for which we develop the required mathematical theory of group actions on simplicial complexes. This methodology greatly simplifies and expedites the process of determining model equivalence. Second, we provide an alternative mathematical framework for our model-comparison methodology by representing models as groups, which allows for the direct application of group-theoretic techniques within our model-comparison methodology. Collapse

Schuler R, Bugacov A, Hacia J, Ho T, Iwata J, Pearlman L, Samuels B, Williams C, Zhao Z, Kesselman C, Chai Y. FaceBase: A Community-Driven Hub for Data-Intensive Research. J Dent Res 2022;101:1289-1298. [PMID: 35912790 PMCID: PMC9516628 DOI: 10.1177/00220345221107905] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open

“KRiShI”: a manually curated knowledgebase on rice sheath blight disease. Funct Integr Genomics 2022;22:1403-1410. [DOI: 10.1007/s10142-022-00899-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 06/28/2022] [Accepted: 09/04/2022] [Indexed: 11/04/2022]

Xu Q, Liu Y, Hu J, Duan X, Song N, Zhou J, Zhai J, Su J, Liu S, Chen F, Zheng W, Guo Z, Li H, Zhou Q, Niu B. OncoPubMiner: a platform for mining oncology publications. Brief Bioinform 2022;23:6691792. [PMID: 36058206 DOI: 10.1093/bib/bbac383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/12/2022] Open

Affiliation(s)

Quan Xu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Yueyue Liu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,ChosenMed Gene Technology Co. Ltd., Nanjing, China
Jifang Hu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100190, China
Xiaohong Duan ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,ChosenMed Gene Technology Co. Ltd., Nanjing, China
Niuben Song ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Jiale Zhou ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Jincheng Zhai ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Junyan Su ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Siyao Liu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Fan Chen ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,ChosenMed Gene Technology Co. Ltd., Nanjing, China
Wei Zheng The Department of Nephrology and Hypertension Medicine, Beijing Electric Power Hospital, Beijing 100073, China
Zhongjia Guo ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Hexiang Li ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China
Qiming Zhou ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,ChosenMed Gene Technology Co. Ltd., Nanjing, China
Beifang Niu ChosenMed Technology (Beijing) Company Limited, Jinghai Industrial Park, Economic and Technological Development Area, Beijing 100176, China.,Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China.,University of Chinese Academy of Sciences, Beijing 100190, China

Collapse

Chen Q, Du J, Allot A, Lu Z. LitMC-BERT: Transformer-Based Multi-Label Classification of Biomedical Literature With An Application on COVID-19 Literature Curation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:2584-2595. [PMID: 35536809 PMCID: PMC9647722 DOI: 10.1109/tcbb.2022.3173562] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Revised: 04/19/2022] [Accepted: 04/22/2022] [Indexed: 05/20/2023]

Chen Q, Allot A, Leaman R, Islamaj R, Du J, Fang L, Wang K, Xu S, Zhang Y, Bagherzadeh P, Bergler S, Bhatnagar A, Bhavsar N, Chang YC, Lin SJ, Tang W, Zhang H, Tavchioski I, Pollak S, Tian S, Zhang J, Otmakhova Y, Yepes AJ, Dong H, Wu H, Dufour R, Labrak Y, Chatterjee N, Tandon K, Laleye FAA, Rakotoson L, Chersoni E, Gu J, Friedrich A, Pujari SC, Chizhikova M, Sivadasan N, VG S, Lu Z. Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. Database (Oxford) 2022;2022:baac069. [PMID: 36043400 PMCID: PMC9428574 DOI: 10.1093/database/baac069] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 08/02/2022] [Accepted: 08/13/2022] [Indexed: 05/03/2023]

Abstract

The coronavirus disease 2019 (COVID-19) pandemic has been severely impacting global society since December 2019. The related findings such as vaccine and drug development have been reported in biomedical literature-at a rate of about 10 000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200 000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g. Diagnosis and Treatment) to the articles in LitCovid. The annotated topics have been widely used for navigating the COVID literature, rapidly locating articles of interest and other downstream studies. However, annotating the topics has been the bottleneck of manual curation. Despite the continuing advances in biomedical text-mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset-consisting of over 30 000 articles with manually reviewed topics-was created for training and testing. It is one of the largest multi-label classification datasets in biomedical scientific literature. Nineteen teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181 and 0.9394 for macro-F1-score, micro-F1-score and instance-based F1-score, respectively. Notably, these scores are substantially higher (e.g. 12%, higher for macro F1-score) than the corresponding scores of the state-of-art multi-label classification method. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/.

Collapse

Affiliation(s)

Qingyu Chen National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
Alexis Allot National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
Robert Leaman National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
Rezarta Islamaj National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA
Jingcheng Du School of Biomedical Informatics, UT Health, TX, Houston 77030, USA
Li Fang Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
Kai Wang Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
Shuo Xu College of Economics and Management, Beijing University of Technology, Beijing, QC, China
Yuefu Zhang College of Economics and Management, Beijing University of Technology, Beijing, QC, China
Parsa Bagherzadeh CLaC Labs, Concordia University, Montreal, Canada
Sabine Bergler CLaC Labs, Concordia University, Montreal, Canada
Aakash Bhatnagar Navrachana University, Vadodara, India
Nidhir Bhavsar Navrachana University, Vadodara, India
Yung-Chun Chang Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
Sheng-Jie Lin Graduate Institute of Data Science, Taipei Medical University, Taipei, Taiwan
Wentai Tang College of Computer Science and Technology, Dalian University of Technology, Dalian, China
Hongtong Zhang College of Computer Science and Technology, Dalian University of Technology, Dalian, China
Ilija Tavchioski Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia Jožef Stefan Institute, Ljubljana, Slovenia
Senja Pollak Jožef Stefan Institute, Ljubljana, Slovenia
Shubo Tian Department of Statistics, Florida State University, Tallahassee, FL, USA
Jinfeng Zhang Department of Statistics, Florida State University, Tallahassee, FL, USA
Yulia Otmakhova School of Computing and Information Systems, University of Melbourne, Melbourne, AU-VIC, Australia
Antonio Jimeno Yepes School of Computing Technologies, RMIT University, Melbourne, AU-VIC, Australia
Hang Dong Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, UK
Honghan Wu Institute of Health Informatics, University College London, London, UK
Richard Dufour LS2N, Nantes University, Nantes, France
Yanis Labrak LIA, Avignon University, Avignon, France
Niladri Chatterjee Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, India
Kushagri Tandon Department of Mathematics, Indian Institute of Technology Delhi, New Delhi, India
Fréjus A A Laleye Opscidia, Paris, France
Loïc Rakotoson Opscidia, Paris, France
Emmanuele Chersoni Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China
Jinghang Gu Department of Chinese and Bilingual Studies, The Hong Kong Polytechnic University, Hong Kong, China
Annemarie Friedrich Bosch Center for Artificial Intelligence, Renningen, Germany
Subhash Chandra Pujari Institute of Computer Science, Heidelberg University, Heidelberg, Germany Bosch Center for Artificial Intelligence, Renningen, Germany
Mariia Chizhikova SINAI Group, Department of Computer Science, Advanced Studies Center in ICT (CEATIC), Universidad de Jaén, Jaén, Spain
Naveen Sivadasan TCS Research, Life Sciences, Hyderabad, India
Saipradeep VG TCS Research, Life Sciences, Hyderabad, India
Zhiyong Lu National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, MD, Bethesda 20892, USA

Collapse

Raboudi A, Allanic M, Balvay D, Hervé PY, Viel T, Yoganathan T, Certain A, Hilbey J, Charlet J, Durupt A, Boutinaud P, Eynard B, Tavitian B. The BMS-LM ontology for biomedical data reporting throughout the lifecycle of a research study: From data model to ontology. J Biomed Inform 2022;127:104007. [DOI: 10.1016/j.jbi.2022.104007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Revised: 12/24/2021] [Accepted: 01/28/2022] [Indexed: 11/16/2022]

Nadendla S, Jackson R, Munro J, Quaglia F, Mészáros B, Olley D, Hobbs ET, Goralski SM, Chibucos M, Mungall CJ, Tosatto SCE, Erill I, Giglio MG. ECO: the Evidence and Conclusion Ontology, an update for 2022. Nucleic Acids Res 2022;50:D1515-D1521. [PMID: 34986598 PMCID: PMC8728134 DOI: 10.1093/nar/gkab1025] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 10/12/2021] [Accepted: 10/18/2021] [Indexed: 11/12/2022] Open

Charles WM, Delgado BM. Health Datasets as Assets: Blockchain-Based Valuation and Transaction Methods. BLOCKCHAIN IN HEALTHCARE TODAY 2022;5:185. [PMID: 36779021 PMCID: PMC9907414 DOI: 10.30953/bhty.v5.185] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 12/19/2021] [Accepted: 12/21/2021] [Indexed: 05/13/2023]

Fitzpatrick R, Stefan MI. Validation Through Collaboration: Encouraging Team Efforts to Ensure Internal and External Validity of Computational Models of Biochemical Pathways. Neuroinformatics 2022;20:277-284. [PMID: 35543917 PMCID: PMC9537119 DOI: 10.1007/s12021-022-09584-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/17/2022] [Indexed: 01/09/2023]

Chen Q, Rankine A, Peng Y, Aghaarabi E, Lu Z. Benchmarking Effectiveness and Efficiency of Deep Learning Models for Semantic Textual Similarity in the Clinical Domain: Validation Study. JMIR Med Inform 2021;9:e27386. [PMID: 34967748 PMCID: PMC8759018 DOI: 10.2196/27386] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 08/06/2021] [Accepted: 08/06/2021] [Indexed: 01/23/2023] Open

Abstract

Background

Semantic textual similarity (STS) measures the degree of relatedness between sentence pairs. The Open Health Natural Language Processing (OHNLP) Consortium released an expertly annotated STS data set and called for the National Natural Language Processing Clinical Challenges. This work describes our entry, an ensemble model that leverages a range of deep learning (DL) models. Our team from the National Library of Medicine obtained a Pearson correlation of 0.8967 in an official test set during 2019 National Natural Language Processing Clinical Challenges/Open Health Natural Language Processing shared task and achieved a second rank.

Objective

Although our models strongly correlate with manual annotations, annotator-level correlation was only moderate (weighted Cohen κ=0.60). We are cautious of the potential use of DL models in production systems and argue that it is more critical to evaluate the models in-depth, especially those with extremely high correlations. In this study, we benchmark the effectiveness and efficiency of top-ranked DL models. We quantify their robustness and inference times to validate their usefulness in real-time applications.

Methods

We benchmarked five DL models, which are the top-ranked systems for STS tasks: Convolutional Neural Network, BioSentVec, BioBERT, BlueBERT, and ClinicalBERT. We evaluated a random forest model as an additional baseline. For each model, we repeated the experiment 10 times, using the official training and testing sets. We reported 95% CI of the Wilcoxon rank-sum test on the average Pearson correlation (official evaluation metric) and running time. We further evaluated Spearman correlation, R², and mean squared error as additional measures.

Results

Using only the official training set, all models obtained highly effective results. BioSentVec and BioBERT achieved the highest average Pearson correlations (0.8497 and 0.8481, respectively). BioSentVec also had the highest results in 3 of 4 effectiveness measures, followed by BioBERT. However, their robustness to sentence pairs of different similarity levels varies significantly. A particular observation is that BERT models made the most errors (a mean squared error of over 2.5) on highly similar sentence pairs. They cannot capture highly similar sentence pairs effectively when they have different negation terms or word orders. In addition, time efficiency is dramatically different from the effectiveness results. On average, the BERT models were approximately 20 times and 50 times slower than the Convolutional Neural Network and BioSentVec models, respectively. This results in challenges for real-time applications.

Conclusions

Despite the excitement of further improving Pearson correlations in this data set, our results highlight that evaluations of the effectiveness and efficiency of STS models are critical. In future, we suggest more evaluations on the generalization capability and user-level testing of the models. We call for community efforts to create more biomedical and clinical STS data sets from different perspectives to reflect the multifaceted notion of sentence-relatedness.

Collapse

Kuiper M, Bonello J, Fernández-Breis JT, Bucher P, Futschik ME, Gaudet P, Kulakovskiy IV, Licata L, Logie C, Lovering RC, Makeev VJ, Orchard S, Panni S, Perfetto L, Sant D, Schulz S, Zerbino DR, Lægreid A. The Gene Regulation Knowledge Commons: The action area of GREEKC. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2021;1865:194768. [PMID: 34757206 DOI: 10.1016/j.bbagrm.2021.194768] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 10/18/2021] [Accepted: 10/20/2021] [Indexed: 02/08/2023]

Affiliation(s)

Martin Kuiper Systems Biology Group, Department of Biology, Norwegian University of Science and Technology, Trondheim, Norway.
Joseph Bonello Faculty of Information & Communication Technology, University of Malta, Msida, Malta
Jesualdo Tomás Fernández-Breis Departamento de Informática y Sistemas, Universidad de Murcia, IMIB-Arrixaca, CP 30100 Murcia, Spain
Philipp Bucher Swiss Institute of Bioinformatics, Quartier Sorge, Bâtiment Amphipôle, 1015 Lausanne, Switzerland
Matthias E Futschik Systems Biology and Bioinformatics Laboratory (SysBioLab), Centre of Marine Sciences (CCMAR), University of Algarve, 8005-139 Faro, Portugal
Pascale Gaudet SIB Swiss Institute of Bioinformatics, 1 Rue Michel-Servet, 1204 Geneva, Switzerland
Ivan V Kulakovskiy Institute of Protein Research, Russian Academy of Sciences, Institutskaya 4, 142290 Pushchino, Russia
Luana Licata Department of Biology, University of Rome Tor Vergata, Rome, Italy
Colin Logie Department of Molecular Biology, Faculty of Science, Radboud University, PO Box 9101, Nijmegen 6500HG, the Netherlands
Ruth C Lovering Functional Gene Annotation, Pre-clinical and Fundamental Science, Institute of Cardiovascular Science, University College London, 5 University Street, London WC1E 6JF, UK
Vsevolod J Makeev Vavilov Institute of General Genetics, Russian Academy of Sciences, Gubkina 3, 119991 Moscow, Russia
Sandra Orchard European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
Simona Panni Department DIBEST, University of Calabria, Rende, Italy
Livia Perfetto Fondazione Human Technopole, Department of Biology, Via Cristina Belgioioso, 171, 20157 Milan, Italy
David Sant Department of Biomedical Informatics, University of Utah, 421 Wakara Way #140, Salt Lake City, UT 84108, United States
Stefan Schulz Institute of Medical Informatics, Statistics and Documentation, Medical University of Graz, Auenbruggerpl. 2, Graz, Austria
Daniel R Zerbino European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Astrid Lægreid Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, 7491 Trondheim, Norway

Collapse

Pourreza Shahri M, Kahanda I. Deep semi-supervised learning ensemble framework for classifying co-mentions of human proteins and phenotypes. BMC Bioinformatics 2021;22:500. [PMID: 34656098 PMCID: PMC8520253 DOI: 10.1186/s12859-021-04421-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 10/04/2021] [Indexed: 11/13/2022] Open

Abstract

Background

Identifying human protein-phenotype relationships has attracted researchers in bioinformatics and biomedical natural language processing due to its importance in uncovering rare and complex diseases. Since experimental validation of protein-phenotype associations is prohibitive, automated tools capable of accurately extracting these associations from the biomedical text are in high demand. However, while the manual annotation of protein-phenotype co-mentions required for training such models is highly resource-consuming, extracting millions of unlabeled co-mentions is straightforward.

Results

In this study, we propose a novel deep semi-supervised ensemble framework that combines deep neural networks, semi-supervised, and ensemble learning for classifying human protein-phenotype co-mentions with the help of unlabeled data. This framework allows the ability to incorporate an extensive collection of unlabeled sentence-level co-mentions of human proteins and phenotypes with a small labeled dataset to enhance overall performance. We develop PPPredSS, a prototype of our proposed semi-supervised framework that combines sophisticated language models, convolutional networks, and recurrent networks. Our experimental results demonstrate that the proposed approach provides a new state-of-the-art performance in classifying human protein-phenotype co-mentions by outperforming other supervised and semi-supervised counterparts. Furthermore, we highlight the utility of PPPredSS in powering a curation assistant system through case studies involving a group of biologists.

Conclusions

This article presents a novel approach for human protein-phenotype co-mention classification based on deep, semi-supervised, and ensemble learning. The insights and findings from this work have implications for biomedical researchers, biocurators, and the text mining community working on biomedical relationship extraction.

Collapse

Glavaški M, Velicki L. Humans and machines in biomedical knowledge curation: hypertrophic cardiomyopathy molecular mechanisms' representation. BioData Min 2021;14:45. [PMID: 34600580 PMCID: PMC8487578 DOI: 10.1186/s13040-021-00279-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Accepted: 09/14/2021] [Indexed: 11/25/2022] Open

Abstract

Background

Biomedical knowledge is dispersed in scientific literature and is growing constantly. Curation is the extraction of knowledge from unstructured data into a computable form and could be done manually or automatically. Hypertrophic cardiomyopathy (HCM) is the most common inherited cardiac disease, with genotype–phenotype associations still incompletely understood. We compared human- and machine-curated HCM molecular mechanisms’ models and examined the performance of different machine approaches for that task.

Results

We created six models representing HCM molecular mechanisms using different approaches and made them publicly available, analyzed them as networks, and tried to explain the models’ differences by the analysis of factors that affect the quality of machine-curated models (query constraints and reading systems’ performance). A result of this work is also the Interactive HCM map, the only publicly available knowledge resource dedicated to HCM. Sizes and topological parameters of the networks differed notably, and a low consensus was found in terms of centrality measures between networks. Consensus about the most important nodes was achieved only with respect to one element (calcium). Models with a reduced level of noise were generated and cooperatively working elements were detected. REACH and TRIPS reading systems showed much higher accuracy than Sparser, but at the cost of extraction performance. TRIPS proved to be the best single reading system for text segments about HCM, in terms of the compromise between accuracy and extraction performance.

Conclusions

Different approaches in curation can produce models of the same disease with diverse characteristics, and they give rise to utterly different conclusions in subsequent analysis. The final purpose of the model should direct the choice of curation techniques. Manual curation represents the gold standard for information extraction in biomedical research and is most suitable when only high-quality elements for models are required. Automated curation provides more substance, but high level of noise is expected. Different curation strategies can reduce the level of human input needed. Biomedical knowledge would benefit overwhelmingly, especially as to its rapid growth, if computers were to be able to assist in analysis on a larger scale.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13040-021-00279-2.

Collapse

Ramsey J, McIntosh B, Renfro D, Aleksander SA, LaBonte S, Ross C, Zweifel AE, Liles N, Farrar S, Gill JJ, Erill I, Ades S, Berardini TZ, Bennett JA, Brady S, Britton R, Carbon S, Caruso SM, Clements D, Dalia R, Defelice M, Doyle EL, Friedberg I, Gurney SMR, Hughes L, Johnson A, Kowalski JM, Li D, Lovering RC, Mans TL, McCarthy F, Moore SD, Murphy R, Paustian TD, Perdue S, Peterson CN, Prüß BM, Saha MS, Sheehy RR, Tansey JT, Temple L, Thorman AW, Trevino S, Vollmer AC, Walbot V, Willey J, Siegele DA, Hu JC. Crowdsourcing biocuration: The Community Assessment of Community Annotation with Ontologies (CACAO). PLoS Comput Biol 2021;17:e1009463. [PMID: 34710081 PMCID: PMC8553046 DOI: 10.1371/journal.pcbi.1009463] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open

Affiliation(s)

Jolene Ramsey Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America
Brenley McIntosh Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Daniel Renfro Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Suzanne A. Aleksander Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Sandra LaBonte Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Curtis Ross Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America
Adrienne E. Zweifel Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Nathan Liles Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Shabnam Farrar Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America
Jason J. Gill Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America Department of Animal Science, Texas A&M University, College Station, Texas, United States of America
Ivan Erill Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States of America Department of Computer Science and Electrical Engineering, University of Maryland Baltimore County, Baltimore, Maryland, United States of America
Sarah Ades Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
Tanya Z. Berardini The Arabidopsis Information Resource, Phoenix Bioinformatics, Newark, California, United States of America
Jennifer A. Bennett Department of Biology and Earth Science, Otterbein University, Westerville, Ohio, United States of America
Siobhan Brady Department of Plant Biology and Genome Center, University of California Davis, Davis, California, United States of America
Robert Britton Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan, United States of America
Seth Carbon Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
Steven M. Caruso Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, Maryland, United States of America
Dave Clements Department of Biology, John Hopkins University, Baltimore, Maryland, United States of America
Ritu Dalia Department of Biology, Drexel University, Philadelphia, Pennsylvania, United States of America
Meredith Defelice Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
Erin L. Doyle Biology Department, Doane University, Crete, Nebraska, United States of America
Iddo Friedberg Department of Microbiology, Miami University, Oxford, Ohio, United States of America
Susan M. R. Gurney Department of Biology, Drexel University, Philadelphia, Pennsylvania, United States of America
Lee Hughes Department of Biological Sciences, University of North Texas, Denton, Texas, United States of America
Allison Johnson Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, Virginia, United States of America
Jason M. Kowalski Biological Sciences Department, University of Wisconsin-Parkside, Kenosha, Wisconsin, United States of America
Donghui Li The Arabidopsis Information Resource, Phoenix Bioinformatics, Newark, California, United States of America
Ruth C. Lovering Institute of Cardiovascular Science, University College London, London, United Kingdom
Tamara L. Mans Department of Biochemistry and Biotechnology, Minnesota State University Moorhead, Brooklyn Park, Minnesota, United States of America
Fiona McCarthy Department of Basic Science, College of Veterinary Medicine, Mississippi State University, Starkville, Mississippi, United States of America
Sean D. Moore Burnett School of Biomedical Sciences, University of Central Florida, Orlando, Florida, United States of America
Rebecca Murphy Department of Biology, Centenary College of Louisiana, Shreveport, Louisiana, United States of America
Timothy D. Paustian Department of Bacteriology, University of Wisconsin, Madison, Wisconsin, United States of America
Sarah Perdue Biological Sciences Department, University of Wisconsin-Parkside, Kenosha, Wisconsin, United States of America
Celeste N. Peterson Biology Department, Suffolk University, Boston, Massachusetts, United States of America
Birgit M. Prüß Microbiological Sciences Department, North Dakota State University, Fargo, North Dakota, United States of America
Margaret S. Saha Department of Biology, College of William & Mary, Williamsburg, Virginia, United States of America
Robert R. Sheehy Biology Department, Radford University, Radford, Virginia, United States of America
John T. Tansey Department of Biochemistry and Molecular Biology, Otterbein University, Westerville, Ohio, United States of America
Louise Temple School of Integrated Sciences, James Madison University, Harrisonburg, Virginia, United States of America
Alexander William Thorman Department of Environmental and Public Health Sciences, University of Cincinnati, Cincinnati, Ohio, United States of America
Saul Trevino Department of Chemistry, Math, and Physics, Houston Baptist University, Houston, Texas, United States of America
Amy Cheng Vollmer Department of Biology, Swarthmore College, Swarthmore, Pennsylvania, United States of America
Virginia Walbot Department of Biology, Stanford University, Stanford, California, United States of America
Joanne Willey Department of Science Education, Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, New York, United States of America
Deborah A. Siegele Department of Biology, Texas A&M University, College Station, Texas, United States of America
James C. Hu Department of Biochemistry & Biophysics, Texas A&M University, College Station, Texas, United States of America Center for Phage Technology, Texas A&M University, College Station, Texas, United States of America

Collapse

Díaz-Rodríguez M, Lithgow-Serrano O, Guadarrama-García F, Tierrafría VH, Gama-Castro S, Solano-Lira H, Salgado H, Rinaldi F, Méndez-Cruz CF, Collado-Vides J. Lisen&Curate: A platform to facilitate gathering textual evidence for curation of regulation of transcription initiation in bacteria. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2021;1864:194753. [PMID: 34461312 PMCID: PMC10155859 DOI: 10.1016/j.bbagrm.2021.194753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 07/12/2021] [Accepted: 08/25/2021] [Indexed: 10/20/2022]

Affiliation(s)

Martín Díaz-Rodríguez Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico
Oscar Lithgow-Serrano Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico; Dalle Molle Institute for Artificial Intelligence Research, IDSIA USI-SUPSI, Polo universitario Lugano-Campus Est, Via la Santa 1, CH-6962 Lugano, Switzerland
Francisco Guadarrama-García Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico
Víctor H Tierrafría Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico
Socorro Gama-Castro Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico
Hilda Solano-Lira Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico
Heladia Salgado Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico
Fabio Rinaldi Dalle Molle Institute for Artificial Intelligence Research, IDSIA USI-SUPSI, Polo universitario Lugano-Campus Est, Via la Santa 1, CH-6962 Lugano, Switzerland; Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
Carlos-Francisco Méndez-Cruz Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico.
Julio Collado-Vides Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n Col. Chamilpa, 62210 Cuernavaca, Mor., Mexico; Department of Biomedical Engineering, Boston University, 44 Cummington Mall Room 403, 02215 Boston, MA, USA; Center for Genomic Regulation (CRG), Dr. Aiguader 88, 08003, Barcelona, Spain

Collapse

Allot A, Lee K, Chen Q, Luo L, Lu Z. LitSuggest: a web-based system for literature recommendation and curation using machine learning. Nucleic Acids Res 2021;49:W352-W358. [PMID: 33950204 DOI: 10.1093/nar/gkab326] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 01/02/2023] Open

Staton M, Cannon E, Sanderson LA, Wegrzyn J, Anderson T, Buehler S, Cobo-Simón I, Faaberg K, Grau E, Guignon V, Gunoskey J, Inderski B, Jung S, Lager K, Main D, Poelchau M, Ramnath R, Richter P, West J, Ficklin S. Tripal, a community update after 10 years of supporting open source, standards-based genetic, genomic and breeding databases. Brief Bioinform 2021;22:6318561. [PMID: 34251419 PMCID: PMC8574961 DOI: 10.1093/bib/bbab238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 05/28/2021] [Accepted: 06/01/2021] [Indexed: 12/01/2022] Open

Foerster H, Battey JND, Sierro N, Ivanov NV, Mueller LA. Metabolic networks of the Nicotiana genus in the spotlight: content, progress and outlook. Brief Bioinform 2021;22:bbaa136. [PMID: 32662816 PMCID: PMC8138835 DOI: 10.1093/bib/bbaa136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 05/19/2020] [Accepted: 06/04/2020] [Indexed: 01/09/2023] Open

Hatos A, Quaglia F, Piovesan D, Tosatto SCE. APICURON: a database to credit and acknowledge the work of biocurators. Database (Oxford) 2021;2021:baab019. [PMID: 33882120 PMCID: PMC8060004 DOI: 10.1093/database/baab019] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 03/12/2021] [Accepted: 04/12/2021] [Indexed: 11/14/2022]

Arnaboldi V, Cho J, Sternberg PW. Wormicloud: a new text summarization tool based on word clouds to explore the C. elegans literature. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021;2021:6206631. [PMID: 33787871 DOI: 10.1093/database/baab015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 02/19/2021] [Accepted: 03/24/2021] [Indexed: 11/12/2022]

Touré V, Zobolas J, Kuiper M, Vercruysse S. CausalBuilder: bringing the MI2CAST causal interaction annotation standard to the curator. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021;2021:6129748. [PMID: 33547799 PMCID: PMC7904049 DOI: 10.1093/database/baaa107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 11/16/2020] [Accepted: 12/07/2020] [Indexed: 12/23/2022]

Pancsa R, Vranken W, Mészáros B. Computational resources for identifying and describing proteins driving liquid-liquid phase separation. Brief Bioinform 2021;22:6124912. [PMID: 33517364 PMCID: PMC8425267 DOI: 10.1093/bib/bbaa408] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Revised: 11/23/2020] [Accepted: 12/12/2020] [Indexed: 01/06/2023] Open

Bastian FB, Roux J, Niknejad A, Comte A, Fonseca Costa SS, de Farias TM, Moretti S, Parmentier G, de Laval VR, Rosikiewicz M, Wollbrett J, Echchiki A, Escoriza A, Gharib WH, Gonzales-Porta M, Jarosz Y, Laurenczy B, Moret P, Person E, Roelli P, Sanjeev K, Seppey M, Robinson-Rechavi M. The Bgee suite: integrated curated expression atlas and comparative transcriptomics in animals. Nucleic Acids Res 2021;49:D831-D847. [PMID: 33037820 PMCID: PMC7778977 DOI: 10.1093/nar/gkaa793] [Citation(s) in RCA: 76] [Impact Index Per Article: 25.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Revised: 08/24/2020] [Accepted: 09/15/2020] [Indexed: 01/24/2023] Open

Affiliation(s)

Frederic B Bastian Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Julien Roux Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Anne Niknejad Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Aurélie Comte Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Sara S Fonseca Costa Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Tarcisio Mendes de Farias Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Sébastien Moretti Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Gilles Parmentier Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Valentine Rech de Laval Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Marta Rosikiewicz Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Julien Wollbrett Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Amina Echchiki Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Angélique Escoriza Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Walid H Gharib Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Mar Gonzales-Porta Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Yohan Jarosz Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Balazs Laurenczy Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Philippe Moret Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Emilie Person Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Patrick Roelli Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Komal Sanjeev Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Mathieu Seppey Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
Marc Robinson-Rechavi Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland

Collapse

Chen Q, Allot A, Lu Z. LitCovid: an open database of COVID-19 literature. Nucleic Acids Res 2021;49:D1534-D1540. [PMID: 33166392 PMCID: PMC7778958 DOI: 10.1093/nar/gkaa952] [Citation(s) in RCA: 130] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/02/2020] [Accepted: 10/08/2020] [Indexed: 12/22/2022] Open

Egorova KS, Smirnova NS, Toukach PV. CSDB_GT, a curated glycosyltransferase database with close-to-full coverage on three most studied nonanimal species. Glycobiology 2020;31:524-529. [PMID: 33242091 DOI: 10.1093/glycob/cwaa107] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2020] [Revised: 11/13/2020] [Accepted: 11/18/2020] [Indexed: 11/13/2022] Open

Gabrielsen AM. Openness and trust in data-intensive science: the case of biocuration. MEDICINE, HEALTH CARE, AND PHILOSOPHY 2020;23:497-504. [PMID: 32524312 PMCID: PMC7426290 DOI: 10.1007/s11019-020-09960-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Nydal R, Bennett G, Kuiper M, Lægreid A. Silencing trust: confidence and familiarity in re-engineering knowledge infrastructures. MEDICINE, HEALTH CARE, AND PHILOSOPHY 2020;23:471-484. [PMID: 32468194 PMCID: PMC7426298 DOI: 10.1007/s11019-020-09957-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]

Shaw F, Etuk A, Minotto A, Gonzalez-Beltran A, Johnson D, Rocca-Serra P, Laporte MA, Arnaud E, Devare M, Kersey P, Sansone SA, Davey RP. COPO: a metadata platform for brokering FAIR data in the life sciences. F1000Res 2020. [DOI: 10.12688/f1000research.23889.1] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

Leaman R, Wei CH, Allot A, Lu Z. Ten tips for a text-mining-ready article: How to improve automated discoverability and interpretability. PLoS Biol 2020;18:e3000716. [PMID: 32479517 PMCID: PMC7289435 DOI: 10.1371/journal.pbio.3000716] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 06/11/2020] [Indexed: 12/22/2022] Open

Teodoro D, Knafou J, Naderi N, Pasche E, Gobeill J, Arighi CN, Ruch P. UPCLASS: a deep learning-based classifier for UniProtKB entry publications. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020;2020:5822772. [PMID: 32367111 PMCID: PMC7198315 DOI: 10.1093/database/baaa026] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Revised: 02/19/2020] [Accepted: 03/11/2020] [Indexed: 12/20/2022]

Lock A, Harris MA, Rutherford K, Hayles J, Wood V. Community curation in PomBase: enabling fission yeast experts to provide detailed, standardized, sharable annotation from research publications. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020;2020:5827230. [PMID: 32353878 PMCID: PMC7192550 DOI: 10.1093/database/baaa028] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/10/2020] [Revised: 02/28/2020] [Accepted: 03/22/2020] [Indexed: 11/22/2022]

Southan C. Opening up connectivity between documents, structures and bioactivity. Beilstein J Org Chem 2020;16:596-606. [PMID: 32280387 PMCID: PMC7136548 DOI: 10.3762/bjoc.16.54] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 03/12/2020] [Indexed: 12/17/2022] Open

Abstract

Bioscientists reading papers or patents strive to discern the key relationships reported within a document "D" where a bioactivity "A" with a quantitative result "R" (e.g., an IC₅₀) is reported for chemical structure "C" that modulates (e.g., inhibits) a protein target "P". A useful shorthand for this connectivity thus becomes DARCP. The problem at the core of this article is that the community has spent millions effectively burying these relationships in PDFs over many decades but must now spend millions more trying to get them back out. The key imperative for this is to increase the flow into structured open databases. The positive impacts will include expanded data mining opportunities for drug discovery and chemical biology. Over the last decade commercial sources have manually extracted DARCP from ≈300,000 documents encompassing ≈7 million compounds interacting with ≈10,000 targets. Over a similar time, the Guide to Pharmacology, BindingDB and ChEMBL have carried out analogues DARCP extractions. Although their expert-curated numbers are lower (i.e., ≈2 million compounds against ≈3700 human proteins), these open sources have the great advantage of being merged within PubChem. Parallel efforts have focused on the extraction of document-to-compound (D-C-only) connectivity. In the absence of molecular mechanism of action (mmoa) annotation, this is of less value but can be automatically extracted. This has been significantly accomplished for patents, (e.g., by IBM, SureChEMBL and WIPO) for over 30 million compounds in PubChem. These have recently been joined by 1.4 million D-C submissions from three major chemistry publishers. In addition, both the European and US PubMed Central portals now add chemistry look-ups from abstracts and full-text papers. However, the fully automated extraction of DARCLP has not yet been achieved. This stands in contrast to the ability of biocurators to discern these relationships in minutes. Unfortunately, no journals have yet instigated a flow of author-specified DARCP directly into open databases. Progress may come from trends such as open science, open access (OA), findable, accessible, interoperable and reusable (FAIR), resource description framework (RDF) and WikiData. However, we will need to await the technical applicability in respect to DARCP capture to see if this opens up connectivity.

Collapse

Baryshnikova A. Data libraries - the missing element for modeling biological systems. FEBS J 2020;287:4594-4601. [PMID: 32100391 PMCID: PMC7687078 DOI: 10.1111/febs.15261] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 02/19/2020] [Accepted: 02/24/2020] [Indexed: 11/29/2022]

Breuza L, Arighi CN, Argoud-Puy G, Casals-Casas C, Estreicher A, Famiglietti ML, Georghiou G, Gos A, Gruaz-Gumowski N, Hinz U, Hyka-Nouspikel N, Kramarz B, Lovering RC, Lussi Y, Magrane M, Masson P, Perfetto L, Poux S, Rodriguez-Lopez M, Stoeckert C, Sundaram S, Wang LS, Wu E, Orchard S. A Coordinated Approach by Public Domain Bioinformatics Resources to Aid the Fight Against Alzheimer's Disease Through Expert Curation of Key Protein Targets. J Alzheimers Dis 2020;77:257-273. [PMID: 32716361 PMCID: PMC7592670 DOI: 10.3233/jad-200206] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/05/2020] [Indexed: 01/08/2023]

Affiliation(s)

Lionel Breuza Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
Cecilia N. Arighi Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA Protein Information Resource, University of Delaware, Newark, DE, USA
Ghislaine Argoud-Puy Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
Cristina Casals-Casas Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
Anne Estreicher Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
Maria Livia Famiglietti Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
George Georghiou European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
Arnaud Gos Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
Nadine Gruaz-Gumowski Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
Ursula Hinz Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
Nevila Hyka-Nouspikel Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
Barbara Kramarz Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), London, UK
Ruth C. Lovering Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), London, UK
Yvonne Lussi European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
Michele Magrane European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
Patrick Masson Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
Livia Perfetto European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
Sylvain Poux Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
Milagros Rodriguez-Lopez European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
Christian Stoeckert Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Shyamala Sundaram Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
Li-San Wang Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
Elizabeth Wu Alzforum, Cambridge, MA, USA
Sandra Orchard European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK
IMEx Consortium, UniProt Consortium Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland Protein Information Resource, Georgetown University Medical Center, Washington, DC, USA Protein Information Resource, University of Delaware, Newark, DE, USA European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Campus, Hinxton, Cambridge, UK Functional Gene Annotation, Preclinical and Fundamental Science, Institute of Cardiovascular Science, University College London (UCL), London, UK Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA Alzforum, Cambridge, MA, USA

Collapse

Davis AP, Wiegers J, Wiegers TC, Mattingly CJ. Public data sources to support systems toxicology applications. CURRENT OPINION IN TOXICOLOGY 2019;16:17-24. [PMID: 33604492 PMCID: PMC7889036 DOI: 10.1016/j.cotox.2019.03.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

Tang YA, Pichler K, Füllgrabe A, Lomax J, Malone J, Munoz-Torres MC, Vasant DV, Williams E, Haendel M. Ten quick tips for biocuration. PLoS Comput Biol 2019;15:e1006906. [PMID: 31048830 PMCID: PMC6497217 DOI: 10.1371/journal.pcbi.1006906] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Thompson R, Abicht A, Beeson D, Engel AG, Eymard B, Maxime E, Lochmüller H. A nomenclature and classification for the congenital myasthenic syndromes: preparing for FAIR data in the genomic era. Orphanet J Rare Dis 2018;13:211. [PMID: 30477555 PMCID: PMC6260762 DOI: 10.1186/s13023-018-0955-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 11/14/2018] [Indexed: 12/30/2022] Open

Abstract

BACKGROUND

Congenital myasthenic syndromes (CMS) are a heterogeneous group of inherited neuromuscular disorders sharing the common feature of fatigable weakness due to defective neuromuscular transmission. Despite rapidly increasing knowledge about the genetic origins, specific features and potential treatments for the known CMS entities, the lack of standardized classification at the most granular level has hindered the implementation of computer-based systems for knowledge capture and reuse. Where individual clinical or genetic entities do not exist in disease coding systems, they are often invisible in clinical records and inadequately annotated in information systems, and features that apply to one disease but not another cannot be adequately differentiated.

RESULTS

We created a detailed classification of all CMS disease entities suitable for use in clinical and genetic databases and decision support systems. To avoid conflict with existing coding systems as well as with expert-defined group-level classifications, we developed a collaboration with the Orphanet nomenclature for rare diseases, creating a clinically understandable name for each entity and placing it within a logical hierarchy that paves the way towards computer-aided clinical systems and improved knowledge bases for CMS that can adequately differentiate between types and ascribe relevant expert knowledge to each.

CONCLUSIONS

We suggest that data science approaches can be used effectively in the clinical domain in a way that does not disrupt preexisting expert classification and that enhances the utility of existing coding systems. Our classification provides a comprehensive view of the individual CMS entities in a manner that supports differential diagnosis and understanding of the range and heterogeneity of the disease but that also enables robust computational coding and hierarchy for machine-readability. It can be extended as required in the light of future scientific advances, but already provides the starting point for the creation of FAIR (Findable, Accessible, Interoperable and Reusable) knowledge bases of data on the congenital myasthenic syndromes.

Collapse