1
|
Lyons EL, Watson D, Alodadi MS, Haugabook SJ, Tawa GJ, Hannah-Shmouni F, Porter FD, Collins JR, Ottinger EA, Mudunuri US. Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus. BMC Genomics 2023; 24:460. [PMID: 37587458 PMCID: PMC10433598 DOI: 10.1186/s12864-023-09561-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 08/08/2023] [Indexed: 08/18/2023] Open
Abstract
BACKGROUND Approximately 4-8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain. RESULTS This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm. CONCLUSIONS Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases.
Collapse
Affiliation(s)
- Erica L Lyons
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Daniel Watson
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Mohammad S Alodadi
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Sharie J Haugabook
- Division of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Gregory J Tawa
- Division of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Fady Hannah-Shmouni
- Division of Translational Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Forbes D Porter
- Division of Translational Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Jack R Collins
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA
| | - Elizabeth A Ottinger
- Division of Preclinical Innovation, Therapeutic Development Branch, Therapeutics for Rare and Neglected Diseases (TRND) Program, National Center for Advancing Translational Sciences, National Institutes of Health, Bethesda, MD, 20892, USA.
| | - Uma S Mudunuri
- Advanced Biomedical Computational Science, Frederick National Laboratory for Cancer Research, Frederick, MD, 21702, USA.
| |
Collapse
|
2
|
Leaman R, Islamaj R, Adams V, Alliheedi MA, Almeida JR, Antunes R, Bevan R, Chang YC, Erdengasileng A, Hodgskiss M, Ida R, Kim H, Li K, Mercer RE, Mertová L, Mobasher G, Shin HC, Sung M, Tsujimura T, Yeh WC, Lu Z. Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII. Database (Oxford) 2023; 2023:7071696. [PMID: 36882099 PMCID: PMC9991492 DOI: 10.1093/database/baad005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 01/06/2023] [Accepted: 02/15/2023] [Indexed: 03/09/2023]
Abstract
The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and-as highlighted during the coronavirus disease 2019 pandemic-their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 F-score (0.8759 precision and 0.8587 recall) for strict NER performance and 0.8136 F-score (0.8621 precision and 0.7702 recall) for strict normalization performance. The highest performance achieved for the chemical indexing task was 0.6073 F-score (0.7417 precision and 0.5141 recall). This community challenge demonstrated that (i) the current substantial achievements in deep learning technologies can be utilized to improve automated prediction accuracy further and (ii) the chemical indexing task is substantially more challenging. We look forward to further developing biomedical text-mining methods to respond to the rapid growth of biomedical literature. The NLM-Chem track dataset and other challenge materials are publicly available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/.
Collapse
Affiliation(s)
| | | | - Virginia Adams
- NVIDIA, 2788 San Tomas Expressway, Santa Clara, CA 95051, USA
| | - Mohammed A Alliheedi
- Department of Computer Science, Al Baha University, 4781 King Fahd Rd, Al Aqiq 65779, Saudi Arabia
| | - João Rafael Almeida
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
- Department of Information and Communications Technologies, University of A Coruña, Camiño do Lagar de Castro, A Coruña 15008, Spain
| | - Rui Antunes
- Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
| | - Robert Bevan
- Informatics Department, Medicines Discovery Catapult, Alderley Park, Block 35, Mereside, Macclesfield SK10 4ZF, UK
| | - Yung-Chun Chang
- Graduate Institute of Data Science, Taipei Medical University, No. 172-1, Section 2, Keelung Rd, Da’an District, Taipei City , Taipei 106, Taiwan
| | - Arslan Erdengasileng
- Department of Statistics, Florida State University, 117 N. Woodward Ave, Tallahassee, FL 32306, USA
| | - Matthew Hodgskiss
- Informatics Department, Medicines Discovery Catapult, Alderley Park, Block 35, Mereside, Macclesfield SK10 4ZF, UK
| | - Ryuki Ida
- Computational Intelligence Laboratory, Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, Aichi 468-8511, Japan
| | - Hyunjae Kim
- Department of Computer Science and Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea
| | - Keqiao Li
- Department of Statistics, Florida State University, 117 N. Woodward Ave, Tallahassee, FL 32306, USA
| | - Robert E Mercer
- Department of Computer Science, The University of Western Ontario, Room 355, Middlesex College, Ontario , London N6A 5B7, Canada
| | - Lukrécia Mertová
- Scientific Databases and Visualization Group, Heidelberg Institute for Theoretical Studies (HITS gGmbH), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany
| | - Ghadeer Mobasher
- Scientific Databases and Visualization Group, Heidelberg Institute for Theoretical Studies (HITS gGmbH), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany
- Institute of Computer Science, Heidelberg University, Im Neuenheimer Feld 205, Heidelberg 69120, Germany
| | - Hoo-Chang Shin
- NVIDIA, 2788 San Tomas Expressway, Santa Clara, CA 95051, USA
| | - Mujeen Sung
- Department of Computer Science and Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea
| | - Tomoki Tsujimura
- Computational Intelligence Laboratory, Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, Aichi 468-8511, Japan
| | - Wen-Chao Yeh
- Institute of Information Systems and Applications, National Tsing Hua University, No. 101, Section 2, Kuang-Fu Road, Hsinchu 30013, Taiwan
| | - Zhiyong Lu
- *Corresponding author: Tel: +1-301-594-7089; Fax: +1-301-480-2290;
| |
Collapse
|
3
|
Papadimitriou S, Gravel B, Nachtegael C, De Baere E, Loeys B, Vikkula M, Smits G, Lenaerts T. Toward reporting standards for the pathogenicity of variant combinations involved in multilocus/oligogenic diseases. HGG ADVANCES 2022; 4:100165. [PMID: 36578772 PMCID: PMC9791921 DOI: 10.1016/j.xhgg.2022.100165] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Although standards and guidelines for the interpretation of variants identified in genes that cause Mendelian disorders have been developed, this is not the case for more complex genetic models including variant combinations in multiple genes. During a large curation process conducted on 318 research articles presenting oligogenic variant combinations, we encountered several recurring issues concerning their proper reporting and pathogenicity assessment. These mainly concern the absence of strong evidence that refutes a monogenic model and the lack of a proper genetic and functional assessment of the joint effect of the involved variants. With the increasing accumulation of such cases, it has become essential to develop standards and guidelines on how these oligogenic/multilocus variant combinations should be interpreted, validated, and reported in order to provide high-quality data and supporting evidence to the scientific community.
Collapse
Affiliation(s)
- Sofia Papadimitriou
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium,Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium,Corresponding author
| | - Barbara Gravel
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium,Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Charlotte Nachtegael
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
| | - Elfride De Baere
- Center for Medical Genetics, Ghent University Hospital, Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
| | - Bart Loeys
- Center for Medical Genetics, Antwerp University Hospital/University of Antwerp, 2650 Antwerp, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, UCLouvain, Brussels, Belgium
| | - Guillaume Smits
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,Center of Human Genetics, Hôpital Erasme, Université Libre de Bruxelles, 1070 Brussels, Belgium,Hôpital Universitaire des Enfants Reine Fabiola, Université Libre de Bruxelles, 1020 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium,Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium,Corresponding author
| |
Collapse
|
4
|
Comprehensively identifying Long Covid articles with human-in-the-loop machine learning. PATTERNS (NEW YORK, N.Y.) 2022; 4:100659. [PMID: 36471749 PMCID: PMC9712067 DOI: 10.1016/j.patter.2022.100659] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/19/2022] [Accepted: 11/17/2022] [Indexed: 12/05/2022]
Abstract
A significant percentage of COVID-19 survivors experience ongoing multisystemic symptoms that often affect daily living, a condition known as Long Covid or post-acute-sequelae of SARS-CoV-2 infection. However, identifying scientific articles relevant to Long Covid is challenging since there is no standardized or consensus terminology. We developed an iterative human-in-the-loop machine learning framework combining data programming with active learning into a robust ensemble model, demonstrating higher specificity and considerably higher sensitivity than other methods. Analysis of the Long Covid Collection shows that (1) most Long Covid articles do not refer to Long Covid by any name, (2) when the condition is named, the name used most frequently in the literature is Long Covid, and (3) Long Covid is associated with disorders in a wide variety of body systems. The Long Covid Collection is updated weekly and is searchable online at the LitCovid portal: https://www.ncbi.nlm.nih.gov/research/coronavirus/docsum?filters=e_condition.LongCovid.
Collapse
|
5
|
Rodriguez-Esteban R. New reasons for biologists to write with a formal language. Database (Oxford) 2022; 2022:6600538. [PMID: 35657112 PMCID: PMC9216469 DOI: 10.1093/database/baac039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 03/18/2022] [Accepted: 05/17/2022] [Indexed: 12/03/2022]
Abstract
Current biological writing is afflicted by the use of ambiguous names, convoluted sentences, vague statements and narrative-fitted storylines. This represents a challenge for biological research in general and in particular for fields such as biological database curation and text mining, which have been tasked to cope with exponentially growing content. Improving the quality of biological writing by encouraging unambiguity and precision would foster expository discipline and machine reasoning. More specifically, the routine inclusion of formal languages in biological writing would improve our ability to describe, compile and model biology.
Collapse
Affiliation(s)
- Raul Rodriguez-Esteban
- Roche Pharmaceutical Research and Early Development, Roche Innovation Center Basel, Grenzacherstrasse 124 , Basel 4070, Switzerland
| |
Collapse
|
6
|
Lee K, Wei CH, Lu Z. Recent advances of automated methods for searching and extracting genomic variant information from biomedical literature. Brief Bioinform 2021; 22:bbaa142. [PMID: 32770181 PMCID: PMC8138883 DOI: 10.1093/bib/bbaa142] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 06/07/2020] [Accepted: 06/25/2020] [Indexed: 12/28/2022] Open
Abstract
MOTIVATION To obtain key information for personalized medicine and cancer research, clinicians and researchers in the biomedical field are in great need of searching genomic variant information from the biomedical literature now than ever before. Due to the various written forms of genomic variants, however, it is difficult to locate the right information from the literature when using a general literature search system. To address the difficulty of locating genomic variant information from the literature, researchers have suggested various solutions based on automated literature-mining techniques. There is, however, no study for summarizing and comparing existing tools for genomic variant literature mining in terms of how to search easily for information in the literature on genomic variants. RESULTS In this article, we systematically compared currently available genomic variant recognition and normalization tools as well as the literature search engines that adopted these literature-mining techniques. First, we explain the problems that are caused by the use of non-standard formats of genomic variants in the PubMed literature by considering examples from the literature and show the prevalence of the problem. Second, we review literature-mining tools that address the problem by recognizing and normalizing the various forms of genomic variants in the literature and systematically compare them. Third, we present and compare existing literature search engines that are designed for a genomic variant search by using the literature-mining techniques. We expect this work to be helpful for researchers who seek information about genomic variants from the literature, developers who integrate genomic variant information from the literature and beyond.
Collapse
Affiliation(s)
- Kyubum Lee
- National Center for Biotechnology Information
| | | | - Zhiyong Lu
- National Center for Biotechnology Information
| |
Collapse
|
7
|
Caufield JH, Fu J, Wang D, Guevara-Gonzalez V, Wang W, Ping P. A Second Look at FAIR in Proteomic Investigations. J Proteome Res 2021; 20:2182-2186. [PMID: 33719446 DOI: 10.1021/acs.jproteome.1c00177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Proteomics is, by definition, comprehensive and large-scale, seeking to unravel ome-level protein features with phenotypic information on an entire system, an organ, cells, or organisms. This scope consistently involves and extends beyond single experiments. Multitudinous resources now exist to assist in making the results of proteomics experiments more findable, accessible, interoperable, and reusable (FAIR), yet many tools are awaiting to be adopted by our community. Here we highlight strategies for expanding the impact of proteomics data beyond single studies. We show how linking specific terminologies, identifiers, and text (words) can unify individual data points across a wide spectrum of studies and, more importantly, how this approach may potentially reveal novel relationships. In this effort, we explain how data sets and methods can be rendered more linkable and how this maximizes their value. We also include a discussion on how data linking strategies benefit stakeholders across the proteomics community and beyond.
Collapse
|
8
|
De Silva K, Mathews N, Teede H, Forbes A, Jönsson D, Demmer RT, Enticott J. Clinical notes as prognostic markers of mortality associated with diabetes mellitus following critical care: A retrospective cohort analysis using machine learning and unstructured big data. Comput Biol Med 2021; 132:104305. [PMID: 33705995 DOI: 10.1016/j.compbiomed.2021.104305] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 02/23/2021] [Accepted: 02/27/2021] [Indexed: 12/14/2022]
Abstract
BACKGROUND Clinical notes are ubiquitous resources offering potential value in optimizing critical care via data mining technologies. OBJECTIVE To determine the predictive value of clinical notes as prognostic markers of 1-year all-cause mortality among people with diabetes following critical care. MATERIALS AND METHODS Mortality of diabetes patients were predicted using three cohorts of clinical text in a critical care database, written by physicians (n = 45253), nurses (159027), and both (n = 204280). Natural language processing was used to pre-process text documents and LASSO-regularized logistic regression models were trained and tested. Confusion matrix metrics of each model were calculated and AUROC estimates between models were compared. All predictive words and corresponding coefficients were extracted. Outcome probability associated with each text document was estimated. RESULTS Models built on clinical text of physicians, nurses, and the combined cohort predicted mortality with AUROC of 0.996, 0.893, and 0.922, respectively. Predictive performance of the models significantly differed from one another whereas inter-rater reliability ranged from substantial to almost perfect across them. Number of predictive words with non-zero coefficients were 3994, 8159, and 10579, respectively, in the models of physicians, nurses, and the combined cohort. Physicians' and nursing notes, both individually and when combined, strongly predicted 1-year all-cause mortality among people with diabetes following critical care. CONCLUSION Clinical notes of physicians and nurses are strong and novel prognostic markers of diabetes-associated mortality in critical care, offering potentially generalizable and scalable applications. Clinical text-derived personalized risk estimates of prognostic outcomes such as mortality could be used to optimize patient care.
Collapse
Affiliation(s)
- Kushan De Silva
- Monash Centre for Health Research and Implementation, School of Public Health and Preventive Medicine, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Clayton, 3168, Australia.
| | - Noel Mathews
- Monash Centre for Health Research and Implementation, School of Public Health and Preventive Medicine, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Clayton, 3168, Australia
| | - Helena Teede
- Monash Centre for Health Research and Implementation, School of Public Health and Preventive Medicine, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Clayton, 3168, Australia
| | - Andrew Forbes
- Biostatistics Unit, Division of Research Methodology, School of Public Health and Preventive Medicine, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Melbourne, 3004, Australia
| | - Daniel Jönsson
- Department of Periodontology, Faculty of Odontology, Malmö University, Malmö, 21119, Sweden; Swedish Dental Service of Skane, Lund, 22647, Sweden
| | - Ryan T Demmer
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN, USA; Mailman School of Public Health, Columbia University, New York, USA
| | - Joanne Enticott
- Monash Centre for Health Research and Implementation, School of Public Health and Preventive Medicine, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Clayton, 3168, Australia
| |
Collapse
|
9
|
Chen Q, Allot A, Lu Z. LitCovid: an open database of COVID-19 literature. Nucleic Acids Res 2021; 49:D1534-D1540. [PMID: 33166392 PMCID: PMC7778958 DOI: 10.1093/nar/gkaa952] [Citation(s) in RCA: 130] [Impact Index Per Article: 43.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/02/2020] [Accepted: 10/08/2020] [Indexed: 12/22/2022] Open
Abstract
Since the outbreak of the current pandemic in 2020, there has been a rapid growth of published articles on COVID-19 and SARS-CoV-2, with about 10 000 new articles added each month. This is causing an increasingly serious information overload, making it difficult for scientists, healthcare professionals and the general public to remain up to date on the latest SARS-CoV-2 and COVID-19 research. Hence, we developed LitCovid (https://www.ncbi.nlm.nih.gov/research/coronavirus/), a curated literature hub, to track up-to-date scientific information in PubMed. LitCovid is updated daily with newly identified relevant articles organized into curated categories. To support manual curation, advanced machine-learning and deep-learning algorithms have been developed, evaluated and integrated into the curation workflow. To the best of our knowledge, LitCovid is the first-of-its-kind COVID-19-specific literature resource, with all of its collected articles and curated data freely available. Since its release, LitCovid has been widely used, with millions of accesses by users worldwide for various information needs, such as evidence synthesis, drug discovery and text and data mining, among others.
Collapse
Affiliation(s)
- Qingyu Chen
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Alexis Allot
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD 20892, USA
| |
Collapse
|