Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Leaman R, Wei CH, Allot A, Lu Z. Ten tips for a text-mining-ready article: How to improve automated discoverability and interpretability. PLoS Biol 2020;18:e3000716. [PMID: 32479517 PMCID: PMC7289435 DOI: 10.1371/journal.pbio.3000716] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 06/11/2020] [Indexed: 12/22/2022] Open

For:	Leaman R, Wei CH, Allot A, Lu Z. Ten tips for a text-mining-ready article: How to improve automated discoverability and interpretability. PLoS Biol 2020;18:e3000716. [PMID: 32479517 PMCID: PMC7289435 DOI: 10.1371/journal.pbio.3000716] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 06/11/2020] [Indexed: 12/22/2022] Open

Number

Cited by Other Article(s)

Lyons EL, Watson D, Alodadi MS, Haugabook SJ, Tawa GJ, Hannah-Shmouni F, Porter FD, Collins JR, Ottinger EA, Mudunuri US. Rare disease variant curation from literature: assessing gaps with creatine transport deficiency in focus. BMC Genomics 2023;24:460. [PMID: 37587458 PMCID: PMC10433598 DOI: 10.1186/s12864-023-09561-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 08/08/2023] [Indexed: 08/18/2023] Open

Abstract

BACKGROUND

Approximately 4-8% of the world suffers from a rare disease. Rare diseases are often difficult to diagnose, and many do not have approved therapies. Genetic sequencing has the potential to shorten the current diagnostic process, increase mechanistic understanding, and facilitate research on therapeutic approaches but is limited by the difficulty of novel variant pathogenicity interpretation and the communication of known causative variants. It is unknown how many published rare disease variants are currently accessible in the public domain.

RESULTS

This study investigated the translation of knowledge of variants reported in published manuscripts to publicly accessible variant databases. Variants, symptoms, biochemical assay results, and protein function from literature on the SLC6A8 gene associated with X-linked Creatine Transporter Deficiency (CTD) were curated and reported as a highly annotated dataset of variants with clinical context and functional details. Variants were harmonized, their availability in existing variant databases was analyzed and pathogenicity assignments were compared with impact algorithm predictions. 24% of the pathogenic variants found in PubMed articles were not captured in any database used in this analysis while only 65% of the published variants received an accurate pathogenicity prediction from at least one impact prediction algorithm.

CONCLUSIONS

Despite being published in the literature, pathogenicity data on patient variants may remain inaccessible for genetic diagnosis, therapeutic target identification, mechanistic understanding, or hypothesis generation. Clinical and functional details presented in the literature are important to make pathogenicity assessments. Impact predictions remain imperfect but are improving, especially for single nucleotide exonic variants, however such predictions are less accurate or unavailable for intronic and multi-nucleotide variants. Developing text mining workflows that use natural language processing for identifying diseases, genes and variants, along with impact prediction algorithms and integrating with details on clinical phenotypes and functional assessments might be a promising approach to scale literature mining of variants and assigning correct pathogenicity. The curated variants list created by this effort includes context details to improve any such efforts on variant curation for rare diseases.

Collapse

Leaman R, Islamaj R, Adams V, Alliheedi MA, Almeida JR, Antunes R, Bevan R, Chang YC, Erdengasileng A, Hodgskiss M, Ida R, Kim H, Li K, Mercer RE, Mertová L, Mobasher G, Shin HC, Sung M, Tsujimura T, Yeh WC, Lu Z. Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII. Database (Oxford) 2023;2023:7071696. [PMID: 36882099 PMCID: PMC9991492 DOI: 10.1093/database/baad005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 01/06/2023] [Accepted: 02/15/2023] [Indexed: 03/09/2023]

Abstract

The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and-as highlighted during the coronavirus disease 2019 pandemic-their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 F-score (0.8759 precision and 0.8587 recall) for strict NER performance and 0.8136 F-score (0.8621 precision and 0.7702 recall) for strict normalization performance. The highest performance achieved for the chemical indexing task was 0.6073 F-score (0.7417 precision and 0.5141 recall). This community challenge demonstrated that (i) the current substantial achievements in deep learning technologies can be utilized to improve automated prediction accuracy further and (ii) the chemical indexing task is substantially more challenging. We look forward to further developing biomedical text-mining methods to respond to the rapid growth of biomedical literature. The NLM-Chem track dataset and other challenge materials are publicly available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/.

Collapse

Affiliation(s)

Robert Leaman
Rezarta Islamaj
Virginia Adams NVIDIA, 2788 San Tomas Expressway, Santa Clara, CA 95051, USA
Mohammed A Alliheedi Department of Computer Science, Al Baha University, 4781 King Fahd Rd, Al Aqiq 65779, Saudi Arabia
João Rafael Almeida Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal Department of Information and Communications Technologies, University of A Coruña, Camiño do Lagar de Castro, A Coruña 15008, Spain
Rui Antunes Department of Electronics, Telecommunications and Informatics (DETI), Institute of Electronics and Informatics Engineering of Aveiro (IEETA), University of Aveiro, Campus Universitário de Santiago, Aveiro 3810-193, Portugal
Robert Bevan Informatics Department, Medicines Discovery Catapult, Alderley Park, Block 35, Mereside, Macclesfield SK10 4ZF, UK
Yung-Chun Chang Graduate Institute of Data Science, Taipei Medical University, No. 172-1, Section 2, Keelung Rd, Da’an District, Taipei City , Taipei 106, Taiwan
Arslan Erdengasileng Department of Statistics, Florida State University, 117 N. Woodward Ave, Tallahassee, FL 32306, USA
Matthew Hodgskiss Informatics Department, Medicines Discovery Catapult, Alderley Park, Block 35, Mereside, Macclesfield SK10 4ZF, UK
Ryuki Ida Computational Intelligence Laboratory, Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, Aichi 468-8511, Japan
Hyunjae Kim Department of Computer Science and Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea
Keqiao Li Department of Statistics, Florida State University, 117 N. Woodward Ave, Tallahassee, FL 32306, USA
Robert E Mercer Department of Computer Science, The University of Western Ontario, Room 355, Middlesex College, Ontario , London N6A 5B7, Canada
Lukrécia Mertová Scientific Databases and Visualization Group, Heidelberg Institute for Theoretical Studies (HITS gGmbH), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany
Ghadeer Mobasher Scientific Databases and Visualization Group, Heidelberg Institute for Theoretical Studies (HITS gGmbH), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany Institute of Computer Science, Heidelberg University, Im Neuenheimer Feld 205, Heidelberg 69120, Germany
Hoo-Chang Shin NVIDIA, 2788 San Tomas Expressway, Santa Clara, CA 95051, USA
Mujeen Sung Department of Computer Science and Engineering, Korea University, 145 Anam-ro, Seongbuk-gu, Seoul 02841, South Korea
Tomoki Tsujimura Computational Intelligence Laboratory, Toyota Technological Institute, 2-12-1 Hisakata, Tempaku-ku, Nagoya, Aichi 468-8511, Japan
Wen-Chao Yeh Institute of Information Systems and Applications, National Tsing Hua University, No. 101, Section 2, Kuang-Fu Road, Hsinchu 30013, Taiwan
Zhiyong Lu *Corresponding author: Tel: +1-301-594-7089; Fax: +1-301-480-2290;

Collapse

Papadimitriou S, Gravel B, Nachtegael C, De Baere E, Loeys B, Vikkula M, Smits G, Lenaerts T. Toward reporting standards for the pathogenicity of variant combinations involved in multilocus/oligogenic diseases. HGG ADVANCES 2022;4:100165. [PMID: 36578772 PMCID: PMC9791921 DOI: 10.1016/j.xhgg.2022.100165] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open

Affiliation(s)

Sofia Papadimitriou Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,2Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium,3Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium,∗Corresponding author
Barbara Gravel Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,2Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium,3Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
Charlotte Nachtegael Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,2Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
Elfride De Baere Center for Medical Genetics, Ghent University Hospital, Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium
Bart Loeys Center for Medical Genetics, Antwerp University Hospital/University of Antwerp, 2650 Antwerp, Belgium
Miikka Vikkula Human Molecular Genetics, de Duve Institute, UCLouvain, Brussels, Belgium
Guillaume Smits Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,7Center of Human Genetics, Hôpital Erasme, Université Libre de Bruxelles, 1070 Brussels, Belgium,8Hôpital Universitaire des Enfants Reine Fabiola, Université Libre de Bruxelles, 1020 Brussels, Belgium
Tom Lenaerts Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium,2Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium,3Artificial Intelligence Laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium,∗∗Corresponding author

Collapse

Comprehensively identifying Long Covid articles with human-in-the-loop machine learning. PATTERNS (NEW YORK, N.Y.) 2022;4:100659. [PMID: 36471749 PMCID: PMC9712067 DOI: 10.1016/j.patter.2022.100659] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 09/19/2022] [Accepted: 11/17/2022] [Indexed: 12/05/2022]

Rodriguez-Esteban R. New reasons for biologists to write with a formal language. Database (Oxford) 2022;2022:6600538. [PMID: 35657112 PMCID: PMC9216469 DOI: 10.1093/database/baac039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 03/18/2022] [Accepted: 05/17/2022] [Indexed: 12/03/2022]

Lee K, Wei CH, Lu Z. Recent advances of automated methods for searching and extracting genomic variant information from biomedical literature. Brief Bioinform 2021;22:bbaa142. [PMID: 32770181 PMCID: PMC8138883 DOI: 10.1093/bib/bbaa142] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 06/07/2020] [Accepted: 06/25/2020] [Indexed: 12/28/2022] Open

Caufield JH, Fu J, Wang D, Guevara-Gonzalez V, Wang W, Ping P. A Second Look at FAIR in Proteomic Investigations. J Proteome Res 2021;20:2182-2186. [PMID: 33719446 DOI: 10.1021/acs.jproteome.1c00177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

De Silva K, Mathews N, Teede H, Forbes A, Jönsson D, Demmer RT, Enticott J. Clinical notes as prognostic markers of mortality associated with diabetes mellitus following critical care: A retrospective cohort analysis using machine learning and unstructured big data. Comput Biol Med 2021;132:104305. [PMID: 33705995 DOI: 10.1016/j.compbiomed.2021.104305] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 02/23/2021] [Accepted: 02/27/2021] [Indexed: 12/14/2022]

Abstract

BACKGROUND

Clinical notes are ubiquitous resources offering potential value in optimizing critical care via data mining technologies.

OBJECTIVE

To determine the predictive value of clinical notes as prognostic markers of 1-year all-cause mortality among people with diabetes following critical care.

MATERIALS AND METHODS

Mortality of diabetes patients were predicted using three cohorts of clinical text in a critical care database, written by physicians (n = 45253), nurses (159027), and both (n = 204280). Natural language processing was used to pre-process text documents and LASSO-regularized logistic regression models were trained and tested. Confusion matrix metrics of each model were calculated and AUROC estimates between models were compared. All predictive words and corresponding coefficients were extracted. Outcome probability associated with each text document was estimated.

RESULTS

Models built on clinical text of physicians, nurses, and the combined cohort predicted mortality with AUROC of 0.996, 0.893, and 0.922, respectively. Predictive performance of the models significantly differed from one another whereas inter-rater reliability ranged from substantial to almost perfect across them. Number of predictive words with non-zero coefficients were 3994, 8159, and 10579, respectively, in the models of physicians, nurses, and the combined cohort. Physicians' and nursing notes, both individually and when combined, strongly predicted 1-year all-cause mortality among people with diabetes following critical care.

CONCLUSION

Clinical notes of physicians and nurses are strong and novel prognostic markers of diabetes-associated mortality in critical care, offering potentially generalizable and scalable applications. Clinical text-derived personalized risk estimates of prognostic outcomes such as mortality could be used to optimize patient care.

Collapse

Chen Q, Allot A, Lu Z. LitCovid: an open database of COVID-19 literature. Nucleic Acids Res 2021;49:D1534-D1540. [PMID: 33166392 PMCID: PMC7778958 DOI: 10.1093/nar/gkaa952] [Citation(s) in RCA: 137] [Impact Index Per Article: 45.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 10/02/2020] [Accepted: 10/08/2020] [Indexed: 12/22/2022] Open