1
|
Yan S, Luo L, Lai PT, Veltri D, Oler AJ, Xirasagar S, Ghosh R, Similuk M, Robinson PN, Lu Z. PhenoRerank: A re-ranking model for phenotypic concept recognition pre-trained on human phenotype ontology. J Biomed Inform 2022; 129:104059. [PMID: 35351638 PMCID: PMC11040548 DOI: 10.1016/j.jbi.2022.104059] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 02/23/2022] [Accepted: 03/22/2022] [Indexed: 11/29/2022]
Abstract
The study aims at developing a neural network model to improve the performance of Human Phenotype Ontology (HPO) concept recognition tools. We used the terms, definitions, and comments about the phenotypic concepts in the HPO database to train our model. The document to be analyzed is first split into sentences and annotated with a base method to generate candidate concepts. The sentences, along with the candidate concepts, are then fed into the pre-trained model for re-ranking. Our model comprises the pre-trained BlueBERT and a feature selection module, followed by a contrastive loss. We re-ranked the results generated by three robust HPO annotation tools and compared the performance against most of the existing approaches. The experimental results show that our model can improve the performance of the existing methods. Significantly, it boosted 3.0% and 5.6% in F1 score on the two evaluated datasets compared with the base methods. It removed more than 80% of the false positives predicted by the base methods, resulting in up to 18% improvement in precision. Our model utilizes the descriptive data in the ontology and the contextual information in the sentences for re-ranking. The results indicate that the additional information and the re-ranking model can significantly enhance the precision of HPO concept recognition compared with the base method.
Collapse
Affiliation(s)
- Shankai Yan
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Ling Luo
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Po-Ting Lai
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA
| | - Daniel Veltri
- Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Andrew J Oler
- Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Sandhya Xirasagar
- Bioinformatics and Computational Biosciences Branch, Office of Cyber Infrastructure and Computational Biology, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Rajarshi Ghosh
- Centralized Sequencing Program, Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Morgan Similuk
- Centralized Sequencing Program, Division of Intramural Research, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Bethesda, MD 20892, USA
| | - Peter N Robinson
- The Jackson Laboratory for Genomic Medicine, Farmington, CT 06032, USA
| | - Zhiyong Lu
- National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), Bethesda, MD, USA.
| |
Collapse
|
2
|
Li T, Zhou HP, Zhou ZJ, Guo LQ, Zhou L. Computed tomography-identified phenotypes of small airway obstructions in chronic obstructive pulmonary disease. Chin Med J (Engl) 2021; 134:2025-2036. [PMID: 34517376 PMCID: PMC8440009 DOI: 10.1097/cm9.0000000000001724] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Indexed: 12/02/2022] Open
Abstract
ABSTRACT Chronic obstructive pulmonary disease (COPD) is a heterogeneous disease characteristic of small airway inflammation, obstruction, and emphysema. It is well known that spirometry alone cannot differentiate each separate component. Computed tomography (CT) is widely used to determine the extent of emphysema and small airway involvement in COPD. Compared with the pulmonary function test, small airway CT phenotypes can accurately reflect disease severity in patients with COPD, which is conducive to improving the prognosis of this disease. CT measurement of central airway morphology has been applied in clinical, epidemiologic, and genetic investigations as an inference of the presence and severity of small airway disease. This review will focus on presenting the current knowledge and methodologies in chest CT that aid in identifying discrete COPD phenotypes.
Collapse
Affiliation(s)
- Tao Li
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital, Nanjing Medical University, Nanjing, Jiangsu 210029, China
- Department of Respiratory Medicine, Xuzhou First People's Hospital, Xuzhou, Jiangsu 221116, China
| | - Hao-Peng Zhou
- Department of Medicine, Jiangsu University School of Medicine, Zhenjiang, Jiangsu 212013, China
| | - Zhi-Jun Zhou
- Institute of Radio Frequency & Optical Electronics-Integrated Circuits, School of Information and Engineering, Southeast University, Nanjing, Jiangsu 210096, China
| | - Li-Quan Guo
- Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, Jiangsu 215163, China
| | - Linfu Zhou
- Department of Respiratory and Critical Care Medicine, The First Affiliated Hospital, Nanjing Medical University, Nanjing, Jiangsu 210029, China
- Institute of Integrative Medicine, Nanjing Medical University, Nanjing, Jiangsu 210029, China
| |
Collapse
|
3
|
Alnazzawi N. Building a semantically annotated corpus for chronic disease complications using two document types. PLoS One 2021; 16:e0247319. [PMID: 33735207 PMCID: PMC7971867 DOI: 10.1371/journal.pone.0247319] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 02/04/2021] [Indexed: 11/19/2022] Open
Abstract
Narrative information in electronic health records (EHRs) contains a wealth of information related to patient health conditions. In addition, people use Twitter to express their experiences regarding personal health issues, such as medical complaints, symptoms, treatments, lifestyle, and other factors. Both genres of text include different types of health-related information concerning disease complications and risk factors. Knowing detailed information about controlling disease risk factors has a great impact on modifying these risks and subsequently preventing disease complications. Text-mining tools provide efficient solutions to extract and integrate vital information related to disease complications hidden in the large volume of the narrative text. However, the development of text-mining tools depends on the availability of an annotated corpus. In response, we have developed the PrevComp corpus, which is annotated with information relevant to the identification of disease complications, underlying risk factors, and prevention measures, in the context of the interaction between hypertension and diabetes. The corpus is unique and novel in terms of the very specific topic in the biomedical domain and as an integration of information from both EHRs and tweets collected from Twitter. The annotation scheme was designed with guidance by a domain expert, and two further domain experts performed the annotation, resulting in a high-quality annotation, with agreement rate F-scores as high as 0.60 and 0.75 for EHRs and tweets, respectively.
Collapse
Affiliation(s)
- Noha Alnazzawi
- Department of Computer Science and Engineering, Royal Commission for Jubail and Yanbu, Yanbu University College, Yanbu Industrial City, Saudi Arabia
- * E-mail:
| |
Collapse
|