1
|
Zheng L, Perl Y, He Y. Big knowledge visualization of the COVID-19 CIDO ontology evolution. BMC Med Inform Decis Mak 2023; 23:88. [PMID: 37161560 PMCID: PMC10169115 DOI: 10.1186/s12911-023-02184-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 04/20/2023] [Indexed: 05/11/2023] Open
Abstract
BACKGROUND The extensive international research for medications and vaccines for the devastating COVID-19 pandemic requires a standard reference ontology. Among the current COVID-19 ontologies, the Coronavirus Infectious Disease Ontology (CIDO) is the largest one. Furthermore, it keeps growing very frequently. Researchers using CIDO as a reference ontology, need a quick update about the content added in a recent release to know how relevant the new concepts are to their research needs. Although CIDO is only a medium size ontology, it is still a large knowledge base posing a challenge for a user interested in obtaining the "big picture" of content changes between releases. Both a theoretical framework and a proper visualization are required to provide such a "big picture". METHODS The child-of-based layout of the weighted aggregate partial-area taxonomy summarization network (WAT) provides a "big picture" convenient visualization of the content of an ontology. In this paper we address the "big picture" of content changes between two releases of an ontology. We introduce a new DIFF framework named Diff Weighted Aggregate Taxonomy (DWAT) to display the differences between the WATs of two releases of an ontology. We use a layered approach which consists first of a DWAT of major subjects in CIDO, and then drill down a major subject of interest in the top-level DWAT to obtain a DWAT of secondary subjects and even further refined layers. RESULTS A visualization of the Diff Weighted Aggregate Taxonomy is demonstrated on the CIDO ontology. The evolution of CIDO between 2020 and 2022 is demonstrated in two perspectives. Drilling down for a DWAT of secondary subject networks is also demonstrated. We illustrate how the DWAT of CIDO provides insight into its evolution. CONCLUSIONS The new Diff Weighted Aggregate Taxonomy enables a layered approach to view the "big picture" of the changes in the content between two releases of an ontology.
Collapse
Affiliation(s)
- Ling Zheng
- Computer Science and Software Engineering Department, Monmouth University, West Long Branch, NJ, USA.
| | - Yehoshua Perl
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Yongqun He
- Unit for Laboratory Animal Medicine, Department of Microbiology and Immunology, and Center for Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA
| |
Collapse
|
2
|
Keloth VK, Zhou S, Lindemann L, Zheng L, Elhanan G, Einstein AJ, Geller J, Perl Y. Mining of EHR for interface terminology concepts for annotating EHRs of COVID patients. BMC Med Inform Decis Mak 2023; 23:40. [PMID: 36829139 PMCID: PMC9951157 DOI: 10.1186/s12911-023-02136-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 02/09/2023] [Indexed: 02/26/2023] Open
Abstract
BACKGROUND Two years into the COVID-19 pandemic and with more than five million deaths worldwide, the healthcare establishment continues to struggle with every new wave of the pandemic resulting from a new coronavirus variant. Research has demonstrated that there are variations in the symptoms, and even in the order of symptom presentations, in COVID-19 patients infected by different SARS-CoV-2 variants (e.g., Alpha and Omicron). Textual data in the form of admission notes and physician notes in the Electronic Health Records (EHRs) is rich in information regarding the symptoms and their orders of presentation. Unstructured EHR data is often underutilized in research due to the lack of annotations that enable automatic extraction of useful information from the available extensive volumes of textual data. METHODS We present the design of a COVID Interface Terminology (CIT), not just a generic COVID-19 terminology, but one serving a specific purpose of enabling automatic annotation of EHRs of COVID-19 patients. CIT was constructed by integrating existing COVID-related ontologies and mining additional fine granularity concepts from clinical notes. The iterative mining approach utilized the techniques of 'anchoring' and 'concatenation' to identify potential fine granularity concepts to be added to the CIT. We also tested the generalizability of our approach on a hold-out dataset and compared the annotation coverage to the coverage obtained for the dataset used to build the CIT. RESULTS Our experiments demonstrate that this approach results in higher annotation coverage compared to existing ontologies such as SNOMED CT and Coronavirus Infectious Disease Ontology (CIDO). The final version of CIT achieved about 20% more coverage than SNOMED CT and 50% more coverage than CIDO. In the future, the concepts mined and added into CIT could be used as training data for machine learning models for mining even more concepts into CIT and further increasing the annotation coverage. CONCLUSION In this paper, we demonstrated the construction of a COVID interface terminology that can be utilized for automatically annotating EHRs of COVID-19 patients. The techniques presented can identify frequently documented fine granularity concepts that are missing in other ontologies thereby increasing the annotation coverage.
Collapse
Affiliation(s)
- Vipina K Keloth
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
| | - Shuxin Zhou
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Luke Lindemann
- School of Medicine and Health Sciences, The George Washington University, Washington (D.C.), USA
| | - Ling Zheng
- Computer Science and Software Engineering Department, Monmouth University, West Long Branch, NJ, USA
| | - Gai Elhanan
- Renown Institute for Health Innovation, Desert Research Institute, Reno, NV, USA
| | - Andrew J Einstein
- Cardiology Division, Department of Medicine, Columbia University Irving Medical Center, New York, NY, USA
- Department of Radiology, Columbia University Irving Medical Center, New York, NY, USA
| | - James Geller
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Yehoshua Perl
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| |
Collapse
|
3
|
He Y, Yu H, Huffman A, Lin AY, Natale DA, Beverley J, Zheng L, Perl Y, Wang Z, Liu Y, Ong E, Wang Y, Huang P, Tran L, Du J, Shah Z, Shah E, Desai R, Huang HH, Tian Y, Merrell E, Duncan WD, Arabandi S, Schriml LM, Zheng J, Masci AM, Wang L, Liu H, Smaili FZ, Hoehndorf R, Pendlington ZM, Roncaglia P, Ye X, Xie J, Tang YW, Yang X, Peng S, Zhang L, Chen L, Hur J, Omenn GS, Athey B, Smith B. A comprehensive update on CIDO: the community-based coronavirus infectious disease ontology. J Biomed Semantics 2022; 13:25. [PMID: 36271389 PMCID: PMC9585694 DOI: 10.1186/s13326-022-00279-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 09/13/2022] [Indexed: 11/24/2022] Open
Abstract
Background The current COVID-19 pandemic and the previous SARS/MERS outbreaks of 2003 and 2012 have resulted in a series of major global public health crises. We argue that in the interest of developing effective and safe vaccines and drugs and to better understand coronaviruses and associated disease mechenisms it is necessary to integrate the large and exponentially growing body of heterogeneous coronavirus data. Ontologies play an important role in standard-based knowledge and data representation, integration, sharing, and analysis. Accordingly, we initiated the development of the community-based Coronavirus Infectious Disease Ontology (CIDO) in early 2020. Results As an Open Biomedical Ontology (OBO) library ontology, CIDO is open source and interoperable with other existing OBO ontologies. CIDO is aligned with the Basic Formal Ontology and Viral Infectious Disease Ontology. CIDO has imported terms from over 30 OBO ontologies. For example, CIDO imports all SARS-CoV-2 protein terms from the Protein Ontology, COVID-19-related phenotype terms from the Human Phenotype Ontology, and over 100 COVID-19 terms for vaccines (both authorized and in clinical trial) from the Vaccine Ontology. CIDO systematically represents variants of SARS-CoV-2 viruses and over 300 amino acid substitutions therein, along with over 300 diagnostic kits and methods. CIDO also describes hundreds of host-coronavirus protein-protein interactions (PPIs) and the drugs that target proteins in these PPIs. CIDO has been used to model COVID-19 related phenomena in areas such as epidemiology. The scope of CIDO was evaluated by visual analysis supported by a summarization network method. CIDO has been used in various applications such as term standardization, inference, natural language processing (NLP) and clinical data integration. We have applied the amino acid variant knowledge present in CIDO to analyze differences between SARS-CoV-2 Delta and Omicron variants. CIDO's integrative host-coronavirus PPIs and drug-target knowledge has also been used to support drug repurposing for COVID-19 treatment. Conclusion CIDO represents entities and relations in the domain of coronavirus diseases with a special focus on COVID-19. It supports shared knowledge representation, data and metadata standardization and integration, and has been used in a range of applications. Supplementary Information The online version contains supplementary material available at 10.1186/s13326-022-00279-z.
Collapse
Affiliation(s)
- Yongqun He
- University of Michigan Medical School, Ann Arbor, MI, USA.
| | - Hong Yu
- People's Hospital of Guizhou Province, Guiyang, Guizhou, China.
| | | | - Asiyah Yu Lin
- National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA.,National Center for Ontological Research, Buffalo, NY, USA
| | | | - John Beverley
- National Center for Ontological Research, Buffalo, NY, USA.,The Johns Hopkins University Applied Physics Laboratory, Laurel, MD, USA
| | - Ling Zheng
- Computer Science and Software Engineering Department, Monmouth University, West Long Branch, NJ, USA
| | - Yehoshua Perl
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ, USA
| | - Zhigang Wang
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing, China
| | - Yingtong Liu
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Edison Ong
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Yang Wang
- University of Michigan Medical School, Ann Arbor, MI, USA.,People's Hospital of Guizhou Province, Guiyang, Guizhou, China
| | - Philip Huang
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Long Tran
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Jinyang Du
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Zalan Shah
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Easheta Shah
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Roshan Desai
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Hsin-Hui Huang
- University of Michigan Medical School, Ann Arbor, MI, USA.,National Yang-Ming University, Taipei, Taiwan
| | - Yujia Tian
- Rutgers University, New Brunswick, NJ, USA
| | | | | | | | - Lynn M Schriml
- University of Maryland School of Medicine, Baltimore, MD, USA
| | - Jie Zheng
- Department of Biology, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Anna Maria Masci
- Office of Data Science, National Institute of Environmental Health Sciences, Research Triangle Park, NC, USA
| | | | | | | | - Robert Hoehndorf
- King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Zoë May Pendlington
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| | - Paola Roncaglia
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
| | - Xianwei Ye
- People's Hospital of Guizhou Province, Guiyang, Guizhou, China
| | - Jiangan Xie
- School of Bioinformatics, Chongqing University of Posts and Telecommunications, Chongqing, China
| | - Yi-Wei Tang
- Cepheid, Danaher Diagnostic Platform, Shanghai, China
| | - Xiaolin Yang
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences & School of Basic Medicine, Peking Union Medical College, Beijing, China
| | - Suyuan Peng
- National Institute of Health Data Science, Peking University, Beijing, China
| | - Luxia Zhang
- National Institute of Health Data Science, Peking University, Beijing, China
| | - Luonan Chen
- Shanghai Institute of Biochemistry and Cell Biology, Chinese Academy of Sciences, Shanghai, China
| | - Junguk Hur
- University of North Dakota School of Medicine and Health Sciences, Grand Forks, ND, USA
| | | | - Brian Athey
- University of Michigan Medical School, Ann Arbor, MI, USA
| | - Barry Smith
- National Center for Ontological Research, Buffalo, NY, USA.,University at Buffalo, Buffalo, NY, 14260, USA
| |
Collapse
|