1
|
Goli R, Hubig N, Min H, Gong Y, Sittig DF, Rennert L, Robinson D, Biondich P, Wright A, Nøhr C, Law T, Faxvaag A, Weaver A, Gimbel R, Jing X. Keyphrase Identification Using Minimal Labeled Data with Hierarchical Context and Transfer Learning. medRxiv 2023:2023.01.26.23285060. [PMID: 37292830 PMCID: PMC10246160 DOI: 10.1101/2023.01.26.23285060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Interoperable clinical decision support system (CDSS) rules provide a pathway to interoperability, a well-recognized challenge in health information technology. Building an ontology facilitates creating interoperable CDSS rules, which can be achieved by identifying the keyphrases (KP) from the existing literature. However, KP identification for data labeling requires human expertise, consensus, and contextual understanding. This paper aims to present a semi-supervised KP identification framework using minimal labeled data based on hierarchical attention over the documents and domain adaptation. Our method outperforms the prior neural architectures by learning through synthetic labels for initial training, document-level contextual learning, language modeling, and fine-tuning with limited gold standard label data. To the best of our knowledge, this is the first functional framework for the CDSS sub-domain to identify KPs, which is trained on limited labeled data. It contributes to the general natural language processing (NLP) architectures in areas such as clinical NLP, where manual data labeling is challenging, and light-weighted deep learning models play a role in real-time KP identification as a complementary approach to human experts' effort.
Collapse
Affiliation(s)
- Rohan Goli
- School of Computing, College of Engineering, Computing and Applied Science, Clemson University, Clemson, SC, USA
| | - Nina Hubig
- School of Computing, College of Engineering, Computing and Applied Science, Clemson University, Clemson, SC, USA
| | - Hua Min
- Department of Health Administration and Policy, College of Public Health, George Mason University, Fairfax, VA, USA
| | - Yang Gong
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Dean F. Sittig
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Lior Rennert
- Department of Public Health Sciences, College of Behavioral, Social, and Health Sciences, Clemson University, Clemson, SC, USA
| | - David Robinson
- General Practitioner/Independent Consultant, Cumbria, UK
| | - Paul Biondich
- Clem McDonald Biomedical Informatics Center, Regenstrief Institute, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Adam Wright
- Vanderbilt University Medical Center, Nashville, TN, USA
| | - Christian Nøhr
- Department of Planning, Faculty of Engineering, Aalborg University, Aalborg, Denmark
| | - Timothy Law
- Ohio Musculoskeletal and Neurologic Institute, Ohio University, Athens, OH, USA
| | - Arild Faxvaag
- Department of Neuromedicine and Movement Science, Faculty of Medicine and Health Sciences, Norwegian University of Science and Technology, Trondheim, Norway
| | - Aneesa Weaver
- Department of Public Health Sciences, College of Behavioral, Social, and Health Sciences, Clemson University, Clemson, SC, USA
| | - Ronald Gimbel
- Department of Public Health Sciences, College of Behavioral, Social, and Health Sciences, Clemson University, Clemson, SC, USA
| | - Xia Jing
- Department of Public Health Sciences, College of Behavioral, Social, and Health Sciences, Clemson University, Clemson, SC, USA
| |
Collapse
|