Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wagholikar KB, Estiri H, Murphy M, Murphy SN. Polar labeling: silver standard algorithm for training disease classifiers. Bioinformatics 2020;36:3200-3206. [PMID: 32049335 PMCID: PMC7214041 DOI: 10.1093/bioinformatics/btaa088] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 01/30/2020] [Accepted: 02/04/2020] [Indexed: 01/29/2023] Open

For:	Wagholikar KB, Estiri H, Murphy M, Murphy SN. Polar labeling: silver standard algorithm for training disease classifiers. Bioinformatics 2020;36:3200-3206. [PMID: 32049335 PMCID: PMC7214041 DOI: 10.1093/bioinformatics/btaa088] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 01/30/2020] [Accepted: 02/04/2020] [Indexed: 01/29/2023] Open

Number

Cited by Other Article(s)

Liao Y, Liu H, Spasić I. Fine-tuning coreference resolution for different styles of clinical narratives. J Biomed Inform 2024;149:104578. [PMID: 38122841 DOI: 10.1016/j.jbi.2023.104578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 11/22/2023] [Accepted: 12/12/2023] [Indexed: 12/23/2023]

Abstract

OBJECTIVE

Coreference resolution (CR) is a natural language processing (NLP) task that is concerned with finding all expressions within a single document that refer to the same entity. This makes it crucial in supporting downstream NLP tasks such as summarization, question answering and information extraction. Despite great progress in CR, our experiments have highlighted a substandard performance of the existing open-source CR tools in the clinical domain. We set out to explore some practical solutions to fine-tune their performance on clinical data.

METHODS

We first explored the possibility of automatically producing silver standards following the success of such an approach in other clinical NLP tasks. We designed an ensemble approach that leverages multiple models to automatically annotate co-referring mentions. Subsequently, we looked into other ways of incorporating human feedback to improve the performance of an existing neural network approach. We proposed a semi-automatic annotation process to facilitate the manual annotation process. We also compared the effectiveness of active learning relative to random sampling in an effort to further reduce the cost of manual annotation.

RESULTS

Our experiments demonstrated that the silver standard approach was ineffective in fine-tuning the CR models. Our results indicated that active learning should also be applied with caution. The semi-automatic annotation approach combined with continued training was found to be well suited for the rapid transfer of CR models under low-resource conditions. The ensemble approach demonstrated a potential to further improve accuracy by leveraging multiple fine-tuned models.

CONCLUSION

Overall, we have effectively transferred a general CR model to a clinical domain. Our findings based on extensive experimentation have been summarized into practical suggestions for rapid transferring of CR models across different styles of clinical narratives.

Collapse

Nogues IE, Wen J, Lin Y, Liu M, Tedeschi SK, Geva A, Cai T, Hong C. Weakly Semi-supervised phenotyping using Electronic Health records. J Biomed Inform 2022;134:104175. [PMID: 36064111 PMCID: PMC10112494 DOI: 10.1016/j.jbi.2022.104175] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 04/23/2022] [Accepted: 08/15/2022] [Indexed: 01/07/2023]

Abstract

OBJECTIVE

Electronic Health Record (EHR) based phenotyping is a crucial yet challenging problem in the biomedical field. Though clinicians typically determine patient-level diagnoses via manual chart review, the sheer volume and heterogeneity of EHR data renders such tasks challenging, time-consuming, and prohibitively expensive, thus leading to a scarcity of clinical annotations in EHRs. Weakly supervised learning algorithms have been successfully applied to various EHR phenotyping problems, due to their ability to leverage information from large quantities of unlabeled samples to better inform predictions based on a far smaller number of patients. However, most weakly supervised methods are subject to the challenge to choose the right cutoff value to generate an optimal classifier. Furthermore, since they only utilize the most informative features (i.e., main ICD and NLP counts) they may fail for episodic phenotypes that cannot be consistently detected via ICD and NLP data. In this paper, we propose a label-efficient, weakly semi-supervised deep learning algorithm for EHR phenotyping (WSS-DL), which overcomes the limitations above.

MATERIALS AND METHODS

WSS-DL classifies patient-level disease status through a series of learning stages: 1) generating silver standard labels, 2) deriving enhanced-silver-standard labels by fitting a weakly supervised deep learning model to data with silver standard labels as outcomes and high dimensional EHR features as input, and 3) obtaining the final prediction score and classifier by fitting a supervised learning model to data with a minimal number of gold standard labels as the outcome, and the enhanced-silver-standard labels and a minimal set of most informative EHR features as input. To assess the generalizability of WSS-DL across different phenotypes and medical institutions, we apply WSS-DL to classify a total of 17 diseases, including both acute and chronic conditions, using EHR data from three healthcare systems. Additionally, we determine the minimum quantity of training labels required by WSS-DL to outperform existing supervised and semi-supervised phenotyping methods.

RESULTS

The proposed method, in combining the strengths of deep learning and weakly semi-supervised learning, successfully leverages the crucial phenotyping information contained in EHR features from unlabeled samples. Indeed, the deep learning model's ability to handle high-dimensional EHR features allows it to generate strong phenotype status predictions from silver standard labels. These predictions, in turn, provide highly effective features in the final logistic regression stage, leading to high phenotyping accuracy in notably small subsets of labeled data (e.g. n = 40 labeled samples).

CONCLUSION

Our method's high performance in EHR datasets with very small numbers of labels indicates its potential value in aiding doctors to diagnose rare diseases as well as conditions susceptible to misdiagnosis.

Collapse

De Freitas JK, Johnson KW, Golden E, Nadkarni GN, Dudley JT, Bottinger EP, Glicksberg BS, Miotto R. Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records. PATTERNS (NEW YORK, N.Y.) 2021;2:100337. [PMID: 34553174 PMCID: PMC8441576 DOI: 10.1016/j.patter.2021.100337] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 06/30/2021] [Accepted: 08/05/2021] [Indexed: 11/23/2022]

Affiliation(s)

Jessica K. De Freitas Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Kipp W. Johnson Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Eddye Golden Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Girish N. Nadkarni Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Medicine, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Joel T. Dudley Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Erwin P. Bottinger Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Medicine, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Digital Health Center at Hasso Plattner Institute, University of Potsdam, Professor-Dr.-Helmert-Str 2–3, 14482 Potsdam, Germany
Benjamin S. Glicksberg Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Riccardo Miotto Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA

Collapse

Ferté T, Cossin S, Schaeverbeke T, Barnetche T, Jouhet V, Hejblum BP. Automatic phenotyping of electronical health record: PheVis algorithm. J Biomed Inform 2021;117:103746. [PMID: 33746080 DOI: 10.1016/j.jbi.2021.103746] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 03/02/2021] [Accepted: 03/05/2021] [Indexed: 11/18/2022]

Estiri H, Vasey S, Murphy SN. Generative transfer learning for measuring plausibility of EHR diagnosis records. J Am Med Inform Assoc 2021;28:559-568. [PMID: 33043366 PMCID: PMC7936395 DOI: 10.1093/jamia/ocaa215] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Accepted: 08/18/2020] [Indexed: 12/12/2022] Open

Cloud Services for Patient Cohort Identification Using the Informatics for Integrating Biology and the Bedside Platform. BIOMED RESEARCH INTERNATIONAL 2020;2020:2851713. [PMID: 32724799 PMCID: PMC7366204 DOI: 10.1155/2020/2851713] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Revised: 06/08/2020] [Accepted: 06/15/2020] [Indexed: 11/17/2022]

Estiri H, Strasser ZH, Klann JG, McCoy TH, Wagholikar KB, Vasey S, Castro VM, Murphy ME, Murphy SN. Transitive Sequencing Medical Records for Mining Predictive and Interpretable Temporal Representations. PATTERNS (NEW YORK, N.Y.) 2020;1:100051. [PMID: 32835307 PMCID: PMC7301790 DOI: 10.1016/j.patter.2020.100051] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Revised: 04/27/2020] [Accepted: 05/26/2020] [Indexed: 12/13/2022]

Affiliation(s)

Hossein Estiri Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Zachary H. Strasser Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
Jeffery G. Klann Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Thomas H. McCoy Harvard Medical School, Boston, MA 02115, USA Center for Quantitative Health, Massachusetts General Hospital, Boston, MA 02114, USA
Kavishwar B. Wagholikar Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA
Sebastien Vasey Department of Mathematics, Harvard University, Cambridge, MA 02138, USA
Victor M. Castro Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
MaryKate E. Murphy Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA
Shawn N. Murphy Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA Harvard Medical School, Boston, MA 02115, USA Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA Department of Neurology, Massachusetts General Hospital, Boston, MA 02114, USA

Collapse