Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ho JC, Ghosh J, Steinhubl SR, Stewart WF, Denny JC, Malin BA, Sun J. Limestone: high-throughput candidate phenotype generation via tensor factorization. J Biomed Inform 2014;52:199-211. [PMID: 25038555 DOI: 10.1016/j.jbi.2014.07.001] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Revised: 05/14/2014] [Accepted: 07/02/2014] [Indexed: 12/22/2022]

For:	Ho JC, Ghosh J, Steinhubl SR, Stewart WF, Denny JC, Malin BA, Sun J. Limestone: high-throughput candidate phenotype generation via tensor factorization. J Biomed Inform 2014;52:199-211. [PMID: 25038555 DOI: 10.1016/j.jbi.2014.07.001] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2013] [Revised: 05/14/2014] [Accepted: 07/02/2014] [Indexed: 12/22/2022]

Number

Cited by Other Article(s)

Ding S, Zhang S, Hu X, Zou N. Identify and mitigate bias in electronic phenotyping: A comprehensive study from computational perspective. J Biomed Inform 2024;156:104671. [PMID: 38876452 DOI: 10.1016/j.jbi.2024.104671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 05/26/2024] [Accepted: 06/05/2024] [Indexed: 06/16/2024]

Oh W, Jayaraman P, Tandon P, Chaddha US, Kovatch P, Charney AW, Glicksberg BS, Nadkarni GN. A novel method leveraging time series data to improve subphenotyping and application in critically ill patients with COVID-19. Artif Intell Med 2024;148:102750. [PMID: 38325922 PMCID: PMC10864255 DOI: 10.1016/j.artmed.2023.102750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 12/12/2023] [Accepted: 12/14/2023] [Indexed: 02/09/2024]

Affiliation(s)

Wonsuk Oh Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
Pushkala Jayaraman Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Pranai Tandon Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Udit S Chaddha Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Patricia Kovatch Department of Scientific Computing, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Alexander W Charney Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY, USA
Benjamin S Glicksberg Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Character Biosciences, New York, NY, USA
Girish N Nadkarni Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Data-Driven and Digital Medicine, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Collapse

Lin J, Ngiam KY. How data science and AI-based technologies impact genomics. Singapore Med J 2023;64:59-66. [PMID: 36722518 PMCID: PMC9979798 DOI: 10.4103/singaporemedj.smj-2021-438] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Data-Driven Phenotyping of Alzheimer's Disease under Epigenetic Conditions Using Partial Volume Correction of PET Studies and Manifold Learning. Biomedicines 2023;11:biomedicines11020273. [PMID: 36830810 PMCID: PMC9953610 DOI: 10.3390/biomedicines11020273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 01/10/2023] [Accepted: 01/16/2023] [Indexed: 01/20/2023] Open

Ahuja Y, Zou Y, Verma A, Buckeridge D, Li Y. MixEHR-Guided: A guided multi-modal topic modeling approach for large-scale automatic phenotyping using the electronic health record. J Biomed Inform 2022;134:104190. [PMID: 36058522 DOI: 10.1016/j.jbi.2022.104190] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2022] [Revised: 08/27/2022] [Accepted: 08/28/2022] [Indexed: 01/18/2023]

Abstract

Electronic Health Records (EHRs) contain rich clinical data collected at the point of the care, and their increasing adoption offers exciting opportunities for clinical informatics, disease risk prediction, and personalized treatment recommendation. However, effective use of EHR data for research and clinical decision support is often hampered by a lack of reliable disease labels. To compile gold-standard labels, researchers often rely on clinical experts to develop rule-based phenotyping algorithms from billing codes and other surrogate features. This process is tedious and error-prone due to recall and observer biases in how codes and measures are selected, and some phenotypes are incompletely captured by a handful of surrogate features. To address this challenge, we present a novel automatic phenotyping model called MixEHR-Guided (MixEHR-G), a multimodal hierarchical Bayesian topic model that efficiently models the EHR generative process by identifying latent phenotype structure in the data. Unlike existing topic modeling algorithms wherein the inferred topics are not identifiable, MixEHR-G uses prior information from informative surrogate features to align topics with known phenotypes. We applied MixEHR-G to an openly-available EHR dataset of 38,597 intensive care patients (MIMIC-III) in Boston, USA and to administrative claims data for a population-based cohort (PopHR) of 1.3 million people in Quebec, Canada. Qualitatively, we demonstrate that MixEHR-G learns interpretable phenotypes and yields meaningful insights about phenotype similarities, comorbidities, and epidemiological associations. Quantitatively, MixEHR-G outperforms existing unsupervised phenotyping methods on a phenotype label annotation task, and it can accurately estimate relative phenotype prevalence functions without gold-standard phenotype information. Altogether, MixEHR-G is an important step towards building an interpretable and automated phenotyping system using EHR data.

Collapse

Ahmad FS, Luo Y, Wehbe RM, Thomas JD, Shah SJ. Advances in Machine Learning Approaches to Heart Failure with Preserved Ejection Fraction. Heart Fail Clin 2022;18:287-300. [PMID: 35341541 PMCID: PMC8983114 DOI: 10.1016/j.hfc.2021.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Cruz Navarro J, Ponce Mejia LL, Robertson C. A Precision Medicine Agenda in Traumatic Brain Injury. Front Pharmacol 2022;13:713100. [PMID: 35370671 PMCID: PMC8966615 DOI: 10.3389/fphar.2022.713100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 02/25/2022] [Indexed: 11/13/2022] Open

Niu K, Lu Y, Peng X, Zeng J. Fusion of Sequential Visits and Medical Ontology for Mortality Prediction. J Biomed Inform 2022;127:104012. [PMID: 35144001 DOI: 10.1016/j.jbi.2022.104012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 12/12/2021] [Accepted: 02/02/2022] [Indexed: 10/19/2022]

Tong L, Wu H, Wang MD, Wang G. Introduction of medical genomics and clinical informatics integration for p-Health care. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2022;190:1-37. [DOI: 10.1016/bs.pmbts.2022.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

De Freitas JK, Johnson KW, Golden E, Nadkarni GN, Dudley JT, Bottinger EP, Glicksberg BS, Miotto R. Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records. PATTERNS (NEW YORK, N.Y.) 2021;2:100337. [PMID: 34553174 PMCID: PMC8441576 DOI: 10.1016/j.patter.2021.100337] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 06/30/2021] [Accepted: 08/05/2021] [Indexed: 11/23/2022]

Affiliation(s)

Jessica K. De Freitas Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Kipp W. Johnson Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Eddye Golden Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Girish N. Nadkarni Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Medicine, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Joel T. Dudley Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Erwin P. Bottinger Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Medicine, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Digital Health Center at Hasso Plattner Institute, University of Potsdam, Professor-Dr.-Helmert-Str 2–3, 14482 Potsdam, Germany
Benjamin S. Glicksberg Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA
Riccardo Miotto Hasso Plattner Institute for Digital Health at Mount Sinai, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, NY 10029, USA

Collapse

Ma J, Zhang Q, Lou J, Xiong L, Ho JC. Communication Efficient Federated Generalized Tensor Factorization for Collaborative Health Data Analytics. PROCEEDINGS OF THE ... INTERNATIONAL WORLD-WIDE WEB CONFERENCE. INTERNATIONAL WWW CONFERENCE 2021;2021:171-182. [PMID: 34467367 PMCID: PMC8404412 DOI: 10.1145/3442381.3449832] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]

Ju R, Zhou P, Wen S, Wei W, Xue Y, Huang X, Yang X. 3D-CNN-SPP: A Patient Risk Prediction System From Electronic Health Records via 3D CNN and Spatial Pyramid Pooling. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2021. [DOI: 10.1109/tetci.2019.2960474] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]

Ferté T, Cossin S, Schaeverbeke T, Barnetche T, Jouhet V, Hejblum BP. Automatic phenotyping of electronical health record: PheVis algorithm. J Biomed Inform 2021;117:103746. [PMID: 33746080 DOI: 10.1016/j.jbi.2021.103746] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 03/02/2021] [Accepted: 03/05/2021] [Indexed: 11/18/2022]

Lou J, Wang Y, Li L, Zeng D. Learning latent heterogeneity for type 2 diabetes patients using longitudinal health markers in electronic health records. Stat Med 2021;40:1930-1946. [PMID: 33586187 DOI: 10.1002/sim.8880] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 12/21/2020] [Accepted: 12/30/2020] [Indexed: 11/07/2022]

Zhang C, Fanaee-T H, Thoresen M. Feature extraction from unequal length heterogeneous EHR time series via dynamic time warping and tensor decomposition. Data Min Knowl Discov 2021. [DOI: 10.1007/s10618-020-00724-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Wagholikar KB, Estiri H, Murphy M, Murphy SN. Polar labeling: silver standard algorithm for training disease classifiers. Bioinformatics 2020;36:3200-3206. [PMID: 32049335 PMCID: PMC7214041 DOI: 10.1093/bioinformatics/btaa088] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2019] [Revised: 01/30/2020] [Accepted: 02/04/2020] [Indexed: 01/29/2023] Open

Stevens LM, Mortazavi BJ, Deo RC, Curtis L, Kao DP. Recommendations for Reporting Machine Learning Analyses in Clinical Research. Circ Cardiovasc Qual Outcomes 2020;13:e006556. [PMID: 33079589 DOI: 10.1161/circoutcomes.120.006556] [Citation(s) in RCA: 100] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Yin K, Afshar A, Ho JC, Cheung WK, Zhang C, Sun J. LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values. KDD : PROCEEDINGS. INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING 2020;2020:1625-1635. [PMID: 34109054 DOI: 10.1145/3394486.3403213] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]

Zhou F, Gillespie A, Gligorijevic D, Gligorijevic J, Obradovic Z. Use of disease embedding technique to predict the risk of progression to end-stage renal disease. J Biomed Inform 2020;105:103409. [PMID: 32304869 PMCID: PMC9885429 DOI: 10.1016/j.jbi.2020.103409] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 03/18/2020] [Accepted: 03/19/2020] [Indexed: 02/01/2023]

Afshar A, Perros I, Park H, deFilippi C, Yan X, Stewart W, Ho J, Sun J. TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records. PROCEEDINGS OF THE ACM CONFERENCE ON HEALTH, INFERENCE, AND LEARNING 2020;2020:193-203. [PMID: 33659966 PMCID: PMC7924914 DOI: 10.1145/3368555.3384464] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]

Kim Y, Jiang X, Giancardo L, Pena D, Bukhbinder AS, Amran AY, Schulz PE. Multimodal Phenotyping of Alzheimer's Disease with Longitudinal Magnetic Resonance Imaging and Cognitive Function Data. Sci Rep 2020;10:5527. [PMID: 32218482 PMCID: PMC7099007 DOI: 10.1038/s41598-020-62263-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 03/06/2020] [Indexed: 01/26/2023] Open

Rasmussen LV, Brandt PS, Jiang G, Kiefer RC, Pacheco JA, Adekkanattu P, Ancker JS, Wang F, Xu Z, Pathak J, Luo Y. Considerations for Improving the Portability of Electronic Health Record-Based Phenotype Algorithms. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020;2019:755-764. [PMID: 32308871 PMCID: PMC7153055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Liu X, zhou Y, Zongrun W. Can the development of a patient’s condition be predicted through intelligent inquiry under the e-health business mode? Sequential feature map-based disease risk prediction upon features selected from cognitive diagnosis big data. INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT 2020. [DOI: 10.1016/j.ijinfomgt.2019.05.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Xu Z, Chou J, Zhang XS, Luo Y, Isakova T, Adekkanattu P, Ancker JS, Jiang G, Kiefer RC, Pacheco JA, Rasmussen LV, Pathak J, Wang F. Identifying sub-phenotypes of acute kidney injury using structured and unstructured electronic health record data with memory networks. J Biomed Inform 2020;102:103361. [PMID: 31911172 DOI: 10.1016/j.jbi.2019.103361] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 11/18/2019] [Accepted: 12/16/2019] [Indexed: 01/08/2023]

Zhang L, Zhang Y, Cai T, Ahuja Y, He Z, Ho YL, Beam A, Cho K, Carroll R, Denny J, Kohane I, Liao K, Cai T. Automated grouping of medical codes via multiview banded spectral clustering. J Biomed Inform 2019;100:103322. [PMID: 31672532 PMCID: PMC7261410 DOI: 10.1016/j.jbi.2019.103322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 10/25/2019] [Accepted: 10/27/2019] [Indexed: 01/28/2023]

Abstract

OBJECTIVE

With its increasingly widespread adoption, electronic health records (EHR) have enabled phenotypic information extraction at an unprecedented granularity and scale. However, often a medical concept (e.g. diagnosis, prescription, symptom) is described in various synonyms across different EHR systems, hindering data integration for signal enhancement and complicating dimensionality reduction for knowledge discovery. Despite existing ontologies and hierarchies, tremendous human effort is needed for curation and maintenance - a process that is both unscalable and susceptible to subjective biases. This paper aims to develop a data-driven approach to automate grouping medical terms into clinically relevant concepts by combining multiple up-to-date data sources in an unbiased manner.

METHODS

We present a novel data-driven grouping approach - multi-view banded spectral clustering (mvBSC) combining summary data from multiple healthcare systems. The proposed method consists of a banding step that leverages the prior knowledge from the existing coding hierarchy, and a combining step that performs spectral clustering on an optimally weighted matrix.

RESULTS

We apply the proposed method to group ICD-9 and ICD-10-CM codes together by integrating data from two healthcare systems. We show grouping results and hierarchies for 13 representative disease categories. Individual grouping qualities were evaluated using normalized mutual information, adjusted Rand index, and F1-measure, and were found to consistently exhibit great similarity to the existing manual grouping counterpart. The resulting ICD groupings also enjoy comparable interpretability and are well aligned with the current ICD hierarchy.

CONCLUSION

The proposed approach, by systematically leveraging multiple data sources, is able to overcome bias while maximizing consensus to achieve generalizability. It has the advantage of being efficient, scalable, and adaptive to the evolving human knowledge reflected in the data, showing a significant step toward automating medical knowledge integration.

Collapse

Affiliation(s)

Luwan Zhang Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Yichi Zhang Department of Computer Science and Statistics, University of Rhode Island, Kingston, RI, USA
Tianrun Cai Division of Rheumatology, Brigham and Women's Hospital, Boston, MA, USA; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA
Yuri Ahuja Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Zeling He Division of Rheumatology, Brigham and Women's Hospital, Boston, MA, USA; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA
Yuk-Lam Ho Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA
Andrew Beam Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Kelly Cho Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA; Division of Aging, Brigham and Women's Hospital, Boston, MA, USA; Department of Medicine, Harvard Medical School, Boston, MA, USA
Robert Carroll Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
Joshua Denny Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
Isaac Kohane Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Katherine Liao Division of Rheumatology, Brigham and Women's Hospital, Boston, MA, USA; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Tianxi Cai Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA; Division of Population Health and Data Sciences, MAVERIC, VA Boston Healthcare System, Boston, MA, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA

Collapse

A Review of Automatic Phenotyping Approaches using Electronic Health Records. ELECTRONICS 2019. [DOI: 10.3390/electronics8111235] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Zhao J, Zhang Y, Schlueter DJ, Wu P, Eric Kerchberger V, Trent Rosenbloom S, Wells QS, Feng Q, Denny JC, Wei WQ. Detecting time-evolving phenotypic topics via tensor factorization on electronic health records: Cardiovascular disease case study. J Biomed Inform 2019;98:103270. [PMID: 31445983 PMCID: PMC6783385 DOI: 10.1016/j.jbi.2019.103270] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 07/10/2019] [Accepted: 08/16/2019] [Indexed: 12/12/2022]

Abstract

OBJECTIVE

Discovering subphenotypes of complex diseases can help characterize disease cohorts for investigative studies aimed at developing better diagnoses and treatments. Recent advances in unsupervised machine learning on electronic health record (EHR) data have enabled researchers to discover phenotypes without input from domain experts. However, most existing studies have ignored time and modeled diseases as discrete events. Uncovering the evolution of phenotypes - how they emerge, evolve and contribute to health outcomes - is essential to define more precise phenotypes and refine the understanding of disease progression. Our objective was to assess the benefits of an unsupervised approach that incorporates time to model diseases as dynamic processes in phenotype discovery.

METHODS

In this study, we applied a constrained non-negative tensor-factorization approach to characterize the complexity of cardiovascular disease (CVD) patient cohort based on longitudinal EHR data. Through tensor-factorization, we identified a set of phenotypic topics (i.e., subphenotypes) that these patients established over the 10 years prior to the diagnosis of CVD, and showed the progress pattern. For each identified subphenotype, we examined its association with the risk for adverse cardiovascular outcomes estimated by the American College of Cardiology/American Heart Association Pooled Cohort Risk Equations, a conventional CVD-risk assessment tool frequently used in clinical practice. Furthermore, we compared the subsequent myocardial infarction (MI) rates among the six most prevalent subphenotypes using survival analysis.

RESULTS

From a cohort of 12,380 adult CVD individuals with 1068 unique PheCodes, we successfully identified 14 subphenotypes. Through the association analysis with estimated CVD risk for each subtype, we found some phenotypic topics such as Vitamin D deficiency and depression, Urinary infections cannot be explained by the conventional risk factors. Through a survival analysis, we found markedly different risks of subsequent MI following the diagnosis of CVD among the six most prevalent topics (p < 0.0001), indicating these topics may capture clinically meaningful subphenotypes of CVD.

CONCLUSION

This study demonstrates the potential benefits of using tensor-decomposition to model diseases as dynamic processes from longitudinal EHR data. Our results suggest that this data-driven approach may potentially help researchers identify complex and chronic disease subphenotypes in precision medicine research.

Collapse

Oh W, Steinbach MS, Castro MR, Peterson KA, Kumar V, Caraballo PJ, Simona GJ. Evaluating the Impact of Data Representation on EHR-Based Analytic Tasks. Stud Health Technol Inform 2019;264:288-292. [PMID: 31437931 PMCID: PMC7666864 DOI: 10.3233/shti190229] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Rodriguez VA, Perotte A. Phenotype Inference with Semi-Supervised Mixed Membership Models. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2019;106:304-324. [PMID: 32490377 PMCID: PMC7266114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Gachloo M, Wang Y, Xia J. A review of drug knowledge discovery using BioNLP and tensor or matrix decomposition. Genomics Inform 2019;17:e18. [PMID: 31307133 PMCID: PMC6808632 DOI: 10.5808/gi.2019.17.2.e18] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2019] [Revised: 05/30/2019] [Accepted: 05/30/2019] [Indexed: 12/12/2022] Open

Henderson J, Malin BA, Denny JC, Kho AN, Sun J, Ghosh J, Ho JC. CP Tensor Decomposition with Cannot-Link Intermode Constraints. PROCEEDINGS OF THE ... SIAM INTERNATIONAL CONFERENCE ON DATA MINING. SIAM INTERNATIONAL CONFERENCE ON DATA MINING 2019;2019:711-719. [PMID: 31198618 PMCID: PMC6563811 DOI: 10.1137/1.9781611975673.80] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/28/2023]

Luo G. A roadmap for semi-automatically extracting predictive and clinically meaningful temporal features from medical data for predictive modeling. GLOBAL TRANSITIONS 2019;1:61-82. [PMID: 31032483 PMCID: PMC6482973 DOI: 10.1016/j.glt.2018.11.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]

Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA). PLoS One 2019;14:e0212112. [PMID: 30759150 PMCID: PMC6374022 DOI: 10.1371/journal.pone.0212112] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Accepted: 01/27/2019] [Indexed: 01/01/2023] Open

Perros I, Papalexakis EE, Vuduc R, Searles E, Sun J. Temporal phenotyping of medically complex children via PARAFAC2 tensor factorization. J Biomed Inform 2019;93:103125. [PMID: 30743070 DOI: 10.1016/j.jbi.2019.103125] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Revised: 01/17/2019] [Accepted: 01/29/2019] [Indexed: 10/27/2022]

Abstract

OBJECTIVE

Our aim is to extract clinically-meaningful phenotypes from longitudinal electronic health records (EHRs) of medically-complex children. This is a fragile set of patients consuming a disproportionate amount of pediatric care resources but who often end up with sub-optimal clinical outcome. The rise in available electronic health records (EHRs) provide a rich data source that can be used to disentangle their complex clinical conditions into concise, clinically-meaningful groups of characteristics. We aim at identifying those phenotypes and their temporal evolution in a scalable, computational manner, which avoids the time-consuming manual chart review.

MATERIALS AND METHODS

We analyze longitudinal EHRs from Children's Healthcare of Atlanta including 1045 medically complex patients with a total of 59,948 encounters over 2 years. We apply a tensor factorization method called PARAFAC2 to extract: (a) clinically-meaningful groups of features (b) concise patient representations indicating the presence of a phenotype for each patient, and (c) temporal signatures indicating the evolution of those phenotypes over time for each patient.

RESULTS

We identified four medically complex phenotypes, namely gastrointestinal disorders, oncological conditions, blood-related disorders, and neurological system disorders, which have distinct clinical characterizations among patients. We demonstrate the utility of patient representations produced by PARAFAC2, towards identifying groups of patients with significant survival variations. Finally, we showcase representative examples of the temporal phenotypic trends extracted for different patients.

DISCUSSION

Unsupervised temporal phenotyping is an important task since it minimizes the burden on behalf of clinical experts, by relegating their involvement in the output phenotypes' validation. PARAFAC2 enjoys several compelling properties towards temporal computational phenotyping: (a) it is able to handle high-dimensional data and variable numbers of encounters across patients, (b) it has an intuitive interpretation and (c) it is free from ad-hoc parameter choices. Computational phenotypes, such as the ones computed by our approach, have multiple applications; we highlight three of them which are particularly useful for medically complex children: (1) integration into clinical decision support systems, (2) interpretable mortality prediction and 3) clinical trial recruitment.

CONCLUSION

PARAFAC2 can be applied to unsupervised temporal phenotyping tasks where precise definitions of different phenotypes are absent, and lengths of patient records are varying.

Collapse

Mao C, Zhao Y, Sun M(I, Luo Y. Are My EHRs Private Enough? Event-Level Privacy Protection. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:103-112. [PMID: 29994487 PMCID: PMC6441392 DOI: 10.1109/tcbb.2018.2850037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Zeng Z, Deng Y, Li X, Naumann T, Luo Y. Natural Language Processing for EHR-Based Computational Phenotyping. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019;16:139-153. [PMID: 29994486 PMCID: PMC6388621 DOI: 10.1109/tcbb.2018.2849968] [Citation(s) in RCA: 78] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]

Henderson J, He H, Malin BA, Denny JC, Kho AN, Ghosh J, Ho JC. Phenotyping through Semi-Supervised Tensor Factorization (PSST). AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018;2018:564-573. [PMID: 30815097 PMCID: PMC6371355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Afshar A, Perros I, Papalexakis EE, Searles E, Ho J, Sun J. COPA: Constrained PARAFAC2 for Sparse & Large Datasets. PROCEEDINGS OF THE ... ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT. ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT 2018;2018:793-802. [PMID: 32905548 PMCID: PMC7472553 DOI: 10.1145/3269206.3271775] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Abstract

PARAFAC2 has demonstrated success in modeling irregular tensors, where the tensor dimensions vary across one of the modes. An example scenario is modeling treatments across a set of patients with the varying number of medical encounters over time. Despite recent improvements on unconstrained PARAFAC2, its model factors are usually dense and sensitive to noise which limits their interpretability. As a result, the following open challenges remain: a) various modeling constraints, such as temporal smoothness, sparsity and non-negativity, are needed to be imposed for interpretable temporal modeling and b) a scalable approach is required to support those constraints efficiently for large datasets. To tackle these challenges, we propose a COnstrained PARAFAC2 (COPA) method, which carefully incorporates optimization constraints such as temporal smoothness, sparsity, and non-negativity in the resulting factors. To efficiently support all those constraints, COPA adopts a hybrid optimization framework using alternating optimization and alternating direction method of multiplier (AO-ADMM). As evaluated on large electronic health record (EHR) datasets with hundreds of thousands of patients, COPA achieves significant speedups (up to 36× faster) over prior PARAFAC2 approaches that only attempt to handle a subset of the constraints that COPA enables. Overall, our method outperforms all the baselines attempting to handle a subset of the constraints in terms of speed, while achieving the same level of accuracy. Through a case study on temporal phenotyping of medically complex children, we demonstrate how the constraints imposed by COPA reveal concise phenotypes and meaningful temporal profiles of patients. The clinical interpretation of both the phenotypes and the temporal profiles was confirmed by a medical expert.

Collapse

SUSTain. PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING 2018;2018:2080-2089. [PMID: 33680534 PMCID: PMC7935718 DOI: 10.1145/3219819.3219999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]

Banda JM, Seneviratne M, Hernandez-Boussard T, Shah NH. Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models. Annu Rev Biomed Data Sci 2018;1:53-68. [PMID: 31218278 PMCID: PMC6583807 DOI: 10.1146/annurev-biodatasci-080917-013315] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Choi J, Kim Y, Kim HS, Choi IY, Yu H. Phenotyping of Korean patients with better-than-expected efficacy of moderate-intensity statins using tensor factorization. PLoS One 2018;13:e0197518. [PMID: 29897980 PMCID: PMC5999101 DOI: 10.1371/journal.pone.0197518] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 05/03/2018] [Indexed: 11/18/2022] Open

Robinson JR, Wei WQ, Roden DM, Denny JC. Defining Phenotypes from Clinical Data to Drive Genomic Research. Annu Rev Biomed Data Sci 2018;1:69-92. [PMID: 34109303 DOI: 10.1146/annurev-biodatasci-080917-013335] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Denny JC, Van Driest SL, Wei WQ, Roden DM. The Influence of Big (Clinical) Data and Genomics on Precision Medicine and Drug Development. Clin Pharmacol Ther 2018;103:409-418. [PMID: 29171014 PMCID: PMC5805632 DOI: 10.1002/cpt.951] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Revised: 11/15/2017] [Accepted: 11/19/2017] [Indexed: 12/30/2022]

Chen Y, Kho AN, Liebovitz D, Ivory C, Osmundson S, Bian J, Malin BA. Learning bundled care opportunities from electronic medical records. J Biomed Inform 2018;77:1-10. [PMID: 29174994 PMCID: PMC5771885 DOI: 10.1016/j.jbi.2017.11.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 10/30/2017] [Accepted: 11/21/2017] [Indexed: 01/29/2023]

Meystre SM, Lovis C, Bürkle T, Tognola G, Budrionis A, Lehmann CU. Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress. Yearb Med Inform 2017;26:38-52. [PMID: 28480475 PMCID: PMC6239225 DOI: 10.15265/iy-2017-007] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Indexed: 12/30/2022] Open

Kim Y, Sun J, Yu H, Jiang X. Federated Tensor Factorization for Computational Phenotyping. KDD : PROCEEDINGS. INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING 2017;2017:887-895. [PMID: 29071165 PMCID: PMC5652331 DOI: 10.1145/3097983.3098118] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]

Clifton DA, Niehaus KE, Charlton P, Colopy GW. Health Informatics via Machine Learning for the Clinical Management of Patients. Yearb Med Inform 2017;10:38-43. [PMID: 26293849 DOI: 10.15265/iy-2015-014] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

Banda JM, Halpern Y, Sontag D, Shah NH. Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017;2017:48-57. [PMID: 28815104 PMCID: PMC5543379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]

Henderson J, Bridges R, Ho JC, Wallace BC, Ghosh J. PheKnow-Cloud: A Tool for Evaluating High-Throughput Phenotype Candidates using Online Medical Literature. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2017;2017:149-157. [PMID: 28815124 PMCID: PMC5543339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Henderson J, Ho J, Ghosh J. gamAID: Greedy CP tensor decomposition for supervised EHR-based disease trajectory differentiation. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2017;2017:3644-3647. [PMID: 29060688 DOI: 10.1109/embc.2017.8037647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]