1
|
Maurits MP, Korsunsky I, Raychaudhuri S, Murphy SN, Smoller JW, Weiss ST, Petukhova LM, Weng C, Wei WQ, Huizinga TWJ, Reinders MJT, Karlson EW, van den Akker EB, Knevel R. A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history. J Am Med Inform Assoc 2022; 29:761-769. [PMID: 35139533 PMCID: PMC9122640 DOI: 10.1093/jamia/ocac008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 11/24/2021] [Accepted: 01/27/2022] [Indexed: 11/23/2022] Open
Abstract
OBJECTIVE To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects. MATERIAL AND METHODS We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features. RESULTS We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 "other headache" clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2-8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles. DISCUSSION Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data. CONCLUSION We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes.
Collapse
Affiliation(s)
- Marc P Maurits
- Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
| | - Ilya Korsunsky
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Soumya Raychaudhuri
- Center for Data Sciences, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Shawn N Murphy
- Research Information Science and Computing, Mass General Brigham, Boston, MA, USA
| | - Jordan W Smoller
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Scott T Weiss
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Lynn M Petukhova
- Lynn M. Petukhova, Department of Dermatology at NewYork-Presbyterian/Columbia University Medical Center (CUMC)
| | - Chunhua Weng
- Chunhua Weng, Biomedical Informatics - Columbia University
| | - Wei-Qi Wei
- Wei-Qi Wei, Biomedical Informatics in the School of Medicine at Vanderbilt University Wei
| | - Thomas W J Huizinga
- Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
| | - Marcel J T Reinders
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- The Delft Bioinformatics Lab, Delft University of Technology, Delft, The Netherlands
| | - Elizabeth W Karlson
- Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Erik B van den Akker
- Leiden Computational Biology Center, Leiden University Medical Center, Leiden, The Netherlands
- Section of Molecular Epidemiology, Leiden University Medical Center, Leiden, The Netherlands
| | - Rachel Knevel
- Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
- Division of Rheumatology, Inflammation and Immunity, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| |
Collapse
|