1
|
Sikström S, Valavičiūtė I, Kuusela I, Evors N. Question-based computational language approach outperforms rating scales in quantifying emotional states. COMMUNICATIONS PSYCHOLOGY 2024; 2:45. [PMID: 39242812 PMCID: PMC11332055 DOI: 10.1038/s44271-024-00097-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 05/03/2024] [Indexed: 09/09/2024]
Abstract
Psychological constructs are commonly quantified with closed-ended rating scales. However, recent advancements in natural language processing (NLP) enable the quantification of open-ended language responses. Here we demonstrate that descriptive word responses analyzed using NLP show higher accuracy in categorizing emotional states compared to traditional rating scales. One group of participants (N = 297) generated narratives related to depression, anxiety, satisfaction, or harmony, summarized them with five descriptive words, and rated them using rating scales. Another group (N = 434) evaluated these narratives (with descriptive words and rating scales) from the author's perspective. The descriptive words were quantified using NLP, and machine learning was used to categorize the responses into the corresponding emotional states. The results showed a significantly higher number of accurate categorizations of the narratives based on descriptive words (64%) than on rating scales (44%), questioning the notion that rating scales are more precise in measuring emotional states than language-based measures.
Collapse
Affiliation(s)
- Sverker Sikström
- Department of Psychology, Lund University, Lund, SE-221 00, Sweden.
| | - Ieva Valavičiūtė
- Department of Psychology, Lund University, Lund, SE-221 00, Sweden
| | - Inari Kuusela
- Department of Psychology, Lund University, Lund, SE-221 00, Sweden
| | - Nicole Evors
- Department of Psychology, Lund University, Lund, SE-221 00, Sweden
| |
Collapse
|
2
|
Deo AJ, Castro VM, Baker A, Carroll D, Gonzalez-Heydrich J, Henderson DC, Holt DJ, Hook K, Karmacharya R, Roffman JL, Madsen EM, Song E, Adams WG, Camacho L, Gasman S, Gibbs JS, Fortgang RG, Kennedy CJ, Lozinski G, Perez DC, Wilson M, Reis BY, Smoller JW. Validation of an ICD-Code-Based Case Definition for Psychotic Illness Across Three Health Systems. Schizophr Bull 2024:sbae064. [PMID: 38728421 DOI: 10.1093/schbul/sbae064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/12/2024]
Abstract
BACKGROUND AND HYPOTHESIS Psychosis-associated diagnostic codes are increasingly being utilized as case definitions for electronic health record (EHR)-based algorithms to predict and detect psychosis. However, data on the validity of psychosis-related diagnostic codes is limited. We evaluated the positive predictive value (PPV) of International Classification of Diseases (ICD) codes for psychosis. STUDY DESIGN Using EHRs at 3 health systems, ICD codes comprising primary psychotic disorders and mood disorders with psychosis were grouped into 5 higher-order groups. 1133 records were sampled for chart review using the full EHR. PPVs (the probability of chart-confirmed psychosis given ICD psychosis codes) were calculated across multiple treatment settings. STUDY RESULTS PPVs across all diagnostic groups and hospital systems exceeded 70%: Mass General Brigham 0.72 [95% CI 0.68-0.77], Boston Children's Hospital 0.80 [0.75-0.84], and Boston Medical Center 0.83 [0.79-0.86]. Schizoaffective disorder PPVs were consistently the highest across sites (0.80-0.92) and major depressive disorder with psychosis were the most variable (0.57-0.79). To determine if the first documented code captured first-episode psychosis (FEP), we excluded cases with prior chart evidence of a diagnosis of or treatment for a psychotic illness, yielding substantially lower PPVs (0.08-0.62). CONCLUSIONS We found that the first documented psychosis diagnostic code accurately captured true episodes of psychosis but was a poor index of FEP. These data have important implications for the case definitions used in the development of risk prediction models designed to predict or detect undiagnosed psychosis.
Collapse
Affiliation(s)
- Anthony J Deo
- Department of Psychiatry and Behavioral Sciences, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Department of Psychiatry, Rutgers-Robert Wood Johnson Medical School, Piscataway, NJ, USA
- Psychiatric Evaluation of Adolescent and Child Experiences (P.E.A.C.E.) Program, Rutgers University Behavioral Health Care, Piscataway, NJ, USA
| | - Victor M Castro
- Research Information Science and Computing, Mass General Brigham, Somerville, MA, USA
| | | | - Devon Carroll
- Department of Psychiatry and Behavioral Sciences, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- College of Nursing, University of Rhode Island, Providence, RI, USA
| | - Joseph Gonzalez-Heydrich
- Department of Psychiatry and Behavioral Sciences, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Tommy Fuss Center for Neuropsychiatric Disease Research, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- Early Psychosis Investigation Center, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
| | - David C Henderson
- Boston Medical Center, Boston, MA, USA
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Daphne J Holt
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital, Boston MA, USA
| | - Kimberly Hook
- Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Rakesh Karmacharya
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Chemical Biology and Therapeutic Science Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Schizophrenia and Bipolar Disorder Program, McLean Hospital, Belmont, MA, USA
| | - Joshua L Roffman
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Department of Psychiatry, Massachusetts General Hospital, Boston MA, USA
| | - Emily M Madsen
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Eugene Song
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | | | | | | | - Jada S Gibbs
- Rutgers New Jersey Medical School, Newark, NJ, USA
| | - Rebecca G Fortgang
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychology, Harvard University, Cambridge, MA, USA
| | - Chris J Kennedy
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | | | - Daisy C Perez
- Boston Medical Center, Boston, MA, USA
- Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
| | - Marina Wilson
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Ben Y Reis
- Predictive Medicine Group, Harvard Medical School, Boston, MA, USA
- Computational Health Informatics Program, Boston Children's Hospital, Boston, MA, USA
| | - Jordan W Smoller
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA
| |
Collapse
|
3
|
Deo AJ, Castro VM, Baker A, Carroll D, Gonzalez-Heydrich J, Henderson DC, Holt DJ, Hook K, Karmacharya R, Roffman JL, Madsen EM, Song E, Adams WG, Camacho L, Gasman S, Gibbs JS, Fortgang RG, Kennedy CJ, Lozinski G, Perez DC, Wilson M, Reis BY, Smoller JW. Validation of an ICD-code-based case definition for psychotic illness across three health systems. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.02.28.24303443. [PMID: 38464074 PMCID: PMC10925367 DOI: 10.1101/2024.02.28.24303443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Background and Hypothesis Early detection of psychosis is critical for improving outcomes. Algorithms to predict or detect psychosis using electronic health record (EHR) data depend on the validity of the case definitions used, typically based on diagnostic codes. Data on the validity of psychosis-related diagnostic codes is limited. We evaluated the positive predictive value (PPV) of International Classification of Diseases (ICD) codes for psychosis. Study Design Using EHRs at three health systems, ICD codes comprising primary psychotic disorders and mood disorders with psychosis were grouped into five higher-order groups. 1,133 records were sampled for chart review using the full EHR. PPVs (the probability of chart-confirmed psychosis given ICD psychosis codes) were calculated across multiple treatment settings. Study Results PPVs across all diagnostic groups and hospital systems exceeded 70%: Massachusetts General Brigham 0.72 [95% CI 0.68-0.77], Boston Children's Hospital 0.80 [0.75-0.84], and Boston Medical Center 0.83 [0.79-0.86]. Schizoaffective disorder PPVs were consistently the highest across sites (0.80-0.92) and major depressive disorder with psychosis were the most variable (0.57-0.79). To determine if the first documented code captured first-episode psychosis (FEP), we excluded cases with prior chart evidence of a diagnosis of or treatment for a psychotic illness, yielding substantially lower PPVs (0.08-0.62). Conclusions We found that the first documented psychosis diagnostic code accurately captured true episodes of psychosis but was a poor index of FEP. These data have important implications for the development of risk prediction models designed to predict or detect undiagnosed psychosis.
Collapse
Affiliation(s)
- Anthony J. Deo
- Department of Psychiatry and Behavioral Sciences, Boston Children’s Hospital, Harvard Medical School, Boston, MA
- Department of Psychiatry, Harvard Medical School, Boston, MA
- Department of Psychiatry, Rutgers-Robert Wood Johnson Medical School, Piscataway, NJ
- Rutgers University Behavioral Health Care, Piscataway, NJ
| | - Victor M. Castro
- Research Information Science and Computing, Mass General Brigham, Somerville, MA
| | | | - Devon Carroll
- Department of Psychiatry and Behavioral Sciences, Boston Children’s Hospital, Harvard Medical School, Boston, MA
- University of Rhode Island, Providence, RI, USA
| | - Joseph Gonzalez-Heydrich
- Department of Psychiatry and Behavioral Sciences, Boston Children’s Hospital, Harvard Medical School, Boston, MA
- Department of Psychiatry, Harvard Medical School, Boston, MA
- Tommy Fuss Center for Neuropsychiatric Disease Research, Boston Children’s Hospital, Harvard Medical School, Boston, MA
- Early Psychosis Investigation Center, Boston Children’s Hospital, Harvard Medical School, Boston, MA
| | - David C. Henderson
- Boston Medical Center, Boston MA
- Boston University Chobanian & Avedisian School of Medicine, Boston MA
| | - Daphne J. Holt
- Department of Psychiatry, Harvard Medical School, Boston, MA
- Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Kimberly Hook
- Harvard T.H. Chan School of Public Health, Boston, MA
| | - Rakesh Karmacharya
- Department of Psychiatry, Harvard Medical School, Boston, MA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
- Chemical Biology and Therapeutic Science Program, Broad Institute of MIT and Harvard, Cambridge, MA
- Schizophrenia and Bipolar Disorder Program, McLean Hospital, Belmont, MA
| | - Joshua L. Roffman
- Department of Psychiatry, Harvard Medical School, Boston, MA
- Department of Psychiatry, Massachusetts General Hospital, Boston MA
| | - Emily M. Madsen
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Eugene Song
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - William G. Adams
- Boston Medical Center, Boston MA
- Boston University Chobanian & Avedisian School of Medicine, Boston MA
| | | | | | - Jada S. Gibbs
- Rutgers New Jersey Medical School, Newark, New Jersey 07103
| | - Rebecca G. Fortgang
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Department of Psychology, Harvard University, Cambridge, MA
| | - Chris J. Kennedy
- Department of Psychiatry, Harvard Medical School, Boston, MA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | | | - Daisy C. Perez
- Boston Medical Center, Boston MA
- Boston University Chobanian & Avedisian School of Medicine, Boston MA
| | - Marina Wilson
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Ben Y. Reis
- Predictive Medicine Group, Harvard Medical School, Boston, MA
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, MA
| | - Jordan W. Smoller
- Department of Psychiatry, Harvard Medical School, Boston, MA
- Psychiatric & Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA
| |
Collapse
|
4
|
Walsh CG, Ripperger MA, Hu Y, Sheu YH, Lee H, Wilimitis D, Zheutlin AB, Rocha D, Choi KW, Castro VM, Kirchner HL, Chabris CF, Davis LK, Smoller JW. Development and multi-site external validation of a generalizable risk prediction model for bipolar disorder. Transl Psychiatry 2024; 14:58. [PMID: 38272862 PMCID: PMC10810911 DOI: 10.1038/s41398-023-02720-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 11/29/2023] [Accepted: 12/15/2023] [Indexed: 01/27/2024] Open
Abstract
Bipolar disorder is a leading contributor to disability, premature mortality, and suicide. Early identification of risk for bipolar disorder using generalizable predictive models trained on diverse cohorts around the United States could improve targeted assessment of high risk individuals, reduce misdiagnosis, and improve the allocation of limited mental health resources. This observational case-control study intended to develop and validate generalizable predictive models of bipolar disorder as part of the multisite, multinational PsycheMERGE Network across diverse and large biobanks with linked electronic health records (EHRs) from three academic medical centers: in the Northeast (Massachusetts General Brigham), the Mid-Atlantic (Geisinger) and the Mid-South (Vanderbilt University Medical Center). Predictive models were developed and valid with multiple algorithms at each study site: random forests, gradient boosting machines, penalized regression, including stacked ensemble learning algorithms combining them. Predictors were limited to widely available EHR-based features agnostic to a common data model including demographics, diagnostic codes, and medications. The main study outcome was bipolar disorder diagnosis as defined by the International Cohort Collection for Bipolar Disorder, 2015. In total, the study included records for 3,529,569 patients including 12,533 cases (0.3%) of bipolar disorder. After internal and external validation, algorithms demonstrated optimal performance in their respective development sites. The stacked ensemble achieved the best combination of overall discrimination (AUC = 0.82-0.87) and calibration performance with positive predictive values above 5% in the highest risk quantiles at all three study sites. In conclusion, generalizable predictive models of risk for bipolar disorder can be feasibly developed across diverse sites to enable precision medicine. Comparison of a range of machine learning methods indicated that an ensemble approach provides the best performance overall but required local retraining. These models will be disseminated via the PsycheMERGE Network website.
Collapse
Affiliation(s)
- Colin G Walsh
- Vanderbilt University Medical Center Health System, Nashville, TN, USA.
| | | | - Yirui Hu
- Geisinger Health System, Danville, PA, USA
| | - Yi-Han Sheu
- Massachusetts General-Brigham Health System, Boston, MA, USA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Hyunjoon Lee
- Vanderbilt University Medical Center Health System, Nashville, TN, USA
| | - Drew Wilimitis
- Vanderbilt University Medical Center Health System, Nashville, TN, USA
| | | | | | - Karmel W Choi
- Massachusetts General-Brigham Health System, Boston, MA, USA
| | - Victor M Castro
- Massachusetts General-Brigham Health System, Boston, MA, USA
| | | | | | - Lea K Davis
- Vanderbilt University Medical Center Health System, Nashville, TN, USA
| | - Jordan W Smoller
- Massachusetts General-Brigham Health System, Boston, MA, USA
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| |
Collapse
|
5
|
Zhu T, Kou R, Hu Y, Yuan M, Yuan C, Luo L, Zhang W. Dissecting clinical and biological heterogeneity in clinical states of bipolar disorder: a 10-year retrospective study from China. Front Psychiatry 2023; 14:1128862. [PMID: 38179244 PMCID: PMC10764613 DOI: 10.3389/fpsyt.2023.1128862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Accepted: 12/01/2023] [Indexed: 01/06/2024] Open
Abstract
Objectives To dissect clinical and biological heterogeneity in clinical states of bipolar disorder (BD), and investigate if neuropsychological symptomatology, comorbidity, vital signs, and blood laboratory indicators are predictors of distinct BD states. Methods A retrospective BD cohort was established with data extracted from a Chinese hospital's electronic medical records (EMR) between 2009 and 2018. Subjects were inpatients with a main discharge diagnosis of BD and were assessed for clinical state at hospitalization. We categorized all subjects into manic state, depressive state, and mixed state. Four machine learning classifiers were utilized to classify the subjects. A Shapley additive explanations (SHAP) algorithm was applied to the classifiers to aid in quantifying and visualizing the contributions of each feature that drive patient-specific classifications. Results A sample of 3,085 records was included (38.54% as manic, 56.69% as depressive, and 4.77% as mixed state). Mixed state showed more severe suicidal ideation and psychomotor abnormalities, while depressive state showed more common anxiety, sleep, and somatic-related symptoms and more comorbid conditions. Higher levels of body temperature, pulse, and systolic and diastolic blood pressures were present during manic episodes. Xgboost achieved the best AUC of 88.54% in manic/depressive states classification; Logistic regression and Random forest achieved the best AUCs of 75.5 and 75% in manic/mixed states and depressive/mixed states classifications, respectively. Myocardial enzymes and the non-enzymatic antioxidant uric acid and bilirubin contributed significantly to distinguish BD clinical states. Conclusion The observed novel biological associations with BD clinical states confirm that biological heterogeneity contributes to clinical heterogeneity of BD.
Collapse
Affiliation(s)
- Ting Zhu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
- Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Ran Kou
- Business School, Sichuan University, Chengdu, China
| | - Yao Hu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
- Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Minlan Yuan
- Mental Health Center of West China Hospital, Sichuan University, Chengdu, China
| | - Cui Yuan
- Sichuan Provincial Center for Mental Health, The Center of Psychosomatic Medicine of Sichuan Provincial People’s Hospital, University of Electronic Science and Technology of China, Chengdu, China
| | - Li Luo
- Business School, Sichuan University, Chengdu, China
| | - Wei Zhang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
- Med-X Center for Informatics, Sichuan University, Chengdu, China
- Mental Health Center of West China Hospital, Sichuan University, Chengdu, China
| |
Collapse
|
6
|
Zhu T, Liu X, Wang J, Kou R, Hu Y, Yuan M, Yuan C, Luo L, Zhang W. Explainable machine-learning algorithms to differentiate bipolar disorder from major depressive disorder using self-reported symptoms, vital signs, and blood-based markers. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 240:107723. [PMID: 37480646 DOI: 10.1016/j.cmpb.2023.107723] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 06/26/2023] [Accepted: 07/15/2023] [Indexed: 07/24/2023]
Abstract
BACKGROUND AND OBJECTIVE Caused by shared genetic risk factors and similar neuropsychological symptoms, bipolar disorder (BD) and major depressive disorder (MDD) are at high risk of misdiagnosis, which is associated with ineffective treatment and worsening of outcomes. We aimed to develop a machine learning (ML)-based diagnostic system, based on electronic medical records (EMR) data, to mimic the clinical reasoning of human physicians to differentiate MDD and BD (especially BD depressive episodes) patients about to be admitted to a hospital and, hence, reduce the misdiagnosis of BD as MDD on admission. In addition, we examined to what extent our ML model could be made interpretable by quantifying and visualizing the features that drive the predictions. METHODS By identifying 16,311 patients admitted to a hospital located in western China between 2009 and 2018 with a recorded main diagnosis of MDD or BD, we established three sub-cohorts with different combinations of features for both the MDD-BD cohort and the MDD-BD depressive episodes cohort, respectively. Four different ML algorithms (logistic regression, extreme gradient boosting (XGBoost), random forest, and support vector machine) and four train-test splits were used to train and validate diagnostic models, and explainable methods (SHAP and Break Down) were utilized to analyze the contribution of each of the features at both population-level and individual-level, including feature importance, feature interaction, and feature effect on prediction decision for a specific subject. RESULTS The XGBoost algorithm provided the best test performance (AUC: 0.838 (0.810-0.867), PPV: 0.810 and NPV: 0.834) for separating patients with BD from those with MDD. Core predictors included symptoms (mood-up, exciting, bad sleep, loss of interest, talking, mood-down, provoke), along with age, job, myocardial enzyme markers (creatine kinase, hydroxybutyrate dehydrogenase), diabetes-associated marker (glucose), bone function marker (alkaline phosphatase), non-enzymatic antioxidant (uric acid), markers of immune/inflammation (white blood cell count, lymphocyte count, basophil percentage, monocyte count), cardiovascular function marker (low density lipoprotein), renal marker (total protein), liver biochemistry marker (indirect bilirubin), and vital signs like pulse. For separating patients with BD depressive episodes from those with MDD, the test AUC was 0.777 (0.732-0.822), with PPV 0.576 and NPV 0.899. Additional validation in models built with self-reported symptoms removed from the feature set, showed test AUC of 0.701 (0.666-0.736) for differentiating BD and MDD, and AUC of 0.564 (0.515-0.614) for detecting patients in BD depressive episodes from MDD patients. Validation in the datasets without removing the patients with comorbidity showed an AUC of 0.826 (0.806-0.846). CONCLUSION The diagnostic system accurately identified patients with BD in various clinical scenarios, and differences in patterns of peripheral markers between BD and MDD could enrich our understanding of potential underlying pathophysiological mechanisms of them.
Collapse
Affiliation(s)
- Ting Zhu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China; Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Xiaofei Liu
- Business School, Sichuan University, Chengdu, China
| | - Junren Wang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China; Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Ran Kou
- Business School, Sichuan University, Chengdu, China
| | - Yao Hu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China; Med-X Center for Informatics, Sichuan University, Chengdu, China
| | - Minlan Yuan
- Mental Health Center of West China Hospital, Sichuan University, Chengdu, China
| | - Cui Yuan
- Sichuan Provincial Center for Mental Health, The Center of Psychosomatic Medicine of Sichuan Provincial People's Hospital, University of Electronic Science and Technology of China, Chengdu, China
| | - Li Luo
- Business School, Sichuan University, Chengdu, China
| | - Wei Zhang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China; Med-X Center for Informatics, Sichuan University, Chengdu, China; Mental Health Center of West China Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
7
|
Kirchner HL, Rocha D, Linner RK, Wilimitis D, Walsh CG, Ripperger M, Lee H, Liu Z, Davis L, Hu Y, Chabris CF, Smoller JW. Association Between Psychiatric Polygenic Scores, Healthcare Utilization and Comorbidity Burden. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.29.23296345. [PMID: 37808705 PMCID: PMC10557834 DOI: 10.1101/2023.09.29.23296345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Purpose To estimate the association of psychiatric polygenic scores with healthcare utilization and comorbidity burden. Methods Observational cohort study (N = 118,882) of adolescent and adult biobank participants with linked electronic health records (EHRs) from three diverse study sites; (Massachusetts General Brigham, Vanderbilt University Medical Center, Geisinger). Polygenic scores (PGS) were derived from the largest available GWAS of major depressive depression, bipolar disorder, and schizophrenia at the time of analysis. Negative binomial regression models were used to estimate the association between each psychiatric PGS and healthcare utilization and comorbidity burden. Healthcare utilization was measured as frequency of emergency department (ED), inpatient (IP), and outpatient (OP) visits. Comorbidity burden was defined by the Elixhauser Comorbidity Index and the Charlson Comorbidity Index. Results Participants had a median follow-up duration of 12 years in the EHR. Individuals in the top decile of polygenic score for major depressive disorder had significantly more ED visits (RR=1.22, 95% CI; 1.17, 1.29) compared to those the lowest decile. Increases were also observed with IP and comorbidity burden. Among those diagnosed with depression and in the highest decile of the PGS, there was an increase in all utilization types (ED: RR=1.56, 95% CI 1.41, 1.72; OP: RR=1.16, 95% CI 1.08, 1.24; IP: RR=1.23, 95% CI 1.12, 1.36) post-diagnosis. No clinically significant results were observed with bipolar and schizophrenia polygenic scores. Conclusions Polygenic score for depression is modestly associated with increased healthcare resource utilization and comorbidity burden, in the absence of diagnosis. Following a diagnosis of depression, the PGS was associated with further increases in healthcare utilization. These findings suggest that depression genetic risk is associated with utilization and burden of chronic disease in real-world settings.
Collapse
Affiliation(s)
| | - Daniel Rocha
- Phenomic Analytics and Clinical Data Core, Geisinger, Danville PA
| | - Richard K Linner
- Department of Bioethics and Decision Sciences, Geisinger, Danville PA
| | - Drew Wilimitis
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, TN
| | - Colin G Walsh
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, TN
- Department of Medicine, Vanderbilt University Medicine Center, Nashville, TN
- Department of Psychiatry and Behavioral Sciences, Vanderbilt University Medical Center, Nashville, TN
| | - Michael Ripperger
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University School of Medicine, Nashville, TN
| | - Hyunjoon Lee
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA
| | - Zhaowen Liu
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA
| | - Lea Davis
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN
| | - Yirui Hu
- Department of Population Health Sciences, Geisinger, Danville PA
| | | | - Jordan W Smoller
- Center for Precision Psychiatry, Department of Psychiatry, Massachusetts General Hospital, Boston, MA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA
| |
Collapse
|
8
|
Walsh CG, Ripperger MA, Hu Y, Sheu YH, Wilimitis D, Zheutlin AB, Rocha D, Choi KW, Castro VM, Kirchner HL, Chabris CF, Davis LK, Smoller JW. Development and Multi-Site External Validation of a Generalizable Risk Prediction Model for Bipolar Disorder. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.02.21.23286251. [PMID: 36865341 PMCID: PMC9980254 DOI: 10.1101/2023.02.21.23286251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2023]
Abstract
Bipolar disorder is a leading contributor to disability, premature mortality, and suicide. Early identification of risk for bipolar disorder using generalizable predictive models trained on diverse cohorts around the United States could improve targeted assessment of high risk individuals, reduce misdiagnosis, and improve the allocation of limited mental health resources. This observational case-control study intended to develop and validate generalizable predictive models of bipolar disorder as part of the multisite, multinational PsycheMERGE Consortium across diverse and large biobanks with linked electronic health records (EHRs) from three academic medical centers: in the Northeast (Massachusetts General Brigham), the Mid-Atlantic (Geisinger) and the Mid-South (Vanderbilt University Medical Center). Predictive models were developed and validated with multiple algorithms at each study site: random forests, gradient boosting machines, penalized regression, including stacked ensemble learning algorithms combining them. Predictors were limited to widely available EHR-based features agnostic to a common data model including demographics, diagnostic codes, and medications. The main study outcome was bipolar disorder diagnosis as defined by the International Cohort Collection for Bipolar Disorder, 2015. In total, the study included records for 3,529,569 patients including 12,533 cases (0.3%) of bipolar disorder. After internal and external validation, algorithms demonstrated optimal performance in their respective development sites. The stacked ensemble achieved the best combination of overall discrimination (AUC = 0.82 - 0.87) and calibration performance with positive predictive values above 5% in the highest risk quantiles at all three study sites. In conclusion, generalizable predictive models of risk for bipolar disorder can be feasibly developed across diverse sites to enable precision medicine. Comparison of a range of machine learning methods indicated that an ensemble approach provides the best performance overall but required local retraining. These models will be disseminated via the PsycheMERGE Consortium website.
Collapse
|
9
|
Kishimoto T, Nakamura H, Kano Y, Eguchi Y, Kitazawa M, Liang KC, Kudo K, Sento A, Takamiya A, Horigome T, Yamasaki T, Sunami Y, Kikuchi T, Nakajima K, Tomita M, Bun S, Momota Y, Sawada K, Murakami J, Takahashi H, Mimura M. Understanding psychiatric illness through natural language processing (UNDERPIN): Rationale, design, and methodology. Front Psychiatry 2022; 13:954703. [PMID: 36532181 PMCID: PMC9752868 DOI: 10.3389/fpsyt.2022.954703] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 11/11/2022] [Indexed: 12/04/2022] Open
Abstract
Introduction Psychiatric disorders are diagnosed through observations of psychiatrists according to diagnostic criteria such as the DSM-5. Such observations, however, are mainly based on each psychiatrist's level of experience and often lack objectivity, potentially leading to disagreements among psychiatrists. In contrast, specific linguistic features can be observed in some psychiatric disorders, such as a loosening of associations in schizophrenia. Some studies explored biomarkers, but biomarkers have yet to be used in clinical practice. Aim The purposes of this study are to create a large dataset of Japanese speech data labeled with detailed information on psychiatric disorders and neurocognitive disorders to quantify the linguistic features of those disorders using natural language processing and, finally, to develop objective and easy-to-use biomarkers for diagnosing and assessing the severity of them. Methods This study will have a multi-center prospective design. The DSM-5 or ICD-11 criteria for major depressive disorder, bipolar disorder, schizophrenia, and anxiety disorder and for major and minor neurocognitive disorders will be regarded as the inclusion criteria for the psychiatric disorder samples. For the healthy subjects, the absence of a history of psychiatric disorders will be confirmed using the Mini-International Neuropsychiatric Interview (M.I.N.I.). The absence of current cognitive decline will be confirmed using the Mini-Mental State Examination (MMSE). A psychiatrist or psychologist will conduct 30-to-60-min interviews with each participant; these interviews will include free conversation, picture-description task, and story-telling task, all of which will be recorded using a microphone headset. In addition, the severity of disorders will be assessed using clinical rating scales. Data will be collected from each participant at least twice during the study period and up to a maximum of five times at an interval of at least one month. Discussion This study is unique in its large sample size and the novelty of its method, and has potential for applications in many fields. We have some challenges regarding inter-rater reliability and the linguistic peculiarities of Japanese. As of September 2022, we have collected a total of >1000 records from >400 participants. To the best of our knowledge, this data sample is one of the largest in this field. Clinical Trial Registration Identifier: UMIN000032141.
Collapse
Affiliation(s)
- Taishiro Kishimoto
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
- Hills Joint Research Laboratory for Future Preventive Medicine and Wellness, Keio University School of Medicine, Tokyo, Japan
| | - Hironobu Nakamura
- Department of Psychiatry and Behavioral Sciences, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Yoshinobu Kano
- Faculty of Informatics, Shizuoka University, Shizuoka, Japan
| | - Yoko Eguchi
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Momoko Kitazawa
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Kuo-ching Liang
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Koki Kudo
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
- Department of Neuropsychiatry, St. Marianna University School of Medicine Hospital, Kawasaki, Japan
| | - Ayako Sento
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Akihiro Takamiya
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Toshiro Horigome
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Toshihiko Yamasaki
- Computer Vision and Media Lab (Yamasaki Lab), Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
| | - Yuki Sunami
- Keio University School of Medicine, Tokyo, Japan
| | - Toshiaki Kikuchi
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Kazuki Nakajima
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | | | - Shogyoku Bun
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
- Department of Psychiatry, Koutokukai Sato Hospital, Yamagata, Japan
| | - Yuki Momota
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | - Kyosuke Sawada
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| | | | - Hidehiko Takahashi
- Department of Psychiatry and Behavioral Sciences, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Masaru Mimura
- Department of Neuropsychiatry, Keio University School of Medicine, Tokyo, Japan
| |
Collapse
|
10
|
Chen ZS, Kulkarni P(P, Galatzer-Levy IR, Bigio B, Nasca C, Zhang Y. Modern views of machine learning for precision psychiatry. PATTERNS (NEW YORK, N.Y.) 2022; 3:100602. [PMID: 36419447 PMCID: PMC9676543 DOI: 10.1016/j.patter.2022.100602] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
In light of the National Institute of Mental Health (NIMH)'s Research Domain Criteria (RDoC), the advent of functional neuroimaging, novel technologies and methods provide new opportunities to develop precise and personalized prognosis and diagnosis of mental disorders. Machine learning (ML) and artificial intelligence (AI) technologies are playing an increasingly critical role in the new era of precision psychiatry. Combining ML/AI with neuromodulation technologies can potentially provide explainable solutions in clinical practice and effective therapeutic treatment. Advanced wearable and mobile technologies also call for the new role of ML/AI for digital phenotyping in mobile mental health. In this review, we provide a comprehensive review of ML methodologies and applications by combining neuroimaging, neuromodulation, and advanced mobile technologies in psychiatry practice. We further review the role of ML in molecular phenotyping and cross-species biomarker identification in precision psychiatry. We also discuss explainable AI (XAI) and neuromodulation in a closed human-in-the-loop manner and highlight the ML potential in multi-media information extraction and multi-modal data fusion. Finally, we discuss conceptual and practical challenges in precision psychiatry and highlight ML opportunities in future research.
Collapse
Affiliation(s)
- Zhe Sage Chen
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
- Department of Neuroscience and Physiology, New York University Grossman School of Medicine, New York, NY 10016, USA
- The Neuroscience Institute, New York University Grossman School of Medicine, New York, NY 10016, USA
- Department of Biomedical Engineering, New York University Tandon School of Engineering, Brooklyn, NY 11201, USA
| | | | - Isaac R. Galatzer-Levy
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
- Meta Reality Lab, New York, NY, USA
| | - Benedetta Bigio
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Carla Nasca
- Department of Psychiatry, New York University Grossman School of Medicine, New York, NY 10016, USA
- The Neuroscience Institute, New York University Grossman School of Medicine, New York, NY 10016, USA
| | - Yu Zhang
- Department of Bioengineering, Lehigh University, Bethlehem, PA 18015, USA
- Department of Electrical and Computer Engineering, Lehigh University, Bethlehem, PA 18015, USA
| |
Collapse
|
11
|
Benson NM, Yang Z, Weiss M, Fung V, Moran LV, Öngür D, Hsu J. Identifying Diagnoses of Schizophrenia Spectrum Disorder in Large Data Sets. Psychiatr Serv 2022; 73:1210-1216. [PMID: 35440163 PMCID: PMC9582046 DOI: 10.1176/appi.ps.202100696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
Objective The authors used a large clinical data set to determine which index diagnoses of schizophrenia spectrum disorder were new diagnoses. Methods Using the Massachusetts All-Payer Claims Database (2012–2016), the authors identified patients with a schizophrenia spectrum disorder diagnosis in 2016 (index diagnosis) and then reviewed patients’ care histories for the previous 12, 24, 36, and 48 months to identify previous diagnoses. Logistic regression was used to examine patient characteristics associated with the index diagnosis being a new diagnosis. Results Overall, 7,217 individuals ages 15–35 years had a 2016 diagnosis of schizophrenia spectrum disorder; 67.7% had at least 48 months of historical data. Among those with at least 48 months of care history, 23% had no previous diagnoses. Diagnoses from inpatient psychiatric admissions or among female or younger patients were more likely to represent new diagnoses, compared with diagnoses from most other diagnosis locations or among males or older age groups, and outpatient diagnoses were less likely to represent new diagnoses than were most other diagnosis settings. Reviewing 48 instead of 12 months of data reduced estimated rates of new diagnoses from 112 to 66 per 100,000 persons; historical diagnoses were detected for 61% and 77% of patients with 12 or 48 months of care history, respectively. Conclusions Examining multiple years of patient history spanning all payers and providers is critical to identifying new schizophrenia spectrum disorder diagnoses in large data sets. Review of 48 months of care history resulted in lower rates of new schizophrenia spectrum disorder diagnoses than previously reported.
Collapse
Affiliation(s)
- Nicole M. Benson
- McLean Hospital, Harvard Medical School, Belmont, MA
- Mongan Institute, Massachusetts General Hospital, Boston, MA
| | - Zhiyou Yang
- Mongan Institute, Massachusetts General Hospital, Boston, MA
| | - Max Weiss
- Mongan Institute, Massachusetts General Hospital, Boston, MA
| | - Vicki Fung
- Mongan Institute, Massachusetts General Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| | | | - Dost Öngür
- McLean Hospital, Harvard Medical School, Belmont, MA
| | - John Hsu
- Mongan Institute, Massachusetts General Hospital, Boston, MA
- Department of Medicine, Harvard Medical School, Boston, MA
| |
Collapse
|
12
|
Ahuja Y, Wen J, Hong C, Xia Z, Huang S, Cai T. A semi-supervised adaptive Markov Gaussian embedding process (SAMGEP) for prediction of phenotype event times using the electronic health record. Sci Rep 2022; 12:17737. [PMID: 36273240 PMCID: PMC9588081 DOI: 10.1038/s41598-022-22585-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2021] [Accepted: 10/17/2022] [Indexed: 01/18/2023] Open
Abstract
While there exist numerous methods to identify binary phenotypes (i.e. COPD) using electronic health record (EHR) data, few exist to ascertain the timings of phenotype events (i.e. COPD onset or exacerbations). Estimating event times could enable more powerful use of EHR data for longitudinal risk modeling, including survival analysis. Here we introduce Semi-supervised Adaptive Markov Gaussian Embedding Process (SAMGEP), a semi-supervised machine learning algorithm to estimate phenotype event times using EHR data with limited observed labels, which require resource-intensive chart review to obtain. SAMGEP models latent phenotype states as a binary Markov process, and it employs an adaptive weighting strategy to map timestamped EHR features to an embedding function that it models as a state-dependent Gaussian process. SAMGEP's feature weighting achieves meaningful feature selection, and its predictions significantly improve AUCs and F1 scores over existing approaches in diverse simulations and real-world settings. It is particularly adept at predicting cumulative risk and event counting process functions, and is robust to diverse generative model parameters. Moreover, it achieves high accuracy with few (50-100) labels, efficiently leveraging unlabeled EHR data to maximize information gain from costly-to-obtain event time labels. SAMGEP can be used to estimate accurate phenotype state functions for risk modeling research.
Collapse
Affiliation(s)
- Yuri Ahuja
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA, 02115, USA. .,Harvard Medical School, Boston, MA, USA. .,Department of Medicine, NYU Langone Health, New York, NY, USA.
| | - Jun Wen
- grid.38142.3c000000041936754XHarvard Medical School, Boston, MA USA
| | - Chuan Hong
- grid.38142.3c000000041936754XHarvard Medical School, Boston, MA USA
| | - Zongqi Xia
- grid.21925.3d0000 0004 1936 9000Department of Neurology, University of Pittsburgh, Pittsburgh, PA USA
| | - Sicong Huang
- grid.38142.3c000000041936754XHarvard Medical School, Boston, MA USA ,grid.62560.370000 0004 0378 8294Division of Rheumatology, Inflammation, and Immunity, Brigham and Women’s Hospital, Boston, MA USA ,grid.410370.10000 0004 4657 1992VA Boston Healthcare System, Boston, MA USA
| | - Tianxi Cai
- grid.38142.3c000000041936754XDepartment of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115 USA ,grid.38142.3c000000041936754XHarvard Medical School, Boston, MA USA ,grid.410370.10000 0004 4657 1992VA Boston Healthcare System, Boston, MA USA
| |
Collapse
|
13
|
Swerdel JN, Schuemie M, Murray G, Ryan PB. PheValuator 2.0: Methodological improvements for the PheValuator approach to semi-automated phenotype algorithm evaluation. J Biomed Inform 2022; 135:104177. [PMID: 35995107 DOI: 10.1016/j.jbi.2022.104177] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 08/11/2022] [Accepted: 08/15/2022] [Indexed: 10/31/2022]
Abstract
PURPOSE Phenotype algorithms are central to performing analyses using observational data. These algorithms translate the clinical idea of a health condition into an executable set of rules allowing for queries of data elements from a database. PheValuator, a software package in the Observational Health Data Sciences and Informatics (OHDSI) tool stack, provides a method to assess the performance characteristics of these algorithms, namely, sensitivity, specificity, and positive and negative predictive value. It uses machine learning to develop predictive models for determining a probabilistic gold standard of subjects for assessment of cases and non-cases of health conditions. PheValuator was developed to complement or even replace the traditional approach of algorithm validation, i.e., by expert assessment of subject records through chart review. Results in our first PheValuator paper suggest a systematic underestimation of the PPV compared to previous results using chart review. In this paper we evaluate modifications made to the method designed to improve its performance. METHODS The major changes to PheValuator included allowing all diagnostic conditions, clinical observations, drug prescriptions, and laboratory measurements to be included as predictors within the modeling process whereas in the prior version there were significant restrictions on the included predictors. We also have allowed for the inclusion of the temporal relationships of the predictors in the model. To evaluate the performance of the new method, we compared the results from the new and original methods against results found from the literature using traditional validation of algorithms for 19 phenotypes. We performed these tests using data from five commercial databases. RESULTS In the assessment aggregating all phenotype algorithms, the median difference between the PheValuator estimate and the gold standard estimate for PPV was reduced from -21 (IQR -34, -3) in Version 1.0 to 4 (IQR -3, 15) using Version 2.0. We found a median difference in specificity of 3 (IQR 1, 4.25) for Version 1.0 and 3 (IQR 1, 4) for Version 2.0. The median difference between the two versions of PheValuator and the gold standard for estimates of sensitivity was reduced from -39 (-51, -20) to -16 (-34, -6). CONCLUSION PheValuator 2.0 produces estimates for the performance characteristics for phenotype algorithms that are significantly closer to estimates from traditional validation through chart review compared to version 1.0. With this tool in researcher's toolkits, methods, such as quantitative bias analysis, may now be used to improve the reliability and reproducibility of research studies using observational data.
Collapse
Affiliation(s)
- Joel N Swerdel
- Janssen Research and Development, Titusville, NJ, USA; Observational Health Data Sciences and Informatics (OHDSI), New York, NY.
| | - Martijn Schuemie
- Janssen Research and Development, Titusville, NJ, USA; Observational Health Data Sciences and Informatics (OHDSI), New York, NY
| | - Gayle Murray
- Janssen Research and Development, Titusville, NJ, USA
| | - Patrick B Ryan
- Janssen Research and Development, Titusville, NJ, USA; Columbia University, New York, NY, USA; Observational Health Data Sciences and Informatics (OHDSI), New York, NY
| |
Collapse
|
14
|
Mahmoudi E, Wu W, Najarian C, Aikens J, Bynum J, Vydiswaran VV. Identify Caregiver Availability Using Medical Notes: Rule-Based Natural Language Processing. JMIR Aging 2022; 5:e40241. [PMID: 35998328 PMCID: PMC9539648 DOI: 10.2196/40241] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Revised: 07/28/2022] [Accepted: 08/16/2022] [Indexed: 11/23/2022] Open
Abstract
Background Identifying caregiver availability, particularly for patients with dementia or those with a disability, is critical to informing the appropriate care planning by the health systems, hospitals, and providers. This information is not readily available, and there is a paucity of pragmatic approaches to automatically identifying caregiver availability and type. Objective Our main objective was to use medical notes to assess caregiver availability and type for hospitalized patients with dementia. Our second objective was to identify whether the patient lived at home or resided at an institution. Methods In this retrospective cohort study, we used 2016-2019 telephone-encounter medical notes from a single institution to develop a rule-based natural language processing (NLP) algorithm to identify the patient’s caregiver availability and place of residence. Using note-level data, we compared the results of the NLP algorithm with human-conducted chart abstraction for both training (749/976, 77%) and test sets (227/976, 23%) for a total of 223 adults aged 65 years and older diagnosed with dementia. Our outcomes included determining whether the patients (1) reside at home or in an institution, (2) have a formal caregiver, and (3) have an informal caregiver. Results Test set results indicated that our NLP algorithm had high level of accuracy and reliability for identifying whether patients had an informal caregiver (F1=0.94, accuracy=0.95, sensitivity=0.97, and specificity=0.93), but was relatively less able to identify whether the patient lived at an institution (F1=0.64, accuracy=0.90, sensitivity=0.51, and specificity=0.98). The most common explanations for NLP misclassifications across all categories were (1) incomplete or misspelled facility names; (2) past, uncertain, or undecided status; (3) uncommon abbreviations; and (4) irregular use of templates. Conclusions This innovative work was the first to use medical notes to pragmatically determine caregiver availability. Our NLP algorithm identified whether hospitalized patients with dementia have a formal or informal caregiver and, to a lesser extent, whether they lived at home or in an institutional setting. There is merit in using NLP to identify caregivers. This study serves as a proof of concept. Future work can use other approaches and further identify caregivers and the extent of their availability.
Collapse
Affiliation(s)
- Elham Mahmoudi
- Department of Family Medicine, Medical School, University of Michigan, Institute for healthcare Policy and Innovation, University of Michigan, NCRC Building 14, Room G2342800 Plymouth Rd., Ann Arbor, US
| | - Wenbo Wu
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, US
| | - Cyrus Najarian
- University of Michigan Medical School, University of Michigan, Ann Arbor, US
| | - James Aikens
- Department of Family Medicine, Medical School, University of Michigan, Ann Arbor, US
| | - Julie Bynum
- Medical School, University of Michigan, Ann Arbor, US
| | - Vg Vinod Vydiswaran
- Department of Learning Health Sciences, Medical School, University of Michigan, Ann Arbor, US
| |
Collapse
|
15
|
An electronic health record (EHR) phenotype algorithm to identify patients with attention deficit hyperactivity disorders (ADHD) and psychiatric comorbidities. J Neurodev Disord 2022; 14:37. [PMID: 35690720 PMCID: PMC9188139 DOI: 10.1186/s11689-022-09447-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 05/31/2022] [Indexed: 11/10/2022] Open
Abstract
Background In over half of pediatric cases, ADHD presents with comorbidities, and often, it is unclear whether the symptoms causing impairment are due to the comorbidity or the underlying ADHD. Comorbid conditions increase the likelihood for a more severe and persistent course and complicate treatment decisions. Therefore, it is highly important to establish an algorithm that identifies ADHD and comorbidities in order to improve research on ADHD using biorepository and other electronic record data. Methods It is feasible to accurately distinguish between ADHD in isolation from ADHD with comorbidities using an electronic algorithm designed to include other psychiatric disorders. We sought to develop an EHR phenotype algorithm to discriminate cases with ADHD in isolation from cases with ADHD with comorbidities more effectively for efficient future searches in large biorepositories. We developed a multi-source algorithm allowing for a more complete view of the patient’s EHR, leveraging the biobank of the Center for Applied Genomics (CAG) at Children’s Hospital of Philadelphia (CHOP). We mined EHRs from 2009 to 2016 using International Statistical Classification of Diseases and Related Health Problems (ICD) codes, medication history and keywords specific to ADHD, and comorbid psychiatric disorders to facilitate genotype-phenotype correlation efforts. Chart abstractions and behavioral surveys added evidence in support of the psychiatric diagnoses. Most notably, the algorithm did not exclude other psychiatric disorders, as is the case in many previous algorithms. Controls lacked psychiatric and other neurological disorders. Participants enrolled in various CAG studies at CHOP and completed a broad informed consent, including consent for prospective analyses of EHRs. We created and validated an EHR-based algorithm to classify ADHD and comorbid psychiatric status in a pediatric healthcare network to be used in future genetic analyses and discovery-based studies. Results In this retrospective case-control study that included data from 51,293 subjects, 5840 ADHD cases were discovered of which 46.1% had ADHD alone and 53.9% had ADHD with psychiatric comorbidities. Our primary study outcome was to examine whether the algorithm could identify and distinguish ADHD exclusive cases from ADHD comorbid cases. The results indicate ICD codes coupled with medication searches revealed the most cases. We discovered ADHD-related keywords did not increase yield. However, we found including ADHD-specific medications increased our number of cases by 21%. Positive predictive values (PPVs) were 95% for ADHD cases and 93% for controls. Conclusion We established a new algorithm and demonstrated the feasibility of the electronic algorithm approach to accurately diagnose ADHD and comorbid conditions, verifying the efficiency of our large biorepository for further genetic discovery-based analyses. Trial registration ClinicalTrials.gov, NCT02286817. First posted on 10 November 2014. ClinicalTrials.gov, NCT02777931. First posted on 19 May 2016. ClinicalTrials.gov, NCT03006367. First posted on 30 December 2016. ClinicalTrials.gov, NCT02895906. First posted on 12 September 2016. Supplementary Information The online version contains supplementary material available at 10.1186/s11689-022-09447-9.
Collapse
|
16
|
Klann JG, Strasser ZH, Hutch MR, Kennedy CJ, Marwaha JS, Morris M, Samayamuthu MJ, Pfaff AC, Estiri H, South AM, Weber GM, Yuan W, Avillach P, Wagholikar KB, Luo Y, Omenn GS, Visweswaran S, Holmes JH, Xia Z, Brat GA, Murphy SN. Distinguishing Admissions Specifically for COVID-19 From Incidental SARS-CoV-2 Admissions: National Retrospective Electronic Health Record Study. J Med Internet Res 2022; 24:e37931. [PMID: 35476727 PMCID: PMC9119395 DOI: 10.2196/37931] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Revised: 04/22/2022] [Accepted: 04/22/2022] [Indexed: 01/16/2023] Open
Abstract
BACKGROUND Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. Electronic health record (EHR)-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. Although the need to improve classification of COVID-19 versus incidental SARS-CoV-2 is well understood, the magnitude of the problems has only been characterized in small, single-center studies. Furthermore, there have been no peer-reviewed studies evaluating methods for improving classification. OBJECTIVE The aims of this study are to, first, quantify the frequency of incidental hospitalizations over the first 15 months of the pandemic in multiple hospital systems in the United States and, second, to apply electronic phenotyping techniques to automatically improve COVID-19 hospitalization classification. METHODS From a retrospective EHR-based cohort in 4 US health care systems in Massachusetts, Pennsylvania, and Illinois, a random sample of 1123 SARS-CoV-2 PCR-positive patients hospitalized from March 2020 to August 2021 was manually chart-reviewed and classified as "admitted with COVID-19" (incidental) versus specifically admitted for COVID-19 ("for COVID-19"). EHR-based phenotyping was used to find feature sets to filter out incidental admissions. RESULTS EHR-based phenotyped feature sets filtered out incidental admissions, which occurred in an average of 26% of hospitalizations (although this varied widely over time, from 0% to 75%). The top site-specific feature sets had 79%-99% specificity with 62%-75% sensitivity, while the best-performing across-site feature sets had 71%-94% specificity with 69%-81% sensitivity. CONCLUSIONS A large proportion of SARS-CoV-2 PCR-positive admissions were incidental. Straightforward EHR-based phenotypes differentiated admissions, which is important to assure accurate public health reporting and research.
Collapse
Affiliation(s)
- Jeffrey G Klann
- Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Zachary H Strasser
- Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Meghan R Hutch
- Department of Preventive Medicine, Northwestern University, Chicago, IL, United States
| | - Chris J Kennedy
- Center for Precision Psychiatry, Massachusetts General Hospital, Boston, MA, United States
| | - Jayson S Marwaha
- Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, United States
| | - Michele Morris
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | | | - Ashley C Pfaff
- Department of Surgery, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, United States
| | - Hossein Estiri
- Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Andrew M South
- Section of Nephrology, Department of Pediatrics, Brenner Children's, Wake Forest School of Medicine, Winston Salem, NC, United States
| | | | | | | | - Kavishwar B Wagholikar
- Laboratory of Computer Science, Department of Medicine, Massachusetts General Hospital, Boston, MA, United States
| | - Yuan Luo
- Department of Preventive Medicine, Northwestern University, Chicago, IL, United States
| | - Gilbert S Omenn
- Center for Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, United States
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, United States
| | - John H Holmes
- Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
| | - Zongqi Xia
- Department of Neurology, University of Pittsburgh, Pittsburgh, PA, United States
| | | | - Shawn N Murphy
- Department of Neurology, Massachusetts General Hospital, Boston, MA, United States
| |
Collapse
|
17
|
Harvey D, Lobban F, Rayson P, Warner A, Jones S. Natural Language Processing Methods and Bipolar Disorder: Scoping Review. JMIR Ment Health 2022; 9:e35928. [PMID: 35451984 PMCID: PMC9077496 DOI: 10.2196/35928] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 03/15/2022] [Accepted: 03/20/2022] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Health researchers are increasingly using natural language processing (NLP) to study various mental health conditions using both social media and electronic health records (EHRs). There is currently no published synthesis that relates specifically to the use of NLP methods for bipolar disorder, and this scoping review was conducted to synthesize valuable insights that have been presented in the literature. OBJECTIVE This scoping review explored how NLP methods have been used in research to better understand bipolar disorder and identify opportunities for further use of these methods. METHODS A systematic, computerized search of index and free-text terms related to bipolar disorder and NLP was conducted using 5 databases and 1 anthology: MEDLINE, PsycINFO, Academic Search Ultimate, Scopus, Web of Science Core Collection, and the ACL Anthology. RESULTS Of 507 identified studies, a total of 35 (6.9%) studies met the inclusion criteria. A narrative synthesis was used to describe the data, and the studies were grouped into four objectives: prediction and classification (n=25), characterization of the language of bipolar disorder (n=13), use of EHRs to measure health outcomes (n=3), and use of EHRs for phenotyping (n=2). Ethical considerations were reported in 60% (21/35) of the studies. CONCLUSIONS The current literature demonstrates how language analysis can be used to assist in and improve the provision of care for people living with bipolar disorder. Individuals with bipolar disorder and the medical community could benefit from research that uses NLP to investigate risk-taking, web-based services, social and occupational functioning, and the representation of gender in bipolar disorder populations on the web. Future research that implements NLP methods to study bipolar disorder should be governed by ethical principles, and any decisions regarding the collection and sharing of data sets should ultimately be made on a case-by-case basis, considering the risk to the data participants and whether their privacy can be ensured.
Collapse
Affiliation(s)
- Daisy Harvey
- Spectrum Centre for Mental Health Research, Division of Health Research, School of Health and Medicine, Lancaster University, Lancaster, United Kingdom
| | - Fiona Lobban
- Spectrum Centre for Mental Health Research, Division of Health Research, School of Health and Medicine, Lancaster University, Lancaster, United Kingdom
| | - Paul Rayson
- Department of Computing and Communications, Lancaster University, Lancaster, United Kingdom
| | - Aaron Warner
- Spectrum Centre for Mental Health Research, Division of Health Research, School of Health and Medicine, Lancaster University, Lancaster, United Kingdom
| | - Steven Jones
- Spectrum Centre for Mental Health Research, Division of Health Research, School of Health and Medicine, Lancaster University, Lancaster, United Kingdom
| |
Collapse
|
18
|
Birnbaum R, Mahjani B, Loos RJF, Sharp AJ. Clinical Characterization of Copy Number Variants Associated With Neurodevelopmental Disorders in a Large-scale Multiancestry Biobank. JAMA Psychiatry 2022; 79:250-259. [PMID: 35080590 PMCID: PMC8792794 DOI: 10.1001/jamapsychiatry.2021.4080] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 11/30/2021] [Indexed: 01/28/2023]
Abstract
IMPORTANCE Past studies identified rare copy number variants (CNVs) as risk factors for neurodevelopmental disorders (NDDs), including autism spectrum disorder and schizophrenia. However, the clinical characterization of NDD CNVs is understudied in population cohorts unselected for neuropsychiatric disorders and in cohorts of diverse ancestry. OBJECTIVE To identify individuals harboring NDD CNVs in a multiancestry biobank and to query their enrichment for select neuropsychiatric disorders as well as association with multiple medical disorders. DESIGN, SETTINGS, AND PARTICIPANTS In a series of phenotypic enrichment and association analyses, NDD CNVs were clinically characterized among 24 877 participants in the BioMe biobank, an electronic health record-linked biobank derived from the Mount Sinai Health System, New York, New York. Participants were recruited into the biobank since September 2007 across diverse ancestry and medical and neuropsychiatric specialties. For the current analyses, electronic health record data were analyzed from May 2004 through May 2019. MAIN OUTCOMES AND MEASURES NDD CNVs were identified using a consensus of 2 CNV calling algorithms, based on whole-exome sequencing and genotype array data, followed by novel in-silico clinical assessments. RESULTS Of 24 877 participants, 14 586 (58.7%) were female; self-reported ancestry categories included 5965 (24.0%) who were of African ancestry, 7892 (31.7%) who were of European ancestry, and 8536 (34.3%) who were of Hispanic ancestry; and the mean (SD) age was 50.5 (17.3) years. Among 24 877 individuals, the prevalence of 64 NDD CNVs was 2.5% (n = 627), with prevalence varying by locus, corroborating the presence of some relatively highly prevalent NDD CNVs (eg, 15q11.2 deletion/duplication). An aggregate set of NDD CNVs were enriched for congenital disorders (odds ratio, 2.0; 95% CI, 1.1-3.5; P = .01) and major depressive disorder (odds ratio, 1.5; 95% CI, 1.1-2.0; P = .01). In a meta-analysis of medical diagnoses (n = 195 hierarchically clustered diagnostic codes), NDD CNVs were significantly associated with several medical outcomes, including essential hypertension (z score = 3.6; P = 2.8 × 10-4), kidney failure (z score = 3.3; P = 1.1 × 10-3), and obstructive sleep apnea (z score = 3.4; P = 8.1 × 10-4) and, in another analysis, morbid obesity (z score = 3.8; P = 1.3 × 10-4). Further, NDD CNVs were associated with increased body mass index in a multiancestry analysis (β = 0.19; 95% CI, 0.10-0.31; P = .003). For 36 common serum tests, there was no association with NDD CNVs. CONCLUSIONS AND RELEVANCE Clinical features of individuals harboring NDD CNVs were elucidated in a large-scale, multiancestry biobank, identifying enrichments for congenital disorders and major depressive disorder as well as associations with several medical outcomes, including hypertension, kidney failure, and obesity and obesity-related phenotypes, specifically obstructive sleep apnea and increased body mass index. The association between NDD CNVs and obesity outcomes indicate further potential pleiotropy of NDD CNVs beyond neurodevelopmental outcomes previously reported. Future clinical genetic investigations may lead to insights of at-risk individuals and therapeutic strategies targeting specific genetic variants. The importance of diverse inclusion within biobanks and considering the effect of rare genetic variants in a multiancestry context is evident.
Collapse
Affiliation(s)
- Rebecca Birnbaum
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Behrang Mahjani
- Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York
- Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Ruth J. F. Loos
- Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York
- NovoNordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Andrew J. Sharp
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York
- The Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
19
|
Klann JG, Strasser ZH, Hutch MR, Kennedy CJ, Marwaha JS, Morris M, Samayamuthu MJ, Pfaff AC, Estiri H, South AM, Weber GM, Yuan W, Avillach P, Wagholikar KB, Luo Y, Omenn GS, Visweswaran S, Holmes JH, Xia Z, Brat GA, Murphy SN. Distinguishing Admissions Specifically for COVID-19 from Incidental SARS-CoV-2 Admissions: A National EHR Research Consortium Study. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2022:2022.02.10.22270728. [PMID: 35350202 PMCID: PMC8963684 DOI: 10.1101/2022.02.10.22270728] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Admissions are generally classified as COVID-19 hospitalizations if the patient has a positive SARS-CoV-2 polymerase chain reaction (PCR) test. However, because 35% of SARS-CoV-2 infections are asymptomatic, patients admitted for unrelated indications with an incidentally positive test could be misclassified as a COVID-19 hospitalization. EHR-based studies have been unable to distinguish between a hospitalization specifically for COVID-19 versus an incidental SARS-CoV-2 hospitalization. From a retrospective EHR-based cohort in four US healthcare systems, a random sample of 1,123 SARS-CoV-2 PCR-positive patients hospitalized between 3/2020â€"8/2021 was manually chart-reviewed and classified as admitted-with-COVID-19 (incidental) vs. specifically admitted for COVID-19 (for-COVID-19). EHR-based phenotyped feature sets filtered out incidental admissions, which occurred in 26%. The top site-specific feature sets had 79-99% specificity with 62-75% sensitivity, while the best performing across-site feature set had 71-94% specificity with 69-81% sensitivity. A large proportion of SARS-CoV-2 PCR-positive admissions were incidental. Straightforward EHR-based phenotypes differentiated admissions, which is important to assure accurate public health reporting and research.
Collapse
|
20
|
Loebel A, Koblan KS, Tsai J, Deng L, Fava M, Kent J, Hopkins SC. A Randomized, Double-blind, Placebo-controlled Proof-of-Concept Trial to Evaluate the Efficacy and Safety of Non-racemic Amisulpride (SEP-4199) for the Treatment of Bipolar I Depression. J Affect Disord 2022; 296:549-558. [PMID: 34614447 DOI: 10.1016/j.jad.2021.09.109] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/16/2021] [Accepted: 09/29/2021] [Indexed: 12/11/2022]
Abstract
BACKGROUND Non-racemic amisulpride (SEP-4199) is an 85:15 ratio of aramisulpride:esamisulpride with a 5-HT7 and D2 receptor binding profile optimized for the treatment of bipolar depression. The aim of this study was to evaluate the efficacy and safety of SEP-4199 for the treatment of bipolar depression. METHODS Patients meeting DSM-5 criteria for bipolar I depression were randomized to 6 weeks of double-blind, placebo-controlled treatment with SEP-4199 200 mg/d or 400 mg/d. The primary endpoint was change in the Montgomery-Asberg Depression Rating Scale (MADRS) at Week 6. The primary efficacy analysis population consisted of patients in Europe and US (n = 289); the secondary efficacy analysis population (ITT; n = 337) included patients in Japan. RESULTS Endpoint improvement in MADRS total score was observed on both the primary analysis for SEP-4199 200 mg/d (P = 0.054; effect size [ES], 0.31) and 400 mg/d (P = 0.054; ES, 0.29), and on the secondary (full ITT) analysis for SEP-4199 200 mg/d (P = 0.016; ES, 0.34) and 400 mg/d (P = 0.024; ES, 0.31). Study completion rates were 81% on SEP-4199 200 mg/d, 88% on 400 mg/d, and 86% on placebo. SEP-4199 had low rates of individual adverse events (<8%) and minimal effects on weight and lipids; median increases in prolactin were +83.6 μg/L on 200 mg/d, +95.2 μg/L on 400 mg/d compared with 0.0 μg/L on placebo. LIMITATIONS The study excluded patients with bipolar II depression and serious psychiatric or medical comorbidity. CONCLUSION Study results provide preliminary proof of concept, needing confirmation in subsequent randomized trials, for the efficacy of non-racemic amisulpride in bipolar depression.
Collapse
Affiliation(s)
- Antony Loebel
- Sunovion Pharmaceuticals Inc., Marlborough, MA, United States of America
| | - Kenneth S Koblan
- Sunovion Pharmaceuticals Inc., Marlborough, MA, United States of America.
| | - Joyce Tsai
- Sunovion Pharmaceuticals Inc., Marlborough, MA, United States of America
| | - Ling Deng
- Sunovion Pharmaceuticals Inc., Marlborough, MA, United States of America
| | - Maurizio Fava
- Department of Psychiatry, Massachusetts General Hospital, and Harvard Medical School, Boston, MA, United States of America
| | - Justine Kent
- Sunovion Pharmaceuticals Inc., Marlborough, MA, United States of America
| | - Seth C Hopkins
- Sunovion Pharmaceuticals Inc., Marlborough, MA, United States of America
| |
Collapse
|
21
|
Crema C, Attardi G, Sartiano D, Redolfi A. Natural language processing in clinical neuroscience and psychiatry: A review. Front Psychiatry 2022; 13:946387. [PMID: 36186874 PMCID: PMC9515453 DOI: 10.3389/fpsyt.2022.946387] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
Natural language processing (NLP) is rapidly becoming an important topic in the medical community. The ability to automatically analyze any type of medical document could be the key factor to fully exploit the data it contains. Cutting-edge artificial intelligence (AI) architectures, particularly machine learning and deep learning, have begun to be applied to this topic and have yielded promising results. We conducted a literature search for 1,024 papers that used NLP technology in neuroscience and psychiatry from 2010 to early 2022. After a selection process, 115 papers were evaluated. Each publication was classified into one of three categories: information extraction, classification, and data inference. Automated understanding of clinical reports in electronic health records has the potential to improve healthcare delivery. Overall, the performance of NLP applications is high, with an average F1-score and AUC above 85%. We also derived a composite measure in the form of Z-scores to better compare the performance of NLP models and their different classes as a whole. No statistical differences were found in the unbiased comparison. Strong asymmetry between English and non-English models, difficulty in obtaining high-quality annotated data, and train biases causing low generalizability are the main limitations. This review suggests that NLP could be an effective tool to help clinicians gain insights from medical reports, clinical research forms, and more, making NLP an effective tool to improve the quality of healthcare services.
Collapse
Affiliation(s)
- Claudio Crema
- Laboratory of Neuroinformatics, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | | | - Daniele Sartiano
- Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche, Pisa, Italy
| | - Alberto Redolfi
- Laboratory of Neuroinformatics, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| |
Collapse
|
22
|
Guo A, Stephens KA, Khan YM, Langabeer JR, Foraker RE. Women and ethnoracial minorities with poor cardiovascular health measures associated with a higher risk of developing mood disorder. BMC Med Inform Decis Mak 2021; 21:361. [PMID: 34952584 PMCID: PMC8709948 DOI: 10.1186/s12911-021-01674-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2020] [Accepted: 10/29/2021] [Indexed: 11/30/2022] Open
Abstract
Background Mood disorders (MDS) are a type of mental health illness that effects millions of people in the United States. Early prediction of MDS can give providers greater opportunity to treat these disorders. We hypothesized that longitudinal cardiovascular health (CVH) measurements would be informative for MDS prediction. Methods To test this hypothesis, the American Heart Association’s Guideline Advantage (TGA) dataset was used, which contained longitudinal EHR from 70 outpatient clinics. The statistical analysis and machine learning models were employed to identify the associations of the MDS and the longitudinal CVH metrics and other confounding factors. Results Patients diagnosed with MDS consistently had a higher proportion of poor CVH compared to patients without MDS, with the largest difference between groups for Body mass index (BMI) and Smoking. Race and gender were associated with status of CVH metrics. Approximate 46% female patients with MDS had a poor hemoglobin A1C compared to 44% of those without MDS; 62% of those with MDS had poor BMI compared to 47% of those without MDS; 59% of those with MDS had poor blood pressure (BP) compared to 43% of those without MDS; and 43% of those with MDS were current smokers compared to 17% of those without MDS. Conclusions Women and ethnoracial minorities with poor cardiovascular health measures were associated with a higher risk of development of MDS, which indicated the high utility for using routine medical records data collected in care to improve detection and treatment for MDS among patients with poor CVH. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-021-01674-9.
Collapse
Affiliation(s)
- Aixia Guo
- Institute for Informatics (I2), Washington University School of Medicine, St. Louis, MO, USA.
| | - Kari A Stephens
- Family Medicine, University of Washington School of Medicine, Seattle, WA, USA
| | - Yosef M Khan
- Health Informatics and Analytics, Centers for Health Metrics and Evaluation, American Heart Association, Dallas, TX, USA
| | - James R Langabeer
- School of Biomedical Informatics, Health Science Center at Houston, The University of Texas, Houston, TX, USA
| | - Randi E Foraker
- Institute for Informatics (I2), Washington University School of Medicine, St. Louis, MO, USA.,Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
23
|
An independently validated, portable algorithm for the rapid identification of COPD patients using electronic health records. Sci Rep 2021; 11:19959. [PMID: 34620889 PMCID: PMC8497529 DOI: 10.1038/s41598-021-98719-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Accepted: 08/25/2021] [Indexed: 11/24/2022] Open
Abstract
Electronic health records (EHR) provide an unprecedented opportunity to conduct large, cost-efficient, population-based studies. However, the studies of heterogeneous diseases, such as chronic obstructive pulmonary disease (COPD), often require labor-intensive clinical review and testing, limiting widespread use of these important resources. To develop a generalizable and efficient method for accurate identification of large COPD cohorts in EHRs, a COPD datamart was developed from 3420 participants meeting inclusion criteria in the Mass General Brigham Biobank. Training and test sets were selected and labeled with gold-standard COPD classifications obtained from chart review by pulmonologists. Multiple classes of algorithms were built utilizing both structured (e.g. ICD codes) and unstructured (e.g. medical notes) data via elastic net regression. Models explicitly including and excluding spirometry features were compared. External validation of the final algorithm was conducted in an independent biobank with a different EHR system. The final COPD classification model demonstrated excellent positive predictive value (PPV; 91.7%), sensitivity (71.7%), and specificity (94.4%). This algorithm performed well not only within the MGBB, but also demonstrated similar or improved classification performance in an independent biobank (PPV 93.5%, sensitivity 61.4%, specificity 90%). Ancillary comparisons showed that the classification model built including a binary feature for FEV1/FVC produced substantially higher sensitivity than those excluding. This study fills a gap in COPD research involving population-based EHRs, providing an important resource for the rapid, automated classification of COPD cases that is both cost-efficient and requires minimal information from unstructured medical records.
Collapse
|
24
|
Chapman M, Mumtaz S, Rasmussen LV, Karwath A, Gkoutos GV, Gao C, Thayer D, Pacheco JA, Parkinson H, Richesson RL, Jefferson E, Denaxas S, Curcin V. Desiderata for the development of next-generation electronic health record phenotype libraries. Gigascience 2021; 10:giab059. [PMID: 34508578 PMCID: PMC8434766 DOI: 10.1093/gigascience/giab059] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/15/2021] [Accepted: 08/18/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. METHODS A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. RESULTS We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. CONCLUSIONS There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.
Collapse
Affiliation(s)
- Martin Chapman
- Department of Population Health Sciences, King's College London, London, SE1 1UL, UK
| | - Shahzad Mumtaz
- Health Informatics Centre (HIC), University of Dundee, Dundee, DD1 9SY, UK
| | - Luke V Rasmussen
- Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Chuang Gao
- Health Informatics Centre (HIC), University of Dundee, Dundee, DD1 9SY, UK
| | - Dan Thayer
- SAIL Databank, Swansea University, Swansea, SA2 8PP, UK
| | - Jennifer A Pacheco
- Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Rachel L Richesson
- Department of Learning Health Sciences, University of Michigan Medical School, MI 48109, USA
| | - Emily Jefferson
- Health Informatics Centre (HIC), University of Dundee, Dundee, DD1 9SY, UK
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
| | - Vasa Curcin
- Department of Population Health Sciences, King's College London, London, SE1 1UL, UK
| |
Collapse
|
25
|
Berchuck SI, Jammal AA, Mukherjee S, Somers TJ, Medeiros FA. Impact of anxiety and depression on progression to glaucoma among glaucoma suspects. Br J Ophthalmol 2021; 105:1244-1249. [PMID: 32862132 PMCID: PMC9924953 DOI: 10.1136/bjophthalmol-2020-316617] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2020] [Revised: 07/24/2020] [Accepted: 08/01/2020] [Indexed: 01/12/2023]
Abstract
AIMS To assess the impact of anxiety and depression in the risk of converting to glaucoma in a cohort of glaucoma suspects followed over time. METHODS The study included a retrospective cohort of subjects with diagnosis of glaucoma suspect at baseline, extracted from the Duke Glaucoma Registry. The presence of anxiety and depression was defined based on electronic health records billing codes, medical history and problem list. Univariable and multivariable Cox proportional hazards models were used to obtain HRs for the risk of converting to glaucoma over time. Multivariable models were adjusted for age, gender, race, intraocular pressure measurements over time and disease severity at baseline. RESULTS A total of 3259 glaucoma suspects followed for an average of 3.60 (2.05) years were included in our cohort, of which 911 (28%) were diagnosed with glaucoma during follow-up. Prevalence of anxiety and depression were 32% and 33%, respectively. Diagnoses of anxiety, or concomitant anxiety and depression were significantly associated with risk of converting to glaucoma over time, with adjusted HRs (95% CI) of 1.16 (1.01, 1.33) and 1.27 (1.07, 1.50), respectively. CONCLUSION A history of anxiety or both anxiety and depression in glaucoma suspects was associated with developing glaucoma during follow-up.
Collapse
Affiliation(s)
- Samuel I. Berchuck
- Department of Statistical Science and Forge, Duke University, Durham, North Carolina, USA,Duke Eye Center and Department of Ophthalmology, Duke University, Durham, North Carolina, USA
| | - Alessandro A. Jammal
- Duke Eye Center and Department of Ophthalmology, Duke University, Durham, North Carolina, USA
| | - Sayan Mukherjee
- Departments of Statistical Science, Mathematics, Computer Science, Biostatistics & Bioinformatics, Duke University, Durham, North Carolina, USA
| | - Tamara J. Somers
- Department of Psychiatry and Behavioral Sciences, Duke University, Durham, North Carolina, USA
| | - Felipe A. Medeiros
- Duke Eye Center and Department of Ophthalmology, Duke University, Durham, North Carolina, USA
| |
Collapse
|
26
|
Estiri H, Strasser ZH, Murphy SN. High-throughput phenotyping with temporal sequences. J Am Med Inform Assoc 2021; 28:772-781. [PMID: 33313899 DOI: 10.1093/jamia/ocaa288] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 11/04/2020] [Indexed: 12/15/2022] Open
Abstract
OBJECTIVE High-throughput electronic phenotyping algorithms can accelerate translational research using data from electronic health record (EHR) systems. The temporal information buried in EHRs is often underutilized in developing computational phenotypic definitions. This study aims to develop a high-throughput phenotyping method, leveraging temporal sequential patterns from EHRs. MATERIALS AND METHODS We develop a representation mining algorithm to extract 5 classes of representations from EHR diagnosis and medication records: the aggregated vector of the records (aggregated vector representation), the standard sequential patterns (sequential pattern mining), the transitive sequential patterns (transitive sequential pattern mining), and 2 hybrid classes. Using EHR data on 10 phenotypes from the Mass General Brigham Biobank, we train and validate phenotyping algorithms. RESULTS Phenotyping with temporal sequences resulted in a superior classification performance across all 10 phenotypes compared with the standard representations in electronic phenotyping. The high-throughput algorithm's classification performance was superior or similar to the performance of previously published electronic phenotyping algorithms. We characterize and evaluate the top transitive sequences of diagnosis records paired with the records of risk factors, symptoms, complications, medications, or vaccinations. DISCUSSION The proposed high-throughput phenotyping approach enables seamless discovery of sequential record combinations that may be difficult to assume from raw EHR data. Transitive sequences offer more accurate characterization of the phenotype, compared with its individual components, and reflect the actual lived experiences of the patients with that particular disease. CONCLUSION Sequential data representations provide a precise mechanism for incorporating raw EHR records into downstream machine learning. Our approach starts with user interpretability and works backward to the technology.
Collapse
Affiliation(s)
- Hossein Estiri
- Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,Mass General Brigham, Boston, Massachusetts, USA
| | - Zachary H Strasser
- Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,Mass General Brigham, Boston, Massachusetts, USA
| | - Shawn N Murphy
- Harvard Medical School, Boston, Massachusetts, USA.,Massachusetts General Hospital, Boston, Massachusetts, USA.,Mass General Brigham, Boston, Massachusetts, USA
| |
Collapse
|
27
|
Liu L, Bustamante R, Earles A, Demb J, Messer K, Gupta S. A strategy for validation of variables derived from large-scale electronic health record data. J Biomed Inform 2021; 121:103879. [PMID: 34329789 PMCID: PMC9615095 DOI: 10.1016/j.jbi.2021.103879] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 07/21/2021] [Accepted: 07/24/2021] [Indexed: 11/16/2022]
Abstract
Purpose: Standardized approaches for rigorous validation of phenotyping from large-scale electronic health record (EHR) data have not been widely reported. We proposed a methodologically rigorous and efficient approach to guide such validation, including strategies for sampling cases and controls, determining sample sizes, estimating algorithm performance, and terminating the validation process, hereafter referred to as the San Diego Approach to Variable Validation (SDAVV). Methods: We propose sample size formulae which should be used prior to chart review, based on pre-specified critical lower bounds for positive predictive value (PPV) and negative predictive value (NPV). We also propose a stepwise strategy for iterative algorithm development/validation cycles, updating sample sizes for data abstraction until both PPV and NPV achieve target performance. Results: We applied the SDAVV to a Department of Veterans Affairs study in which we created two phenotyping algorithms, one for distinguishing normal colonoscopy cases from abnormal colonoscopy controls and one for identifying aspirin exposure. Estimated PPV and NPV both reached 0.970 with a 95% confidence lower bound of 0.915, estimated sensitivity was 0.963 and specificity was 0.975 for identifying normal colonoscopy cases. The phenotyping algorithm for identifying aspirin exposure reached a PPV of 0.990 (a 95% lower bound of 0.950), an NPV of 0.980 (a 95% lower bound of 0.930), and sensitivity and specificity were 0.960 and 1.000. Conclusions: A structured approach for prospectively developing and validating phenotyping algorithms from large-scale EHR data can be successfully implemented, and should be considered to improve the quality of “big data” research.
Collapse
Affiliation(s)
- Lin Liu
- VA San Diego Healthcare System, 3500 La Jolla Village Dr, San Diego, CA 92161, USA; University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA.
| | - Ranier Bustamante
- University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Ashley Earles
- Veterans Medical Research Foundation, 3350 La Jolla Village Dr, San Diego, CA 92161, USA
| | - Joshua Demb
- University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Karen Messer
- University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA
| | - Samir Gupta
- VA San Diego Healthcare System, 3500 La Jolla Village Dr, San Diego, CA 92161, USA; University of California San Diego, 9500 Gilman Dr, La Jolla, CA 92093, USA.
| |
Collapse
|
28
|
Abstract
Electronic health records (EHRs) are becoming a vital source of data for healthcare quality improvement, research, and operations. However, much of the most valuable information contained in EHRs remains buried in unstructured text. The field of clinical text mining has advanced rapidly in recent years, transitioning from rule-based approaches to machine learning and, more recently, deep learning. With new methods come new challenges, however, especially for those new to the field. This review provides an overview of clinical text mining for those who are encountering it for the first time (e.g., physician researchers, operational analytics teams, machine learning scientists from other domains). While not a comprehensive survey, this review describes the state of the art, with a particular focus on new tasks and methods developed over the past few years. It also identifies key barriers between these remarkable technical advances and the practical realities of implementation in health systems and in industry.
Collapse
Affiliation(s)
- Bethany Percha
- Department of Medicine and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10025, USA;
| |
Collapse
|
29
|
Teneralli RE, Kern DM, Cepeda MS, Gilbert JP, Drevets WC. Exploring real-world evidence to uncover unknown drug benefits and support the discovery of new treatment targets for depressive and bipolar disorders. J Affect Disord 2021; 290:324-333. [PMID: 34020207 DOI: 10.1016/j.jad.2021.04.096] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 02/19/2021] [Accepted: 04/25/2021] [Indexed: 12/28/2022]
Abstract
BACKGROUND Major depressive and bipolar disorders are associated with impaired quality of life and high economic burden. Although progress has been made in our understanding of the underlying pathophysiology and the development of novel pharmacological treatments, a large unmet need remains for finding effective treatment options. The purpose of this study was to identify potential new mechanisms of actions or treatment targets that could inform future research and development opportunities for major depressive and bipolar disorders. METHODS A self-controlled cohort study was conducted to examine associations between 1933 medications and incidence of major depressive and bipolar disorders across four US insurance claims databases. Presence of incident depressive or bipolar disorders were captured for each patient prior to or after drug exposure and incident rate ratios were calculated. Medications that demonstrated ≥50% reduction in risk for both depressive and bipolar disorders within two or more databases were evaluated as potential treatment targets. RESULTS Eight medications met our inclusion criteria, which fell into three treatment groups: drugs used in substance use disorders; drugs that affect the cholinergic system; and drugs used for the management of cardiovascular-related conditions. LIMITATIONS This study was not designed to confirm a causal association nor inform current clinical practice. Instead, this research and the methods employed intended to be hypothesis generating and help uncover potential treatment pathways that could warrant further investigation. CONCLUSIONS Several potential drug targets that could aid further research and discovery into novel treatments for depressive and bipolar disorders were identified.
Collapse
Affiliation(s)
- Rachel E Teneralli
- Janssen Research & Development, LLC., Epidemiology, Titusville, NJ, USA.
| | - David M Kern
- Janssen Research & Development, LLC., Epidemiology, Titusville, NJ, USA
| | - M Soledad Cepeda
- Janssen Research & Development, LLC., Epidemiology, Titusville, NJ, USA
| | - James P Gilbert
- Janssen Research & Development, LLC., Observational Health and Data Analytics, Raritan, NJ, USA
| | - Wayne C Drevets
- Janssen Research & Development, LLC., Neuroscience, San Diego, CA, USA
| |
Collapse
|
30
|
Le Glaz A, Haralambous Y, Kim-Dufor DH, Lenca P, Billot R, Ryan TC, Marsh J, DeVylder J, Walter M, Berrouiguet S, Lemey C. Machine Learning and Natural Language Processing in Mental Health: Systematic Review. J Med Internet Res 2021; 23:e15708. [PMID: 33944788 PMCID: PMC8132982 DOI: 10.2196/15708] [Citation(s) in RCA: 94] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 04/18/2020] [Accepted: 10/02/2020] [Indexed: 01/22/2023] Open
Abstract
BACKGROUND Machine learning systems are part of the field of artificial intelligence that automatically learn models from data to make better decisions. Natural language processing (NLP), by using corpora and learning approaches, provides good performance in statistical tasks, such as text classification or sentiment mining. OBJECTIVE The primary aim of this systematic review was to summarize and characterize, in methodological and technical terms, studies that used machine learning and NLP techniques for mental health. The secondary aim was to consider the potential use of these methods in mental health clinical practice. METHODS This systematic review follows the PRISMA (Preferred Reporting Items for Systematic Review and Meta-analysis) guidelines and is registered with PROSPERO (Prospective Register of Systematic Reviews; number CRD42019107376). The search was conducted using 4 medical databases (PubMed, Scopus, ScienceDirect, and PsycINFO) with the following keywords: machine learning, data mining, psychiatry, mental health, and mental disorder. The exclusion criteria were as follows: languages other than English, anonymization process, case studies, conference papers, and reviews. No limitations on publication dates were imposed. RESULTS A total of 327 articles were identified, of which 269 (82.3%) were excluded and 58 (17.7%) were included in the review. The results were organized through a qualitative perspective. Although studies had heterogeneous topics and methods, some themes emerged. Population studies could be grouped into 3 categories: patients included in medical databases, patients who came to the emergency room, and social media users. The main objectives were to extract symptoms, classify severity of illness, compare therapy effectiveness, provide psychopathological clues, and challenge the current nosography. Medical records and social media were the 2 major data sources. With regard to the methods used, preprocessing used the standard methods of NLP and unique identifier extraction dedicated to medical texts. Efficient classifiers were preferred rather than transparent functioning classifiers. Python was the most frequently used platform. CONCLUSIONS Machine learning and NLP models have been highly topical issues in medicine in recent years and may be considered a new paradigm in medical research. However, these processes tend to confirm clinical hypotheses rather than developing entirely new information, and only one major category of the population (ie, social media users) is an imprecise cohort. Moreover, some language-specific features can improve the performance of NLP methods, and their extension to other languages should be more closely investigated. However, machine learning and NLP techniques provide useful information from unexplored data (ie, patients' daily habits that are usually inaccessible to care providers). Before considering It as an additional tool of mental health care, ethical issues remain and should be discussed in a timely manner. Machine learning and NLP methods may offer multiple perspectives in mental health research but should also be considered as tools to support clinical practice.
Collapse
Affiliation(s)
- Aziliz Le Glaz
- URCI Mental Health Department, Brest Medical University Hospital, Brest, France
| | | | - Deok-Hee Kim-Dufor
- URCI Mental Health Department, Brest Medical University Hospital, Brest, France
| | - Philippe Lenca
- IMT Atlantique, Lab-STICC, UMR CNRS 6285, F-29238, Brest, France
| | - Romain Billot
- IMT Atlantique, Lab-STICC, UMR CNRS 6285, F-29238, Brest, France
| | - Taylor C Ryan
- Department of Mental Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Jonathan Marsh
- Fordham University Graduate School of Social Service, New York, NY, United States
| | - Jordan DeVylder
- Fordham University Graduate School of Social Service, New York, NY, United States
| | - Michel Walter
- URCI Mental Health Department, Brest Medical University Hospital, Brest, France
- EA 7479 SPURBO, Université de Bretagne Occidentale, Brest, France
| | - Sofian Berrouiguet
- URCI Mental Health Department, Brest Medical University Hospital, Brest, France
- IMT Atlantique, Lab-STICC, UMR CNRS 6285, F-29238, Brest, France
- EA 7479 SPURBO, Université de Bretagne Occidentale, Brest, France
- LaTIM, INSERM, UMR 1101, Brest, France
| | - Christophe Lemey
- URCI Mental Health Department, Brest Medical University Hospital, Brest, France
- IMT Atlantique, Lab-STICC, UMR CNRS 6285, F-29238, Brest, France
- EA 7479 SPURBO, Université de Bretagne Occidentale, Brest, France
| |
Collapse
|
31
|
Zhao Y, Fu S, Bielinski SJ, Decker PA, Chamberlain AM, Roger VL, Liu H, Larson NB. Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation. J Med Internet Res 2021; 23:e22951. [PMID: 33683212 PMCID: PMC7985804 DOI: 10.2196/22951] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Revised: 08/25/2020] [Accepted: 01/20/2021] [Indexed: 11/29/2022] Open
Abstract
Background Stroke is an important clinical outcome in cardiovascular research. However, the ascertainment of incident stroke is typically accomplished via time-consuming manual chart abstraction. Current phenotyping efforts using electronic health records for stroke focus on case ascertainment rather than incident disease, which requires knowledge of the temporal sequence of events. Objective The aim of this study was to develop a machine learning–based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing. Methods The algorithm was trained and validated using an existing epidemiology cohort consisting of 4914 patients with atrial fibrillation (AF) with manually curated incident stroke events. Various combinations of feature sets and machine learning classifiers were compared. Using a heuristic rule based on the composition of concepts and codes, we further detected the stroke subtype (ischemic stroke/transient ischemic attack or hemorrhagic stroke) of each identified stroke. The algorithm was further validated using a cohort (n=150) stratified sampled from a population in Olmsted County, Minnesota (N=74,314). Results Among the 4914 patients with AF, 740 had validated incident stroke events. The best-performing stroke phenotyping algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier. Among patients with stroke codes in the general population sample, the best-performing model achieved a positive predictive value of 86% (43/50; 95% CI 0.74-0.93) and a negative predictive value of 96% (96/100). For subtype identification, we achieved an accuracy of 83% in the AF cohort and 80% in the general population sample. Conclusions We developed and validated a machine learning–based algorithm that performed well for identifying incident stroke and for determining type of stroke. The algorithm also performed well on a sample from a general population, further demonstrating its generalizability and potential for adoption by other institutions.
Collapse
Affiliation(s)
- Yiqing Zhao
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Suzette J Bielinski
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Paul A Decker
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Alanna M Chamberlain
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Veronique L Roger
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| | - Nicholas B Larson
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
32
|
Liao KP, Sun J, Cai TA, Link N, Hong C, Huang J, Huffman JE, Gronsbell J, Zhang Y, Ho YL, Castro V, Gainer V, Murphy SN, O'Donnell CJ, Gaziano JM, Cho K, Szolovits P, Kohane IS, Yu S, Cai T. High-throughput multimodal automated phenotyping (MAP) with application to PheWAS. J Am Med Inform Assoc 2021; 26:1255-1262. [PMID: 31613361 DOI: 10.1093/jamia/ocz066] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Revised: 04/08/2019] [Accepted: 04/26/2019] [Indexed: 01/01/2023] Open
Abstract
OBJECTIVE Electronic health records linked with biorepositories are a powerful platform for translational studies. A major bottleneck exists in the ability to phenotype patients accurately and efficiently. The objective of this study was to develop an automated high-throughput phenotyping method integrating International Classification of Diseases (ICD) codes and narrative data extracted using natural language processing (NLP). MATERIALS AND METHODS We developed a mapping method for automatically identifying relevant ICD and NLP concepts for a specific phenotype leveraging the Unified Medical Language System. Along with health care utilization, aggregated ICD and NLP counts were jointly analyzed by fitting an ensemble of latent mixture models. The multimodal automated phenotyping (MAP) algorithm yields a predicted probability of phenotype for each patient and a threshold for classifying participants with phenotype yes/no. The algorithm was validated using labeled data for 16 phenotypes from a biorepository and further tested in an independent cohort phenome-wide association studies (PheWAS) for 2 single nucleotide polymorphisms with known associations. RESULTS The MAP algorithm achieved higher or similar AUC and F-scores compared to the ICD code across all 16 phenotypes. The features assembled via the automated approach had comparable accuracy to those assembled via manual curation (AUCMAP 0.943, AUCmanual 0.941). The PheWAS results suggest that the MAP approach detected previously validated associations with higher power when compared to the standard PheWAS method based on ICD codes. CONCLUSION The MAP approach increased the accuracy of phenotype definition while maintaining scalability, thereby facilitating use in studies requiring large-scale phenotyping, such as PheWAS.
Collapse
Affiliation(s)
- Katherine P Liao
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA
| | - Jiehuan Sun
- Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Tianrun A Cai
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA
| | - Nicholas Link
- Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA
| | - Chuan Hong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jie Huang
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | | | | | - Yichi Zhang
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.,University of Rhode Island, Kingston, RI, USA
| | - Yuk-Lam Ho
- Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA
| | | | | | - Shawn N Murphy
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Partners Healthcare Systems, Summerville, MA, USA.,Massachusetts General Hospital, Boston, MA, USA
| | - Christopher J O'Donnell
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA.,Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA
| | - J Michael Gaziano
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA
| | - Kelly Cho
- Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital, Boston, MA, USA.,Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA
| | - Peter Szolovits
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Isaac S Kohane
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sheng Yu
- Center for Statistical Science, Tsinghua University, Beijing, China.,Department of Industrial Engineering, Tsinghua University, Beijing, China.,Institute for Data Science, Tsinghua University, Beijing, China
| | - Tianxi Cai
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.,Division of Data Sciences, VA Boston Healthcare System, Boston, MA, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
33
|
Guo Z, Rakshit P, Herman DS, Chen J. Inference for the Case Probability in High-dimensional Logistic Regression. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2021; 22:254. [PMID: 35935001 PMCID: PMC9354733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Labeling patients in electronic health records with respect to their statuses of having a disease or condition, i.e. case or control statuses, has increasingly relied on prediction models using high-dimensional variables derived from structured and unstructured electronic health record data. A major hurdle currently is a lack of valid statistical inference methods for the case probability. In this paper, considering high-dimensional sparse logistic regression models for prediction, we propose a novel bias-corrected estimator for the case probability through the development of linearization and variance enhancement techniques. We establish asymptotic normality of the proposed estimator for any loading vector in high dimensions. We construct a confidence interval for the case probability and propose a hypothesis testing procedure for patient case-control labelling. We demonstrate the proposed method via extensive simulation studies and application to real-world electronic health record data.
Collapse
Affiliation(s)
- Zijian Guo
- Department of Statistics, Rutgers University, Piscataway, New Jersey, USA
| | - Prabrisha Rakshit
- Department of Statistics, Rutgers University, Piscataway, New Jersey, USA
| | - Daniel S Herman
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Jinbo Chen
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
34
|
Hart KL, Pellegrini AM, Forester BP, Berretta S, Murphy SN, Perlis RH, McCoy TH. Distribution of agitation and related symptoms among hospitalized patients using a scalable natural language processing method. Gen Hosp Psychiatry 2021; 68:46-51. [PMID: 33310013 PMCID: PMC7855889 DOI: 10.1016/j.genhosppsych.2020.11.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 11/03/2020] [Accepted: 11/04/2020] [Indexed: 01/29/2023]
Abstract
BACKGROUND Agitation is a common feature of many neuropsychiatric disorders. OBJECTIVE Understanding the prevalence, implications, and characteristics of agitation among hospitalized populations can facilitate more precise recognition of disability arising from neuropsychiatric diseases. METHODS We developed two agitation phenotypes using an expansion of expert curated term lists. These phenotypes were used to characterize five years of psychiatric admissions. The relationship of agitation symptoms and length of stay was examined. RESULTS Among 4548 psychiatric admissions, 1134 (24.9%) included documentation of agitation based on the primary agitation phenotype. These symptoms were greater among individuals with public insurance, and those with mania and psychosis compared to major depressive disorder. Greater symptoms were associated with longer hospital stay, with ~0.9 day increase in stay for every 10% increase in agitation phenotype. CONCLUSION Agitation was common at hospital admission and associated with diagnosis and longer length of stay. Characterizing agitation-related symptoms through natural language processing may provide new tools for understanding agitated behaviors and their relationship to delirium.
Collapse
Affiliation(s)
- Kamber L. Hart
- Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA
| | | | - Brent P. Forester
- Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA,McLean Hospital, 115 Mill St, Belmont, MA 02478, USA
| | - Sabina Berretta
- Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA; McLean Hospital, 115 Mill St, Belmont, MA 02478, USA.
| | - Shawn N. Murphy
- Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA,Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA
| | - Roy H. Perlis
- Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA,Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA
| | - Thomas H. McCoy
- Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA,Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA,Corresponding author at: Massachusetts General Hospital, 185 Cambridge Street, 6th Floor, Boston, MA 02114, USA. (T.H. McCoy)
| |
Collapse
|
35
|
Atuegwu NC, Oncken C, Laubenbacher RC, Perez MF, Mortensen EM. Factors Associated with E-Cigarette Use in U.S. Young Adult Never Smokers of Conventional Cigarettes: A Machine Learning Approach. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17197271. [PMID: 33027932 PMCID: PMC7579019 DOI: 10.3390/ijerph17197271] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 09/24/2020] [Accepted: 09/28/2020] [Indexed: 02/08/2023]
Abstract
E-cigarette use is increasing among young adult never smokers of conventional cigarettes, but the awareness of the factors associated with e-cigarette use in this population is limited. The goal of this work was to use machine learning (ML) algorithms to determine the factors associated with current e-cigarette use among US young adult never cigarette smokers. Young adult (18-34 years) never cigarette smokers from the 2016 and 2017 Behavioral Risk Factor Surveillance System (BRFSS) who reported current or never e-cigarette use were used for the analysis (n = 79,539). Variables associated with current e-cigarette use were selected by two ML algorithms (Boruta and Least absolute shrinkage and selection operator (LASSO)). Odds ratios were calculated to determine the association between e-cigarette use and the variables selected by the ML algorithms, after adjusting for age, gender and race/ethnicity and incorporating the BRFSS complex design. The prevalence of e-cigarette use varied across states. Factors previously reported in the literature, such as age, race/ethnicity, alcohol use, depression, as well as novel factors associated with e-cigarette use, such as disabilities, obesity, history of diabetes and history of arthritis were identified. These results can be used to generate further hypotheses for research, increase public awareness and help provide targeted e-cigarette education.
Collapse
Affiliation(s)
- Nkiruka C. Atuegwu
- Department of Medicine, University of Connecticut School of Medicine, Farmington, CT 06030, USA; (C.O.); (M.F.P.); (E.M.M.)
- Correspondence: ; Tel.: +1-860-0679-2372; Fax: +1-860-0679-8087
| | - Cheryl Oncken
- Department of Medicine, University of Connecticut School of Medicine, Farmington, CT 06030, USA; (C.O.); (M.F.P.); (E.M.M.)
| | | | - Mario F. Perez
- Department of Medicine, University of Connecticut School of Medicine, Farmington, CT 06030, USA; (C.O.); (M.F.P.); (E.M.M.)
| | - Eric M. Mortensen
- Department of Medicine, University of Connecticut School of Medicine, Farmington, CT 06030, USA; (C.O.); (M.F.P.); (E.M.M.)
| |
Collapse
|
36
|
Palumbo SA, Adamson KM, Krishnamurthy S, Manoharan S, Beiler D, Seiwell A, Young C, Metpally R, Crist RC, Doyle GA, Ferraro TN, Li M, Berrettini WH, Robishaw JD, Troiani V. Assessment of Probable Opioid Use Disorder Using Electronic Health Record Documentation. JAMA Netw Open 2020; 3:e2015909. [PMID: 32886123 PMCID: PMC7489858 DOI: 10.1001/jamanetworkopen.2020.15909] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
IMPORTANCE Electronic health records are a potentially valuable source of information for identifying patients with opioid use disorder (OUD). OBJECTIVE To evaluate whether proxy measures from electronic health record data can be used reliably to identify patients with probable OUD based on Diagnostic and Statistical Manual of Mental Disorders (Fifth Edition) (DSM-5) criteria. DESIGN, SETTING, AND PARTICIPANTS This retrospective cross-sectional study analyzed individuals within the Geisinger health system who were prescribed opioids between December 31, 2000, and May 31, 2017, using a mixed-methods approach. The cohort was identified from 16 253 patients enrolled in a contract-based, Geisinger-specific medication monitoring program (GMMP) for opioid use, including patients who maintained or violated contract terms, as well as a demographically matched control group of 16 253 patients who were prescribed opioids but not enrolled in the GMMP. Substance use diagnoses and psychiatric comorbidities were assessed using automated electronic health record summaries. A manual medical record review procedure using DSM-5 criteria for OUD was completed for a subset of patients. The analysis was conducted beginning from June 5, 2017, until May 29, 2020. MAIN OUTCOMES AND MEASURES The primary outcome was the prevalence of OUD as defined by proxy measures for DSM-5 criteria for OUD as well as the prevalence of comorbidities among patients prescribed opioids within an integrated health system. RESULTS Among the 16 253 patients enrolled in the GMMP (9309 women [57%]; mean [SD] age, 52 [14] years), OUD diagnoses as defined by diagnostic codes were present at a much lower rate than expected (291 [2%]), indicating the necessity for alternative diagnostic strategies. The DSM-5 criteria for OUD can be assessed using manual medical record review; a manual review of 200 patients in the GMMP and 200 control patients identifed a larger percentage of patients with probable moderate to severe OUD (GMMP, 145 of 200 [73%]; and control, 27 of 200 [14%]) compared with the prevalence of OUD assessed using diagnostic codes. CONCLUSIONS AND RELEVANCE These results suggest that patients with OUD may be identified using information available in the electronic health record, even when diagnostic codes do not reflect this diagnosis. Furthermore, the study demonstrates the utility of coding for DSM-5 criteria from medical records to generate a quantitative DSM-5 score that is associated with OUD severity.
Collapse
Affiliation(s)
- Sarah A. Palumbo
- Department of Biomedical Science, Schmidt College of Medicine of Florida Atlantic University, Boca Raton
| | | | | | | | | | | | - Colt Young
- Geisinger Clinic, Geisinger, Danville, Pennsylvania
| | - Raghu Metpally
- Department of Molecular and Functional Genomics, Geisinger, Danville, Pennsylvania
| | - Richard C. Crist
- Center for Neurobiology and Behavior, Department of Psychiatry, University of Pennsylvania Perelman School of Medicine, Philadelphia
| | - Glenn A. Doyle
- Center for Neurobiology and Behavior, Department of Psychiatry, University of Pennsylvania Perelman School of Medicine, Philadelphia
| | - Thomas N. Ferraro
- Center for Neurobiology and Behavior, Department of Psychiatry, University of Pennsylvania Perelman School of Medicine, Philadelphia
- Department of Biomedical Sciences, Cooper Medical School of Rowan University, Camden, New Jersey
| | - Mingyao Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia
| | - Wade H. Berrettini
- Geisinger Clinic, Geisinger, Danville, Pennsylvania
- Center for Neurobiology and Behavior, Department of Psychiatry, University of Pennsylvania Perelman School of Medicine, Philadelphia
| | - Janet D. Robishaw
- Department of Biomedical Science, Schmidt College of Medicine of Florida Atlantic University, Boca Raton
| | - Vanessa Troiani
- Geisinger Clinic, Geisinger, Danville, Pennsylvania
- Department of Imaging Science and Innovation, Geisinger, Danville, Pennsylvania
- Neuroscience Institute, Geisinger, Danville, Pennsylvania
- Department of Basic Sciences, Geisinger Commonwealth School of Medicine, Scranton, Pennsylvania
| |
Collapse
|
37
|
Vuijk PJ, Martin J, Braaten EB, Genovese G, Capawana MR, O’Keefe SM, Lee BA, Lind HS, Smoller JW, Faraone SV, Perlis RH, Doyle AE. Translating Discoveries in Attention-Deficit/Hyperactivity Disorder Genomics to an Outpatient Child and Adolescent Psychiatric Cohort. J Am Acad Child Adolesc Psychiatry 2020; 59:964-977. [PMID: 31421235 PMCID: PMC7408479 DOI: 10.1016/j.jaac.2019.08.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Revised: 05/29/2019] [Accepted: 08/08/2019] [Indexed: 01/10/2023]
Abstract
OBJECTIVE Genomic discoveries should be investigated in generalizable child psychiatric samples in order to justify and inform studies that will evaluate their use for specific clinical purposes. In youth consecutively referred for neuropsychiatric evaluation, we examined 1) the convergent and discriminant validity of attention-deficit/hyperactivity disorder (ADHD) polygenic risk scores (PRSs) in relation to DSM-based ADHD phenotypes; 2) the association of ADHD PRSs with phenotypes beyond ADHD that share its liability and have implications for outcome; and 3) the extent to which youth with high ADHD PRSs manifest a distinctive clinical profile. METHOD Participants were 433 youth, ages 7-18 years, from the Longitudinal Study of Genetic Influences on Cognition. We used logistic/linear regression and mixed effects models to examine associations with ADHD-related polygenic variation from the largest ADHD genome-wide association study to date. We replicated key findings in 5,140 adult patients from a local health system biobank. RESULTS Among referred youth, ADHD PRSs were associated with ADHD diagnoses, cross-diagnostic ADHD symptoms and academic impairment (odds ratios ∼1.4; R2 values ∼2%-3%), as well as cross-diagnostic variation in aggression and working memory. In adults, ADHD PRSs were associated with ADHD and phenotypes beyond the condition that have public health implications. Finally, youth with a high ADHD polygenic burden showed a more severe clinical profile than youth with a low burden (β coefficients ∼.2). CONCLUSION Among child and adolescent outpatients, ADHD polygenic risk was associated with ADHD and related phenotypes as well as clinical severity. These results extend the scientific foundation for studies of ADHD polygenic risk in the clinical setting and highlight directions for further research.
Collapse
Affiliation(s)
- Pieter J. Vuijk
- Center for Genomic Medicine, Massachusetts General Hospital, Boston
| | - Joanna Martin
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, UK,Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA
| | - Ellen B. Braaten
- Massachusetts General Hospital and Harvard Medical School, Massachusetts General Hospital, Boston
| | - Giulio Genovese
- Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA
| | - Michael R. Capawana
- Massachusetts General Hospital and Harvard Medical School, Massachusetts General Hospital, Boston
| | - Sheila M. O’Keefe
- Massachusetts General Hospital and Harvard Medical School, Massachusetts General Hospital, Boston
| | - B. Andi Lee
- Center for Genomic Medicine, Massachusetts General Hospital, Boston
| | - Hannah S. Lind
- Center for Genomic Medicine, Massachusetts General Hospital, Boston
| | - Jordan W. Smoller
- Center for Genomic Medicine, Massachusetts General Hospital, Boston,Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA,Massachusetts General Hospital and Harvard Medical School, Massachusetts General Hospital, Boston
| | | | - Roy H. Perlis
- Center for Genomic Medicine, Massachusetts General Hospital, Boston,Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA,Massachusetts General Hospital and Harvard Medical School, Massachusetts General Hospital, Boston,Center for Experimental Drugs and Diagnostics, Massachusetts General Hospital, Boston
| | - Alysa E. Doyle
- Center for Genomic Medicine, Massachusetts General Hospital, Boston,Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA,Massachusetts General Hospital and Harvard Medical School, Massachusetts General Hospital, Boston,Correspondence to Alysa E. Doyle, PhD, Center for Genomic Medicine, Massachusetts General Hospital, 185 Cambridge Street, CPZN 6240, Boston, MA 02114
| |
Collapse
|
38
|
Beesley LJ, Fritsche LG, Mukherjee B. An analytic framework for exploring sampling and observation process biases in genome and phenome-wide association studies using electronic health records. Stat Med 2020; 39:1965-1979. [PMID: 32198773 DOI: 10.1002/sim.8524] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 02/14/2020] [Accepted: 02/14/2020] [Indexed: 12/17/2022]
Abstract
Large-scale association analyses based on observational health care databases such as electronic health records have been a topic of increasing interest in the scientific community. However, challenges due to nonprobability sampling and phenotype misclassification associated with the use of these data sources are often ignored in standard analyses. The extent of the bias introduced by ignoring these factors is not well-characterized. In this paper, we develop an analytic framework for characterizing the bias expected in disease-gene association studies based on electronic health records when disease status misclassification and the sampling mechanism are ignored. Through a sensitivity analysis approach, this framework can be used to obtain plausible values for parameters of interest given summary results from standard analysis. We develop an online tool for performing this sensitivity analysis. Simulations demonstrate promising properties of the proposed method. We apply our approach to study bias in disease-gene association studies using electronic health record data from the Michigan Genomics Initiative, a longitudinal biorepository effort within The University Michigan health system.
Collapse
Affiliation(s)
- Lauren J Beesley
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Lars G Fritsche
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Bhramar Mukherjee
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| |
Collapse
|
39
|
Beesley LJ, Salvatore M, Fritsche LG, Pandit A, Rao A, Brummett C, Willer CJ, Lisabeth LD, Mukherjee B. The emerging landscape of health research based on biobanks linked to electronic health records: Existing resources, statistical challenges, and potential opportunities. Stat Med 2020; 39:773-800. [PMID: 31859414 PMCID: PMC7983809 DOI: 10.1002/sim.8445] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Revised: 09/10/2019] [Accepted: 11/16/2019] [Indexed: 01/03/2023]
Abstract
Biobanks linked to electronic health records provide rich resources for health-related research. With improvements in administrative and informatics infrastructure, the availability and utility of data from biobanks have dramatically increased. In this paper, we first aim to characterize the current landscape of available biobanks and to describe specific biobanks, including their place of origin, size, and data types. The development and accessibility of large-scale biorepositories provide the opportunity to accelerate agnostic searches, expedite discoveries, and conduct hypothesis-generating studies of disease-treatment, disease-exposure, and disease-gene associations. Rather than designing and implementing a single study focused on a few targeted hypotheses, researchers can potentially use biobanks' existing resources to answer an expanded selection of exploratory questions as quickly as they can analyze them. However, there are many obvious and subtle challenges with the design and analysis of biobank-based studies. Our second aim is to discuss statistical issues related to biobank research such as study design, sampling strategy, phenotype identification, and missing data. We focus our discussion on biobanks that are linked to electronic health records. Some of the analytic issues are illustrated using data from the Michigan Genomics Initiative and UK Biobank, two biobanks with two different recruitment mechanisms. We summarize the current body of literature for addressing these challenges and discuss some standing open problems. This work complements and extends recent reviews about biobank-based research and serves as a resource catalog with analytical and practical guidance for statisticians, epidemiologists, and other medical researchers pursuing research using biobanks.
Collapse
Affiliation(s)
| | | | | | - Anita Pandit
- University of Michigan, Department of Biostatistics
| | - Arvind Rao
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | - Chad Brummett
- University of Michigan, Department of Anesthesiology
| | - Cristen J. Willer
- University of Michigan, Department of Computational Medicine and Bioinformatics
| | | | | |
Collapse
|
40
|
Barak-Corren Y, Castro VM, Nock MK, Mandl KD, Madsen EM, Seiger A, Adams WG, Applegate RJ, Bernstam EV, Klann JG, McCarthy EP, Murphy SN, Natter M, Ostasiewski B, Patibandla N, Rosenthal GE, Silva GS, Wei K, Weber GM, Weiler SR, Reis BY, Smoller JW. Validation of an Electronic Health Record-Based Suicide Risk Prediction Modeling Approach Across Multiple Health Care Systems. JAMA Netw Open 2020; 3:e201262. [PMID: 32211868 PMCID: PMC11136522 DOI: 10.1001/jamanetworkopen.2020.1262] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Importance Suicide is a leading cause of mortality, with suicide-related deaths increasing in recent years. Automated methods for individualized risk prediction have great potential to address this growing public health threat. To facilitate their adoption, they must first be validated across diverse health care settings. Objective To evaluate the generalizability and cross-site performance of a risk prediction method using readily available structured data from electronic health records in predicting incident suicide attempts across multiple, independent, US health care systems. Design, Setting, and Participants For this prognostic study, data were extracted from longitudinal electronic health record data comprising International Classification of Diseases, Ninth Revision diagnoses, laboratory test results, procedures codes, and medications for more than 3.7 million patients from 5 independent health care systems participating in the Accessible Research Commons for Health network. Across sites, 6 to 17 years' worth of data were available, up to 2018. Outcomes were defined by International Classification of Diseases, Ninth Revision codes reflecting incident suicide attempts (with positive predictive value >0.70 according to expert clinician medical record review). Models were trained using naive Bayes classifiers in each of the 5 systems. Models were cross-validated in independent data sets at each site, and performance metrics were calculated. Data analysis was performed from November 2017 to August 2019. Main Outcomes and Measures The primary outcome was suicide attempt as defined by a previously validated case definition using International Classification of Diseases, Ninth Revision codes. The accuracy and timeliness of the prediction were measured at each site. Results Across the 5 health care systems, of the 3 714 105 patients (2 130 454 female [57.2%]) included in the analysis, 39 162 cases (1.1%) were identified. Predictive features varied by site but, as expected, the most common predictors reflected mental health conditions (eg, borderline personality disorder, with odds ratios of 8.1-12.9, and bipolar disorder, with odds ratios of 0.9-9.1) and substance use disorders (eg, drug withdrawal syndrome, with odds ratios of 7.0-12.9). Despite variation in geographical location, demographic characteristics, and population health characteristics, model performance was similar across sites, with areas under the curve ranging from 0.71 (95% CI, 0.70-0.72) to 0.76 (95% CI, 0.75-0.77). Across sites, at a specificity of 90%, the models detected a mean of 38% of cases a mean of 2.1 years in advance. Conclusions and Relevance Across 5 diverse health care systems, a computationally efficient approach leveraging the full spectrum of structured electronic health record data was able to detect the risk of suicidal behavior in unselected patients. This approach could facilitate the development of clinical decision support tools that inform risk reduction interventions.
Collapse
Affiliation(s)
- Yuval Barak-Corren
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
| | - Victor M Castro
- Partners Research Information Science and Computing, Boston, Massachusetts
| | - Matthew K Nock
- Department of Psychology, Harvard University, Cambridge, Massachusetts
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Emily M Madsen
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - Ashley Seiger
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts
| | - William G Adams
- Department of Pediatrics, Boston Medical Center, Boston University School of Medicine, Boston, Massachusetts
| | - R Joseph Applegate
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston
| | - Elmer V Bernstam
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston
- McGovern Medical School, Division of General Internal Medicine, The University of Texas Health Science Center at Houston, Houston
| | - Jeffrey G Klann
- Partners Research Information Science and Computing, Boston, Massachusetts
| | - Ellen P McCarthy
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| | - Shawn N Murphy
- Partners Research Information Science and Computing, Boston, Massachusetts
| | - Marc Natter
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
| | - Brian Ostasiewski
- Clinical and TranslationalScience Institute, Wake Forest School of Medicine, Winston-Salem, North Carolina
| | - Nandan Patibandla
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
| | - Gary E Rosenthal
- Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina
| | - George S Silva
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| | - Kun Wei
- Clinical and TranslationalScience Institute, Wake Forest School of Medicine, Winston-Salem, North Carolina
| | - Griffin M Weber
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
- Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| | - Sarah R Weiler
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts
| | - Ben Y Reis
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
| | - Jordan W Smoller
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, Massachusetts
| |
Collapse
|
41
|
Defining Major Depressive Disorder Cohorts Using the EHR: Multiple Phenotypes Based on ICD-9 Codes and Medication Orders. ACTA ACUST UNITED AC 2020; 36:18-26. [PMID: 32218644 DOI: 10.1016/j.npbr.2020.02.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Background Major Depressive Disorder (MDD) is one of the most common mental illnesses and a leading cause of disability worldwide. Electronic Health Records (EHR) allow researchers to conduct unprecedented large-scale observational studies investigating MDD, its disease development and its interaction with other health outcomes. While there exist methods to classify patients as clear cases or controls, given specific data requirements, there are presently no simple, generalizable, and validated methods to classify an entire patient population into varying groups of depression likelihood and severity. Methods We have tested a simple, pragmatic electronic phenotype algorithm that classifies patients into one of five mutually exclusive, ordinal groups, varying in depression phenotype. Using data from an integrated health system on 278,026 patients from a 10-year study period we have tested the convergent validity of these constructs using measures of external validation, including patterns of psychiatric prescriptions, symptom severity, indicators of suicidality, comorbidity, mortality, health care utilization, and polygenic risk scores for MDD. Results We found consistent patterns of increasing morbidity and/or adverse outcomes across the five groups, providing evidence for convergent validity. Limitations The study population is from a single rural integrated health system which is predominantly white, possibly limiting its generalizability. Conclusion Our study provides initial evidence that a simple algorithm, generalizable to most EHR data sets, provides categories with meaningful face and convergent validity that can be used for stratification of an entire patient population.
Collapse
|
42
|
Wang J, Deng H, Liu B, Hu A, Liang J, Fan L, Zheng X, Wang T, Lei J. Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed. J Med Internet Res 2020; 22:e16816. [PMID: 32012074 PMCID: PMC7005695 DOI: 10.2196/16816] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2019] [Revised: 12/05/2019] [Accepted: 12/15/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Natural language processing (NLP) is an important traditional field in computer science, but its application in medical research has faced many challenges. With the extensive digitalization of medical information globally and increasing importance of understanding and mining big data in the medical field, NLP is becoming more crucial. OBJECTIVE The goal of the research was to perform a systematic review on the use of NLP in medical research with the aim of understanding the global progress on NLP research outcomes, content, methods, and study groups involved. METHODS A systematic review was conducted using the PubMed database as a search platform. All published studies on the application of NLP in medicine (except biomedicine) during the 20 years between 1999 and 2018 were retrieved. The data obtained from these published studies were cleaned and structured. Excel (Microsoft Corp) and VOSviewer (Nees Jan van Eck and Ludo Waltman) were used to perform bibliometric analysis of publication trends, author orders, countries, institutions, collaboration relationships, research hot spots, diseases studied, and research methods. RESULTS A total of 3498 articles were obtained during initial screening, and 2336 articles were found to meet the study criteria after manual screening. The number of publications increased every year, with a significant growth after 2012 (number of publications ranged from 148 to a maximum of 302 annually). The United States has occupied the leading position since the inception of the field, with the largest number of articles published. The United States contributed to 63.01% (1472/2336) of all publications, followed by France (5.44%, 127/2336) and the United Kingdom (3.51%, 82/2336). The author with the largest number of articles published was Hongfang Liu (70), while Stéphane Meystre (17) and Hua Xu (33) published the largest number of articles as the first and corresponding authors. Among the first author's affiliation institution, Columbia University published the largest number of articles, accounting for 4.54% (106/2336) of the total. Specifically, approximately one-fifth (17.68%, 413/2336) of the articles involved research on specific diseases, and the subject areas primarily focused on mental illness (16.46%, 68/413), breast cancer (5.81%, 24/413), and pneumonia (4.12%, 17/413). CONCLUSIONS NLP is in a period of robust development in the medical field, with an average of approximately 100 publications annually. Electronic medical records were the most used research materials, but social media such as Twitter have become important research materials since 2015. Cancer (24.94%, 103/413) was the most common subject area in NLP-assisted medical research on diseases, with breast cancers (23.30%, 24/103) and lung cancers (14.56%, 15/103) accounting for the highest proportions of studies. Columbia University and the talents trained therein were the most active and prolific research forces on NLP in the medical field.
Collapse
Affiliation(s)
- Jing Wang
- School of Medical Informatics and Engineering, Southwest Medical University, Luzhou, China
| | - Huan Deng
- School of Medical Informatics and Engineering, Southwest Medical University, Luzhou, China
| | - Bangtao Liu
- School of Medical Informatics and Engineering, Southwest Medical University, Luzhou, China
| | - Anbin Hu
- School of Medical Informatics and Engineering, Southwest Medical University, Luzhou, China
| | - Jun Liang
- IT Center, Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China
| | - Lingye Fan
- Affiliated Hospital, Southwest Medical University, Luzhou, China
| | - Xu Zheng
- Center for Medical Informatics, Peking University, Beijing, China
| | - Tong Wang
- School of Public Health, Jilin University, Jilin, China
| | - Jianbo Lei
- School of Medical Informatics and Engineering, Southwest Medical University, Luzhou, China.,Center for Medical Informatics, Peking University, Beijing, China.,Institute of Medical Technology, Health Science Center, Peking University, Beijing, China
| |
Collapse
|
43
|
Walsh CG, Chaudhry B, Dua P, Goodman KW, Kaplan B, Kavuluru R, Solomonides A, Subbian V. Stigma, biomarkers, and algorithmic bias: recommendations for precision behavioral health with artificial intelligence. JAMIA Open 2020; 3:9-15. [PMID: 32607482 PMCID: PMC7309258 DOI: 10.1093/jamiaopen/ooz054] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2019] [Revised: 07/29/2019] [Accepted: 10/30/2019] [Indexed: 12/22/2022] Open
Abstract
Effective implementation of artificial intelligence in behavioral healthcare delivery depends on overcoming challenges that are pronounced in this domain. Self and social stigma contribute to under-reported symptoms, and under-coding worsens ascertainment. Health disparities contribute to algorithmic bias. Lack of reliable biological and clinical markers hinders model development, and model explainability challenges impede trust among users. In this perspective, we describe these challenges and discuss design and implementation recommendations to overcome them in intelligent systems for behavioral and mental health.
Collapse
Affiliation(s)
- Colin G Walsh
- Biomedical Informatics, Medicine and Psychiatry, Vanderbilt University Medical Center, 2525 West End, Suite 1475, Nashville, TN, USA
| | - Beenish Chaudhry
- School of Computing and Informatics, University of Louisiana at Lafayette, Lafayette, Louisiana, USA
| | - Prerna Dua
- Department of Health Informatics and Information Management, Louisiana Tech University, Ruston, Louisiana, USA
| | - Kenneth W Goodman
- Institute for Bioethics and Health Policy, University of Miami, Miller School of Medicine, Miami, Florida, USA
| | - Bonnie Kaplan
- Yale Center for Medical Informatics, Yale Bioethics Center, Yale Information Society, Yale Solomon Center for Health Law & Policy, Yale University, New Haven, Connecticut, USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, Department of Internal Medicine, University of Kentucky, Lexington, Kentucky, USA
| | - Anthony Solomonides
- Outcomes Research and Biomedical Informatics, NorthShore University HealthSystem, Research Institute, Evanston, Illinois, USA
| | - Vignesh Subbian
- Department of Biomedical Engineering, Department of Systems and Industrial Engineering, The University of Arizona, Tucson, Arizona, USA
| |
Collapse
|
44
|
High-throughput phenotyping with electronic medical record data using a common semi-supervised approach (PheCAP). Nat Protoc 2019; 14:3426-3444. [PMID: 31748751 DOI: 10.1038/s41596-019-0227-6] [Citation(s) in RCA: 83] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2018] [Accepted: 07/22/2019] [Indexed: 01/12/2023]
Abstract
Phenotypes are the foundation for clinical and genetic studies of disease risk and outcomes. The growth of biobanks linked to electronic medical record (EMR) data has both facilitated and increased the demand for efficient, accurate, and robust approaches for phenotyping millions of patients. Challenges to phenotyping with EMR data include variation in the accuracy of codes, as well as the high level of manual input required to identify features for the algorithm and to obtain gold standard labels. To address these challenges, we developed PheCAP, a high-throughput semi-supervised phenotyping pipeline. PheCAP begins with data from the EMR, including structured data and information extracted from the narrative notes using natural language processing (NLP). The standardized steps integrate automated procedures, which reduce the level of manual input, and machine learning approaches for algorithm training. PheCAP itself can be executed in 1-2 d if all data are available; however, the timing is largely dependent on the chart review stage, which typically requires at least 2 weeks. The final products of PheCAP include a phenotype algorithm, the probability of the phenotype for all patients, and a phenotype classification (yes or no).
Collapse
|
45
|
Abstract
Electronic Health Records (EHR) are a rich repository of valuable clinical information that exist in primary and secondary care databases. In order to utilize EHRs for medical observational research a range of algorithms for automatically identifying individuals with a specific phenotype have been developed. This review summarizes and offers a critical evaluation of the literature relating to studies conducted into the development of EHR phenotyping systems. This review describes phenotyping systems and techniques based on structured and unstructured EHR data. Articles published on PubMed and Google scholar between 2013 and 2017 have been reviewed, using search terms derived from Medical Subject Headings (MeSH). The popularity of using Natural Language Processing (NLP) techniques in extracting features from narrative text has increased. This increased attention is due to the availability of open source NLP algorithms, combined with accuracy improvement. In this review, Concept extraction is the most popular NLP technique since it has been used by more than 50% of the reviewed papers to extract features from EHR. High-throughput phenotyping systems using unsupervised machine learning techniques have gained more popularity due to their ability to efficiently and automatically extract a phenotype with minimal human effort.
Collapse
|
46
|
Zheutlin AB, Dennis J, Karlsson Linnér R, Moscati A, Restrepo N, Straub P, Ruderfer D, Castro VM, Chen CY, Ge T, Huckins LM, Charney A, Kirchner HL, Stahl EA, Chabris CF, Davis LK, Smoller JW. Penetrance and Pleiotropy of Polygenic Risk Scores for Schizophrenia in 106,160 Patients Across Four Health Care Systems. Am J Psychiatry 2019; 176:846-855. [PMID: 31416338 PMCID: PMC6961974 DOI: 10.1176/appi.ajp.2019.18091085] [Citation(s) in RCA: 147] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
OBJECTIVE Individuals at high risk for schizophrenia may benefit from early intervention, but few validated risk predictors are available. Genetic profiling is one approach to risk stratification that has been extensively validated in research cohorts. The authors sought to test the utility of this approach in clinical settings and to evaluate the broader health consequences of high genetic risk for schizophrenia. METHODS The authors used electronic health records for 106,160 patients from four health care systems to evaluate the penetrance and pleiotropy of genetic risk for schizophrenia. Polygenic risk scores (PRSs) for schizophrenia were calculated from summary statistics and tested for association with 1,359 disease categories, including schizophrenia and psychosis, in phenome-wide association studies. Effects were combined through meta-analysis across sites. RESULTS PRSs were robustly associated with schizophrenia (odds ratio per standard deviation increase in PRS, 1.55; 95% CI=1.4, 1.7), and patients in the highest risk decile of the PRS distribution had up to 4.6-fold higher odds of schizophrenia compared with those in the bottom decile (95% CI=2.9, 7.3). PRSs were also positively associated with other phenotypes, including anxiety, mood, substance use, neurological, and personality disorders, as well as suicidal behavior, memory loss, and urinary syndromes; they were inversely related to obesity. CONCLUSIONS The study demonstrates that an available measure of genetic risk for schizophrenia is robustly associated with schizophrenia in health care settings and has pleiotropic effects on related psychiatric disorders as well as other medical syndromes. The results provide an initial indication of the opportunities and limitations that may arise with the future application of PRS testing in health care systems.
Collapse
Affiliation(s)
- Amanda B Zheutlin
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Jessica Dennis
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Richard Karlsson Linnér
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Arden Moscati
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Nicole Restrepo
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Peter Straub
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Douglas Ruderfer
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Victor M Castro
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Chia-Yen Chen
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Tian Ge
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Laura M Huckins
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Alexander Charney
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - H Lester Kirchner
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Eli A Stahl
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Christopher F Chabris
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Lea K Davis
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| | - Jordan W Smoller
- Psychiatric and Neurodevelopmental Genetics Unit (Zheutlin, Chen, Ge, Smoller) and Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston (Chen); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Mass. (Zheutlin, Chen, Stahl, Smoller); Division of Genetic Medicine, Department of Medicine (Dennis, Straub, Ruderfer, Davis), Vanderbilt Genetics Institute (Dennis, Straub, Ruderfer, Davis), and Department of Biomedical Informatics (Ruderfer), Vanderbilt University Medical Center, Nashville; Department of Economics, School of Business and Economics, Vrije Universiteit Amsterdam, Amsterdam (Karlsson Linnér); Autism and Developmental Medicine Institute, Geisinger, Lewisburg, Pa. (Karlsson Linnér, Chabris); Charles Bronfman Institute for Personalized Medicine (Moscati), Pamela Sklar Division of Psychiatric Genomics (Huckins, Charney, Stahl), and Department of Genetics and Genomic Sciences (Huckins, Charney, Stahl, ), Icahn School of Medicine at Mount Sinai, New York; Department of Biomedical and Translational Informatics, Geisinger, Rockville, Md. (Restrepo, Kirchner); Research Information Science and Computing, Partners HealthCare, Somerville, Mass. (Castro)
| |
Collapse
|
47
|
White JM, Mertz EA, Mullins JM, Even JB, Guy T, Blaga E, Kottek AM, Kumar SV, Bangar S, Vaderhobli R, Brandon R, Santo W, Jenson L, Gansky SA. Developing and Testing Electronic Health Record-Derived Caries Indices. Caries Res 2019; 53:650-658. [PMID: 31167186 DOI: 10.1159/000499700] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 03/18/2019] [Indexed: 12/15/2022] Open
Abstract
Caries indices, the basis of epidemiologic caries measures, are not easily obtained in clinical settings. This study's objective was to design, test, and validate an automated program (Valid Electronic Health Record Dental Caries Indices Calculator Tool [VERDICT]) to calculate caries indices from an electronic health record (EHR). Synthetic use case scenarios and actual patient cases of primary, mixed, and permanent dentition, including decayed, missing, and filled teeth (DMFT/dmft) and tooth surfaces (DMFS/dmfs) were entered into the EHR. VERDICT measures were compared to a previously validated clinical electronic data capture (EDC) system and statistical program to calculate caries indices. Four university clinician-researchers abstracted EHR caries exam data for 45 synthetic use cases into the EDC and post-processed with SAS software creating a gold standard to compare the -VERDICT-derived caries indices. Then, 2 senior researchers abstracted EHR caries exam data and calculated caries indices for 24 patients, allowing further comparisons to VERDICT indices. Agreement statistics were computed among abstractors, and discrepancies were resolved by consensus. Agreement statistics between the 2 final-phase abstractors and the VERDICT measures showed extremely high concordance: Lin's concordance coefficients (LCCs) >0.99 for dmfs, dmft, DS, ds, DT, dt, ms, mt, FS, fs, FT, and ft; LCCs >0.95 for DMFS and DMFT; and LCCs of 0.92-0.93 for MS and MT. Caries indices, essential to developing primary health outcome measures for research, can be reliably derived from an EHR using VERDICT. Using these indices will enable population oral health management approaches and inform quality improvement efforts.
Collapse
Affiliation(s)
- Joel M White
- Department of Preventive and Restorative Dental Sciences, University of California, San Francisco, San Francisco, California, USA, .,Center to Address Disparities in Children's Oral Health, University of California, San Francisco, San Francisco, California, USA,
| | - Elizabeth A Mertz
- Department of Preventive and Restorative Dental Sciences, University of California, San Francisco, San Francisco, California, USA.,Center to Address Disparities in Children's Oral Health, University of California, San Francisco, San Francisco, California, USA.,Philip R. Lee Institute for Health Policy Studies, University of California, San Francisco, San Francisco, California, USA
| | - Joanna M Mullins
- Willamette Dental Group and Skourtes Institute, Hillsboro, Oregon, USA
| | - Joshua B Even
- Willamette Dental Group and Skourtes Institute, Hillsboro, Oregon, USA
| | - Trey Guy
- Willamette Dental Group and Skourtes Institute, Hillsboro, Oregon, USA
| | - Elena Blaga
- Willamette Dental Group and Skourtes Institute, Hillsboro, Oregon, USA
| | - Aubri M Kottek
- Center to Address Disparities in Children's Oral Health, University of California, San Francisco, San Francisco, California, USA.,Philip R. Lee Institute for Health Policy Studies, University of California, San Francisco, San Francisco, California, USA
| | - Shwetha V Kumar
- School of Dentistry, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Suhasini Bangar
- School of Dentistry, The University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Ram Vaderhobli
- Department of Preventive and Restorative Dental Sciences, University of California, San Francisco, San Francisco, California, USA
| | - Ryan Brandon
- Willamette Dental Group and Skourtes Institute, Hillsboro, Oregon, USA
| | - William Santo
- Department of Preventive and Restorative Dental Sciences, University of California, San Francisco, San Francisco, California, USA.,Center to Address Disparities in Children's Oral Health, University of California, San Francisco, San Francisco, California, USA
| | - Larry Jenson
- Department of Preventive and Restorative Dental Sciences, University of California, San Francisco, San Francisco, California, USA
| | - Stuart A Gansky
- Department of Preventive and Restorative Dental Sciences, University of California, San Francisco, San Francisco, California, USA.,Center to Address Disparities in Children's Oral Health, University of California, San Francisco, San Francisco, California, USA.,Philip R. Lee Institute for Health Policy Studies, University of California, San Francisco, San Francisco, California, USA.,Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA
| |
Collapse
|
48
|
Williams K, Shorser-Gentile L, Sarvode Mothi S, Berman N, Pasternack M, Geller D, Walter J. Immunoglobulin A Dysgammaglobulinemia Is Associated with Pediatric-Onset Obsessive-Compulsive Disorder. J Child Adolesc Psychopharmacol 2019; 29:268-275. [PMID: 30892924 PMCID: PMC7227412 DOI: 10.1089/cap.2018.0043] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Background: Inflammation and immune dysregulation have been implicated in the pathogenesis of pediatric-onset obsessive-compulsive disorder (OCD) and tic disorders such as Tourette syndrome (TS). Though few replicated studies have identified markers of immune dysfunction in this population, preliminary studies suggest that serum immunoglobulin A (IgA) concentrations may be abnormal in these children with these disorders. Methods: This observational retrospective cohort study, conducted using electronic health records (EHRs), identified 206 children with pediatric-onset OCD and 1024 adults diagnosed with OCD who also had testing for serum levels of IgA. IgA deficiency and serum IgA levels in pediatric OCD were compared with IgA levels from children diagnosed with autism spectrum disorders (ASD; n = 524), tic disorders (n = 157), attention-deficit/hyperactivity disorder (ADHD; n = 534), anxiety disorders (n = 1206), and celiac disease, a condition associated with IgA deficiency (n = 624). Results: Compared with ASD and anxiety disorder cohorts, the pediatric OCD cohort displayed a significantly higher likelihood of IgA deficiency (OR = 1.93; 95% CI = 1.18-3.16, and OR = 1.98; 95% CI = 1.28-3.06, respectively), though no difference was observed between pediatric OCD and TS cohorts. Furthermore, the pediatric OCD cohort displayed similar rates of IgA deficiency and serum IgA levels when compared with the celiac disease cohort. The pediatric OCD cohort also displayed the highest percentage of IgA deficiency (15%,) when compared with TS (14%), celiac disease (14%), ADHD (13%), ASD (8%), and anxiety disorder (8%) cohorts. When segregated by sex, boys with OCD displayed a significantly higher likelihood of IgA deficiency when compared with all comparison cohorts except for celiac disease and tic disorders; no significant difference in IgA deficiency was observed between female cohorts. Pediatric OCD subjects also displayed significantly lower adjusted serum IgA levels than the ASD and anxiety disorder cohorts. Adults with OCD were also significantly less likely than children with OCD to display IgA deficiency (OR = 2.71; 95% CI = 1.71-4.28). When compared with children with celiac disease, no significant difference in IgA levels or rates of IgA deficiency were observed in the pediatric OCD cohort. Conclusions: We provide further evidence of IgA abnormalities in pediatric-onset OCD. These results require further investigation to determine if these abnormalities impact the clinical course of OCD in children.
Collapse
Affiliation(s)
- Kyle Williams
- Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts.,Department of Psychiatry, Harvard Medical School, Boston, Massachusetts.,Address correspondence to: Kyle Williams, MD, PhD, Department of Psychiatry, Massachusetts General Hospital, Simches Research Building, Suite 2000, 185 Cambridge Street, Boston, MA 02114
| | | | - Suraj Sarvode Mothi
- Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts
| | - Noah Berman
- Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts
| | - Mark Pasternack
- Pediatric Infectious Disease Program, Massachusetts General Hospital, Boston, Massachusetts
| | - Daniel Geller
- Department of Psychiatry, Massachusetts General Hospital, Boston, Massachusetts.,Department of Psychiatry, Harvard Medical School, Boston, Massachusetts
| | - Jolan Walter
- Allergy, Immunology, and Infectious Disease Program, University of South Florida, St. Petersburg, Florida
| |
Collapse
|
49
|
Edgcomb JB, Zima B. Machine Learning, Natural Language Processing, and the Electronic Health Record: Innovations in Mental Health Services Research. Psychiatr Serv 2019; 70:346-349. [PMID: 30784377 DOI: 10.1176/appi.ps.201800401] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
An unprecedented amount of clinical information is now available via electronic health records (EHRs). These massive data sets have stimulated opportunities to adapt computational approaches to track and identify target areas for quality improvement in mental health care. In this column, three key areas of EHR data science are described: EHR phenotyping, natural language processing, and predictive modeling. For each of these computational approaches, case examples are provided to illustrate their role in mental health services research. Together, adaptation of these methods underscores the need for standardization and transparency while recognizing the opportunities and challenges ahead.
Collapse
Affiliation(s)
- Juliet Beni Edgcomb
- Department of Psychiatry and Behavioral Sciences (Edgcomb, Zima) and Center for Health Services and Society (Zima), University of California, Los Angeles, Los Angeles
| | - Bonnie Zima
- Department of Psychiatry and Behavioral Sciences (Edgcomb, Zima) and Center for Health Services and Society (Zima), University of California, Los Angeles, Los Angeles
| |
Collapse
|
50
|
Dennis J, Yengo-Kahn AM, Kirby P, Solomon GS, Cox NJ, Zuckerman SL. Diagnostic Algorithms to Study Post-Concussion Syndrome Using Electronic Health Records: Validating a Method to Capture an Important Patient Population. J Neurotrauma 2019; 36:2167-2177. [PMID: 30773988 DOI: 10.1089/neu.2018.5916] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
Post-concussion syndrome (PCS) is characterized by persistent cognitive, somatic, and emotional symptoms after a mild traumatic brain injury (mTBI). Genetic and other biological variables may contribute to PCS etiology, and the emergence of biobanks linked to electronic health records (EHRs) offers new opportunities for research on PCS. We sought to validate the EHR data of PCS patients by comparing two diagnostic algorithms deployed in the Vanderbilt University Medical Center de-identified database of 2.8 million patient EHRs. The algorithms identified individuals with PCS by: 1) natural language processing (NLP) of narrative text in the EHR combined with structured demographic, diagnostic, and encounter data; or 2) coded billing and procedure data. The predictive value of each algorithm was assessed, and cases and controls identified by each approach were compared on demographic and medical characteristics. The NLP algorithm identified 507 cases and 10,857 controls. The negative predictive value in controls was 78% and the positive predictive value (PPV) in cases was 82%. Conversely, the coded algorithm identified 1142 patients with two or more PCS billing codes and had a PPV of 76%. Comparisons of PCS controls to both case groups recovered known epidemiology of PCS: cases were more likely than controls to be female and to have pre-morbid diagnoses of anxiety, migraine, and post-traumatic stress disorder. In contrast, controls and cases were equally likely to have attention deficit hyperactive disorder and learning disabilities, in accordance with the findings of recent systematic reviews of PCS risk factors. We conclude that EHRs are a valuable research tool for PCS. Ascertainment based on coded data alone had a predictive value comparable to an NLP algorithm, recovered known PCS risk factors, and maximized the number of included patients.
Collapse
Affiliation(s)
- Jessica Dennis
- 1 Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee.,2 Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Aaron M Yengo-Kahn
- 3 Vanderbilt Sports Concussion Center, Vanderbilt University School of Medicine, Nashville, Tennessee.,4 Department of Neurological Surgery, Vanderbilt University School of Medicine, Nashville, Tennessee
| | - Paul Kirby
- 3 Vanderbilt Sports Concussion Center, Vanderbilt University School of Medicine, Nashville, Tennessee
| | - Gary S Solomon
- 3 Vanderbilt Sports Concussion Center, Vanderbilt University School of Medicine, Nashville, Tennessee.,4 Department of Neurological Surgery, Vanderbilt University School of Medicine, Nashville, Tennessee
| | - Nancy J Cox
- 1 Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee.,2 Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Scott L Zuckerman
- 3 Vanderbilt Sports Concussion Center, Vanderbilt University School of Medicine, Nashville, Tennessee.,4 Department of Neurological Surgery, Vanderbilt University School of Medicine, Nashville, Tennessee
| |
Collapse
|