1
|
Zhu T, Mu D, Hu Y, Cao Y, Yuan M, Xu J, Ye HQ, Zhang W. Association of clinical phenotypes of depression with comorbid conditions, treatment patterns and outcomes: a 10-year region-based cohort study. Transl Psychiatry 2024; 14:504. [PMID: 39719438 DOI: 10.1038/s41398-024-03213-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 12/05/2024] [Accepted: 12/16/2024] [Indexed: 12/26/2024] Open
Abstract
Depression is a heterogeneous and complex psychological syndrome with highly variable manifestations, which poses difficulties for treatment and prognosis. Depression patients are prone to developing various comorbidities, which stem from different pathophysiological mechanisms, remaining largely understudied. The current study focused on identifying comorbidity-specific phenotypes, and whether these clustered phenotypes are associated with different treatment patterns, clinical manifestations, physiological characteristics, and prognosis. We have conducted a 10-year retrospective observational cohort study using electronic medical records (EMR) for 11,818 patients diagnosed with depression and hospitalized at a large academic medical center in Chengdu, China. K-means clustering and visualization methods were performed to identify phenotypic categories. The association between phenotypic categories and clinical outcomes was evaluated using adjusted Cox proportional hazards model. We classified patients with depression into five stable phenotypic categories, including 15 statistically driven clusters in the discovery cohort (n = 9925) and the validation cohort (n = 1893), respectively. The categories include: (Category A) the lowest incidence of comorbidity, with prominent suicide, psychotic, and somatic symptoms (n = 3493/9925); (Category B) moderate comorbidity rate, with prominent anhedonia and anxious symptoms (n = 1795/9925); (Category C) the highest incidence of comorbidity of endocrine/metabolic and digestive system diseases (n = 1702/9925); (Category D) the highest incidence of comorbidity of neurological, mental and behavioral diseases (n = 881/9925); (Category E) other diseases comorbid with depression (n = 2054/9925). Patients in Category E had the lowest risk of psychiatric rehospitalization within 60-day follow-up, followed by Category C (HR, 1.57; 95% CI, 1.07-2.30), Category B (HR, 1.61; 95% CI, 1.10-2.40), Category A (HR, 1.82; 95% CI, 1.28-2.60), and Category D (HR, 2.38; 95% CI, 1.59-3.60) with P < 0.05, after adjustment for comorbidities, medications, and age. Regarding other longer observation windows (90-day, 180-day and 365-day), patients in Category D showed the highest rehospitalization risk all the time while there were notable shifts in rankings observed for Categories A, B and C over time. The results indicate that the higher the severity of mental illness in patients with five phenotypic categories, the greater the risk of rehospitalization. These phenotypes are associated with various pathways, including the cardiometabolic system, chronic inflammation, digestive system, neurological system, and mental and behavioral disorders. These pathways play a crucial role in connecting depression with other psychiatric and somatic diseases. The identified phenotypes exhibit notable distinctions in terms of comorbidity patterns, symptomology, biological characteristics, treatment approaches, and clinical outcomes.
Collapse
Affiliation(s)
- Ting Zhu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China
| | - Di Mu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, VIC 3053, Australia
| | - Yao Hu
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China
- Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China
| | - Yang Cao
- Melbourne School of Psychological Sciences, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Minlan Yuan
- Mental Health Center of West China Hospital, Sichuan University, Chengdu, 610041, China
| | - Jia Xu
- The First Psychiatric Hospital of Harbin, Harbin, 150056, China
| | - Heng-Qing Ye
- Faculty of Business, Hong Kong Polytechnic University, Hong Kong, 100872, China.
| | - Wei Zhang
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, 610041, China.
- Med-X Center for Informatics, Sichuan University, Chengdu, 610041, China.
- Mental Health Center of West China Hospital, Sichuan University, Chengdu, 610041, China.
| |
Collapse
|
2
|
Sharma A, Verhaak PF, McCoy TH, Perlis RH, Doshi-Velez F. Identifying data-driven subtypes of major depressive disorder with electronic health records. J Affect Disord 2024; 356:64-70. [PMID: 38565338 DOI: 10.1016/j.jad.2024.03.162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Revised: 03/25/2024] [Accepted: 03/26/2024] [Indexed: 04/04/2024]
Abstract
BACKGROUND Efforts to reduce the heterogeneity of major depressive disorder (MDD) by identifying subtypes have not yet facilitated treatment personalization or investigation of biology, so novel approaches merit consideration. METHODS We utilized electronic health records drawn from 2 academic medical centers and affiliated health systems in Massachusetts to identify data-driven subtypes of MDD, characterizing sociodemographic features, comorbid diagnoses, and treatment patterns. We applied Latent Dirichlet Allocation (LDA) to summarize diagnostic codes followed by agglomerative clustering to define patient subgroups. RESULTS Among 136,371 patients (95,034 women [70 %]; 41,337 men [30 %]; mean [SD] age, 47.0 [14.0] years), the 15 putative MDD subtypes were characterized by comorbidities and distinct patterns in medication use. There was substantial variation in rates of selective serotonin reuptake inhibitor (SSRI) use (from a low of 62 % to a high of 78 %) and selective norepinephrine reuptake inhibitor (SNRI) use (from 4 % to 21 %). LIMITATIONS Electronic health records lack reliable symptom-level data, so we cannot examine the extent to which subtypes might differ in clinical presentation or symptom dimensions. CONCLUSION These data-driven subtypes, drawing on representative clinical cohorts, merit further investigation for their utility in identifying more homogeneous patient populations for basic as well as clinical investigation.
Collapse
Affiliation(s)
- Abhishek Sharma
- Harvard John A. Paulson School of Engineering and Applied Sciences, 29 Oxford Street, Cambridge, MA 02138, United States of America
| | - Pilar F Verhaak
- Center for Quantitative Health, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114, United States of America
| | - Thomas H McCoy
- Center for Quantitative Health, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114, United States of America; Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, United States of America
| | - Roy H Perlis
- Center for Quantitative Health, Massachusetts General Hospital, 185 Cambridge Street, Boston, MA, 02114, United States of America; Harvard Medical School, 25 Shattuck Street, Boston, MA 02115, United States of America.
| | - Finale Doshi-Velez
- Harvard John A. Paulson School of Engineering and Applied Sciences, 29 Oxford Street, Cambridge, MA 02138, United States of America.
| |
Collapse
|
3
|
Nickson D, Meyer C, Walasek L, Toro C. Prediction and diagnosis of depression using machine learning with electronic health records data: a systematic review. BMC Med Inform Decis Mak 2023; 23:271. [PMID: 38012655 PMCID: PMC10680172 DOI: 10.1186/s12911-023-02341-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 10/15/2023] [Indexed: 11/29/2023] Open
Abstract
BACKGROUND Depression is one of the most significant health conditions in personal, social, and economic impact. The aim of this review is to summarize existing literature in which machine learning methods have been used in combination with Electronic Health Records for prediction of depression. METHODS Systematic literature searches were conducted within arXiv, PubMed, PsycINFO, Science Direct, SCOPUS and Web of Science electronic databases. Searches were restricted to information published after 2010 (from 1st January 2011 onwards) and were updated prior to the final synthesis of data (27th January 2022). RESULTS Following the PRISMA process, the initial 744 studies were reduced to 19 eligible for detailed evaluation. Data extraction identified machine learning methods used, types of predictors used, the definition of depression, classification performance achieved, sample size, and benchmarks used. Area Under the Curve (AUC) values more than 0.9 were claimed, though the average was around 0.8. Regression methods proved as effective as more developed machine learning techniques. LIMITATIONS The categorization, definition, and identification of the numbers of predictors used within models was sometimes difficult to establish, Studies were largely Western Educated Industrialised, Rich, Democratic (WEIRD) in demography. CONCLUSION This review supports the potential use of machine learning techniques with Electronic Health Records for the prediction of depression. All the selected studies used clinically based, though sometimes broad, definitions of depression as their classification criteria. The reported performance of the studies was comparable to or even better than that found in primary care. There are concerns with generalizability and interpretability.
Collapse
Affiliation(s)
| | - Caroline Meyer
- Warwick Medical School, University of Warwick, Coventry, UK
| | - Lukasz Walasek
- Department of Psychology, University of Warwick, Coventry, UK
| | - Carla Toro
- Warwick Medical School, University of Warwick, Coventry, UK
| |
Collapse
|
4
|
Ferrara M, Gentili E, Belvederi Murri M, Zese R, Alberti M, Franchini G, Domenicano I, Folesani F, Sorio C, Benini L, Carozza P, Little J, Grassi L. Establishment of a Public Mental Health Database for Research Purposes in the Ferrara Province: Development and Preliminary Evaluation Study. JMIR Med Inform 2023; 11:e45523. [PMID: 37584563 PMCID: PMC10461404 DOI: 10.2196/45523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 05/04/2023] [Accepted: 06/01/2023] [Indexed: 08/17/2023] Open
Abstract
Background The immediate use of data exported from electronic health records (EHRs) for research is often limited by the necessity to transform data elements into an actual data set. Objective This paper describes the methodology for establishing a data set that originated from an EHR registry that included clinical, health service, and sociodemographic information. Methods The Extract, Transform, Load process was applied to raw data collected at the Integrated Department of Mental Health and Pathological Addictions in Ferrara, Italy, from 1925 to February 18, 2021, to build the new, anonymized Ferrara-Psychiatry (FEPSY) database. Information collected before the first EHR was implemented (ie, in 1991) was excluded. An unsupervised cluster analysis was performed to identify patient subgroups to support the proof of concept. Results The FEPSY database included 3,861,432 records on 46,222 patients. Since 1991, each year, a median of 1404 (IQR 1117.5-1757.7) patients had newly accessed care, and a median of 7300 (IQR 6109.5-9397.5) patients were actively receiving care. Among 38,022 patients with a mental disorder, 2 clusters were identified; the first predominantly included male patients who were aged 25 to 34 years at first presentation and were living with their parents, and the second predominantly included female patients who were aged 35 to 44 years and were living with their own families. Conclusions The process for building the FEPSY database proved to be robust and replicable with similar health care data, even when they were not originally conceived for research purposes. The FEPSY database will enable future in-depth analyses regarding the epidemiology and social determinants of mental disorders, access to mental health care, and resource utilization.
Collapse
Affiliation(s)
- Maria Ferrara
- Institute of Psychiatry, Department of Neuroscience and Rehabilitation, University of Ferrara, Ferrara, Italy
- Integrated Department of Mental Health and Pathological Addictions, Ferrara Local Health Trust, Ferrara, Italy
- Department of Psychiatry, Yale School of Medicine, New Haven, CT, United States
| | | | - Martino Belvederi Murri
- Institute of Psychiatry, Department of Neuroscience and Rehabilitation, University of Ferrara, Ferrara, Italy
- Integrated Department of Mental Health and Pathological Addictions, Ferrara Local Health Trust, Ferrara, Italy
| | - Riccardo Zese
- Department of Chemical, Pharmaceutical and Agricultural Sciences, University of Ferrara, Ferrara, Italy
| | - Marco Alberti
- Department of Mathematics and Computer Science, University of Ferrara, Ferrara, Italy
| | - Giorgia Franchini
- Department of Physics, Informatics and Mathematics, University of Modena and Reggio Emilia, Modena, Italy
| | - Ilaria Domenicano
- Institute of Psychiatry, Department of Neuroscience and Rehabilitation, University of Ferrara, Ferrara, Italy
| | - Federica Folesani
- Institute of Psychiatry, Department of Neuroscience and Rehabilitation, University of Ferrara, Ferrara, Italy
- Integrated Department of Mental Health and Pathological Addictions, Ferrara Local Health Trust, Ferrara, Italy
| | - Cristina Sorio
- Integrated Department of Mental Health and Pathological Addictions, Ferrara Local Health Trust, Ferrara, Italy
| | - Lorenzo Benini
- Integrated Department of Mental Health and Pathological Addictions, Ferrara Local Health Trust, Ferrara, Italy
| | - Paola Carozza
- Integrated Department of Mental Health and Pathological Addictions, Ferrara Local Health Trust, Ferrara, Italy
| | - Julian Little
- School of Epidemiology and Public Health, University of Ottawa, Ottawa, ON, Canada
| | - Luigi Grassi
- Institute of Psychiatry, Department of Neuroscience and Rehabilitation, University of Ferrara, Ferrara, Italy
- Integrated Department of Mental Health and Pathological Addictions, Ferrara Local Health Trust, Ferrara, Italy
| |
Collapse
|
5
|
Elhussein A, Gürsoy G. Privacy-preserving patient clustering for personalized federated learning. PROCEEDINGS OF MACHINE LEARNING RESEARCH 2023; 219:150-166. [PMID: 39239484 PMCID: PMC11376435] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 09/07/2024]
Abstract
Federated Learning (FL) is a machine learning framework that enables multiple organizations to train a model without sharing their data with a central server. However, it experiences significant performance degradation if the data is non-identically independently distributed (non-IID). This is a problem in medical settings, where variations in the patient population contribute significantly to distribution differences across hospitals. Personalized FL addresses this issue by accounting for site-specific distribution differences. Clustered FL, a Personalized FL variant, was used to address this problem by clustering patients into groups across hospitals and training separate models on each group. However, privacy concerns remained as a challenge as the clustering process requires exchange of patient-level information. This was previously solved by forming clusters using aggregated data, which led to inaccurate groups and performance degradation. In this study, we propose Privacy-preserving Community-Based Federated machine Learning (PCBFL), a novel Clustered FL framework that can cluster patients using patient-level data while protecting privacy. PCBFL uses Secure Multiparty Computation, a cryptographic technique, to securely calculate patient-level similarity scores across hospitals. We then evaluate PCBFL by training a federated mortality prediction model using 20 sites from the eICU dataset. We compare the performance gain from PCBFL against traditional and existing Clustered FL frameworks. Our results show that PCBFL successfully forms clinically meaningful cohorts of low, medium, and high-risk patients. PCBFL outperforms traditional and existing Clustered FL frameworks with an average AUC improvement of 4.3% and AUPRC improvement of 7.8%.
Collapse
Affiliation(s)
- Ahmed Elhussein
- Department of Biomedical Informatics, Columbia University, New York Genome Center, New York City, NY, U.S.A
| | - Gamze Gürsoy
- Department of Biomedical Informatics, Department of Computer Science, Columbia University, New York Genome Center, New York City, NY, U.S.A
| |
Collapse
|
6
|
Voss RW, Schmidt TD, Weiskopf N, Marino M, Dorr DA, Huguet N, Warren N, Valenzuela S, O’Malley J, Quiñones AR. Comparing ascertainment of chronic condition status with problem lists versus encounter diagnoses from electronic health records. J Am Med Inform Assoc 2022; 29:770-778. [PMID: 35165743 PMCID: PMC9006679 DOI: 10.1093/jamia/ocac016] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 01/18/2022] [Accepted: 01/27/2022] [Indexed: 11/15/2022] Open
Abstract
OBJECTIVE To assess and compare electronic health record (EHR) documentation of chronic disease in problem lists and encounter diagnosis records among Community Health Center (CHC) patients. MATERIALS AND METHODS We assessed patient EHR data in a large clinical research network during 2012-2019. We included CHCs who provided outpatient, older adult primary care to patients age ≥45 years, with ≥2 office visits during the study. Our study sample included 1 180 290 patients from 545 CHCs across 22 states. We used diagnosis codes from 39 Chronic Condition Warehouse algorithms to identify chronic conditions from encounter diagnoses only and compared against problem list records. We measured correspondence including agreement, kappa, prevalence index, bias index, and prevalence-adjusted bias-adjusted kappa. RESULTS Overlap of encounter diagnosis and problem list ascertainment was 59.4% among chronic conditions identified, with 12.2% of conditions identified only in encounters and 28.4% identified only in problem lists. Rates of coidentification varied by condition from 7.1% to 84.4%. Greatest agreement was found in diabetes (84.4%), HIV (78.1%), and hypertension (74.7%). Sixteen conditions had <50% agreement, including cancers and substance use disorders. Overlap for mental health conditions ranged from 47.4% for anxiety to 59.8% for depression. DISCUSSION Agreement between the 2 sources varied substantially. Conditions requiring regular management in primary care settings may have a higher agreement than those diagnosed and treated in specialty care. CONCLUSION Relying on EHR encounter data to identify chronic conditions without reference to patient problem lists may under-capture conditions among CHC patients in the United States.
Collapse
Affiliation(s)
| | | | - Nicole Weiskopf
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Miguel Marino
- Department of Family Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | - David A Dorr
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health & Science University, Portland, Oregon, USA
| | - Nathalie Huguet
- Department of Family Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | | | - Steele Valenzuela
- Department of Family Medicine, Oregon Health & Science University, Portland, Oregon, USA
| | | | - Ana R Quiñones
- Corresponding Author: Ana R. Quiñones, Department of Family Medicine, Oregon Health & Science University, 3181 S.W. Sam Jackson Park Rd., FM, Portland, OR 97239, USA;
| |
Collapse
|
7
|
Chapman M, Mumtaz S, Rasmussen LV, Karwath A, Gkoutos GV, Gao C, Thayer D, Pacheco JA, Parkinson H, Richesson RL, Jefferson E, Denaxas S, Curcin V. Desiderata for the development of next-generation electronic health record phenotype libraries. Gigascience 2021; 10:giab059. [PMID: 34508578 PMCID: PMC8434766 DOI: 10.1093/gigascience/giab059] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/15/2021] [Accepted: 08/18/2021] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. METHODS A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. RESULTS We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. CONCLUSIONS There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.
Collapse
Affiliation(s)
- Martin Chapman
- Department of Population Health Sciences, King's College London, London, SE1 1UL, UK
| | - Shahzad Mumtaz
- Health Informatics Centre (HIC), University of Dundee, Dundee, DD1 9SY, UK
| | - Luke V Rasmussen
- Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Andreas Karwath
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Georgios V Gkoutos
- Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Chuang Gao
- Health Informatics Centre (HIC), University of Dundee, Dundee, DD1 9SY, UK
| | - Dan Thayer
- SAIL Databank, Swansea University, Swansea, SA2 8PP, UK
| | - Jennifer A Pacheco
- Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Helen Parkinson
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Rachel L Richesson
- Department of Learning Health Sciences, University of Michigan Medical School, MI 48109, USA
| | - Emily Jefferson
- Health Informatics Centre (HIC), University of Dundee, Dundee, DD1 9SY, UK
| | - Spiros Denaxas
- Institute of Health Informatics, University College London, London, NW1 2DA, UK
| | - Vasa Curcin
- Department of Population Health Sciences, King's College London, London, SE1 1UL, UK
| |
Collapse
|