1
|
Zhang Y, Jiang X, Mentzer AJ, McVean G, Lunter G. Topic modeling identifies novel genetic loci associated with multimorbidities in UK Biobank. CELL GENOMICS 2023; 3:100371. [PMID: 37601973 PMCID: PMC10435382 DOI: 10.1016/j.xgen.2023.100371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 05/04/2023] [Accepted: 07/07/2023] [Indexed: 08/22/2023]
Abstract
Many diseases show patterns of co-occurrence, possibly driven by systemic dysregulation of underlying processes affecting multiple traits. We have developed a method (treeLFA) for identifying such multimorbidities from routine health-care data, which combines topic modeling with an informative prior derived from medical ontology. We apply treeLFA to UK Biobank data and identify a variety of topics representing multimorbidity clusters, including a healthy topic. We find that loci identified using topic weights as traits in a genome-wide association study (GWAS) analysis, which we validated with a range of approaches, only partially overlap with loci from GWASs on constituent single diseases. We also show that treeLFA improves upon existing methods like latent Dirichlet allocation in various ways. Overall, our findings indicate that topic models can characterize multimorbidity patterns and that genetic analysis of these patterns can provide insight into the etiology of complex traits that cannot be determined from the analysis of constituent traits alone.
Collapse
Affiliation(s)
- Yidong Zhang
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Chinese Academy of Medical Sciences Oxford Institute, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
- Department of Radiation Oncology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100006, China
| | - Xilin Jiang
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Department of Statistics, University of Oxford, Oxford OX1 3LB, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Victor Phillip Dahdaleh Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0SR, UK
- Heart and Lung Research Institute, University of Cambridge, Cambridge CB2 0BB, UK
| | - Alexander J. Mentzer
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
- Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford OX3 7BN, UK
| | - Gil McVean
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford OX3 7LF, UK
| | - Gerton Lunter
- MRC Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, University of Oxford, Oxford OX3 9DS, UK
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen 9700 RB, the Netherlands
| |
Collapse
|
2
|
Cramer EY, Bartlett J, Chan ER, Gaedigk A, Ratsimbasoa AC, Mehlotra RK, Williams SM, Zimmerman PA. Pharmacogenomic variation in the Malagasy population: implications for the antimalarial drug primaquine metabolism. Pharmacogenomics 2023; 24:583-597. [PMID: 37551613 PMCID: PMC10621762 DOI: 10.2217/pgs-2023-0091] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 07/11/2023] [Indexed: 08/09/2023] Open
Abstract
Aim: Antimalarial primaquine (PQ) eliminates liver hypnozoites of Plasmodium vivax. CYP2D6 gene variation contributes to PQ therapeutic failure. Additional gene variation may contribute to PQ efficacy. Information on pharmacogenomic variation in Madagascar, with vivax malaria and a unique population admixture, is scanty. Methods: The authors performed genome-wide genotyping of 55 Malagasy samples and analyzed data with a focus on a set of 28 pharmacogenes most relevant to PQ. Results: Mainly, the study identified 110 coding or splicing variants, including those that, based on previous studies in other populations, may be implicated in PQ response and copy number variation, specifically in chromosomal regions that contain pharmacogenes. Conclusion: With this pilot information, larger genome-wide association analyses with PQ metabolism and response are substantially more feasible.
Collapse
Affiliation(s)
- Estee Y Cramer
- Center for Global Health & Diseases, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
- Department of Biostatistics & Epidemiology, School of Public Health & Health Sciences, University of Massachusetts Amherst, Amherst, MA 01003, USA
| | - Jacquelaine Bartlett
- Population & Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Ernest R Chan
- Population & Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Andrea Gaedigk
- Division of Clinical Pharmacology, Toxicology & Therapeutic Innovation, Children's Mercy Research Institute (CMRI), Kansas City, MO 64108, USA
| | - Arsene C Ratsimbasoa
- University of Fianarantsoa, Fianarantsoa, Madagascar
- Centre National d'Application de Recherche Pharmaceutique (CNARP), Antananarivo, Madagascar
| | - Rajeev K Mehlotra
- Center for Global Health & Diseases, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Scott M Williams
- Population & Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| | - Peter A Zimmerman
- Center for Global Health & Diseases, Case Western Reserve University School of Medicine, Cleveland, OH 44106, USA
| |
Collapse
|
3
|
Seligowski AV, Misganaw B, Duffy LA, Ressler KJ, Guffanti G. Leveraging Large-Scale Genetics of PTSD and Cardiovascular Disease to Demonstrate Robust Shared Risk and Improve Risk Prediction Accuracy. Am J Psychiatry 2022; 179:814-823. [PMID: 36069022 PMCID: PMC9633348 DOI: 10.1176/appi.ajp.21111113] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
OBJECTIVE Individuals with posttraumatic stress disorder (PTSD) are significantly more likely to be diagnosed with cardiovascular disease (CVD) (e.g., myocardial infarction, stroke). The evidence for this link is so compelling that the National Institutes of Health convened a working group to determine gaps in the literature, including the need for large-scale genomic studies to identify shared genetic risk. The aim of the present study was to address some of these gaps by utilizing PTSD and CVD genome-wide association study (GWAS) summary statistics in a large biobank sample to determine the shared genetic risk of PTSD and CVD. METHODS A large health care biobank data set was used (N=36,412), combined with GWAS summary statistics from publicly available large-scale PTSD and CVD studies. Disease phenotypes (e.g., PTSD) were collected from electronic health records. De-identified genetic data from the biobank were genotyped using Illumina SNP array. Summary statistics data sets were processed with the following quality-control criteria: 1) SNP heritability h2 >0.05, 2) compute z-statistics (z=beta/SE or z=log(OR)/SE), 3) filter nonvariable SNPs (0 RESULTS Significant genetic correlations were found between PTSD and CVD (rG=0.24, SE=0.06), and Mendelian randomization analyses indicated a potential causal link from PTSD to hypertension (β=0.20, SE=0.04), but not the reverse. PTSD summary statistics significantly predicted PTSD diagnostic status (R2=0.27), and this was significantly improved by incorporating summary statistics from CVD and major depressive disorder (R2=1.30). Further, pathway enrichment analyses indicated that genetic variants involved in shared PTSD-CVD risk included those involved in postsynaptic structure, synapse organization, and interleukin-7-mediated signaling pathways. CONCLUSIONS The results from this study suggest that PTSD and CVD may share genetic risk. Further, these results implicate PTSD as a risk factor leading to the development of hypertension and coronary artery disease. Additional research is needed to determine the clinical utility of these findings.
Collapse
Affiliation(s)
- Antonia V. Seligowski
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- McLean Hospital, Belmont, MA, USA
| | - Burook Misganaw
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- McLean Hospital, Belmont, MA, USA
| | | | - Kerry J. Ressler
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- McLean Hospital, Belmont, MA, USA
| | - Guia Guffanti
- Department of Psychiatry, Harvard Medical School, Boston, MA, USA
- McLean Hospital, Belmont, MA, USA
| |
Collapse
|
4
|
Wang Y, Benavides R, Diatchenko L, Grant AV, Li Y. A graph-embedded topic model enables characterization of diverse pain phenotypes among UK biobank individuals. iScience 2022; 25:104390. [PMID: 35637735 PMCID: PMC9142639 DOI: 10.1016/j.isci.2022.104390] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 04/08/2022] [Accepted: 05/06/2022] [Indexed: 11/05/2022] Open
Abstract
Large biobank repositories of clinical conditions and medications data open opportunities to investigate the phenotypic disease network. We present a graph embedded topic model (GETM). We integrate existing biomedical knowledge graph information in the form of pre-trained graph embedding into the embedded topic model. Via a variational autoencoder framework, we infer patient phenotypic mixture by modeling multi-modal discrete patient medical records. We applied GETM to UK Biobank (UKB) self-reported clinical phenotype data, which contains 443 self-reported medical conditions and 802 medications for 457,461 individuals. Compared to existing methods, GETM demonstrates good imputation performance. With a more focused application on characterizing pain phenotypes, we observe that GETM-inferred phenotypes not only accurately predict the status of chronic musculoskeletal (CMK) pain but also reveal known pain-related topics. Intriguingly, medications and conditions in the cardiovascular category are enriched among the most predictive topics of chronic pain. Interpretable deep learning to integrate knowledge graphs and patient data Modeling phenotypes from self-reports of 457,461 individuals from the UK Biobank Predicting and characterizing chronic pain phenotypes using latent phenotypes Potential link between cardiovascular conditions or medications and chronic pain
Collapse
Affiliation(s)
- Yuening Wang
- School of Computer Science, McGill University, Canada
| | - Rodrigo Benavides
- Department of Anesthesiology, Centro Nacional de Rehabilitación, San Jose, Costa Rica
| | - Luda Diatchenko
- Department of Anesthesia, McGill University, Canada.,Faculty of Dentistry, McGill University, Canada.,Alan Edwards Centre for Research on Pain, McGill University, Canada
| | - Audrey V Grant
- Department of Anesthesia, McGill University, Canada.,Faculty of Dentistry, McGill University, Canada.,Alan Edwards Centre for Research on Pain, McGill University, Canada
| | - Yue Li
- School of Computer Science, McGill University, Canada
| |
Collapse
|
5
|
The Molecular Genetics of Dissociative Symptomatology: A Transdiagnostic Literature Review. Genes (Basel) 2022; 13:genes13050843. [PMID: 35627228 PMCID: PMC9141026 DOI: 10.3390/genes13050843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Revised: 05/05/2022] [Accepted: 05/06/2022] [Indexed: 12/04/2022] Open
Abstract
Dissociative disorders are a common and frequently undiagnosed group of psychiatric disorders, characterized by disruptions in the normal integration of awareness, personality, emotion and behavior. The available evidence suggests that these disorders arise from an interaction between genetic vulnerability and stress, particularly traumatic stress, but the attention paid to the underlying genetic diatheses has been sparse. In this paper, the existing literature on the molecular genetics of dissociative disorders, as well as of clinically significant dissociative symptoms not reaching the threshold of a disorder, is reviewed comprehensively across clinical and non-clinical samples. Association studies suggest a link between dissociative symptoms and genes related to serotonergic, dopaminergic and peptidergic transmission, neural plasticity and cortisol receptor sensitivity, particularly following exposure to childhood trauma. Genome-wide association studies have identified loci of interest related to second messenger signaling and synaptic integration. Though these findings are inconsistent, they suggest biologically plausible mechanisms through which traumatic stress can lead to pathological dissociation. However, methodological concerns related to phenotype definition, study power, and correction for the confounding factors limit the value of these findings, and they require replication and extension in studies with better design.
Collapse
|
6
|
Clapp MA, McCoy TH. The potential of big data for obstetrics discovery. Curr Opin Endocrinol Diabetes Obes 2021; 28:553-557. [PMID: 34709211 DOI: 10.1097/med.0000000000000679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
PURPOSE OF REVIEW The purpose of this article is to introduce the concept of 'Big Data' and review its potential to advance scientific discovery in obstetrics. RECENT FINDINGS Big Data is now ubiquitous in medicine, being used in many specialties to understand the pathophysiology, risk factors, and treatment for many diseases. Big Data analyses often employ machine learning methods to understand the complex relationships that may exist within these sources. We review the basic principles of supervised and unsupervised machine learning methods, including deep learning. We highlight how these methods have been used to study genetic risk factors for preterm birth, interpreting electronic fetal heart rate tracings, and predict adverse maternal and neonatal outcomes during pregnancy and delivery. Despite its promise, there are challenges with using Big Data, including data integrity, generalizability (namely the concerns about perpetuating inequalities), and confidentiality. SUMMARY The combination of new data and enhanced methods present a synergistic opportunity to explore the complex relationships common to human illness and medical practice, including obstetrics. With prediction as a primary objective instead of the more familiar goals of hypothesis testing, these analytic methods can capture multifaceted, rare, and nuanced relationships between exposures and outcomes that exist within these large data sets.
Collapse
Affiliation(s)
- Mark A Clapp
- Department of Obstetrics and Gynecology
- Center for Quantitative Health, Massachusetts General Hospital
- Harvard Medical School, Boston, Massachusetts, USA
| | - Thomas H McCoy
- Center for Quantitative Health, Massachusetts General Hospital
- Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
7
|
McKee AM, Kirkup BM, Madgwick M, Fowler WJ, Price CA, Dreger SA, Ansorge R, Makin KA, Caim S, Le Gall G, Paveley J, Leclaire C, Dalby M, Alcon-Giner C, Andrusaite A, Feng TY, Di Modica M, Triulzi T, Tagliabue E, Milling SW, Weilbaecher KN, Rutkowski MR, Korcsmáros T, Hall LJ, Robinson SD. Antibiotic-induced disturbances of the gut microbiota result in accelerated breast tumor growth. iScience 2021; 24:103012. [PMID: 34522855 PMCID: PMC8426205 DOI: 10.1016/j.isci.2021.103012] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2020] [Revised: 04/29/2021] [Accepted: 08/17/2021] [Indexed: 02/08/2023] Open
Abstract
The gut microbiota's function in regulating health has seen it linked to disease progression in several cancers. However, there is limited research detailing its influence in breast cancer (BrCa). This study found that antibiotic-induced perturbation of the gut microbiota significantly increases tumor progression in multiple BrCa mouse models. Metagenomics highlights the common loss of several bacterial species following antibiotic administration. One such bacteria, Faecalibaculum rodentium, rescued this increased tumor growth. Single-cell transcriptomics identified an increased number of cells with a stromal signature in tumors, and subsequent histology revealed an increased abundance of mast cells in the tumor stromal regions. We show that administration of a mast cell stabilizer, cromolyn, rescues increased tumor growth in antibiotic treated animals but has no influence on tumors from control cohorts. These findings highlight that BrCa-microbiota interactions are different from other cancers studied to date and suggest new research avenues for therapy development.
Collapse
Affiliation(s)
- Alastair M. McKee
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
| | - Benjamin M. Kirkup
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
| | - Matthew Madgwick
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | - Wesley J. Fowler
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
| | - Christopher A. Price
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
| | - Sally A. Dreger
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
| | - Rebecca Ansorge
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
| | - Kate A. Makin
- Faculty of Medicine and Health Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Shabhonam Caim
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
| | - Gwenaelle Le Gall
- Faculty of Medicine and Health Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| | - Jack Paveley
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
| | - Charlotte Leclaire
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
| | - Matthew Dalby
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
| | - Cristina Alcon-Giner
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
| | - Anna Andrusaite
- Centre for Immunobiology, Institute of Infection, Immunity and Inflammation, College of Medicine, Veterinary Medicine and Life Sciences, University of Glasgow, Glasgow G12 8TA, UK
| | - Tzu-Yu Feng
- Department of Microbiology, Immunology, and Cancer Biology, University of Virginia, Charlottesville, VA 22908, USA
| | - Martina Di Modica
- Molecular Targeting Unit, Department of Research, Fondazione IRCCS Instituto Nazionale di Tumori, Milan, 20133, Italy
| | - Tiziana Triulzi
- Molecular Targeting Unit, Department of Research, Fondazione IRCCS Instituto Nazionale di Tumori, Milan, 20133, Italy
| | - Elda Tagliabue
- Molecular Targeting Unit, Department of Research, Fondazione IRCCS Instituto Nazionale di Tumori, Milan, 20133, Italy
| | - Simon W.F. Milling
- Centre for Immunobiology, Institute of Infection, Immunity and Inflammation, College of Medicine, Veterinary Medicine and Life Sciences, University of Glasgow, Glasgow G12 8TA, UK
| | - Katherine N. Weilbaecher
- Department of Internal Medicine, Division of Molecular Oncology, Washington University in St Louis, St. Louis, MO, 63110, USA
| | - Melanie R. Rutkowski
- Department of Microbiology, Immunology, and Cancer Biology, University of Virginia, Charlottesville, VA 22908, USA
| | - Tamás Korcsmáros
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
- Earlham Institute, Norwich Research Park, Norwich, NR4 7UZ, UK
| | - Lindsay J. Hall
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
- Faculty of Medicine and Health Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
- Chair of Intestinal Microbiome, School of Life Sciences, ZIEL – Institute for Food & Health, Technical University of Munich, 85354 Freising, Germany
| | - Stephen D. Robinson
- Gut Microbes and Health Programme, Quadram Institute Bioscience, Norwich Research Park, Norwich, NR4 7AU, UK
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TJ, UK
| |
Collapse
|
8
|
Hart KL, Perlis RH, McCoy TH. Mapping of Transdiagnostic Neuropsychiatric Phenotypes Across Patients in Two General Hospitals. J Acad Consult Liaison Psychiatry 2021; 62:430-439. [PMID: 34210402 DOI: 10.1016/j.jaclp.2021.01.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 01/12/2021] [Accepted: 01/13/2021] [Indexed: 10/22/2022]
Abstract
BACKGROUND Multidimensional transdiagnostic phenotyping systems are increasingly important to neuropsychiatric phenotyping, particularly in translational research settings. The relationship the National Institute of Mental Health's Research Domain Criteria multidimensional approach to psychopathology and nonpsychiatric diagnoses has not been studied at scale but is relevant to those caring for neuropsychiatric illness in medical and surgical settings. METHODS We applied the CQH Dimensional Phenotyper natural language processing tool to estimate National Institute of Mental Health's Research Domain Criteria domain-associated symptoms of individuals admitted to nonpsychiatric wards at each of 2 large academic general hospitals over an 8-year period. We compared patterns in individual domain symptom burden, as well as a new pooled unidimensional measure, by primary medical and surgical diagnosis. RESULTS Analysis included 227,243 patients from hospital 1 of whom 68,793 (30.3%) had a prior psychiatric history and 220,213 patients from hospital 2 of whom 50,818 (23.1%) had a prior psychiatric history. The distribution of Research Domain Criteria symptom burdens over primary diagnosis was similar across hospital sites and differed significantly across primary medical or surgical diagnosis. The effect of primary medical or surgical diagnosis was larger than that of prior psychiatric history on Research Domain Criteria symptom burden. CONCLUSION Research Domain Criteria-based neuropsychiatric symptom burden estimated from general hospital patients' clinical documentation is more strongly associated with the primary hospital medical or surgical diagnosis than it is with the presence of a previous psychiatric history. The bidirectional role of psychiatric and somatic illness warrants further study through the lens of transdiagnostic phenotyping.
Collapse
Affiliation(s)
- Kamber L Hart
- Center for Quantitative Health, Massachusetts General Hospital, Boston, MA
| | - Roy H Perlis
- Center for Quantitative Health, Massachusetts General Hospital, Boston, MA
| | - Thomas H McCoy
- Center for Quantitative Health, Massachusetts General Hospital, Boston, MA.
| |
Collapse
|
9
|
Hart KL, Pellegrini AM, Forester BP, Berretta S, Murphy SN, Perlis RH, McCoy TH. Distribution of agitation and related symptoms among hospitalized patients using a scalable natural language processing method. Gen Hosp Psychiatry 2021; 68:46-51. [PMID: 33310013 PMCID: PMC7855889 DOI: 10.1016/j.genhosppsych.2020.11.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 11/03/2020] [Accepted: 11/04/2020] [Indexed: 01/29/2023]
Abstract
BACKGROUND Agitation is a common feature of many neuropsychiatric disorders. OBJECTIVE Understanding the prevalence, implications, and characteristics of agitation among hospitalized populations can facilitate more precise recognition of disability arising from neuropsychiatric diseases. METHODS We developed two agitation phenotypes using an expansion of expert curated term lists. These phenotypes were used to characterize five years of psychiatric admissions. The relationship of agitation symptoms and length of stay was examined. RESULTS Among 4548 psychiatric admissions, 1134 (24.9%) included documentation of agitation based on the primary agitation phenotype. These symptoms were greater among individuals with public insurance, and those with mania and psychosis compared to major depressive disorder. Greater symptoms were associated with longer hospital stay, with ~0.9 day increase in stay for every 10% increase in agitation phenotype. CONCLUSION Agitation was common at hospital admission and associated with diagnosis and longer length of stay. Characterizing agitation-related symptoms through natural language processing may provide new tools for understanding agitated behaviors and their relationship to delirium.
Collapse
Affiliation(s)
- Kamber L. Hart
- Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA
| | | | - Brent P. Forester
- Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA,McLean Hospital, 115 Mill St, Belmont, MA 02478, USA
| | - Sabina Berretta
- Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA; McLean Hospital, 115 Mill St, Belmont, MA 02478, USA.
| | - Shawn N. Murphy
- Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA,Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA
| | - Roy H. Perlis
- Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA,Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA
| | - Thomas H. McCoy
- Massachusetts General Hospital, 185 Cambridge Street, Boston, MA 02114, USA,Harvard Medical School, 25 Shattuck St, Boston, MA 02115, USA,Corresponding author at: Massachusetts General Hospital, 185 Cambridge Street, 6th Floor, Boston, MA 02114, USA. (T.H. McCoy)
| |
Collapse
|
10
|
Hughes MC, Pradier MF, Ross AS, McCoy TH, Perlis RH, Doshi-Velez F. Assessment of a Prediction Model for Antidepressant Treatment Stability Using Supervised Topic Models. JAMA Netw Open 2020; 3:e205308. [PMID: 32432711 PMCID: PMC7240354 DOI: 10.1001/jamanetworkopen.2020.5308] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 03/16/2020] [Indexed: 12/28/2022] Open
Abstract
Importance In the absence of readily assessed and clinically validated predictors of treatment response, pharmacologic management of major depressive disorder often relies on trial and error. Objective To assess a model using electronic health records to identify predictors of treatment response in patients with major depressive disorder. Design, Setting, and Participants This retrospective cohort study included data from 81 630 adults with a coded diagnosis of major depressive disorder from 2 academic medical centers in Boston, Massachusetts, including outpatient primary and specialty care clinics from December 1, 1997, to December 31, 2017. Data were analyzed from January 1, 2018, to March 15, 2020. Exposures Treatment with at least 1 of 11 standard antidepressants. Main Outcomes and Measures Stable treatment response, intended as a proxy for treatment effectiveness, defined as continued prescription of an antidepressant for 90 days. Supervised topic models were used to extract 10 interpretable covariates from coded clinical data for stability prediction. With use of data from 1 hospital system (site A), generalized linear models and ensembles of decision trees were trained to predict stability outcomes from topic features that summarize patient history. Held-out patients from site A and individuals from a second hospital system (site B) were evaluated. Results Among the 81 630 adults (56 340 women [69%]; mean [SD] age, 48.46 [14.75] years; range, 18.0-80.0 years), 55 303 reached a stable response to their treatment regimen during follow-up. For held-out patients from site A, the mean area under the receiver operating characteristic curve (AUC) for discrimination of the general stability outcome was 0.627 (95% CI, 0.615-0.639) for the supervised topic model with 10 covariates. In evaluation of site B, the AUC was 0.619 (95% CI, 0.610-0.627). Building models to predict stability specific to a particular drug did not improve prediction of general stability even when using a harder-to-interpret ensemble classifier and 9256 coded covariates (specific AUC, 0.647; 95% CI, 0.635-0.658; general AUC, 0.661; 95% CI, 0.648-0.672). Topics coherently captured clinical concepts associated with treatment response. Conclusions and Relevance The findings suggest that coded clinical data available in electronic health records may facilitate prediction of general treatment response but not response to specific medications. Although greater discrimination is likely required for clinical application, the results provide a transparent baseline for such studies.
Collapse
Affiliation(s)
- Michael C. Hughes
- Department of Computer Science, Tufts University, Medford, Massachusetts
| | - Melanie F. Pradier
- John A. Paulson School of Engineering and Applied Sciences, Cambridge, Massachusetts
| | - Andrew Slavin Ross
- John A. Paulson School of Engineering and Applied Sciences, Cambridge, Massachusetts
| | - Thomas H. McCoy
- Center for Quantitative Health, Massachusetts General Hospital, Boston
- Harvard Medical School, Boston, Massachusetts
| | - Roy H. Perlis
- Center for Quantitative Health, Massachusetts General Hospital, Boston
- Harvard Medical School, Boston, Massachusetts
| | - Finale Doshi-Velez
- John A. Paulson School of Engineering and Applied Sciences, Cambridge, Massachusetts
| |
Collapse
|
11
|
Using topic modeling via non-negative matrix factorization to identify relationships between genetic variants and disease phenotypes: A case study of Lipoprotein(a) (LPA). PLoS One 2019; 14:e0212112. [PMID: 30759150 PMCID: PMC6374022 DOI: 10.1371/journal.pone.0212112] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Accepted: 01/27/2019] [Indexed: 01/01/2023] Open
Abstract
Genome-wide and phenome-wide association studies are commonly used to identify important relationships between genetic variants and phenotypes. Most studies have treated diseases as independent variables and suffered from the burden of multiple adjustment due to the large number of genetic variants and disease phenotypes. In this study, we used topic modeling via non-negative matrix factorization (NMF) for identifying associations between disease phenotypes and genetic variants. Topic modeling is an unsupervised machine learning approach that can be used to learn patterns from electronic health record data. We chose the single nucleotide polymorphism (SNP) rs10455872 in LPA as the predictor since it has been shown to be associated with increased risk of hyperlipidemia and cardiovascular diseases (CVD). Using data of 12,759 individuals with electronic health records (EHR) and linked DNA samples at Vanderbilt University Medical Center, we trained a topic model using NMF from 1,853 distinct phenotypes and identified six topics. We tested their associations with rs10455872 in LPA. Topics enriched for CVD and hyperlipidemia had positive correlations with rs10455872 (P < 0.001), replicating a previous finding. We also identified a negative correlation between LPA and a topic enriched for lung cancer (P < 0.001) which was not previously identified via phenome-wide scanning. We were able to replicate the top finding in a separate dataset. Our results demonstrate the applicability of topic modeling in exploring the relationship between genetic variants and clinical diseases.
Collapse
|
12
|
Using phenome-wide association to investigate the function of a schizophrenia risk locus at SLC39A8. Transl Psychiatry 2019; 9:45. [PMID: 30696806 PMCID: PMC6351652 DOI: 10.1038/s41398-019-0386-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 11/08/2018] [Accepted: 11/13/2018] [Indexed: 12/18/2022] Open
Abstract
While nearly all common genomic variants associated with schizophrenia have no known function, one corresponds to a missense variant associated with change in efficiency of a metal ion transporter, ZIP8, coded by SLC39A8. This variant has been linked to a range of phenotypes and is believed to be under recent selection pressure, but its impact on health is poorly understood. We sought to understand phenotypic implications of this variant in a large genomic biobank using an unbiased phenome-wide approach. Specifically, we generated 50 topics based on diagnostic codes using latent Dirichlet allocation, and examined them for association with the risk variant. Then, any significant topics were further characterized by examining association with individual diagnostic codes contributing to the topic. Among 50 topics, 1 was associated at an experiment-wide significance threshold (beta = 0.003, uncorrected p = 0.00049), comprising predominantly brain-related codes, including intracranial hemorrhage, cerebrovascular disease, and delirium/dementia. These results suggest that a functional variant previously associated with schizophrenia risk also increases liability to cerebrovascular disease. They further illustrate the utility of a topic-based approach to phenome-wide association.
Collapse
|