1
|
Taylor B, Hobensack M, Niño de Rivera S, Zhao Y, Masterson Creber R, Cato K. Identifying Depression Through Machine Learning Analysis of Omics Data: Scoping Review. JMIR Nurs 2024; 7:e54810. [PMID: 39028994 PMCID: PMC11297379 DOI: 10.2196/54810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Revised: 04/16/2024] [Accepted: 04/22/2024] [Indexed: 07/21/2024] Open
Abstract
BACKGROUND Depression is one of the most common mental disorders that affects >300 million people worldwide. There is a shortage of providers trained in the provision of mental health care, and the nursing workforce is essential in filling this gap. The diagnosis of depression relies heavily on self-reported symptoms and clinical interviews, which are subject to implicit biases. The omics methods, including genomics, transcriptomics, epigenomics, and microbiomics, are novel methods for identifying the biological underpinnings of depression. Machine learning is used to analyze genomic data that includes large, heterogeneous, and multidimensional data sets. OBJECTIVE This scoping review aims to review the existing literature on machine learning methods for omics data analysis to identify individuals with depression, with the goal of providing insight into alternative objective and driven insights into the diagnostic process for depression. METHODS This scoping review was reported following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. Searches were conducted in 3 databases to identify relevant publications. A total of 3 independent researchers performed screening, and discrepancies were resolved by consensus. Critical appraisal was performed using the Joanna Briggs Institute Critical Appraisal Checklist for Analytical Cross-Sectional Studies. RESULTS The screening process identified 15 relevant papers. The omics methods included genomics, transcriptomics, epigenomics, multiomics, and microbiomics, and machine learning methods included random forest, support vector machine, k-nearest neighbor, and artificial neural network. CONCLUSIONS The findings of this scoping review indicate that the omics methods had similar performance in identifying omics variants associated with depression. All machine learning methods performed well based on their performance metrics. When variants in omics data are associated with an increased risk of depression, the important next step is for clinicians, especially nurses, to assess individuals for symptoms of depression and provide a diagnosis and any necessary treatment.
Collapse
Affiliation(s)
- Brittany Taylor
- School of Nursing, Columbia University, New York, NY, United States
| | - Mollie Hobensack
- Brookdale Department of Geriatrics and Palliative Care, Icahn School of Medicine, Mount Sinai Health System, New York, NY, United States
| | | | - Yihong Zhao
- School of Nursing, Columbia University, New York, NY, United States
| | | | - Kenrick Cato
- School of Nursing, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
2
|
Yusipov I, Kalyakulina A, Trukhanov A, Franceschi C, Ivanchenko M. Map of epigenetic age acceleration: A worldwide analysis. Ageing Res Rev 2024; 100:102418. [PMID: 39002646 DOI: 10.1016/j.arr.2024.102418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 07/03/2024] [Accepted: 07/08/2024] [Indexed: 07/15/2024]
Abstract
We present a systematic analysis of epigenetic age acceleration based on by far the largest collection of publicly available DNA methylation data for healthy samples (93 datasets, 23 K samples), focusing on the geographic (25 countries) and ethnic (31 ethnicities) aspects around the world. We employed the most popular epigenetic tools for assessing age acceleration and examined their quality metrics and ability to extrapolate to epigenetic data from different tissue types and age ranges different from the training data of these models. In most cases, the models proved to be inconsistent with each other and showed different signs of age acceleration, with the PhenoAge model tending to systematically underestimate and different versions of the GrimAge model tending to systematically overestimate the age prediction of healthy subjects. Referring to data availability and consistency, most countries and populations are still not represented in GEO, moreover, different datasets use different criteria for determining healthy controls. Because of this, it is difficult to fully isolate the contribution of "geography/environment", "ethnicity" and "healthiness" to epigenetic age acceleration. Among the explored metrics, only the DunedinPACE, which measures aging rate, appears to adequately reflect the standard of living and socioeconomic indicators in countries, although it has a limited application to blood methylation data only. Invariably, by epigenetic age acceleration, males age faster than females in most of the studied countries and populations.
Collapse
Affiliation(s)
- Igor Yusipov
- Artificial Intelligence Research Center, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, Nizhny Novgorod 603022, Russia; Institute of Biogerontology, Lobachevsky State University, Nizhny Novgorod 603022, Russia.
| | - Alena Kalyakulina
- Artificial Intelligence Research Center, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, Nizhny Novgorod 603022, Russia; Institute of Biogerontology, Lobachevsky State University, Nizhny Novgorod 603022, Russia.
| | - Arseniy Trukhanov
- Mriya Life Institute, National Academy of Active Longevity, Moscow 124489, Russia.
| | - Claudio Franceschi
- Institute of Biogerontology, Lobachevsky State University, Nizhny Novgorod 603022, Russia.
| | - Mikhail Ivanchenko
- Artificial Intelligence Research Center, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, Nizhny Novgorod 603022, Russia; Institute of Biogerontology, Lobachevsky State University, Nizhny Novgorod 603022, Russia.
| |
Collapse
|
3
|
Retallick-Townsley KG, Lee S, Cartwright S, Cohen S, Sen A, Jia M, Young H, Dobbyn L, Deans M, Fernandez-Garcia M, Huckins LM, Brennand KJ. Dynamic stress- and inflammatory-based regulation of psychiatric risk loci in human neurons. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.09.602755. [PMID: 39026810 PMCID: PMC11257632 DOI: 10.1101/2024.07.09.602755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/20/2024]
Abstract
The prenatal environment can alter neurodevelopmental and clinical trajectories, markedly increasing risk for psychiatric disorders in childhood and adolescence. To understand if and how fetal exposures to stress and inflammation exacerbate manifestation of genetic risk for complex brain disorders, we report a large-scale context-dependent massively parallel reporter assay (MPRA) in human neurons designed to catalogue genotype x environment (GxE) interactions. Across 240 genome-wide association study (GWAS) loci linked to ten brain traits/disorders, the impact of hydrocortisone, interleukin 6, and interferon alpha on transcriptional activity is empirically evaluated in human induced pluripotent stem cell (hiPSC)-derived glutamatergic neurons. Of ~3,500 candidate regulatory risk elements (CREs), 11% of variants are active at baseline, whereas cue-specific CRE regulatory activity range from a high of 23% (hydrocortisone) to a low of 6% (IL-6). Cue-specific regulatory activity is driven, at least in part, by differences in transcription factor binding activity, the gene targets of which show unique enrichments for brain disorders as well as co-morbid metabolic and immune syndromes. The dynamic nature of genetic regulation informs the influence of environmental factors, reveals a mechanism underlying pleiotropy and variable penetrance, and identifies specific risk variants that confer greater disorder susceptibility after exposure to stress or inflammation. Understanding neurodevelopmental GxE interactions will inform mental health trajectories and uncover novel targets for therapeutic intervention.
Collapse
Affiliation(s)
- Kayla G. Retallick-Townsley
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Seoyeon Lee
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
- Department of Genetics, Wu Tsai Institute, Yale University School of Medicine, New Haven, CT 06511
| | - Sam Cartwright
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Sophie Cohen
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029
| | - Annabel Sen
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
- Department of Genetics, Wu Tsai Institute, Yale University School of Medicine, New Haven, CT 06511
| | - Meng Jia
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
- Department of Genetics, Wu Tsai Institute, Yale University School of Medicine, New Haven, CT 06511
| | - Hannah Young
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lee Dobbyn
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Icahn Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Pamela Sklar Division of Psychiatric Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michael Deans
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
- Department of Genetics, Wu Tsai Institute, Yale University School of Medicine, New Haven, CT 06511
| | - Meilin Fernandez-Garcia
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
- Department of Genetics, Wu Tsai Institute, Yale University School of Medicine, New Haven, CT 06511
| | - Laura M. Huckins
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
| | - Kristen J. Brennand
- Department of Genetics and Genomics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Nash Family Department of Neuroscience, Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029
- Department of Psychiatry, Division of Molecular Psychiatry, Yale University School of Medicine, New Haven, CT 06511
- Department of Genetics, Wu Tsai Institute, Yale University School of Medicine, New Haven, CT 06511
| |
Collapse
|
4
|
Kimura M, Takebe T. Cellotype-phenotype associations using 'organoid villages'. Trends Endocrinol Metab 2024; 35:462-465. [PMID: 38575442 DOI: 10.1016/j.tem.2024.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 03/04/2024] [Accepted: 03/07/2024] [Indexed: 04/06/2024]
Abstract
En masse phenotyping technology, using massively mosaic donor-derived cells and organoids, can offer enriched insights for cellotype-phenotype association in a cell-type-specific regulatory context. This emerging approach will help to discover biomarkers, inform genetic-epigenetic interactions and identify personalized therapeutic targets, offering hope for precision medicine against highly heterogeneous metabolic diseases.
Collapse
Affiliation(s)
- Masaki Kimura
- Center for Stem Cell and Organoid Medicine (CuSTOM), Division of Gastroenterology, Hepatology and Nutrition, Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA
| | - Takanori Takebe
- Center for Stem Cell and Organoid Medicine (CuSTOM), Division of Gastroenterology, Hepatology and Nutrition, Division of Developmental Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH 45229, USA; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH 45229, USA; Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Graduate School of Medicine, Osaka University, Osaka 565-0871, Japan; Institute of Research, Tokyo Medical and Dental University (TMDU), Tokyo 113-8510, Japan.
| |
Collapse
|
5
|
Peng Q, Gilder DA, Bernert RA, Karriker-Jaffe KJ, Ehlers CL. Genetic factors associated with suicidal behaviors and alcohol use disorders in an American Indian population. Mol Psychiatry 2024; 29:902-913. [PMID: 38177348 PMCID: PMC11176067 DOI: 10.1038/s41380-023-02379-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 12/12/2023] [Accepted: 12/13/2023] [Indexed: 01/06/2024]
Abstract
American Indians (AI) demonstrate the highest rates of both suicidal behaviors (SB) and alcohol use disorders (AUD) among all ethnic groups in the US. Rates of suicide and AUD vary substantially between tribal groups and across different geographical regions, underscoring a need to delineate more specific risk and resilience factors. Using data from over 740 AI living within eight contiguous reservations, we assessed genetic risk factors for SB by investigating: (1) possible genetic overlap with AUD, and (2) impacts of rare and low-frequency genomic variants. Suicidal behaviors included lifetime history of suicidal thoughts and acts, including verified suicide deaths, scored using a ranking variable for the SB phenotype (range 0-4). We identified five loci significantly associated with SB and AUD, two of which are intergenic and three intronic on genes AACSP1, ANK1, and FBXO11. Nonsynonymous rare and low-frequency mutations in four genes including SERPINF1 (PEDF), ZNF30, CD34, and SLC5A9, and non-intronic rare and low-frequency mutations in genes OPRD1, HSD17B3 and one lincRNA were significantly associated with SB. One identified pathway related to hypoxia-inducible factor (HIF) regulation, whose 83 nonsynonymous rare and low-frequency variants on 10 genes were significantly linked to SB as well. Four additional genes, and two pathways related to vasopressin-regulated water metabolism and cellular hexose transport, also were strongly associated with SB. This study represents the first investigation of genetic factors for SB in an American Indian population that has high risk for suicide. Our study suggests that bivariate association analysis between comorbid disorders can increase statistical power; and rare and low-frequency variant analysis in a high-risk population enabled by whole-genome sequencing has the potential to identify novel genetic factors. Although such findings may be population specific, rare functional mutations relating to PEDF and HIF regulation align with past reports and suggest a biological mechanism for suicide risk and a potential therapeutic target for intervention.
Collapse
Affiliation(s)
- Qian Peng
- Department of Neuroscience, The Scripps Research Institute, La Jolla, CA, USA.
| | - David A Gilder
- Department of Neuroscience, The Scripps Research Institute, La Jolla, CA, USA
| | - Rebecca A Bernert
- Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA
| | | | - Cindy L Ehlers
- Department of Neuroscience, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
6
|
Zhang Z, Reynolds SR, Stolrow HG, Chen J, Christensen BC, Salas LA. Deciphering the role of immune cell composition in epigenetic age acceleration: Insights from cell-type deconvolution applied to human blood epigenetic clocks. Aging Cell 2024; 23:e14071. [PMID: 38146185 PMCID: PMC10928575 DOI: 10.1111/acel.14071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 12/27/2023] Open
Abstract
Aging is a significant risk factor for various human disorders, and DNA methylation clocks have emerged as powerful tools for estimating biological age and predicting health-related outcomes. Methylation data from blood DNA has been a focus of more recently developed DNA methylation clocks. However, the impact of immune cell composition on epigenetic age acceleration (EAA) remains unclear as only some clocks incorporate partial cell type composition information when analyzing EAA. We investigated associations of 12 immune cell types measured by cell-type deconvolution with EAA predicted by six widely-used DNA methylation clocks in data from >10,000 blood samples. We observed significant associations of immune cell composition with EAA for all six clocks tested. Across the clocks, nine or more of the 12 cell types tested exhibited significant associations with EAA. Higher memory lymphocyte subtype proportions were associated with increased EAA, and naïve lymphocyte subtypes were associated with decreased EAA. To demonstrate the potential confounding of EAA by immune cell composition, we applied EAA in rheumatoid arthritis. Our research maps immune cell type contributions to EAA in human blood and offers opportunities to adjust for immune cell composition in EAA studies to a significantly more granular level. Understanding associations of EAA with immune profiles has implications for the interpretation of epigenetic age and its relevance in aging and disease research. Our detailed map of immune cell type contributions serves as a resource for studies utilizing epigenetic clocks across diverse research fields, including aging-related diseases, precision medicine, and therapeutic interventions.
Collapse
Affiliation(s)
- Ze Zhang
- Department of EpidemiologyGeisel School of Medicine at DartmouthLebanonNew HampshireUSA
- Dartmouth Cancer CenterDartmouth‐Hitchcock Medical CenterLebanonNew HampshireUSA
- Quantitative Biomedical Sciences ProgramGuarini School of Graduate and Advanced StudiesHanoverNew HampshireUSA
| | - Samuel R. Reynolds
- Department of EpidemiologyGeisel School of Medicine at DartmouthLebanonNew HampshireUSA
| | - Hannah G. Stolrow
- Department of EpidemiologyGeisel School of Medicine at DartmouthLebanonNew HampshireUSA
- Dartmouth Cancer CenterDartmouth‐Hitchcock Medical CenterLebanonNew HampshireUSA
| | - Ji‐Qing Chen
- Department of EpidemiologyGeisel School of Medicine at DartmouthLebanonNew HampshireUSA
- Molecular and Cellular Biology ProgramGuarini School of Graduate and Advanced StudiesHanoverNew HampshireUSA
| | - Brock C. Christensen
- Department of EpidemiologyGeisel School of Medicine at DartmouthLebanonNew HampshireUSA
- Dartmouth Cancer CenterDartmouth‐Hitchcock Medical CenterLebanonNew HampshireUSA
- Quantitative Biomedical Sciences ProgramGuarini School of Graduate and Advanced StudiesHanoverNew HampshireUSA
- Molecular and Cellular Biology ProgramGuarini School of Graduate and Advanced StudiesHanoverNew HampshireUSA
| | - Lucas A. Salas
- Department of EpidemiologyGeisel School of Medicine at DartmouthLebanonNew HampshireUSA
- Dartmouth Cancer CenterDartmouth‐Hitchcock Medical CenterLebanonNew HampshireUSA
- Quantitative Biomedical Sciences ProgramGuarini School of Graduate and Advanced StudiesHanoverNew HampshireUSA
- Molecular and Cellular Biology ProgramGuarini School of Graduate and Advanced StudiesHanoverNew HampshireUSA
| |
Collapse
|
7
|
Sigala RE, Lagou V, Shmeliov A, Atito S, Kouchaki S, Awais M, Prokopenko I, Mahdi A, Demirkan A. Machine Learning to Advance Human Genome-Wide Association Studies. Genes (Basel) 2023; 15:34. [PMID: 38254924 PMCID: PMC10815885 DOI: 10.3390/genes15010034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 12/19/2023] [Accepted: 12/22/2023] [Indexed: 01/24/2024] Open
Abstract
Machine learning, including deep learning, reinforcement learning, and generative artificial intelligence are revolutionising every area of our lives when data are made available. With the help of these methods, we can decipher information from larger datasets while addressing the complex nature of biological systems in a more efficient way. Although machine learning methods have been introduced to human genetic epidemiological research as early as 2004, those were never used to their full capacity. In this review, we outline some of the main applications of machine learning to assigning human genetic loci to health outcomes. We summarise widely used methods and discuss their advantages and challenges. We also identify several tools, such as Combi, GenNet, and GMSTool, specifically designed to integrate these methods for hypothesis-free analysis of genetic variation data. We elaborate on the additional value and limitations of these tools from a geneticist's perspective. Finally, we discuss the fast-moving field of foundation models and large multi-modal omics biobank initiatives.
Collapse
Affiliation(s)
- Rafaella E. Sigala
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Vasiliki Lagou
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Aleksey Shmeliov
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
| | - Sara Atito
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Samaneh Kouchaki
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Muhammad Awais
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
- Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford GU2 7XH, Surrey, UK
| | - Inga Prokopenko
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
| | - Adam Mahdi
- Oxford Internet Institute, University of Oxford, Oxford OX1 3JS, Oxfordshire, UK;
| | - Ayse Demirkan
- Section of Statistical Multi-Omics, Department of Clinical and Experimental Medicine, Guildford GU2 7XH, Surrey, UK; (R.E.S.); (V.L.); (A.S.); (I.P.)
- Surrey Institute for People-Centred Artificial Intelligence, University of Surrey, Guildford GU2 7XH, Surrey, UK; (S.A.); (S.K.); (M.A.)
| |
Collapse
|
8
|
Bettencourt C, Skene N, Bandres-Ciga S, Anderson E, Winchester LM, Foote IF, Schwartzentruber J, Botia JA, Nalls M, Singleton A, Schilder BM, Humphrey J, Marzi SJ, Toomey CE, Kleifat AA, Harshfield EL, Garfield V, Sandor C, Keat S, Tamburin S, Frigerio CS, Lourida I, Ranson JM, Llewellyn DJ. Artificial intelligence for dementia genetics and omics. Alzheimers Dement 2023; 19:5905-5921. [PMID: 37606627 PMCID: PMC10841325 DOI: 10.1002/alz.13427] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 07/14/2023] [Accepted: 07/18/2023] [Indexed: 08/23/2023]
Abstract
Genetics and omics studies of Alzheimer's disease and other dementia subtypes enhance our understanding of underlying mechanisms and pathways that can be targeted. We identified key remaining challenges: First, can we enhance genetic studies to address missing heritability? Can we identify reproducible omics signatures that differentiate between dementia subtypes? Can high-dimensional omics data identify improved biomarkers? How can genetics inform our understanding of causal status of dementia risk factors? And which biological processes are altered by dementia-related genetic variation? Artificial intelligence (AI) and machine learning approaches give us powerful new tools in helping us to tackle these challenges, and we review possible solutions and examples of best practice. However, their limitations also need to be considered, as well as the need for coordinated multidisciplinary research and diverse deeply phenotyped cohorts. Ultimately AI approaches improve our ability to interrogate genetics and omics data for precision dementia medicine. HIGHLIGHTS: We have identified five key challenges in dementia genetics and omics studies. AI can enable detection of undiscovered patterns in dementia genetics and omics data. Enhanced and more diverse genetics and omics datasets are still needed. Multidisciplinary collaborative efforts using AI can boost dementia research.
Collapse
Affiliation(s)
- Conceicao Bettencourt
- Department of Neurodegenerative Disease, UCL Queen Square Institute of Neurology, London, UK
- Queen Square Brain Bank for Neurological Disorders, UCL Queen Square Institute of Neurology, London, UK
| | - Nathan Skene
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Sara Bandres-Ciga
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
| | - Emma Anderson
- Department of Mental Health of Older People, Division of Psychiatry, University College London, London, UK
| | | | - Isabelle F Foote
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, USA
| | - Jeremy Schwartzentruber
- Open Targets, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
- Illumina Artificial Intelligence Laboratory, Illumina Inc, Foster City, California, USA
| | - Juan A Botia
- Departamento de Ingeniería de la Información y las Comunicaciones, Universidad de Murcia, Murcia, Spain
| | - Mike Nalls
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
- Data Tecnica International LLC, Washington, DC, USA
| | - Andrew Singleton
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland, USA
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, USA
| | - Brian M Schilder
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Jack Humphrey
- Nash Family Department of Neuroscience and Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, USA
| | - Sarah J Marzi
- UK Dementia Research Institute, Imperial College London, London, UK
- Department of Brain Sciences, Imperial College London, London, UK
| | - Christina E Toomey
- Queen Square Brain Bank for Neurological Disorders, UCL Queen Square Institute of Neurology, London, UK
- Department of Clinical and Movement Neuroscience, UCL Queen Square Institute of Neurology, London, UK
- The Francis Crick Institute, London, UK
| | - Ahmad Al Kleifat
- Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Eric L Harshfield
- Stroke Research Group, Department of Clinical Neurosciences, University of Cambridge, Cambridge, UK
| | - Victoria Garfield
- MRC Unit for Lifelong Health and Ageing, Institute of Cardiovascular Science, University College London, London, UK
| | - Cynthia Sandor
- UK Dementia Research Institute. School of Medicine, Cardiff University, Cardiff, UK
| | - Samuel Keat
- UK Dementia Research Institute. School of Medicine, Cardiff University, Cardiff, UK
| | - Stefano Tamburin
- Department of Neurosciences, Biomedicine and Movement Sciences, Neurology Section, University of Verona, Verona, Italy
| | - Carlo Sala Frigerio
- UK Dementia Research Institute, Queen Square Institute of Neurology, University College London, London, UK
| | | | | | - David J Llewellyn
- University of Exeter Medical School, Exeter, UK
- The Alan Turing Institute, London, UK
| |
Collapse
|
9
|
Chandrashekar PB, Alatkar S, Wang J, Hoffman GE, He C, Jin T, Khullar S, Bendl J, Fullard JF, Roussos P, Wang D. DeepGAMI: deep biologically guided auxiliary learning for multimodal integration and imputation to improve genotype-phenotype prediction. Genome Med 2023; 15:88. [PMID: 37904203 PMCID: PMC10617196 DOI: 10.1186/s13073-023-01248-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 10/16/2023] [Indexed: 11/01/2023] Open
Abstract
BACKGROUND Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging. Additionally, the partial availability of these multimodal data presents a challenge in developing these predictive models. METHOD To address these challenges, we developed DeepGAMI, an interpretable neural network model to improve genotype-phenotype prediction from multimodal data. DeepGAMI leverages functional genomic information, such as eQTLs and gene regulation, to guide neural network connections. Additionally, it includes an auxiliary learning layer for cross-modal imputation allowing the imputation of latent features of missing modalities and thus predicting phenotypes from a single modality. Finally, DeepGAMI uses integrated gradient to prioritize multimodal features for various phenotypes. RESULTS We applied DeepGAMI to several multimodal datasets including genotype and bulk and cell-type gene expression data in brain diseases, and gene expression and electrophysiology data of mouse neuronal cells. Using cross-validation and independent validation, DeepGAMI outperformed existing methods for classifying disease types, and cellular and clinical phenotypes, even using single modalities (e.g., AUC score of 0.79 for Schizophrenia and 0.73 for cognitive impairment in Alzheimer's disease). CONCLUSION We demonstrated that DeepGAMI improves phenotype prediction and prioritizes phenotypic features and networks in multiple multimodal datasets in complex brains and brain diseases. Also, it prioritized disease-associated variants, genes, and regulatory networks linked to different phenotypes, providing novel insights into the interpretation of gene regulatory mechanisms. DeepGAMI is open-source and available for general use.
Collapse
Affiliation(s)
- Pramod Bharadwaj Chandrashekar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Sayali Alatkar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Jiebiao Wang
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, 15261, USA
| | - Gabriel E Hoffman
- Center for Disease Neurogenomics, Department of Psychiatry and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Chenfeng He
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Ting Jin
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Saniya Khullar
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA
| | - Jaroslav Bendl
- Center for Disease Neurogenomics, Department of Psychiatry and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - John F Fullard
- Center for Disease Neurogenomics, Department of Psychiatry and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
| | - Panos Roussos
- Center for Disease Neurogenomics, Department of Psychiatry and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA
- Mental Illness Research, Education and Clinical Centers, James J. Peters VA Medical Center, Bronx, NY, 10468, USA
- Center for Dementia Research, Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, 10962, USA
| | - Daifeng Wang
- Waisman Center, University of Wisconsin-Madison, Madison, WI, 53705, USA.
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, 53076, USA.
- Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI, 53076, USA.
| |
Collapse
|
10
|
Zhu C, Baumgarten N, Wu M, Wang Y, Das AP, Kaur J, Ardakani FB, Duong TT, Pham MD, Duda M, Dimmeler S, Yuan T, Schulz MH, Krishnan J. CVD-associated SNPs with regulatory potential reveal novel non-coding disease genes. Hum Genomics 2023; 17:69. [PMID: 37491351 PMCID: PMC10369730 DOI: 10.1186/s40246-023-00513-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 07/12/2023] [Indexed: 07/27/2023] Open
Abstract
BACKGROUND Cardiovascular diseases (CVDs) are the leading cause of death worldwide. Genome-wide association studies (GWAS) have identified many single nucleotide polymorphisms (SNPs) appearing in non-coding genomic regions in CVDs. The SNPs may alter gene expression by modifying transcription factor (TF) binding sites and lead to functional consequences in cardiovascular traits or diseases. To understand the underlying molecular mechanisms, it is crucial to identify which variations are involved and how they affect TF binding. METHODS The SNEEP (SNP exploration and analysis using epigenomics data) pipeline was used to identify regulatory SNPs, which alter the binding behavior of TFs and link GWAS SNPs to their potential target genes for six CVDs. The human-induced pluripotent stem cells derived cardiomyocytes (hiPSC-CMs), monoculture cardiac organoids (MCOs) and self-organized cardiac organoids (SCOs) were used in the study. Gene expression, cardiomyocyte size and cardiac contractility were assessed. RESULTS By using our integrative computational pipeline, we identified 1905 regulatory SNPs in CVD GWAS data. These were associated with hundreds of genes, half of them non-coding RNAs (ncRNAs), suggesting novel CVD genes. We experimentally tested 40 CVD-associated non-coding RNAs, among them RP11-98F14.11, RPL23AP92, IGBP1P1, and CTD-2383I20.1, which were upregulated in hiPSC-CMs, MCOs and SCOs under hypoxic conditions. Further experiments showed that IGBP1P1 depletion rescued expression of hypertrophic marker genes, reduced hypoxia-induced cardiomyocyte size and improved hypoxia-reduced cardiac contractility in hiPSC-CMs and MCOs. CONCLUSIONS IGBP1P1 is a novel ncRNA with key regulatory functions in modulating cardiomyocyte size and cardiac function in our disease models. Our data suggest ncRNA IGBP1P1 as a potential therapeutic target to improve cardiac function in CVDs.
Collapse
Affiliation(s)
- Chaonan Zhu
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Nina Baumgarten
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Meiqian Wu
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
| | - Yue Wang
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
| | - Arka Provo Das
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Jaskiran Kaur
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
| | - Fatemeh Behjati Ardakani
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Thanh Thuy Duong
- Genome Biologics, Theodor-Stern-Kai 7, 60590, Frankfurt Am Main, Germany
| | - Minh Duc Pham
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
- Department of Medicine III, Cardiology/Angiology/ Nephrology, Goethe University Hospital, Frankfurt, Germany
- Genome Biologics, Theodor-Stern-Kai 7, 60590, Frankfurt Am Main, Germany
| | - Maria Duda
- Genome Biologics, Theodor-Stern-Kai 7, 60590, Frankfurt Am Main, Germany
| | - Stefanie Dimmeler
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany
| | - Ting Yuan
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany.
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany.
- Department of Medicine III, Cardiology/Angiology/ Nephrology, Goethe University Hospital, Frankfurt, Germany.
| | - Marcel H Schulz
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany.
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany.
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany.
| | - Jaya Krishnan
- Institute for Cardiovascular Regeneration, Goethe University, 60590, Frankfurt Am Main, Germany.
- German Center for Cardiovascular Research, Partner Site Rhein-Main, 60590, Frankfurt Am Main, Germany.
- Cardio-Pulmonary Institute, Goethe University Hospital, 60590, Frankfurt Am Main, Germany.
- Department of Medicine III, Cardiology/Angiology/ Nephrology, Goethe University Hospital, Frankfurt, Germany.
| |
Collapse
|
11
|
Bhat JA, Feng X, Mir ZA, Raina A, Siddique KHM. Recent advances in artificial intelligence, mechanistic models, and speed breeding offer exciting opportunities for precise and accelerated genomics-assisted breeding. PHYSIOLOGIA PLANTARUM 2023; 175:e13969. [PMID: 37401892 DOI: 10.1111/ppl.13969] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 06/11/2023] [Accepted: 06/27/2023] [Indexed: 07/05/2023]
Abstract
Given the challenges of population growth and climate change, there is an urgent need to expedite the development of high-yielding stress-tolerant crop cultivars. While traditional breeding methods have been instrumental in ensuring global food security, their efficiency, precision, and labour intensiveness have become increasingly inadequate to address present and future challenges. Fortunately, recent advances in high-throughput phenomics and genomics-assisted breeding (GAB) provide a promising platform for enhancing crop cultivars with greater efficiency. However, several obstacles must be overcome to optimize the use of these techniques in crop improvement, such as the complexity of phenotypic analysis of big image data. In addition, the prevalent use of linear models in genome-wide association studies (GWAS) and genomic selection (GS) fails to capture the nonlinear interactions of complex traits, limiting their applicability for GAB and impeding crop improvement. Recent advances in artificial intelligence (AI) techniques have opened doors to nonlinear modelling approaches in crop breeding, enabling the capture of nonlinear and epistatic interactions in GWAS and GS and thus making this variation available for GAB. While statistical and software challenges persist in AI-based models, they are expected to be resolved soon. Furthermore, recent advances in speed breeding have significantly reduced the time (3-5-fold) required for conventional breeding. Thus, integrating speed breeding with AI and GAB could improve crop cultivar development within a considerably shorter timeframe while ensuring greater accuracy and efficiency. In conclusion, this integrated approach could revolutionize crop breeding paradigms and safeguard food production in the face of population growth and climate change.
Collapse
Affiliation(s)
| | - Xianzhong Feng
- Zhejiang Lab, Hangzhou, China
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
| | - Zahoor A Mir
- ICAR-National Bureau of Plant Genetic Resources, New Delhi, India
| | - Aamir Raina
- Department of Botany, Faculty of Life Sciences, Aligarh Muslim University, Aligarh, India
| | - Kadambot H M Siddique
- The UWA Institute of Agriculture and School of Agriculture & Environment, The University of Western Australia, Perth, Western Australia, Australia
| |
Collapse
|
12
|
Gilder D, Bernert R, Karriker-Jaffe K, Ehlers C, Peng Q. Genetic Factors Associated with Suicidal Behaviors and Alcohol Use Disorders in an American Indian Population. RESEARCH SQUARE 2023:rs.3.rs-2950284. [PMID: 37398076 PMCID: PMC10312956 DOI: 10.21203/rs.3.rs-2950284/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
American Indians (AI) demonstrate the highest rates of both suicidal behaviors (SB) and alcohol use disorders (AUD) among all ethnic groups in the US. Rates of suicide and AUD vary substantially between tribal groups and across different geographical regions, underscoring a need to delineate more specific risk and resilience factors. Using data from over 740 AI living within eight contiguous reservations, we assessed genetic risk factors for SB by investigating: (1) possible genetic overlap with AUD, and (2) impacts of rare and low frequency genomic variants. Suicidal behaviors included lifetime history of suicidal thoughts and acts, including verified suicide deaths, scored using a ranking variable for the SB phenotype (range 0-4). We identified five loci significantly associated with SB and AUD, two of which are intergenic and three intronic on genes AACSP1, ANK1, and FBXO11. Nonsynonymous rare mutations in four genes including SERPINF1 (PEDF), ZNF30, CD34, and SLC5A9, and non-intronic rare mutations in genes OPRD1, HSD17B3 and one lincRNA were significantly associated with SB. One identified pathway related to hypoxia-inducible factor (HIF) regulation, whose 83 nonsynonymous rare variants on 10 genes were significantly linked to SB as well. Four additional genes, and two pathways related to vasopressin-regulated water metabolism and cellular hexose transport, also were strongly associated with SB. This study represents the first investigation of genetic factors for SB in an American Indian population that has high risk for suicide. Our study suggests that bivariate association analysis between comorbid disorders can increase statistical power; and rare variant analysis in a high-risk population enabled by whole-genome sequencing has the potential to identify novel genetic factors. Although such findings may be population specific, rare functional mutations relating to PEDF and HIF regulation align with past reports and suggest a biological mechanism for suicide risk and a potential therapeutic target for intervention.
Collapse
|
13
|
Nandhini K, Tamilpavai G. An Optimal Stacked ResNet-BiLSTM-Based Accurate Detection and Classification of Genetic Disorders. Neural Process Lett 2023:1-22. [PMID: 37359129 PMCID: PMC10196306 DOI: 10.1007/s11063-023-11195-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/13/2023] [Indexed: 06/28/2023]
Abstract
Gene is located inside the nuclease and the genetic data is contained in deoxyribonucleic acid (DNA). A person's gene count ranges from 20,000 to 30,000. Even a minor alteration to the DNA sequence can be harmful if it affects the cell's fundamental functions. As a result, the gene begins to act abnormally. The sorts of genetic abnormalities brought on by mutation include chromosomal disorders, complex disorders, and single-gene disorders. Therefore, a detailed diagnosis method is required. Thus, we proposed an Elephant Herd Optimization-Whale Optimization Algorithm (EHO-WOA) optimized Stacked ResNet-Bidirectional Long Term Short Memory (ResNet-BiLSTM) model for detecting genetic disorders. Here, a hybrid EHO-WOA algorithm is presented to assess the Stacked ResNet-BiLSTM architecture's fitness. The ResNet-BiLSTM design uses the genotype and gene expression phenotype as input data. Furthermore, the proposed method identifies rare genetic disorders such as Angelman Syndrome, Rett Syndrome, and Prader-Willi Syndrome. It demonstrates the effectiveness of the developed model with greater accuracy, recall, specificity, precision, and f1-score. Thus, a wide range of DNA deficiencies including Prader-Willi syndrome, Marfan syndrome, Early Onset Morbid Obesity, Rett syndrome, and Angelman syndrome are predicted accurately.
Collapse
Affiliation(s)
- K. Nandhini
- Department of Computer Science and Engineering, Anna University, Chennai, India
| | - G. Tamilpavai
- Department of Computer Science and Engineering, Government College of Engineering, Tirunelveli, India
| |
Collapse
|
14
|
Tanifuji T, Okazaki S, Otsuka I, Mouri K, Horai T, Shindo R, Shirai T, Hishimoto A. Epigenetic clock analysis reveals increased plasma cystatin C levels based on DNA methylation in major depressive disorder. Psychiatry Res 2023; 322:115103. [PMID: 36803907 DOI: 10.1016/j.psychres.2023.115103] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 01/01/2023] [Accepted: 02/06/2023] [Indexed: 02/10/2023]
Abstract
Major depressive disorder (MDD) is a common mental illness and a major public health concern worldwide. Depression is associated with epigenetic changes that regulate gene expression, and analyzing these changes may help elucidate the pathophysiology of MDD. Genome-wide DNA methylation (DNAm) profiles can function as 'epigenetic clocks' that can help estimate biological aging. Here, we assessed biological aging in patients with MDD using various DNAm-based indicators of epigenetic aging. We used a publicly available dataset containing data obtained from the whole blood samples of MDD patients (n = 489) and controls (n = 210). We analyzed five epigenetic clocks (HorvathAge, HannumAge, SkinBloodAge, PhenoAge, and GrimAge) and DNAm-based telomere length (DNAmTL). We also investigated seven DNAm-based age-predictive plasma proteins (including cystatin C) and smoking status, which are components of GrimAge. Following adjustment for confounding factors such as age and sex, patients with MDD showed no significant difference in epigenetic clocks and DNAmTL. However, DNAm-based plasma cystatin C levels were significantly higher in patients with MDD than controls. Our findings revealed specific DNAm changes predicting plasma cystatin C levels in MDD. These findings may help elucidate the pathophysiology of MDD, leading to the development of new biomarkers and medications.
Collapse
Affiliation(s)
- Takaki Tanifuji
- Department of Psychiatry, Kobe University Graduate School of Medicine, 7-5-1 Kusunoki-cho, Chuo-ku, Kobe 650-0017, Japan
| | - Satoshi Okazaki
- Department of Psychiatry, Kobe University Graduate School of Medicine, 7-5-1 Kusunoki-cho, Chuo-ku, Kobe 650-0017, Japan.
| | - Ikuo Otsuka
- Department of Psychiatry, Kobe University Graduate School of Medicine, 7-5-1 Kusunoki-cho, Chuo-ku, Kobe 650-0017, Japan
| | - Kentaro Mouri
- Department of Psychiatry, Kobe University Graduate School of Medicine, 7-5-1 Kusunoki-cho, Chuo-ku, Kobe 650-0017, Japan
| | - Tadasu Horai
- Department of Psychiatry, Kobe University Graduate School of Medicine, 7-5-1 Kusunoki-cho, Chuo-ku, Kobe 650-0017, Japan
| | - Ryota Shindo
- Department of Psychiatry, Kobe University Graduate School of Medicine, 7-5-1 Kusunoki-cho, Chuo-ku, Kobe 650-0017, Japan
| | - Toshiyuki Shirai
- Department of Psychiatry, Kobe University Graduate School of Medicine, 7-5-1 Kusunoki-cho, Chuo-ku, Kobe 650-0017, Japan
| | - Akitoyo Hishimoto
- Department of Psychiatry, Kobe University Graduate School of Medicine, 7-5-1 Kusunoki-cho, Chuo-ku, Kobe 650-0017, Japan; Department of Psychiatry, Yokohama City University Graduate School of Medicine, Yokohama, Japan
| |
Collapse
|
15
|
Liu Z, Dai W, Wang S, Yao Y, Zhang H. Deep learning identified genetic variants for COVID-19-related mortality among 28,097 affected cases in UK Biobank. Genet Epidemiol 2023; 47:215-230. [PMID: 36691909 PMCID: PMC10006374 DOI: 10.1002/gepi.22515] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2022] [Revised: 10/19/2022] [Accepted: 01/11/2023] [Indexed: 01/25/2023]
Abstract
Analysis of host genetic components provides insights into the susceptibility and response to viral infection such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes coronavirus disease 2019 (COVID-19). To reveal genetic determinants of susceptibility to COVID-19 related mortality, we train a deep learning model to identify groups of genetic variants and their interactions that contribute to the COVID-19 related mortality risk using the UK Biobank data (28,097 affected cases and 1656 deaths). We refer to such groups of variants as super variants. We identify 15 super variants with various levels of significance as susceptibility loci for COVID-19 mortality. Specifically, we identify a super variant (odds ratio [OR] = 1.594, p = 5.47 × 10-9 ) on Chromosome 7 that consists of the minor allele of rs76398985, rs6943608, rs2052130, 7:150989011_CT_C, rs118033050, and rs12540488. We also discover a super variant (OR = 1.353, p = 2.87 × 10-8 ) on Chromosome 5 that contains rs12517344, rs72733036, rs190052994, rs34723029, rs72734818, 5:9305797_GTA_G, and rs180899355.
Collapse
Affiliation(s)
- Zihuan Liu
- Department of Biostatistics, Yale University, 300 George Street, Ste 523, New Haven, CT, 06511
| | - Wei Dai
- Department of Biostatistics, Yale University, 300 George Street, Ste 523, New Haven, CT, 06511
| | - Shiying Wang
- Department of Biostatistics, Yale University, 300 George Street, Ste 523, New Haven, CT, 06511
| | - Yisha Yao
- Department of Biostatistics, Yale University, 300 George Street, Ste 523, New Haven, CT, 06511
| | - Heping Zhang
- Department of Biostatistics, Yale University, 300 George Street, Ste 523, New Haven, CT, 06511
| |
Collapse
|
16
|
Li Z, Gao E, Zhou J, Han W, Xu X, Gao X. Applications of deep learning in understanding gene regulation. CELL REPORTS METHODS 2023; 3:100384. [PMID: 36814848 PMCID: PMC9939384 DOI: 10.1016/j.crmeth.2022.100384] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2023]
Abstract
Gene regulation is a central topic in cell biology. Advances in omics technologies and the accumulation of omics data have provided better opportunities for gene regulation studies than ever before. For this reason deep learning, as a data-driven predictive modeling approach, has been successfully applied to this field during the past decade. In this article, we aim to give a brief yet comprehensive overview of representative deep-learning methods for gene regulation. Specifically, we discuss and compare the design principles and datasets used by each method, creating a reference for researchers who wish to replicate or improve existing methods. We also discuss the common problems of existing approaches and prospectively introduce the emerging deep-learning paradigms that will potentially alleviate them. We hope that this article will provide a rich and up-to-date resource and shed light on future research directions in this area.
Collapse
Affiliation(s)
- Zhongxiao Li
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Elva Gao
- The KAUST School, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Juexiao Zhou
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Wenkai Han
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xiaopeng Xu
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Xin Gao
- Computer Science Program, Computer, Electrical and Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Kingdom of Saudi Arabia
| |
Collapse
|
17
|
Gurung RL, Burdon KP, McComish BJ. A Guide to Genome-Wide Association Study Design for Diabetic Retinopathy. Methods Mol Biol 2023; 2678:49-89. [PMID: 37326705 DOI: 10.1007/978-1-0716-3255-0_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Diabetic retinopathy (DR) is the most common microvascular complication related to diabetes. There is evidence that genetics play an important role in DR pathogenesis, but the complexity of the disease makes genetic studies a challenge. This chapter is a practical overview of the basic steps for genome-wide association studies with respect to DR and its associated traits. Also described are approaches that can be adopted in future DR studies. This is intended to serve as a guide for beginners and to provide a framework for further in-depth analysis.
Collapse
Affiliation(s)
- Rajya L Gurung
- Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS, Australia.
| | - Kathryn P Burdon
- Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS, Australia.
| | - Bennet J McComish
- Menzies Institute for Medical Research, University of Tasmania, Hobart, TAS, Australia
| |
Collapse
|
18
|
Kuksa PP, Greenfest-Allen E, Cifello J, Ionita M, Wang H, Nicaretta H, Cheng PL, Lee WP, Wang LS, Leung YY. Scalable approaches for functional analyses of whole-genome sequencing non-coding variants. Hum Mol Genet 2022; 31:R62-R72. [PMID: 35943817 PMCID: PMC9585666 DOI: 10.1093/hmg/ddac191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 08/04/2022] [Accepted: 08/08/2022] [Indexed: 11/23/2022] Open
Abstract
Non-coding genetic variants outside of protein-coding genome regions play an important role in genetic and epigenetic regulation. It has become increasingly important to understand their roles, as non-coding variants often make up the majority of top findings of genome-wide association studies (GWAS). In addition, the growing popularity of disease-specific whole-genome sequencing (WGS) efforts expands the library of and offers unique opportunities for investigating both common and rare non-coding variants, which are typically not detected in more limited GWAS approaches. However, the sheer size and breadth of WGS data introduce additional challenges to predicting functional impacts in terms of data analysis and interpretation. This review focuses on the recent approaches developed for efficient, at-scale annotation and prioritization of non-coding variants uncovered in WGS analyses. In particular, we review the latest scalable annotation tools, databases and functional genomic resources for interpreting the variant findings from WGS based on both experimental data and in silico predictive annotations. We also review machine learning-based predictive models for variant scoring and prioritization. We conclude with a discussion of future research directions which will enhance the data and tools necessary for the effective functional analyses of variants identified by WGS to improve our understanding of disease etiology.
Collapse
Affiliation(s)
- Pavel P Kuksa
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Emily Greenfest-Allen
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jeffrey Cifello
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Matei Ionita
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Hui Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Heather Nicaretta
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Po-Liang Cheng
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Li-San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
19
|
Cohort profile: the Food Chain Plus (FoCus) cohort. Eur J Epidemiol 2022; 37:1087-1105. [PMID: 36245062 PMCID: PMC9630232 DOI: 10.1007/s10654-022-00924-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 09/25/2022] [Indexed: 11/16/2022]
Abstract
The Food Chain Plus (FoCus) cohort was launched in 2011 for population-based research related to metabolic inflammation. To characterize this novel pathology in a comprehensive manner, data collection included multiple omics layers such as phenomics, microbiomics, metabolomics, genomics, and metagenomics as well as nutrition profiling, taste perception phenotyping and social network analysis. The cohort was set-up to represent a Northern German population of the Kiel region. Two-step recruitment included the randomised enrolment of participants via residents’ registration offices and via the Obesity Outpatient Centre of the University Medical Center Schleswig–Holstein (UKSH). Hence, both a population- and metabolic inflammation- based cohort was created. In total, 1795 individuals were analysed at baseline. Baseline data collection took place between 2011 and 2014, including 63% females and 37% males with an age range of 18–83 years. The median age of all participants was 52.0 years [IQR: 42.5; 63.0 years] and the median baseline BMI in the study population was 27.7 kg/m2 [IQR: 23.7; 35.9 kg/m2]. In the baseline cohort, 14.1% of participants had type 2 diabetes mellitus, which was more prevalent in the subjects of the metabolic inflammation group (MIG; 31.8%). Follow-up for the assessment of disease progression, as well as the onset of new diseases with changes in subject’s phenotype, diet or lifestyle factors is planned every 5 years. The first follow-up period was finished in 2020 and included 820 subjects.
Collapse
|
20
|
Marttila S, Tamminen H, Rajić S, Mishra PP, Lehtimäki T, Raitakari O, Kähönen M, Kananen L, Jylhävä J, Hägg S, Delerue T, Peters A, Waldenberger M, Kleber ME, März W, Luoto R, Raitanen J, Sillanpää E, Laakkonen EK, Heikkinen A, Ollikainen M, Raitoharju E. Methylation status of VTRNA2-1/ nc886 is stable across populations, monozygotic twin pairs and in majority of tissues. Epigenomics 2022; 14:1105-1124. [PMID: 36200237 DOI: 10.2217/epi-2022-0228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Aims & methods: The aim of this study was to characterize the methylation level of a polymorphically imprinted gene, VTRNA2-1/nc886, in human populations and somatic tissues.48 datasets, consisting of more than 30 tissues and >30,000 individuals, were used. Results: nc886 methylation status is associated with twin status and ethnic background, but the variation between populations is limited. Monozygotic twin pairs present concordant methylation, whereas ∼30% of dizygotic twin pairs present discordant methylation in the nc886 locus. The methylation levels of nc886 are uniform across somatic tissues, except in cerebellum and skeletal muscle. Conclusion: The nc886 imprint may be established in the oocyte, and, after implantation, the methylation status is stable, excluding a few specific tissues.
Collapse
Affiliation(s)
- Saara Marttila
- Molecular Epidemiology, Faculty of Medicine & Health Technology, Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland.,Gerontology Research Center, Tampere University, Tampere, 33014, Finland
| | - Hely Tamminen
- Molecular Epidemiology, Faculty of Medicine & Health Technology, Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland
| | - Sonja Rajić
- Molecular Epidemiology, Faculty of Medicine & Health Technology, Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland
| | - Pashupati P Mishra
- Department of Clinical Chemistry, Faculty of Medicine & Health Technology, Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland.,Finnish Cardiovascular Research Center Tampere, Faculty of Medicine & Health Technology, Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland.,Fimlab Laboratories, Arvo Ylpön katu 4, Tampere, 33520, Finland
| | - Terho Lehtimäki
- Department of Clinical Chemistry, Faculty of Medicine & Health Technology, Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland.,Finnish Cardiovascular Research Center Tampere, Faculty of Medicine & Health Technology, Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland.,Fimlab Laboratories, Arvo Ylpön katu 4, Tampere, 33520, Finland
| | - Olli Raitakari
- Centre for Population Health Research, University of Turku & Turku University Hospital, Turku, 20014, Finland.,Research Centre of Applied & Preventive Cardiovascular Medicine, University of Turku, Turku, 20014, Finland.,Department of Clinical Physiology & Nuclear Medicine, Turku University Hospital, Turku, 20014, Finland
| | - Mika Kähönen
- Finnish Cardiovascular Research Center Tampere, Faculty of Medicine & Health Technology, Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland.,Department of Clinical Physiology, Tampere University Hospital, Tampere, 33521, Finland
| | - Laura Kananen
- Faculty of Medicine & Health Technology, & Gerontology Research Center, Tampere University, Arvo Ylpön katu 34, Tampere, 33520,Finland.,Department of Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, 171 77, Sweden.,Faculty of Social Sciences (Health Sciences), & Gerontology Research Center, Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland
| | - Juulia Jylhävä
- Department of Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, 171 77, Sweden.,Faculty of Social Sciences (Health Sciences), & Gerontology Research Center, Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland
| | - Sara Hägg
- Department of Medical Epidemiology & Biostatistics, Karolinska Institutet, Stockholm, 171 77, Sweden
| | - Thomas Delerue
- Research Unit Molecular Epidemiology, Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Bavaria, D-85764,, Germany
| | - Annette Peters
- Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Bavaria, D-85764, Germany.,German Center for Cardiovascular Research (DZHK), Partner Site Munich Heart Alliance, Munich, Germany
| | - Melanie Waldenberger
- Research Unit Molecular Epidemiology, Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Bavaria, D-85764,, Germany.,German Center for Cardiovascular Research (DZHK), Partner Site Munich Heart Alliance, Munich, Germany
| | - Marcus E Kleber
- Vth Department of Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, 68167, Germany.,SYNLAB MVZ Humangenetik Mannheim, Mannheim, Germany
| | - Winfried März
- Vth Department of Medicine, Medical Faculty Mannheim, Heidelberg University, Mannheim, 68167, Germany.,Competence Cluster for Nutrition & Cardiovascular Health (nutriCARD) Halle-Jena-Leipzig, Jena, 07743, Germany.,SYNLAB Academy, SYNLAB Holding Deutschland GmbH, Augsburg, 86156, Germany.,Clinical Institute of Medical & Chemical Laboratory Diagnostics, Medical University of Graz, Graz, 8010, Austria
| | - Riitta Luoto
- The Social Insurance Institute of Finland (Kela), Helsinki, 00250, Finland.,The UKK Institute for Health Promotion Research, Kaupinpuistonkatu 1, Tampere, 33500, Finland
| | - Jani Raitanen
- The UKK Institute for Health Promotion Research, Kaupinpuistonkatu 1, Tampere, 33500, Finland.,Faculty of Social Sciences (Health Sciences), Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland
| | - Elina Sillanpää
- Gerontology Research Center & Faculty of Sport & Health Sciences, University of Jyväskylä, Jyväskylä, 40014, Finland.,Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, 00014, Finland
| | - Eija K Laakkonen
- Gerontology Research Center & Faculty of Sport & Health Sciences, University of Jyväskylä, Jyväskylä, 40014, Finland
| | - Aino Heikkinen
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, 00014, Finland
| | - Miina Ollikainen
- Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, 00014, Finland
| | - Emma Raitoharju
- Molecular Epidemiology, Faculty of Medicine & Health Technology, Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland.,Finnish Cardiovascular Research Center Tampere, Faculty of Medicine & Health Technology, Tampere University, Arvo Ylpön katu 34, Tampere, 33520, Finland
| |
Collapse
|
21
|
Identifying interpretable gene-biomarker associations with functionally informed kernel-based tests in 190,000 exomes. Nat Commun 2022; 13:5332. [PMID: 36088354 PMCID: PMC9464252 DOI: 10.1038/s41467-022-32864-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 08/22/2022] [Indexed: 12/05/2022] Open
Abstract
Here we present an exome-wide rare genetic variant association study for 30 blood biomarkers in 191,971 individuals in the UK Biobank. We compare gene-based association tests for separate functional variant categories to increase interpretability and identify 193 significant gene-biomarker associations. Genes associated with biomarkers were ~ 4.5-fold enriched for conferring Mendelian disorders. In addition to performing weighted gene-based variant collapsing tests, we design and apply variant-category-specific kernel-based tests that integrate quantitative functional variant effect predictions for missense variants, splicing and the binding of RNA-binding proteins. For these tests, we present a computationally efficient combination of the likelihood-ratio and score tests that found 36% more associations than the score test alone while also controlling the type-1 error. Kernel-based tests identified 13% more associations than their gene-based collapsing counterparts and had advantages in the presence of gain of function missense variants. We introduce local collapsing by amino acid position for missense variants and use it to interpret associations and identify potential novel gain of function variants in PIEZO1. Our results show the benefits of investigating different functional mechanisms when performing rare-variant association tests, and demonstrate pervasive rare-variant contribution to biomarker variability.
Collapse
|
22
|
Alharbi WS, Rashid M. A review of deep learning applications in human genomics using next-generation sequencing data. Hum Genomics 2022; 16:26. [PMID: 35879805 PMCID: PMC9317091 DOI: 10.1186/s40246-022-00396-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 07/12/2022] [Indexed: 12/02/2022] Open
Abstract
Genomics is advancing towards data-driven science. Through the advent of high-throughput data generating technologies in human genomics, we are overwhelmed with the heap of genomic data. To extract knowledge and pattern out of this genomic data, artificial intelligence especially deep learning methods has been instrumental. In the current review, we address development and application of deep learning methods/models in different subarea of human genomics. We assessed over- and under-charted area of genomics by deep learning techniques. Deep learning algorithms underlying the genomic tools have been discussed briefly in later part of this review. Finally, we discussed briefly about the late application of deep learning tools in genomic. Conclusively, this review is timely for biotechnology or genomic scientists in order to guide them why, when and how to use deep learning methods to analyse human genomic data.
Collapse
Affiliation(s)
- Wardah S Alharbi
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia
| | - Mamoon Rashid
- Department of AI and Bioinformatics, King Abdullah International Medical Research Center (KAIMRC), King Saud Bin Abdulaziz University for Health Sciences (KSAU-HS), King Abdulaziz Medical City, Ministry of National Guard Health Affairs, P.O. Box 22490, Riyadh, 11426, Saudi Arabia.
| |
Collapse
|
23
|
Yang M, Huang L, Huang H, Tang H, Zhang N, Yang H, Wu J, Mu F. Integrating convolution and self-attention improves language model of human genome for interpreting non-coding regions at base-resolution. Nucleic Acids Res 2022; 50:e81. [PMID: 35536244 PMCID: PMC9371931 DOI: 10.1093/nar/gkac326] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Revised: 02/22/2022] [Accepted: 05/09/2022] [Indexed: 12/12/2022] Open
Abstract
Interpretation of non-coding genome remains an unsolved challenge in human genetics due to impracticality of exhaustively annotating biochemically active elements in all conditions. Deep learning based computational approaches emerge recently to help interpret non-coding regions. Here, we present LOGO (Language of Genome), a self-attention based contextualized pre-trained language model containing only two self-attention layers with 1 million parameters as a substantially light architecture that applies self-supervision techniques to learn bidirectional representations of the unlabelled human reference genome. LOGO is then fine-tuned for sequence labelling task, and further extended to variant prioritization task via a special input encoding scheme of alternative alleles followed by adding a convolutional module. Experiments show that LOGO achieves 15% absolute improvement for promoter identification and up to 4.5% absolute improvement for enhancer-promoter interaction prediction. LOGO exhibits state-of-the-art multi-task predictive power on thousands of chromatin features with only 3% parameterization benchmarking against the fully supervised model, DeepSEA and 1% parameterization against a recent BERT-based DNA language model. For allelic-effect prediction, locality introduced by one dimensional convolution shows improved sensitivity and specificity for prioritizing non-coding variants associated with human diseases. In addition, we apply LOGO to interpret type 2 diabetes (T2D) GWAS signals and infer underlying regulatory mechanisms. We make a conceptual analogy between natural language and human genome and demonstrate LOGO is an accurate, fast, scalable, and robust framework to interpret non-coding regions for global sequence labeling as well as for variant prioritization at base-resolution.
Collapse
Affiliation(s)
- Meng Yang
- MGI, BGI-Shenzhen, Shenzhen 518083, China.,Department of Biology, University of Copenhagen, Copenhagen DK-2200, Denmark
| | | | | | - Hui Tang
- MGI, BGI-Shenzhen, Shenzhen 518083, China
| | - Nan Zhang
- MGI, BGI-Shenzhen, Shenzhen 518083, China
| | - Huanming Yang
- BGI-Shenzhen, Shenzhen 518083, China.,Guangdong Provincial Academician Workstation of BGI Synthetic Genomics, BGI-Shenzhen, Shenzhen, 518120, China
| | - Jihong Wu
- Department of Ophthalmology, Eye & ENT Hospital, Shanghai Medical College, Fudan University, Shanghai, China.,Shanghai Key Laboratory of Visual Impairment and Restoration, Science and Technology Commission of Shanghai Municipality, Shanghai, China.,Key Laboratory of Myopia (Fudan University), Chinese Academy of Medical Sciences, National Health Commission, Shanghai, China
| | - Feng Mu
- MGI, BGI-Shenzhen, Shenzhen 518083, China
| |
Collapse
|
24
|
Pseudotime Analysis Reveals Exponential Trends in DNA Methylation Aging with Mortality Associated Timescales. Cells 2022; 11:cells11050767. [PMID: 35269389 PMCID: PMC8909670 DOI: 10.3390/cells11050767] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Revised: 02/08/2022] [Accepted: 02/14/2022] [Indexed: 01/27/2023] Open
Abstract
The epigenetic trajectory of DNA methylation profiles has a nonlinear relationship with time, reflecting rapid changes in DNA methylation early in life that progressively slow with age. In this study, we use pseudotime analysis to determine the functional form of these trajectories. Unlike epigenetic clocks that constrain the functional form of methylation changes with time, pseudotime analysis orders samples along a path, based on similarities in a latent dimension, to provide an unbiased trajectory. We show that pseudotime analysis can be applied to DNA methylation in human blood and brain tissue and find that it is highly correlated with the epigenetic states described by the Epigenetic Pacemaker. Moreover, we show that the pseudotime trajectory can be modeled with respect to time, using a sum of two exponentials, with coefficients that are close to the timescales of human age-associated mortality. Thus, for the first time, we can identify age-associated molecular changes that appear to track the exponential dynamics of mortality risk.
Collapse
|
25
|
Capturing SNP Association across the NK Receptor and HLA Gene Regions in Multiple Sclerosis by Targeted Penalised Regression Models. Genes (Basel) 2021; 13:genes13010087. [PMID: 35052430 PMCID: PMC8774935 DOI: 10.3390/genes13010087] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Revised: 12/22/2021] [Accepted: 12/24/2021] [Indexed: 11/17/2022] Open
Abstract
Conventional genome-wide association studies (GWASs) of complex traits, such as Multiple Sclerosis (MS), are reliant on per-SNP p-values and are therefore heavily burdened by multiple testing correction. Thus, in order to detect more subtle alterations, ever increasing sample sizes are required, while ignoring potentially valuable information that is readily available in existing datasets. To overcome this, we used penalised regression incorporating elastic net with a stability selection method by iterative subsampling to detect the potential interaction of loci with MS risk. Through re-analysis of the ANZgene dataset (1617 cases and 1988 controls) and an IMSGC dataset as a replication cohort (1313 cases and 1458 controls), we identified new association signals for MS predisposition, including SNPs above and below conventional significance thresholds while targeting two natural killer receptor loci and the well-established HLA loci. For example, rs2844482 (98.1% iterations), otherwise ignored by conventional statistics (p = 0.673) in the same dataset, was independently strongly associated with MS in another GWAS that required more than 40 times the number of cases (~45 K). Further comparison of our hits to those present in a large-scale meta-analysis, confirmed that the majority of SNPs identified by the elastic net model reached conventional statistical GWAS thresholds (p < 5 × 10−8) in this much larger dataset. Moreover, we found that gene variants involved in oxidative stress, in addition to innate immunity, were associated with MS. Overall, this study highlights the benefit of using more advanced statistical methods to (re-)analyse subtle genetic variation among loci that have a biological basis for their contribution to disease risk.
Collapse
|
26
|
Musolf AM, Holzinger ER, Malley JD, Bailey-Wilson JE. What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics. Hum Genet 2021; 141:1515-1528. [PMID: 34862561 PMCID: PMC9360120 DOI: 10.1007/s00439-021-02402-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 11/08/2021] [Indexed: 01/26/2023]
Abstract
Genetic data have become increasingly complex within the past decade, leading researchers to pursue increasingly complex questions, such as those involving epistatic interactions and protein prediction. Traditional methods are ill-suited to answer these questions, but machine learning (ML) techniques offer an alternative solution. ML algorithms are commonly used in genetics to predict or classify subjects, but some methods evaluate which features (variables) are responsible for creating a good prediction; this is called feature importance. This is critical in genetics, as researchers are often interested in which features (e.g., SNP genotype or environmental exposure) are responsible for a good prediction. This allows for the deeper analysis beyond simple prediction, including the determination of risk factors associated with a given phenotype. Feature importance further permits the researcher to peer inside the black box of many ML algorithms to see how they work and which features are critical in informing a good prediction. This review focuses on ML methods that provide feature importance metrics for the analysis of genetic data. Five major categories of ML algorithms: k nearest neighbors, artificial neural networks, deep learning, support vector machines, and random forests are described. The review ends with a discussion of how to choose the best machine for a data set. This review will be particularly useful for genetic researchers looking to use ML methods to answer questions beyond basic prediction and classification.
Collapse
Affiliation(s)
- Anthony M Musolf
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA
| | - Emily R Holzinger
- Target Sciences, Informatics and Predictive Sciences, Bristol Myers Squibb, Cambridge, MA, USA
| | - James D Malley
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA
| | - Joan E Bailey-Wilson
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA.
| |
Collapse
|
27
|
Brain Immunoinformatics: A Symmetrical Link between Informatics, Wet Lab and the Clinic. Symmetry (Basel) 2021. [DOI: 10.3390/sym13112168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Breakthrough advances in informatics over the last decade have thoroughly influenced the field of immunology. The intermingling of machine learning with wet lab applications and clinical results has hatched the newly defined immunoinformatics society. Immunoinformatics of the central neural system, referred to as neuroimmunoinformatics (NII), investigates symmetrical and asymmetrical interactions of the brain-immune interface. This interdisciplinary overview on NII is addressed to bioscientists and computer scientists. We delineate the dominating trajectories and field-shaping achievements and elaborate on future directions using bridging language and terminology. Computation, varying from linear modeling to complex deep learning approaches, fuels neuroimmunology through three core directions. Firstly, by providing big-data analysis software for high-throughput methods such as next-generation sequencing and genome-wide association studies. Secondly, by designing models for the prediction of protein morphology, functions, and symmetrical and asymmetrical protein–protein interactions. Finally, NII boosts the output of quantitative pathology by enabling the automatization of tedious processes such as cell counting, tracing, and arbor analysis. The new classification of microglia, the brain’s innate immune cells, was an NII achievement. Deep sequencing classifies microglia in “sensotypes” to accurately describe the versatility of immune responses to physiological and pathological challenges, as well as to experimental conditions such as xenografting and organoids. NII approaches complex tasks in the brain-immune interface, recognizes patterns and allows for hypothesis-free predictions with ultimate targeted individualized treatment strategies, and personalizes disease prognosis and treatment response.
Collapse
|
28
|
Dalvie S, Chatzinakos C, Al Zoubi O, Georgiadis F, Lancashire L, Daskalakis NP. From genetics to systems biology of stress-related mental disorders. Neurobiol Stress 2021; 15:100393. [PMID: 34584908 PMCID: PMC8456113 DOI: 10.1016/j.ynstr.2021.100393] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 07/22/2021] [Accepted: 09/08/2021] [Indexed: 01/20/2023] Open
Abstract
Many individuals will be exposed to some form of traumatic stress in their lifetime which, in turn, increases the likelihood of developing stress-related disorders such as post-traumatic stress disorder (PTSD), major depressive disorder (MDD) and anxiety disorders (ANX). The development of these disorders is also influenced by genetics and have heritability estimates ranging between ∼30 and 70%. In this review, we provide an overview of the findings of genome-wide association studies for PTSD, depression and ANX, and we observe a clear genetic overlap between these three diagnostic categories. We go on to highlight the results from transcriptomic and epigenomic studies, and, given the multifactorial nature of stress-related disorders, we provide an overview of the gene-environment studies that have been conducted to date. Finally, we discuss systems biology approaches that are now seeing wider utility in determining a more holistic view of these complex disorders.
Collapse
Affiliation(s)
- Shareefa Dalvie
- South African Medical Research Council (SAMRC), Unit on Risk & Resilience in Mental Disorders, Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
- South African Medical Research Council (SAMRC), Unit on Child & Adolescent Health, Department of Paediatrics and Child Health, University of Cape Town, Cape Town, South Africa
| | - Chris Chatzinakos
- Department of Psychiatry, McLean Hospital, Harvard Medical School, Belmont, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, USA
| | - Obada Al Zoubi
- Department of Psychiatry, McLean Hospital, Harvard Medical School, Belmont, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, USA
| | - Foivos Georgiadis
- Department of Psychiatry, McLean Hospital, Harvard Medical School, Belmont, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, USA
| | | | - Lee Lancashire
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, USA
- Department of Data Science, Cohen Veterans Bioscience, New York, USA
| | - Nikolaos P. Daskalakis
- Department of Psychiatry, McLean Hospital, Harvard Medical School, Belmont, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, USA
| |
Collapse
|
29
|
Molecular Classification and Interpretation of Amyotrophic Lateral Sclerosis Using Deep Convolution Neural Networks and Shapley Values. Genes (Basel) 2021; 12:genes12111754. [PMID: 34828360 PMCID: PMC8626003 DOI: 10.3390/genes12111754] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Revised: 10/23/2021] [Accepted: 10/23/2021] [Indexed: 11/17/2022] Open
Abstract
Amyotrophic lateral sclerosis (ALS) is a prototypical neurodegenerative disease characterized by progressive degeneration of motor neurons to severely effect the functionality to control voluntary muscle movement. Most of the non-additive genetic aberrations responsible for ALS make its molecular classification very challenging along with limited sample size, curse of dimensionality, class imbalance and noise in the data. Deep learning methods have been successful in many other related areas but have low minority class accuracy and suffer from the lack of explainability when used directly with RNA expression features for ALS molecular classification. In this paper, we propose a deep-learning-based molecular ALS classification and interpretation framework. Our framework is based on training a convolution neural network (CNN) on images obtained from converting RNA expression values into pixels based on DeepInsight similarity technique. Then, we employed Shapley additive explanations (SHAP) to extract pixels with higher relevance to ALS classifications. These pixels were mapped back to the genes which made them up. This enabled us to classify ALS samples with high accuracy for a minority class along with identifying genes that might be playing an important role in ALS molecular classifications. Taken together with RNA expression images classified with CNN, our preliminary analysis of the genes identified by SHAP interpretation demonstrate the value of utilizing Machine Learning to perform molecular classification of ALS and uncover disease-associated genes.
Collapse
|
30
|
Kendall KM, Van Assche E, Andlauer TFM, Choi KW, Luykx JJ, Schulte EC, Lu Y. The genetic basis of major depression. Psychol Med 2021; 51:2217-2230. [PMID: 33682643 DOI: 10.1017/s0033291721000441] [Citation(s) in RCA: 57] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Major depressive disorder (MDD) is a common, debilitating, phenotypically heterogeneous disorder with heritability ranges from 30% to 50%. Compared to other psychiatric disorders, its high prevalence, moderate heritability, and strong polygenicity have posed major challenges for gene-mapping in MDD. Studies of common genetic variation in MDD, driven by large international collaborations such as the Psychiatric Genomics Consortium, have confirmed the highly polygenic nature of the disorder and implicated over 100 genetic risk loci to date. Rare copy number variants associated with MDD risk were also recently identified. The goal of this review is to present a broad picture of our current understanding of the epidemiology, genetic epidemiology, molecular genetics, and gene-environment interplay in MDD. Insights into the impact of genetic factors on the aetiology of this complex disorder hold great promise for improving clinical care.
Collapse
Affiliation(s)
- K M Kendall
- MRC Centre for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, UK
| | - E Van Assche
- Department of Psychiatry, University of Muenster, Muenster, Germany
| | - T F M Andlauer
- Department of Neurology, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, Munich, Germany
| | - K W Choi
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA02114, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA02114, USA
- Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA02115, USA
| | - J J Luykx
- Department of Psychiatry, UMC Utrecht Brain Center, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Department of Translational Neuroscience, UMC Utrecht Brain Center, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
- Outpatient Second Opinion Clinic, GGNet Mental Health, Warnsveld, The Netherlands
| | - E C Schulte
- Institute of Psychiatric Phenomics and Genomics (IPPG), University Hospital, LMU Munich, Munich, Germany
- Department of Psychiatry and Psychotherapy, University Hospital, LMU Munich, Munich, Germany
| | - Y Lu
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
31
|
Quantitative neurogenetics: applications in understanding disease. Biochem Soc Trans 2021; 49:1621-1631. [PMID: 34282824 DOI: 10.1042/bst20200732] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/11/2021] [Accepted: 06/21/2021] [Indexed: 12/31/2022]
Abstract
Neurodevelopmental and neurodegenerative disorders (NNDs) are a group of conditions with a broad range of core and co-morbidities, associated with dysfunction of the central nervous system. Improvements in high throughput sequencing have led to the detection of putative risk genetic loci for NNDs, however, quantitative neurogenetic approaches need to be further developed in order to establish causality and underlying molecular genetic mechanisms of pathogenesis. Here, we discuss an approach for prioritizing the contribution of genetic risk loci to complex-NND pathogenesis by estimating the possible impacts of these loci on gene regulation. Furthermore, we highlight the use of a tissue-specificity gene expression index and the application of artificial intelligence (AI) to improve the interpretation of the role of genetic risk elements in NND pathogenesis. Given that NND symptoms are associated with brain dysfunction, risk loci with direct, causative actions would comprise genes with essential functions in neural cells that are highly expressed in the brain. Indeed, NND risk genes implicated in brain dysfunction are disproportionately enriched in the brain compared with other tissues, which we refer to as brain-specific expressed genes. In addition, the tissue-specificity gene expression index can be used as a handle to identify non-brain contexts that are involved in NND pathogenesis. Lastly, we discuss how using an AI approach provides the opportunity to integrate the biological impacts of risk loci to identify those putative combinations of causative relationships through which genetic factors contribute to NND pathogenesis.
Collapse
|
32
|
Song S, Shan N, Wang G, Yan X, Liu JS, Hou L. Openness Weighted Association Studies: Leveraging Personal Genome Information to Prioritize Noncoding Variants. Bioinformatics 2021; 37:4737-4743. [PMID: 34260700 PMCID: PMC8665759 DOI: 10.1093/bioinformatics/btab514] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 06/16/2021] [Accepted: 07/07/2021] [Indexed: 11/15/2022] Open
Abstract
Motivation Identification and interpretation of non-coding variations that affect disease risk remain a paramount challenge in genome-wide association studies (GWAS) of complex diseases. Experimental efforts have provided comprehensive annotations of functional elements in the human genome. On the other hand, advances in computational biology, especially machine learning approaches, have facilitated accurate predictions of cell-type-specific functional annotations. Integrating functional annotations with GWAS signals has advanced the understanding of disease mechanisms. In previous studies, functional annotations were treated as static of a genomic region, ignoring potential functional differences imposed by different genotypes across individuals. Results We develop a computational approach, Openness Weighted Association Studies (OWAS), to leverage and aggregate predictions of chromosome accessibility in personal genomes for prioritizing GWAS signals. The approach relies on an analytical expression we derived for identifying disease associated genomic segments whose effects in the etiology of complex diseases are evaluated. In extensive simulations and real data analysis, OWAS identifies genes/segments that explain more heritability than existing methods, and has a better replication rate in independent cohorts than GWAS. Moreover, the identified genes/segments show tissue-specific patterns and are enriched in disease relevant pathways. We use rheumatic arthritis and asthma as examples to demonstrate how OWAS can be exploited to provide novel insights on complex diseases. Availability and implementation The R package OWAS that implements our method is available at https://github.com/shuangsong0110/OWAS. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shuang Song
- Center for Statistical Science, Tsinghua University, Beijing, China.,Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Nayang Shan
- Center for Statistical Science, Tsinghua University, Beijing, China.,Department of Industrial Engineering, Tsinghua University, Beijing, China
| | - Geng Wang
- University of Queensland Diamantina Institute, University of Queensland, Brisbane, Australia
| | - Xiting Yan
- Section of Pulmonary, Critical Care, and Sleep Medicine, Yale School of Medicine, New Haven, Connecticut, USA.,Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, USA
| | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, MA, 02138, USA
| | - Lin Hou
- Center for Statistical Science, Tsinghua University, Beijing, China.,Department of Industrial Engineering, Tsinghua University, Beijing, China.,MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
| |
Collapse
|
33
|
Lin E, Kuo PH, Lin WY, Liu YL, Yang AC, Tsai SJ. Prediction of Probable Major Depressive Disorder in the Taiwan Biobank: An Integrated Machine Learning and Genome-Wide Analysis Approach. J Pers Med 2021; 11:597. [PMID: 34202750 PMCID: PMC8308113 DOI: 10.3390/jpm11070597] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Revised: 06/14/2021] [Accepted: 06/22/2021] [Indexed: 12/16/2022] Open
Abstract
In light of recent advancements in machine learning, personalized medicine using predictive algorithms serves as an essential paradigmatic methodology. Our goal was to explore an integrated machine learning and genome-wide analysis approach which targets the prediction of probable major depressive disorder (MDD) using 9828 individuals in the Taiwan Biobank. In our analysis, we reported a genome-wide significant association with probable MDD that has not been previously identified: FBN1 on chromosome 15. Furthermore, we pinpointed 17 single nucleotide polymorphisms (SNPs) which show evidence of both associations with probable MDD and potential roles as expression quantitative trait loci (eQTLs). To predict the status of probable MDD, we established prediction models with random undersampling and synthetic minority oversampling using 17 eQTL SNPs and eight clinical variables. We utilized five state-of-the-art models: logistic ridge regression, support vector machine, C4.5 decision tree, LogitBoost, and random forests. Our data revealed that random forests had the highest performance (area under curve = 0.8905 ± 0.0088; repeated 10-fold cross-validation) among the predictive algorithms to infer complex correlations between biomarkers and probable MDD. Our study suggests that an integrated machine learning and genome-wide analysis approach may offer an advantageous method to establish bioinformatics tools for discriminating MDD patients from healthy controls.
Collapse
Affiliation(s)
- Eugene Lin
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
- Department of Electrical & Computer Engineering, University of Washington, Seattle, WA 98195, USA
- Graduate Institute of Biomedical Sciences, China Medical University, Taichung 40402, Taiwan
| | - Po-Hsiu Kuo
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei 10617, Taiwan; (P.-H.K.); (W.-Y.L.)
| | - Wan-Yu Lin
- Department of Public Health, Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei 10617, Taiwan; (P.-H.K.); (W.-Y.L.)
| | - Yu-Li Liu
- Center for Neuropsychiatric Research, National Health Research Institutes, Miaoli County 35053, Taiwan;
| | - Albert C. Yang
- Division of Interdisciplinary Medicine and Biotechnology, Beth Israel Deaconess Medical Center/Harvard Medical School, Boston, MA 02215, USA;
- Institute of Brain Science, National Yang Ming Chiao Tung University, Taipei 112304, Taiwan
| | - Shih-Jen Tsai
- Department of Psychiatry, Taipei Veterans General Hospital, Taipei 11217, Taiwan
- Division of Psychiatry, National Yang Ming Chiao Tung University, Taipei 112304, Taiwan
| |
Collapse
|
34
|
Asada K, Kaneko S, Takasawa K, Machino H, Takahashi S, Shinkai N, Shimoyama R, Komatsu M, Hamamoto R. Integrated Analysis of Whole Genome and Epigenome Data Using Machine Learning Technology: Toward the Establishment of Precision Oncology. Front Oncol 2021; 11:666937. [PMID: 34055633 PMCID: PMC8149908 DOI: 10.3389/fonc.2021.666937] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 04/26/2021] [Indexed: 12/17/2022] Open
Abstract
With the completion of the International Human Genome Project, we have entered what is known as the post-genome era, and efforts to apply genomic information to medicine have become more active. In particular, with the announcement of the Precision Medicine Initiative by U.S. President Barack Obama in his State of the Union address at the beginning of 2015, "precision medicine," which aims to divide patients and potential patients into subgroups with respect to disease susceptibility, has become the focus of worldwide attention. The field of oncology is also actively adopting the precision oncology approach, which is based on molecular profiling, such as genomic information, to select the appropriate treatment. However, the current precision oncology is dominated by a method called targeted-gene panel (TGP), which uses next-generation sequencing (NGS) to analyze a limited number of specific cancer-related genes and suggest optimal treatments, but this method causes the problem that the number of patients who benefit from it is limited. In order to steadily develop precision oncology, it is necessary to integrate and analyze more detailed omics data, such as whole genome data and epigenome data. On the other hand, with the advancement of analysis technologies such as NGS, the amount of data obtained by omics analysis has become enormous, and artificial intelligence (AI) technologies, mainly machine learning (ML) technologies, are being actively used to make more efficient and accurate predictions. In this review, we will focus on whole genome sequencing (WGS) analysis and epigenome analysis, introduce the latest results of omics analysis using ML technologies for the development of precision oncology, and discuss the future prospects.
Collapse
Affiliation(s)
- Ken Asada
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Syuzo Kaneko
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ken Takasawa
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Hidenori Machino
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Satoshi Takahashi
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Norio Shinkai
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| | - Ryo Shimoyama
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Masaaki Komatsu
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
| | - Ryuji Hamamoto
- Cancer Translational Research Team, RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
- Division of Medical AI Research and Development, National Cancer Center Research Institute, Tokyo, Japan
- Department of NCC Cancer Science, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, Japan
| |
Collapse
|
35
|
Hartmann M, Fenton N, Dobson R. Current review and next steps for artificial intelligence in multiple sclerosis risk research. Comput Biol Med 2021; 132:104337. [PMID: 33773193 DOI: 10.1016/j.compbiomed.2021.104337] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 03/09/2021] [Accepted: 03/10/2021] [Indexed: 12/30/2022]
Abstract
In the last few decades, the prevalence of multiple sclerosis (MS), a chronic inflammatory disease of the nervous system, has increased, particularly in Northern European countries, the United States, and United Kingdom. The promise of artificial intelligence (AI) and machine learning (ML) as tools to address problems in MS research has attracted increasing interest in these methods. Bayesian networks offer a clear advantage since they can integrate data and causal knowledge allowing for visualizing interactions between dependent variables and potential confounding factors. A review of AI/ML research methods applied to MS found 216 papers using terms "Multiple Sclerosis", "machine learning", "artificial intelligence", "Bayes", and "Bayesian", of which 90 were relevant and recently published. More than half of these involve the detection and segmentation of MS lesions for quantitative analysis; however clinical and lifestyle risk factor assessment and prediction have largely been ignored. Of those that address risk factors, most provide only association studies for some factors and often fail to include the potential impact of confounding factors and bias (especially where these have causal explanations) that could affect data interpretation, such as reporting quality and medical care access in various countries. To address these gaps in the literature, we propose a causal Bayesian network approach to assessing risk factors for MS, which can address deficiencies in current epidemiological methods of producing risk measurements and makes better use of observational data.
Collapse
Affiliation(s)
- Morghan Hartmann
- Risk and Information Management Research Group, School of Electronic Engineering and Computer Science, Queen Mary University of London, London, E1 4NS, UK.
| | - Norman Fenton
- Risk and Information Management Research Group, School of Electronic Engineering and Computer Science, Queen Mary University of London, London, E1 4NS, UK
| | - Ruth Dobson
- Preventive Neurology Unit, Wolfson Institute of Preventive Medicine, Queen Mary University of London, London, E1 4NS, UK
| |
Collapse
|
36
|
Bhattacharya A, Li Y, Love MI. MOSTWAS: Multi-Omic Strategies for Transcriptome-Wide Association Studies. PLoS Genet 2021; 17:e1009398. [PMID: 33684137 PMCID: PMC7971899 DOI: 10.1371/journal.pgen.1009398] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2020] [Revised: 03/18/2021] [Accepted: 02/04/2021] [Indexed: 02/06/2023] Open
Abstract
Traditional predictive models for transcriptome-wide association studies (TWAS) consider only single nucleotide polymorphisms (SNPs) local to genes of interest and perform parameter shrinkage with a regularization process. These approaches ignore the effect of distal-SNPs or other molecular effects underlying the SNP-gene association. Here, we outline multi-omics strategies for transcriptome imputation from germline genetics to allow more powerful testing of gene-trait associations by prioritizing distal-SNPs to the gene of interest. In one extension, we identify mediating biomarkers (CpG sites, microRNAs, and transcription factors) highly associated with gene expression and train predictive models for these mediators using their local SNPs. Imputed values for mediators are then incorporated into the final predictive model of gene expression, along with local SNPs. In the second extension, we assess distal-eQTLs (SNPs associated with genes not in a local window around it) for their mediation effect through mediating biomarkers local to these distal-eSNPs. Distal-eSNPs with large indirect mediation effects are then included in the transcriptomic prediction model with the local SNPs around the gene of interest. Using simulations and real data from ROS/MAP brain tissue and TCGA breast tumors, we show considerable gains of percent variance explained (1-2% additive increase) of gene expression and TWAS power to detect gene-trait associations. This integrative approach to transcriptome-wide imputation and association studies aids in identifying the complex interactions underlying genetic regulation within a tissue and important risk genes for various traits and disorders.
Collapse
Affiliation(s)
- Arjun Bhattacharya
- Department of Pathology and Laboratory Medicine, University of California-Los Angeles, Los Angeles, California, United States of America
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Michael I. Love
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
37
|
Cai N, Choi KW, Fried EI. Reviewing the genetics of heterogeneity in depression: operationalizations, manifestations and etiologies. Hum Mol Genet 2020; 29:R10-R18. [PMID: 32568380 PMCID: PMC7530517 DOI: 10.1093/hmg/ddaa115] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2020] [Revised: 06/05/2020] [Accepted: 06/08/2020] [Indexed: 02/06/2023] Open
Abstract
With progress in genome-wide association studies of depression, from identifying zero hits in ~16 000 individuals in 2013 to 223 hits in more than a million individuals in 2020, understanding the genetic architecture of this debilitating condition no longer appears to be an impossible task. The pressing question now is whether recently discovered variants describe the etiology of a single disease entity. There are a myriad of ways to measure and operationalize depression severity, and major depressive disorder as defined in the Diagnostic and Statistical Manual of Mental Disorders-5 can manifest in more than 10 000 ways based on symptom profiles alone. Variations in developmental timing, comorbidity and environmental contexts across individuals and samples further add to the heterogeneity. With big data increasingly enabling genomic discovery in psychiatry, it is more timely than ever to explicitly disentangle genetic contributions to what is likely 'depressions' rather than depression. Here, we introduce three sources of heterogeneity: operationalization, manifestation and etiology. We review recent efforts to identify depression subtypes using clinical and data-driven approaches, examine differences in genetic architecture of depression across contexts, and argue that heterogeneity in operationalizations of depression is likely a considerable source of inconsistency. Finally, we offer recommendations and considerations for the field going forward.
Collapse
Affiliation(s)
- Na Cai
- Helmholtz Pioneer Campus, Helmholtz Zentrum München, Neuherberg 85764, Germany
| | - Karmel W Choi
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA 02114, USA
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
- Stanley Center for Psychiatric Research, Broad Institute, Boston, MA 02142, USA
| | - Eiko I Fried
- Department of Psychology, Leiden University, Leiden 2333 AK, Netherlands
| |
Collapse
|
38
|
Sealfon RSG, Mariani LH, Kretzler M, Troyanskaya OG. Machine learning, the kidney, and genotype-phenotype analysis. Kidney Int 2020; 97:1141-1149. [PMID: 32359808 PMCID: PMC8048707 DOI: 10.1016/j.kint.2020.02.028] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Revised: 01/13/2020] [Accepted: 02/06/2020] [Indexed: 01/23/2023]
Abstract
With biomedical research transitioning into data-rich science, machine learning provides a powerful toolkit for extracting knowledge from large-scale biological data sets. The increasing availability of comprehensive kidney omics compendia (transcriptomics, proteomics, metabolomics, and genome sequencing), as well as other data modalities such as electronic health records, digital nephropathology repositories, and radiology renal images, makes machine learning approaches increasingly essential for analyzing human kidney data sets. Here, we discuss how machine learning approaches can be applied to the study of kidney disease, with a particular focus on how they can be used for understanding the relationship between genotype and phenotype.
Collapse
Affiliation(s)
- Rachel S G Sealfon
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, New York, USA
| | - Laura H Mariani
- Division of Nephrology, University of Michigan, Ann Arbor, Michigan, USA
| | - Matthias Kretzler
- Division of Nephrology, University of Michigan, Ann Arbor, Michigan, USA.
| | - Olga G Troyanskaya
- Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, New York, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; Department of Computer Science, Princeton University, Princeton, New Jersey, USA.
| |
Collapse
|