1
|
Pospiech M, Beckford J, Kumar AMS, Tamizharasan M, Brito J, Liang G, Mangul S, Alachkar H. The DNA methylation landscape across the TCR loci in patients with acute myeloid leukemia. Int Immunopharmacol 2024; 138:112376. [PMID: 38917523 DOI: 10.1016/j.intimp.2024.112376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 05/09/2024] [Accepted: 05/28/2024] [Indexed: 06/27/2024]
Abstract
The capacity of T cells to initiate anti-leukemia immune responses is determined by the ability of their receptors (TCRs) to recognize leukemia neoantigens. Epigenetic mechanisms including DNA methylation contribute to shaping the TCR repertoire composition and diversity. The DNA hypomethylating agents (HMAs) have been widely used in the treatment of acute myeloid leukemia (AML) and myelodysplastic syndrome (MDS). Whether DNA HMAs directly influence TCR gene loci methylation patterns remains unknown. By analyzing public datasets, we compared methylation patterns across TCR loci in AML patients and healthy controls. We also explored how HMAs influence TCR loci DNA methylation in patients with AML. While methylation patterns are largely conserved across the TCR loci, certain V genes exhibit high interindividual variability. Although overall methylation levels within the TCR loci did not show significant differences, specific sites, including 32 TRAV and 12 TRBV sites exhibited distinct methylation patterns when comparing T cells from healthy donors to those from patients with AML. In leukemic cells, decitabine treatment demethylates sites across the TRAV and TRBV genes. While not as significant, a similar pattern of demethylation is observed in T cells. Pretreatment AML samples exhibit higher methylation beta values in differentially methylated positions (DMPs) compared with non-DMPs. Methylation levels of certain TRAV and TRBV genes in leukemic cells are associated with patients' risk status. The presence of disease specific TCR loci methylated signatures that are associated with clinical outcome presents an opportunity for therapeutic intervention. HMAs can modulate the TCR loci methylation patterns, yet whether they could reprogram the TCR repertoire composition remains to be explored.
Collapse
Affiliation(s)
- Mateusz Pospiech
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America
| | - John Beckford
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America
| | - Advaith Maya Sanjeev Kumar
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America; Department of Computer Science, University of Southern California, Los Angeles, CA, the United States of America
| | - Mukund Tamizharasan
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America; Department of Computer Science, University of Southern California, Los Angeles, CA, the United States of America
| | - Jaqueline Brito
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America
| | - Gangning Liang
- Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, the United States of America
| | - Serghei Mangul
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America
| | - Houda Alachkar
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America.
| |
Collapse
|
2
|
Bunyavanich S, Becker PM, Altman MC, Lasky-Su J, Ober C, Zengler K, Berdyshev E, Bonneau R, Chatila T, Chatterjee N, Chung KF, Cutcliffe C, Davidson W, Dong G, Fang G, Fulkerson P, Himes BE, Liang L, Mathias RA, Ogino S, Petrosino J, Price ND, Schadt E, Schofield J, Seibold MA, Steen H, Wheatley L, Zhang H, Togias A, Hasegawa K. Analytical challenges in omics research on asthma and allergy: A National Institute of Allergy and Infectious Diseases workshop. J Allergy Clin Immunol 2024; 153:954-968. [PMID: 38295882 PMCID: PMC10999353 DOI: 10.1016/j.jaci.2024.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/19/2024] [Accepted: 01/24/2024] [Indexed: 02/29/2024]
Abstract
Studies of asthma and allergy are generating increasing volumes of omics data for analysis and interpretation. The National Institute of Allergy and Infectious Diseases (NIAID) assembled a workshop comprising investigators studying asthma and allergic diseases using omics approaches, omics investigators from outside the field, and NIAID medical and scientific officers to discuss the following areas in asthma and allergy research: genomics, epigenomics, transcriptomics, microbiomics, metabolomics, proteomics, lipidomics, integrative omics, systems biology, and causal inference. Current states of the art, present challenges, novel and emerging strategies, and priorities for progress were presented and discussed for each area. This workshop report summarizes the major points and conclusions from this NIAID workshop. As a group, the investigators underscored the imperatives for rigorous analytic frameworks, integration of different omics data types, cross-disciplinary interaction, strategies for overcoming current limitations, and the overarching goal to improve scientific understanding and care of asthma and allergic diseases.
Collapse
Affiliation(s)
| | - Patrice M Becker
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Md
| | | | - Jessica Lasky-Su
- Brigham & Women's Hospital and Harvard Medical School, Boston, Mass
| | | | | | | | | | - Talal Chatila
- Boston Children's Hospital and Harvard Medical School, Boston, Mass
| | | | | | | | - Wendy Davidson
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Md
| | - Gang Dong
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Md
| | - Gang Fang
- Icahn School of Medicine at Mount Sinai, New York, NY
| | - Patricia Fulkerson
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Md
| | | | - Liming Liang
- Harvard T. H. Chan School of Public Health, Boston, Mass
| | | | - Shuji Ogino
- Brigham & Women's Hospital and Harvard Medical School, Boston, Mass; Harvard T. H. Chan School of Public Health, Boston, Mass; Broad Institute of MIT and Harvard, Boston, Mass
| | | | | | - Eric Schadt
- Icahn School of Medicine at Mount Sinai, New York, NY
| | | | - Max A Seibold
- National Jewish Health, Denver, Colo; University of Colorado School of Medicine, Aurora, Colo
| | - Hanno Steen
- Boston Children's Hospital and Harvard Medical School, Boston, Mass
| | - Lisa Wheatley
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Md
| | - Hongmei Zhang
- School of Public Health, University of Memphis, Memphis, Tenn
| | - Alkis Togias
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Md
| | - Kohei Hasegawa
- Massachusetts General Hospital and Harvard Medical School, Boston, Mass
| |
Collapse
|
3
|
Kim ME, Gao C, Cai LY, Yang Q, Newlin NR, Ramadass K, Jefferson A, Archer D, Shashikumar N, Pechman KR, Gifford KA, Hohman TJ, Beason-Held LL, Resnick SM, Winzeck S, Schilling KG, Zhang P, Moyer D, Landman BA. Empirical assessment of the assumptions of ComBat with diffusion tensor imaging. J Med Imaging (Bellingham) 2024; 11:024011. [PMID: 38655188 PMCID: PMC11034156 DOI: 10.1117/1.jmi.11.2.024011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 02/28/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024] Open
Abstract
Purpose Diffusion tensor imaging (DTI) is a magnetic resonance imaging technique that provides unique information about white matter microstructure in the brain but is susceptible to confounding effects introduced by scanner or acquisition differences. ComBat is a leading approach for addressing these site biases. However, despite its frequent use for harmonization, ComBat's robustness toward site dissimilarities and overall cohort size have not yet been evaluated in terms of DTI. Approach As a baseline, we match N = 358 participants from two sites to create a "silver standard" that simulates a cohort for multi-site harmonization. Across sites, we harmonize mean fractional anisotropy and mean diffusivity, calculated using participant DTI data, for the regions of interest defined by the JHU EVE-Type III atlas. We bootstrap 10 iterations at 19 levels of total sample size, 10 levels of sample size imbalance between sites, and 6 levels of mean age difference between sites to quantify (i) β AGE , the linear regression coefficient of the relationship between FA and age; (ii) γ ^ s f * , the ComBat-estimated site-shift; and (iii) δ ^ s f * , the ComBat-estimated site-scaling. We characterize the reliability of ComBat by evaluating the root mean squared error in these three metrics and examine if there is a correlation between the reliability of ComBat and a violation of assumptions. Results ComBat remains well behaved for β AGE when N > 162 and when the mean age difference is less than 4 years. The assumptions of the ComBat model regarding the normality of residual distributions are not violated as the model becomes unstable. Conclusion Prior to harmonization of DTI data with ComBat, the input cohort should be examined for size and covariate distributions of each site. Direct assessment of residual distributions is less informative on stability than bootstrap analysis. We caution use ComBat of in situations that do not conform to the above thresholds.
Collapse
Affiliation(s)
- Michael E. Kim
- Vanderbilt University, Department of Computer Science, Nashville, Tennessee, United States
| | - Chenyu Gao
- Vanderbilt University, Department of Electrical Engineering, Nashville, Tennessee, United States
| | - Leon Y. Cai
- Vanderbilt University, Department of Biomedical Engineering, Nashville, Tennessee, United States
- Vanderbilt University, Medical Scientist Training Program, Nashville, Tennessee, United States
| | - Qi Yang
- Vanderbilt University, Department of Computer Science, Nashville, Tennessee, United States
| | - Nancy R. Newlin
- Vanderbilt University, Department of Computer Science, Nashville, Tennessee, United States
| | - Karthik Ramadass
- Vanderbilt University, Department of Computer Science, Nashville, Tennessee, United States
- Vanderbilt University, Department of Electrical Engineering, Nashville, Tennessee, United States
| | - Angela Jefferson
- Vanderbilt University Medical Center, Vanderbilt Memory and Alzheimer’s Center, Nashville, Tennessee, United States
- Vanderbilt University Medical Center, Department of Medicine, Nashville, Tennessee, United States
- Vanderbilt University Medical Center, Department of Neurology, Nashville, Tennessee, United States
| | - Derek Archer
- Vanderbilt University Medical Center, Vanderbilt Memory and Alzheimer’s Center, Nashville, Tennessee, United States
- Vanderbilt University Medical Center, Vanderbilt Genetics Institute, Nashville, Tennessee, United States
| | - Niranjana Shashikumar
- Vanderbilt University Medical Center, Vanderbilt Memory and Alzheimer’s Center, Nashville, Tennessee, United States
| | - Kimberly R. Pechman
- Vanderbilt University Medical Center, Vanderbilt Memory and Alzheimer’s Center, Nashville, Tennessee, United States
| | - Katherine A. Gifford
- Vanderbilt University Medical Center, Vanderbilt Memory and Alzheimer’s Center, Nashville, Tennessee, United States
| | - Timothy J. Hohman
- Vanderbilt University Medical Center, Vanderbilt Memory and Alzheimer’s Center, Nashville, Tennessee, United States
- Vanderbilt University Medical Center, Vanderbilt Genetics Institute, Nashville, Tennessee, United States
| | - Lori L. Beason-Held
- National Institutes of Health, National Institute on Aging, Laboratory of Behavioral Neuroscience, Baltimore, Maryland, United States
| | - Susan M. Resnick
- National Institutes of Health, National Institute on Aging, Laboratory of Behavioral Neuroscience, Baltimore, Maryland, United States
| | - Stefan Winzeck
- Imperial College London, Department of Computing, BioMedIA Group, London, United Kingdom
| | - Kurt G. Schilling
- Vanderbilt University Medical Center, Department of Radiology, Nashville, Tennessee, United States
| | - Panpan Zhang
- Vanderbilt University Medical Center, Vanderbilt Memory and Alzheimer’s Center, Nashville, Tennessee, United States
- Vanderbilt University Medical Center, Department of Biostatistics, Nashville, Tennessee, United States
| | - Daniel Moyer
- Vanderbilt University, Department of Computer Science, Nashville, Tennessee, United States
| | - Bennett A. Landman
- Vanderbilt University, Department of Computer Science, Nashville, Tennessee, United States
- Vanderbilt University, Department of Electrical Engineering, Nashville, Tennessee, United States
- Vanderbilt University, Department of Biomedical Engineering, Nashville, Tennessee, United States
- Vanderbilt University Medical Center, Department of Biostatistics, Nashville, Tennessee, United States
- Vanderbilt University Institute of Imaging Science, Nashville, Tennessee, United States
| |
Collapse
|
4
|
Belov V, Erwin-Grabner T, Aghajani M, Aleman A, Amod AR, Basgoze Z, Benedetti F, Besteher B, Bülow R, Ching CRK, Connolly CG, Cullen K, Davey CG, Dima D, Dols A, Evans JW, Fu CHY, Gonul AS, Gotlib IH, Grabe HJ, Groenewold N, Hamilton JP, Harrison BJ, Ho TC, Mwangi B, Jaworska N, Jahanshad N, Klimes-Dougan B, Koopowitz SM, Lancaster T, Li M, Linden DEJ, MacMaster FP, Mehler DMA, Melloni E, Mueller BA, Ojha A, Oudega ML, Penninx BWJH, Poletti S, Pomarol-Clotet E, Portella MJ, Pozzi E, Reneman L, Sacchet MD, Sämann PG, Schrantee A, Sim K, Soares JC, Stein DJ, Thomopoulos SI, Uyar-Demir A, van der Wee NJA, van der Werff SJA, Völzke H, Whittle S, Wittfeld K, Wright MJ, Wu MJ, Yang TT, Zarate C, Veltman DJ, Schmaal L, Thompson PM, Goya-Maldonado R. Multi-site benchmark classification of major depressive disorder using machine learning on cortical and subcortical measures. Sci Rep 2024; 14:1084. [PMID: 38212349 PMCID: PMC10784593 DOI: 10.1038/s41598-023-47934-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2023] [Accepted: 11/19/2023] [Indexed: 01/13/2024] Open
Abstract
Machine learning (ML) techniques have gained popularity in the neuroimaging field due to their potential for classifying neuropsychiatric disorders. However, the diagnostic predictive power of the existing algorithms has been limited by small sample sizes, lack of representativeness, data leakage, and/or overfitting. Here, we overcome these limitations with the largest multi-site sample size to date (N = 5365) to provide a generalizable ML classification benchmark of major depressive disorder (MDD) using shallow linear and non-linear models. Leveraging brain measures from standardized ENIGMA analysis pipelines in FreeSurfer, we were able to classify MDD versus healthy controls (HC) with a balanced accuracy of around 62%. But after harmonizing the data, e.g., using ComBat, the balanced accuracy dropped to approximately 52%. Accuracy results close to random chance levels were also observed in stratified groups according to age of onset, antidepressant use, number of episodes and sex. Future studies incorporating higher dimensional brain imaging/phenotype features, and/or using more advanced machine and deep learning methods may yield more encouraging prospects.
Collapse
Affiliation(s)
- Vladimir Belov
- Laboratory of Systems Neuroscience and Imaging in Psychiatry (SNIP-Lab), Department of Psychiatry and Psychotherapy, University Medical Center Göttingen (UMG), Georg-August University, Von-Siebold-Str. 5, 37075, Göttingen, Germany
| | - Tracy Erwin-Grabner
- Laboratory of Systems Neuroscience and Imaging in Psychiatry (SNIP-Lab), Department of Psychiatry and Psychotherapy, University Medical Center Göttingen (UMG), Georg-August University, Von-Siebold-Str. 5, 37075, Göttingen, Germany
| | - Moji Aghajani
- Department of Psychiatry, Amsterdam UMC, Amsterdam Neuroscience, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
- Institute of Education and Child Studies, Section Forensic Family and Youth Care, Leiden University, Leiden, The Netherlands
| | - Andre Aleman
- Department of Biomedical Sciences of Cells and Systems, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Alyssa R Amod
- Department of Psychiatry and Mental Health, University of Cape Town, Cape Town, South Africa
| | - Zeynep Basgoze
- Department of Psychiatry and Behavioral Science, University of Minnesota Medical School, Minneapolis, MN, USA
| | - Francesco Benedetti
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Bianca Besteher
- Department of Psychiatry and Psychotherapy, Jena University Hospital, Jena, Germany
| | - Robin Bülow
- Institute for Radiology and Neuroradiology, University Medicine Greifswald, Greifswald, Germany
| | - Christopher R K Ching
- Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA, USA
| | - Colm G Connolly
- Department of Biomedical Sciences, Florida State University, Tallahassee, FL, USA
| | - Kathryn Cullen
- Department of Psychiatry and Behavioral Science, University of Minnesota Medical School, Minneapolis, MN, USA
| | - Christopher G Davey
- Melbourne Neuropsychiatry Centre, Department of Psychiatry, The University of Melbourne, Parkville, VIC, Australia
| | - Danai Dima
- Department of Psychology, School of Arts and Social Sciences, City, University of London, London, UK
- Department of Neuroimaging, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Annemiek Dols
- Department of Psychiatry, Amsterdam UMC, Amsterdam Neuroscience, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Jennifer W Evans
- Experimental Therapeutics and Pathophysiology Branch, National Institute for Mental Health, National Institutes of Health, Bethesda, MD, USA
| | - Cynthia H Y Fu
- School of Psychology, University of East London, London, UK
- Centre for Affective Disorders, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, UK
| | - Ali Saffet Gonul
- SoCAT Lab, Department of Psychiatry, School of Medicine, Ege University, Izmir, Turkey
| | - Ian H Gotlib
- Department of Psychology, Stanford University, Stanford, CA, USA
| | - Hans J Grabe
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, Germany
| | - Nynke Groenewold
- Department of Psychiatry and Mental Health, University of Cape Town, Cape Town, South Africa
| | - J Paul Hamilton
- Center for Social and Affective Neuroscience, Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
- Center for Medical Imaging and Visualization, Linköping University, Linköping, Sweden
| | - Ben J Harrison
- Melbourne Neuropsychiatry Centre, Department of Psychiatry, The University of Melbourne, Parkville, VIC, Australia
| | - Tiffany C Ho
- Department of Psychiatry and Behavioral Sciences, Division of Child and Adolescent Psychiatry, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
- Department of Psychology, University of California, Los Angeles, Los Angeles, CA, USA
| | - Benson Mwangi
- Louis A. Faillace, MD, Department of Psychiatry and Behavioral Sciences, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center Of Excellence On Mood Disorders, Louis A. Faillace, MD, Department of Psychiatry and Behavioral Sciences at McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Natalia Jaworska
- Department of Psychiatry, McGill University, Montreal, QC, Canada
| | - Neda Jahanshad
- Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA, USA
| | | | | | - Thomas Lancaster
- Cardiff University Brain Research Imaging Center, Cardiff University, Cardiff, UK
- MRC Center for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, UK
| | - Meng Li
- Department of Psychiatry and Psychotherapy, Jena University Hospital, Jena, Germany
| | - David E J Linden
- Cardiff University Brain Research Imaging Center, Cardiff University, Cardiff, UK
- MRC Center for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, UK
- Division of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK
- School of Mental Health and Neuroscience, Faculty of Health, Medicine and Life Sciences, Maastricht University, Maastricht, The Netherlands
| | - Frank P MacMaster
- Departments of Psychiatry and Pediatrics, University of Calgary, Calgary, AB, Canada
| | - David M A Mehler
- Cardiff University Brain Research Imaging Center, Cardiff University, Cardiff, UK
- MRC Center for Neuropsychiatric Genetics and Genomics, Cardiff University, Cardiff, UK
- Department of Psychiatry, Psychotherapy and Psychosomatics, Medical School, RWTH Aachen University, Aachen, Germany
| | - Elisa Melloni
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Bryon A Mueller
- Department of Psychiatry and Behavioral Science, University of Minnesota Medical School, Minneapolis, MN, USA
| | - Amar Ojha
- Center for Neuroscience, University of Pittsburgh, Pittsburgh, PA, USA
- Center for Neural Basis of Cognition, University of Pittsburgh, Pittsburgh, PA, USA
| | - Mardien L Oudega
- Department of Psychiatry, Amsterdam UMC, Amsterdam Neuroscience, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Brenda W J H Penninx
- Department of Psychiatry, Amsterdam UMC, Amsterdam Neuroscience, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Sara Poletti
- Division of Neuroscience, IRCCS San Raffaele Scientific Institute, Milan, Italy
| | - Edith Pomarol-Clotet
- FIDMAG Germanes Hospitalàries Research Foundation, Centro de Investigación Biomédica en Red de Salud Mental (CIBERSAM), Barcelona, Catalonia, Spain
| | - Maria J Portella
- Sant Pau Mental Health Research Group, Institut de Recerca de L'Hospital de La Santa Creu I Sant Pau, Barcelona, Catalonia, Spain
| | - Elena Pozzi
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia
- Orygen, Parkville, VIC, Australia
| | - Liesbeth Reneman
- Department of Radiology and Nuclear Medicine, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Matthew D Sacchet
- Meditation Research Program, Department of Psychiatry, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | | | - Anouk Schrantee
- Department of Radiology and Nuclear Medicine, Amsterdam University Medical Centers, Amsterdam, The Netherlands
| | - Kang Sim
- West Region, Institute of Mental Health, Singapore, Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
| | - Jair C Soares
- Center Of Excellence On Mood Disorders, Louis A. Faillace, MD, Department of Psychiatry and Behavioral Sciences at McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Dan J Stein
- SA MRC Research Unit on Risk and Resilience in Mental Disorders, Department of Psychiatry and Neuroscience Institute, University of Cape Town, Cape Town, South Africa
| | - Sophia I Thomopoulos
- Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA, USA
| | - Aslihan Uyar-Demir
- SoCAT Lab, Department of Psychiatry, School of Medicine, Ege University, Izmir, Turkey
| | - Nic J A van der Wee
- Leiden Institute for Brain and Cognition, Leiden University Medical Center, Leiden, The Netherlands
| | - Steven J A van der Werff
- Leiden Institute for Brain and Cognition, Leiden University Medical Center, Leiden, The Netherlands
- Department of Psychiatry, Leiden University Medical Center, Leiden, The Netherlands
| | - Henry Völzke
- Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
| | - Sarah Whittle
- Melbourne Neuropsychiatry Centre, Department of Psychiatry, The University of Melbourne and Melbourne Health, Melbourne, VIC, Australia
| | - Katharina Wittfeld
- Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, Germany
- German Center for Neurodegenerative Diseases (DZNE), Site Rostock/ Greifswald, Greifswald, Germany
| | - Margaret J Wright
- Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia
- Centre for Advanced Imaging, The University of Queensland, Brisbane, QLD, Australia
| | - Mon-Ju Wu
- Louis A. Faillace, MD, Department of Psychiatry and Behavioral Sciences, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center Of Excellence On Mood Disorders, Louis A. Faillace, MD, Department of Psychiatry and Behavioral Sciences at McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Tony T Yang
- Department of Psychiatry and Behavioral Sciences, Division of Child and Adolescent Psychiatry, Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA
| | - Carlos Zarate
- Section on the Neurobiology and Treatment of Mood Disorders, National Institute of Mental Health, Bethesda, MD, USA
| | - Dick J Veltman
- Department of Psychiatry, Amsterdam UMC, Amsterdam Neuroscience, Amsterdam Public Health Research Institute, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - Lianne Schmaal
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia
- Orygen, Parkville, VIC, Australia
| | - Paul M Thompson
- Imaging Genetics Center, Mark and Mary Stevens Neuroimaging and Informatics Institute, Keck School of Medicine, University of Southern California, Marina del Rey, CA, USA
| | - Roberto Goya-Maldonado
- Laboratory of Systems Neuroscience and Imaging in Psychiatry (SNIP-Lab), Department of Psychiatry and Psychotherapy, University Medical Center Göttingen (UMG), Georg-August University, Von-Siebold-Str. 5, 37075, Göttingen, Germany.
| |
Collapse
|
5
|
Yu Y, Zhang J, Zhan Y, Luo G. A novel method for detecting nine hotspot mutations of deafness genes in one tube. Sci Rep 2024; 14:454. [PMID: 38172427 PMCID: PMC10764868 DOI: 10.1038/s41598-023-50928-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/28/2023] [Indexed: 01/05/2024] Open
Abstract
Deafness is a common sensory disorder. In China, approximately 70% of hereditary deafness originates from four common deafness-causing genes: GJB2, SLC26A4, GJB3, and MT-RNR1. A single-tube rapid detection method based on 2D-PCR technology was established for nine mutation sites in the aforementioned genes, and Sanger sequencing was used to verify its reliability and accuracy. The frequency of hotspot mutations in deafness genes was analysed in 116 deaf students. 2D-PCR identified 27 genotypes of nine loci according to the melting curve of the FAM, HEX, and Alexa568 fluorescence channels. Of the 116 deaf patients, 12.9% (15/116) carried SLC26A4 mutations, including c.919-2A > G and c.2168A > G (allele frequencies, 7.3% and 2.2%, respectively). The positivity rate (29.3%; 34/116) was highest for GJB2 (allele frequency, 15.9% for c.235delC, 6.0% for c.299_300delAT, and 2.6% for c.176-191del16). Sanger sequencing confirmed the consistency of results between the detection methods based on 2D-PCR and DNA sequencing. Common pathogenic mutations in patients with non-syndromic deafness in Changzhou were concentrated in GJB2 (c.235delC, c.299_300delAT, and c.176-191del16) and SLC26A4 (c.919-2A > G and c.2168 A > G). 2D-PCR is an effective method for accurately and rapidly identifying deafness-related genotypes using a single-tube reaction, and is superior to DNA sequencing, which has a high cost and long cycle.
Collapse
Affiliation(s)
- Yang Yu
- Comprehensive Laboratory, The Third Affiliated Hospital of Soochow University, Changzhou, 213003, People's Republic of China
| | - Jun Zhang
- Comprehensive Laboratory, The Third Affiliated Hospital of Soochow University, Changzhou, 213003, People's Republic of China
| | - Yuxia Zhan
- Comprehensive Laboratory, The Third Affiliated Hospital of Soochow University, Changzhou, 213003, People's Republic of China
| | - Guanghua Luo
- Comprehensive Laboratory, The Third Affiliated Hospital of Soochow University, Changzhou, 213003, People's Republic of China.
| |
Collapse
|
6
|
Cavinato L, Massi MC, Sollini M, Kirienko M, Ieva F. Dual adversarial deconfounding autoencoder for joint batch-effects removal from multi-center and multi-scanner radiomics data. Sci Rep 2023; 13:18857. [PMID: 37914758 PMCID: PMC10620174 DOI: 10.1038/s41598-023-45983-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 10/26/2023] [Indexed: 11/03/2023] Open
Abstract
Medical imaging represents the primary tool for investigating and monitoring several diseases, including cancer. The advances in quantitative image analysis have developed towards the extraction of biomarkers able to support clinical decisions. To produce robust results, multi-center studies are often set up. However, the imaging information must be denoised from confounding factors-known as batch-effect-like scanner-specific and center-specific influences. Moreover, in non-solid cancers, like lymphomas, effective biomarkers require an imaging-based representation of the disease that accounts for its multi-site spreading over the patient's body. In this work, we address the dual-factor deconfusion problem and we propose a deconfusion algorithm to harmonize the imaging information of patients affected by Hodgkin Lymphoma in a multi-center setting. We show that the proposed model successfully denoises data from domain-specific variability (p-value < 0.001) while it coherently preserves the spatial relationship between imaging descriptions of peer lesions (p-value = 0), which is a strong prognostic biomarker for tumor heterogeneity assessment. This harmonization step allows to significantly improve the performance in prognostic models with respect to state-of-the-art methods, enabling building exhaustive patient representations and delivering more accurate analyses (p-values < 0.001 in training, p-values < 0.05 in testing). This work lays the groundwork for performing large-scale and reproducible analyses on multi-center data that are urgently needed to convey the translation of imaging-based biomarkers into the clinical practice as effective prognostic tools. The code is available on GitHub at this https://github.com/LaraCavinato/Dual-ADAE .
Collapse
Affiliation(s)
- Lara Cavinato
- MOX, Department of Mathematics, Politecnico di Milano, Piazza Leonardo da Vinci, 32, Milan, 20133, Italy.
| | - Michela Carlotta Massi
- Health Data Science Centre, Human Technopole, Viale Rita Levi-Montalcini, 1, Milan, 20157, Italy
| | - Martina Sollini
- Department of Biomedical Sciences, Humanitas University, Via Rita Levi Montalcini, 4, Pieve Emanuele, 20090, Italy
- Department of Nuclear Medicine, IRCCS Humanitas Research Hospital, Via Alessandro Manzoni, 56, Rozzano, 20089, Italy
| | - Margarita Kirienko
- Fondazione IRCCS Istituto Nazionale dei Tumori, Via Giacomo Venezian, 1, Milan, 20133, Italy
| | - Francesca Ieva
- MOX, Department of Mathematics, Politecnico di Milano, Piazza Leonardo da Vinci, 32, Milan, 20133, Italy
- Health Data Science Centre, Human Technopole, Viale Rita Levi-Montalcini, 1, Milan, 20157, Italy
| |
Collapse
|
7
|
Khan A, Inkster AM, Peñaherrera MS, King S, Kildea S, Oberlander TF, Olson DM, Vaillancourt C, Brain U, Beraldo EO, Beristain AG, Clifton VL, Del Gobbo GF, Lam WL, Metz GAS, Ng JWY, Price EM, Schuetz JM, Yuan V, Portales-Casamar É, Robinson WP. The application of epiphenotyping approaches to DNA methylation array studies of the human placenta. Epigenetics Chromatin 2023; 16:37. [PMID: 37794499 PMCID: PMC10548571 DOI: 10.1186/s13072-023-00507-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Accepted: 09/15/2023] [Indexed: 10/06/2023] Open
Abstract
BACKGROUND Genome-wide DNA methylation (DNAme) profiling of the placenta with Illumina Infinium Methylation bead arrays is often used to explore the connections between in utero exposures, placental pathology, and fetal development. However, many technical and biological factors can lead to signals of DNAme variation between samples and between cohorts, and understanding and accounting for these factors is essential to ensure meaningful and replicable data analysis. Recently, "epiphenotyping" approaches have been developed whereby DNAme data can be used to impute information about phenotypic variables such as gestational age, sex, cell composition, and ancestry. These epiphenotypes offer avenues to compare phenotypic data across cohorts, and to understand how phenotypic variables relate to DNAme variability. However, the relationships between placental epiphenotyping variables and other technical and biological variables, and their application to downstream epigenome analyses, have not been well studied. RESULTS Using DNAme data from 204 placentas across three cohorts, we applied the PlaNET R package to estimate epiphenotypes gestational age, ancestry, and cell composition in these samples. PlaNET ancestry estimates were highly correlated with independent polymorphic ancestry-informative markers, and epigenetic gestational age, on average, was estimated within 4 days of reported gestational age, underscoring the accuracy of these tools. Cell composition estimates varied both within and between cohorts, as well as over very long placental processing times. Interestingly, the ratio of cytotrophoblast to syncytiotrophoblast proportion decreased with increasing gestational age, and differed slightly by both maternal ethnicity (lower in white vs. non-white) and genetic ancestry (lower in higher probability European ancestry). The cohort of origin and cytotrophoblast proportion were the largest drivers of DNAme variation in this dataset, based on their associations with the first principal component. CONCLUSIONS This work confirms that cohort, array (technical) batch, cell type proportion, self-reported ethnicity, genetic ancestry, and biological sex are important variables to consider in any analyses of Illumina DNAme data. We further demonstrate the specific utility of epiphenotyping tools developed for use with placental DNAme data, and show that these variables (i) provide an independent check of clinically obtained data and (ii) provide a robust approach to compare variables across different datasets. Finally, we present a general framework for the processing and analysis of placental DNAme data, integrating the epiphenotype variables discussed here.
Collapse
Affiliation(s)
- A Khan
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, ON, M5G 1L7, Canada
- Princess Margaret Cancer Center, Toronto, ON, M5G 2C4, Canada
| | - A M Inkster
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
| | - M S Peñaherrera
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
| | - S King
- Department of Psychiatry, McGill University, Montreal, QC, H3A 1A1, Canada
- Psychosocial Research Division, Douglas Hospital Research Centre, Montreal, QC, H4H 1R3, Canada
| | - S Kildea
- Mater Research Institute, University of Queensland, Brisbane, QLD, 4101, Australia
- Molly Wardaguga Research Centre, Charles Darwin University, Brisbane, QLD, 4000, Australia
| | - T F Oberlander
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada
- School of Population and Public Health, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
- Department of Pediatrics, University of British Columbia, Vancouver, BC, V6H 3V4, Canada
| | - D M Olson
- Department of Obstetrics and Gynecology, University of Alberta, 220 HMRC, Edmonton, AB, T6G 2S2, Canada
| | - C Vaillancourt
- Centre Armand Frappier Santé Biotechnologie - INRS and University of Quebec Intersectorial Health Research Network, Laval, QC, H7V 1B7, Canada
| | - U Brain
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada
- School of Population and Public Health, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
- Department of Pediatrics, University of British Columbia, Vancouver, BC, V6H 3V4, Canada
| | - E O Beraldo
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
| | - A G Beristain
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada
- Department of Obstetrics & Gynecology, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
| | - V L Clifton
- Mater Research Institute, University of Queensland, Brisbane, QLD, 4101, Australia
- Faculty of Medicine, The University of Queensland, Herston, QLD, 4006, Australia
| | - G F Del Gobbo
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON, K1H 5B2, Canada
| | - W L Lam
- British Columbia Cancer Research Centre, Vancouver, BC, V5Z 1L3, Canada
| | - G A S Metz
- Canadian Centre for Behavioural Neuroscience, Department of Neuroscience, University of Lethbridge, Lethbridge, AB, T1K 3M4, Canada
| | - J W Y Ng
- Faculty of Medicine, University of Calgary, Calgary, AB, T2N 4N1, Canada
| | - E M Price
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
- Children's Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON, K1H 5B2, Canada
| | - J M Schuetz
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
| | - V Yuan
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada
| | - É Portales-Casamar
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada.
- Centre de Recherche du CHU Sainte-Justine, 3175 Côte-Sainte-Catherine Road, Montréal, QC, H3T 1C5, Canada.
| | - W P Robinson
- BC Children's Hospital Research Institute (BCCHR), 950 W 28th Ave, Vancouver, BC, V5Z 4H4, Canada.
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, V6H 3N1, Canada.
| |
Collapse
|
8
|
Hu F, Chen AA, Horng H, Bashyam V, Davatzikos C, Alexander-Bloch A, Li M, Shou H, Satterthwaite TD, Yu M, Shinohara RT. Image harmonization: A review of statistical and deep learning methods for removing batch effects and evaluation metrics for effective harmonization. Neuroimage 2023; 274:120125. [PMID: 37084926 PMCID: PMC10257347 DOI: 10.1016/j.neuroimage.2023.120125] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/12/2023] [Accepted: 04/19/2023] [Indexed: 04/23/2023] Open
Abstract
Magnetic resonance imaging and computed tomography from multiple batches (e.g. sites, scanners, datasets, etc.) are increasingly used alongside complex downstream analyses to obtain new insights into the human brain. However, significant confounding due to batch-related technical variation, called batch effects, is present in this data; direct application of downstream analyses to the data may lead to biased results. Image harmonization methods seek to remove these batch effects and enable increased generalizability and reproducibility of downstream results. In this review, we describe and categorize current approaches in statistical and deep learning harmonization methods. We also describe current evaluation metrics used to assess harmonization methods and provide a standardized framework to evaluate newly-proposed methods for effective harmonization and preservation of biological information. Finally, we provide recommendations to end-users to advocate for more effective use of current methods and to methodologists to direct future efforts and accelerate development of the field.
Collapse
Affiliation(s)
- Fengling Hu
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104, United States.
| | - Andrew A Chen
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104, United States
| | - Hannah Horng
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104, United States
| | - Vishnu Bashyam
- Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine, United States
| | - Christos Davatzikos
- Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine, United States
| | - Aaron Alexander-Bloch
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, United States; Penn-CHOP Lifespan Brain Institute, United States; Department of Child and Adolescent Psychiatry and Behavioral Science, Children's Hospital of Philadelphia, United States
| | - Mingyao Li
- Statistical Center for Single-Cell and Spatial Genomics, Perelman School of Medicine, University of Pennsylvania, United States
| | - Haochang Shou
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104, United States; Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine, United States
| | - Theodore D Satterthwaite
- Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, United States; Penn-CHOP Lifespan Brain Institute, United States; The Penn Lifespan Informatics and Neuroimaging Center, Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, United States
| | - Meichen Yu
- Indiana Alzheimer's Disease Research Center, Indiana University School of Medicine, United States
| | - Russell T Shinohara
- Penn Statistics in Imaging and Visualization Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104, United States; Center for Biomedical Image Computing and Analytics (CBICA), Perelman School of Medicine, United States
| |
Collapse
|
9
|
Stokes T, Cen HH, Kapranov P, Gallagher IJ, Pitsillides AA, Volmar C, Kraus WE, Johnson JD, Phillips SM, Wahlestedt C, Timmons JA. Transcriptomics for Clinical and Experimental Biology Research: Hang on a Seq. ADVANCED GENETICS (HOBOKEN, N.J.) 2023; 4:2200024. [PMID: 37288167 PMCID: PMC10242409 DOI: 10.1002/ggn2.202200024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Indexed: 06/09/2023]
Abstract
Sequencing the human genome empowers translational medicine, facilitating transcriptome-wide molecular diagnosis, pathway biology, and drug repositioning. Initially, microarrays are used to study the bulk transcriptome; but now short-read RNA sequencing (RNA-seq) predominates. Positioned as a superior technology, that makes the discovery of novel transcripts routine, most RNA-seq analyses are in fact modeled on the known transcriptome. Limitations of the RNA-seq methodology have emerged, while the design of, and the analysis strategies applied to, arrays have matured. An equitable comparison between these technologies is provided, highlighting advantages that modern arrays hold over RNA-seq. Array protocols more accurately quantify constitutively expressed protein coding genes across tissue replicates, and are more reliable for studying lower expressed genes. Arrays reveal long noncoding RNAs (lncRNA) are neither sparsely nor lower expressed than protein coding genes. Heterogeneous coverage of constitutively expressed genes observed with RNA-seq, undermines the validity and reproducibility of pathway analyses. The factors driving these observations, many of which are relevant to long-read or single-cell sequencing are discussed. As proposed herein, a reappreciation of bulk transcriptomic methods is required, including wider use of the modern high-density array data-to urgently revise existing anatomical RNA reference atlases and assist with more accurate study of lncRNAs.
Collapse
Affiliation(s)
- Tanner Stokes
- Faculty of ScienceMcMaster UniversityHamiltonL8S 4L8Canada
| | - Haoning Howard Cen
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | - Iain J Gallagher
- School of Applied SciencesEdinburgh Napier UniversityEdinburghEH11 4BNUK
| | | | | | | | - James D. Johnson
- Life Sciences InstituteUniversity of British ColumbiaVancouverV6T 1Z3Canada
| | | | | | - James A. Timmons
- Miller School of MedicineUniversity of MiamiMiamiFL33136USA
- William Harvey Research InstituteQueen Mary University LondonLondonEC1M 6BQUK
- Augur Precision Medicine LTDStirlingFK9 5NFUK
| |
Collapse
|
10
|
Krieger N, Chen JT, Testa C, Diez Roux A, Tilling K, Watkins S, Simpkin AJ, Suderman M, Davey Smith G, De Vivo I, Waterman PD, Relton C. Use of Correct and Incorrect Methods of Accounting for Age in Studies of Epigenetic Accelerated Aging: Implications and Recommendations for Best Practices. Am J Epidemiol 2023; 192:800-811. [PMID: 36721372 PMCID: PMC10160768 DOI: 10.1093/aje/kwad025] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Revised: 11/28/2022] [Accepted: 01/27/2023] [Indexed: 02/02/2023] Open
Abstract
Motivated by our conduct of a literature review on social exposures and accelerated aging as measured by a growing number of epigenetic "clocks" (which estimate age via DNA methylation (DNAm) patterns), we report on 3 different approaches in the epidemiologic literature-1 incorrect and 2 correct-on the treatment of age in these and other studies using other common exposures (i.e., body mass index and alcohol consumption). Among the 50 empirical articles reviewed, the majority (n = 29; 58%) used the incorrect method of analyzing accelerated aging detrended for age as the outcome and did not control for age as a covariate. By contrast, only 42% used correct methods, which are either to analyze accelerated aging detrended for age as the outcome and control for age as a covariate (n = 16; 32%) or to analyze raw DNAm age as the outcome and control for age as a covariate (n = 5; 10%). In accord with prior demonstrations of bias introduced by use of the incorrect approach, we provide simulation analyses and additional empirical analyses to illustrate how the incorrect method can lead to bias towards the null, and we discuss implications for extant research and recommendations for best practices.
Collapse
Affiliation(s)
- Nancy Krieger
- Correspondence to Dr. Nancy Krieger, Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Kresge 717, Boston, MA 02115 (e-mail: )
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Wang YW, Chen X, Yan CG. Comprehensive evaluation of harmonization on functional brain imaging for multisite data-fusion. Neuroimage 2023; 274:120089. [PMID: 37086875 DOI: 10.1016/j.neuroimage.2023.120089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 03/05/2023] [Accepted: 04/03/2023] [Indexed: 04/24/2023] Open
Abstract
To embrace big-data neuroimaging, harmonizing the site effect in resting-state functional magnetic resonance imaging (R-fMRI) data fusion is a fundamental challenge. A comprehensive evaluation of potentially effective harmonization strategies, particularly with specifically collected data, has been scarce, especially for R-fMRI metrics. Here, we comprehensively assess harmonization strategies from multiple perspectives, including tests on residual site effect, individual identification, test-retest reliability, and replicability of group-level statistical results, on widely used R-fMRI metrics across various datasets, including data obtained from participants with repetitive measures at different scanners. For individual identifiability (i.e., whether the same subject could be identified across R-fMRI data scanned across different sites), we found that, while most methods decreased site effects, the Subsampling Maximum-mean-distance based distribution shift correction Algorithm (SMA) and parametric unadjusted CovBat outperformed linear regression models, linear mixed models, ComBat series and invariant conditional variational auto-encoder in clustering accuracy. Test-retest reliability was better for SMA and parametric adjusted CovBat than unadjusted ComBat series and parametric unadjusted CovBat in the number of overlapped voxels. At the same time, SMA was superior to the latter in replicability in terms of the Dice coefficient and the scale of brain areas showing sex differences reproducibly observed across datasets. Furthermore, SMA better detected reproducible sex differences of ALFF under the site-sex confounded situation. Moreover, we designed experiments to identify the best target site features to optimize SMA identifiability, test-retest reliability, and stability. We noted both sample size and distribution of the target site matter and introduced a heuristic formula for selecting the target site. In addition to providing practical guidelines, this work can inform continuing improvements and innovations in harmonizing methodologies for big R-fMRI data.
Collapse
Affiliation(s)
- Yu-Wei Wang
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing 100101, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China; International Big-Data Center for Depression Research, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiao Chen
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing 100101, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China; International Big-Data Center for Depression Research, Chinese Academy of Sciences, Beijing 100101, China; Magnetic Resonance Imaging Research Center, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China
| | - Chao-Gan Yan
- CAS Key Laboratory of Behavioral Science, Institute of Psychology, Beijing 100101, China; Department of Psychology, University of Chinese Academy of Sciences, Beijing 100049, China; International Big-Data Center for Depression Research, Chinese Academy of Sciences, Beijing 100101, China; Magnetic Resonance Imaging Research Center, Institute of Psychology, Chinese Academy of Sciences, Beijing 100101, China..
| |
Collapse
|
12
|
Yosef A, Shnaider E, Schneider M, Gurevich M. Heuristic normalization procedure for batch effect correction. Soft comput 2023. [DOI: 10.1007/s00500-023-08049-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
13
|
Silveira PP, Meaney MJ. Examining the biological mechanisms of human mental disorders resulting from gene-environment interdependence using novel functional genomic approaches. Neurobiol Dis 2023; 178:106008. [PMID: 36690304 DOI: 10.1016/j.nbd.2023.106008] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Revised: 12/30/2022] [Accepted: 01/18/2023] [Indexed: 01/21/2023] Open
Abstract
We explore how functional genomics approaches that integrate datasets from human and non-human model systems can improve our understanding of the effect of gene-environment interplay on the risk for mental disorders. We start by briefly defining the G-E paradigm and its challenges and then discuss the different levels of regulation of gene expression and the corresponding data existing in humans (genome wide genotyping, transcriptomics, DNA methylation, chromatin modifications, chromosome conformational changes, non-coding RNAs, proteomics and metabolomics), discussing novel approaches to the application of these data in the study of the origins of mental health. Finally, we discuss the multilevel integration of diverse types of data. Advance in the use of functional genomics in the context of a G-E perspective improves the detection of vulnerabilities, informing the development of preventive and therapeutic interventions.
Collapse
Affiliation(s)
- Patrícia Pelufo Silveira
- Department of Psychiatry, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada; Ludmer Centre for Neuroinformatics and Mental Health, Douglas Mental Health University Institute, McGill University, Montreal, QC, Canada.
| | - Michael J Meaney
- Department of Psychiatry, Faculty of Medicine and Health Sciences, McGill University, Montreal, QC, Canada; Translational Neuroscience Program, Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (ASTAR), Singapore; Brain - Body Initiative, Agency for Science, Technology and Research (ASTAR), Singapore.
| |
Collapse
|
14
|
Fernández-Carrión R, Sorlí JV, Asensio EM, Pascual EC, Portolés O, Alvarez-Sala A, Francès F, Ramírez-Sabio JB, Pérez-Fidalgo A, Villamil LV, Tinahones FJ, Estruch R, Ordovas JM, Coltell O, Corella D. DNA-Methylation Signatures of Tobacco Smoking in a High Cardiovascular Risk Population: Modulation by the Mediterranean Diet. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3635. [PMID: 36834337 PMCID: PMC9964856 DOI: 10.3390/ijerph20043635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 02/15/2023] [Accepted: 02/16/2023] [Indexed: 06/18/2023]
Abstract
Biomarkers based on DNA methylation are relevant in the field of environmental health for precision health. Although tobacco smoking is one of the factors with a strong and consistent impact on DNA methylation, there are very few studies analyzing its methylation signature in southern European populations and none examining its modulation by the Mediterranean diet at the epigenome-wide level. We examined blood methylation smoking signatures on the EPIC 850 K array in this population (n = 414 high cardiovascular risk subjects). Epigenome-wide methylation studies (EWASs) were performed analyzing differential methylation CpG sites by smoking status (never, former, and current smokers) and the modulation by adherence to a Mediterranean diet score was explored. Gene-set enrichment analysis was performed for biological and functional interpretation. The predictive value of the top differentially methylated CpGs was analyzed using receiver operative curves. We characterized the DNA methylation signature of smoking in this Mediterranean population by identifying 46 differentially methylated CpGs at the EWAS level in the whole population. The strongest association was observed at the cg21566642 (p = 2.2 × 10-32) in the 2q37.1 region. We also detected other CpGs that have been consistently reported in prior research and discovered some novel differentially methylated CpG sites in subgroup analyses. In addition, we found distinct methylation profiles based on the adherence to the Mediterranean diet. Particularly, we obtained a significant interaction between smoking and diet modulating the cg5575921 methylation in the AHRR gene. In conclusion, we have characterized biomarkers of the methylation signature of tobacco smoking in this population, and suggest that the Mediterranean diet can increase methylation of certain hypomethylated sites.
Collapse
Affiliation(s)
- Rebeca Fernández-Carrión
- Department of Preventive Medicine and Public Health, School of Medicine, University of Valencia, 46010 Valencia, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición, Instituto de Salud Carlos III, 28029 Madrid, Spain
| | - José V. Sorlí
- Department of Preventive Medicine and Public Health, School of Medicine, University of Valencia, 46010 Valencia, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición, Instituto de Salud Carlos III, 28029 Madrid, Spain
| | - Eva M. Asensio
- Department of Preventive Medicine and Public Health, School of Medicine, University of Valencia, 46010 Valencia, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición, Instituto de Salud Carlos III, 28029 Madrid, Spain
| | - Eva C. Pascual
- Department of Preventive Medicine and Public Health, School of Medicine, University of Valencia, 46010 Valencia, Spain
| | - Olga Portolés
- Department of Preventive Medicine and Public Health, School of Medicine, University of Valencia, 46010 Valencia, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición, Instituto de Salud Carlos III, 28029 Madrid, Spain
| | - Andrea Alvarez-Sala
- Department of Preventive Medicine and Public Health, School of Medicine, University of Valencia, 46010 Valencia, Spain
| | - Francesc Francès
- Department of Preventive Medicine and Public Health, School of Medicine, University of Valencia, 46010 Valencia, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición, Instituto de Salud Carlos III, 28029 Madrid, Spain
| | | | - Alejandro Pérez-Fidalgo
- Department of Medical Oncology, University Clinic Hospital of Valencia, 46010 Valencia, Spain
- Biomedical Research Networking Centre on Cancer (CIBERONC), Health Institute Carlos III, 28029 Madrid, Spain
- INCLIVA Biomedical Research Institute, 46010 Valencia, Spain
| | - Laura V. Villamil
- Department of Physiology, School of Medicine, University Antonio Nariño, Bogotá 111511, Colombia
| | - Francisco J. Tinahones
- CIBER Fisiopatología de la Obesidad y Nutrición, Instituto de Salud Carlos III, 28029 Madrid, Spain
- Department of Endocrinology and Nutrition, Virgen de la Victoria University Hospital, Instituto de Investigación Biomédica de Málaga (IBIMA), University of Málaga, 29590 Málaga, Spain
| | - Ramon Estruch
- CIBER Fisiopatología de la Obesidad y Nutrición, Instituto de Salud Carlos III, 28029 Madrid, Spain
- Department of Internal Medicine, Institut d’Investigacions Biomèdiques August Pi Sunyer (IDIBAPS), Hospital Clinic, University of Barcelona, 08036 Barcelona, Spain
| | - Jose M. Ordovas
- CIBER Fisiopatología de la Obesidad y Nutrición, Instituto de Salud Carlos III, 28029 Madrid, Spain
- Nutrition and Genomics Laboratory, JM-USDA Human Nutrition Research Center on Aging, Tufts University, Boston, MA 02111, USA
- Nutritional Control of the Epigenome Group, Precision Nutrition and Obesity Program, IMDEA Food, UAM + CSIC, 28049 Madrid, Spain
| | - Oscar Coltell
- CIBER Fisiopatología de la Obesidad y Nutrición, Instituto de Salud Carlos III, 28029 Madrid, Spain
- Department of Computer Languages and Systems, Universitat Jaume I, 12071 Castellón, Spain
| | - Dolores Corella
- Department of Preventive Medicine and Public Health, School of Medicine, University of Valencia, 46010 Valencia, Spain
- CIBER Fisiopatología de la Obesidad y Nutrición, Instituto de Salud Carlos III, 28029 Madrid, Spain
| |
Collapse
|
15
|
D’Rozario R, Raychaudhuri D, Bandopadhyay P, Sarif J, Mehta P, Liu CSC, Sinha BP, Roy J, Bhaduri R, Das M, Bandyopadhyay S, Paul SR, Chatterjee S, Pandey R, Ray Y, Ganguly D. Circulating Interleukin-8 Dynamics Parallels Disease Course and Is Linked to Clinical Outcomes in Severe COVID-19. Viruses 2023; 15:v15020549. [PMID: 36851762 PMCID: PMC9960352 DOI: 10.3390/v15020549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 02/06/2023] [Accepted: 02/08/2023] [Indexed: 02/19/2023] Open
Abstract
Severe COVID-19 frequently features a systemic deluge of cytokines. Circulating cytokines that can stratify risks are useful for more effective triage and management. Here, we ran a machine-learning algorithm on a dataset of 36 plasma cytokines in a cohort of severe COVID-19 to identify cytokine/s useful for describing the dynamic clinical state in multiple regression analysis. We performed RNA-sequencing of circulating blood cells collected at different time-points. From a Bayesian Information Criterion analysis, a combination of interleukin-8 (IL-8), Eotaxin, and Interferon-γ (IFNγ) was found to be significantly linked to blood oxygenation over seven days. Individually testing the cytokines in receiver operator characteristics analyses identified IL-8 as a strong stratifier for clinical outcomes. Circulating IL-8 dynamics paralleled disease course. We also revealed key transitions in immune transcriptome in patients stratified for circulating IL-8 at three time-points. The study identifies plasma IL-8 as a key pathogenic cytokine linking systemic hyper-inflammation to the clinical outcomes in COVID-19.
Collapse
Affiliation(s)
- Ranit D’Rozario
- IICB-Translational Research Unit of Excellence, CSIR-Indian Institute of Chemical Biology, Kolkata 700091, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Deblina Raychaudhuri
- IICB-Translational Research Unit of Excellence, CSIR-Indian Institute of Chemical Biology, Kolkata 700091, India
| | - Purbita Bandopadhyay
- IICB-Translational Research Unit of Excellence, CSIR-Indian Institute of Chemical Biology, Kolkata 700091, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Jafar Sarif
- IICB-Translational Research Unit of Excellence, CSIR-Indian Institute of Chemical Biology, Kolkata 700091, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Priyanka Mehta
- INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology, New Delhi 110007, India
| | - Chinky Shiu Chen Liu
- IICB-Translational Research Unit of Excellence, CSIR-Indian Institute of Chemical Biology, Kolkata 700091, India
| | - Bishnu Prasad Sinha
- IICB-Translational Research Unit of Excellence, CSIR-Indian Institute of Chemical Biology, Kolkata 700091, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Jayasree Roy
- IICB-Translational Research Unit of Excellence, CSIR-Indian Institute of Chemical Biology, Kolkata 700091, India
| | | | - Monidipa Das
- Indian Statistical Institute, Kolkata 700108, India
| | | | | | - Shilpak Chatterjee
- IICB-Translational Research Unit of Excellence, CSIR-Indian Institute of Chemical Biology, Kolkata 700091, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
| | - Rajesh Pandey
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- INtegrative GENomics of HOst-PathogEn (INGEN-HOPE) Laboratory, CSIR-Institute of Genomics and Integrative Biology, New Delhi 110007, India
| | - Yogiraj Ray
- Department of Medicine, ID & BG Hospital, Kolkata 700010, India
- Department of Infectious Diseases, Institute of Postgraduate Medical Education and Research, Kolkata 700020, India
| | - Dipyaman Ganguly
- IICB-Translational Research Unit of Excellence, CSIR-Indian Institute of Chemical Biology, Kolkata 700091, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad 201002, India
- Correspondence:
| |
Collapse
|
16
|
A new blood based epigenetic age predictor for adolescents and young adults. Sci Rep 2023; 13:2303. [PMID: 36759656 PMCID: PMC9911637 DOI: 10.1038/s41598-023-29381-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Accepted: 02/03/2023] [Indexed: 02/11/2023] Open
Abstract
Children have special rights for protection compared to adults in our society. However, more than 1/4 of children globally have no documentation of their date of birth. Hence, there is a pressing need to develop biological methods for chronological age prediction, robust to differences in genetics, psychosocial events and physical living conditions. At present, DNA methylation is the most promising biological biomarker applied for age assessment. The human genome contains around 28 million DNA methylation sites, many of which change with age. Several epigenetic clocks accurately predict chronological age using methylation levels at age associated GpG-sites. However, variation in DNA methylation increases with age, and there is no epigenetic clock specifically designed for adolescents and young adults. Here we present a novel age Predictor for Adolescents and Young Adults (PAYA), using 267 CpG methylation sites to assess the chronological age of adolescents and young adults. We compared different preprocessing approaches and investigated the effect on prediction performance of the epigenetic clock. We evaluated performance using an independent validation data set consisting of 18-year-old individuals, where we obtained a median absolute deviation of just below 0.7 years. This tool may be helpful in age assessment of adolescents and young adults. However, there is a need to investigate the robustness of the age predictor across geographical and disease populations as well as environmental effects.
Collapse
|
17
|
Louise J, Deussen AR, Dodd JM. Data processing choices can affect findings in differential methylation analyses: an investigation using data from the LIMIT RCT. PeerJ 2023; 11:e14786. [PMID: 36755865 PMCID: PMC9901304 DOI: 10.7717/peerj.14786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 01/03/2023] [Indexed: 02/05/2023] Open
Abstract
Objective A wide array of methods exist for processing and analysing DNA methylation data. We aimed to perform a systematic comparison of the behaviour of these methods, using cord blood DNAm from the LIMIT RCT, in relation to detecting hypothesised effects of interest (intervention and pre-pregnancy maternal BMI) as well as effects known to be spurious, and known to be present. Methods DNAm data, from 645 cord blood samples analysed using Illumina 450K BeadChip arrays, were normalised using three different methods (with probe filtering undertaken pre- or post- normalisation). Batch effects were handled with a supervised algorithm, an unsupervised algorithm, or adjustment in the analysis model. Analysis was undertaken with and without adjustment for estimated cell type proportions. The effects estimated included intervention and BMI (effects of interest in the original study), infant sex and randomly assigned groups. Data processing and analysis methods were compared in relation to number and identity of differentially methylated probes, rankings of probes by p value and log-fold-change, and distributions of p values and log-fold-change estimates. Results There were differences corresponding to each of the processing and analysis choices. Importantly, some combinations of data processing choices resulted in a substantial number of spurious 'significant' findings. We recommend greater emphasis on replication and greater use of sensitivity analyses.
Collapse
Affiliation(s)
- Jennie Louise
- Discipline of Obstetrics & Gynaecology and The Robinson Research Institute, The University of Adelaide, Adelaide, Australia,Adelaide Health Technology Asseessment, The University of Adelaide, Adelaide, Australia
| | - Andrea R. Deussen
- Discipline of Obstetrics & Gynaecology and The Robinson Research Institute, The University of Adelaide, Adelaide, Australia
| | - Jodie M. Dodd
- Discipline of Obstetrics & Gynaecology and The Robinson Research Institute, The University of Adelaide, Adelaide, Australia,Department of Perinatal Medicine, Women’s and Babies Division, The Women’s and Children’s Hospital, Adelaide, South Australia, Australia
| |
Collapse
|
18
|
Inkster AM, Wong MT, Matthews AM, Brown CJ, Robinson WP. Who's afraid of the X? Incorporating the X and Y chromosomes into the analysis of DNA methylation array data. Epigenetics Chromatin 2023; 16:1. [PMID: 36609459 PMCID: PMC9825011 DOI: 10.1186/s13072-022-00477-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 12/27/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Many human disease phenotypes manifest differently by sex, making the development of methods for incorporating X and Y-chromosome data into analyses vital. Unfortunately, X and Y chromosome data are frequently excluded from large-scale analyses of the human genome and epigenome due to analytical complexity associated with sex chromosome dosage differences between XX and XY individuals, and the impact of X-chromosome inactivation (XCI) on the epigenome. As such, little attention has been given to considering the methods by which sex chromosome data may be included in analyses of DNA methylation (DNAme) array data. RESULTS With Illumina Infinium HumanMethylation450 DNAme array data from 634 placental samples, we investigated the effects of probe filtering, normalization, and batch correction on DNAme data from the X and Y chromosomes. Processing steps were evaluated in both mixed-sex and sex-stratified subsets of the analysis cohort to identify whether including both sexes impacted processing results. We found that identification of probes that have a high detection p-value, or that are non-variable, should be performed in sex-stratified data subsets to avoid over- and under-estimation of the quantity of probes eligible for removal, respectively. All normalization techniques investigated returned X and Y DNAme data that were highly correlated with the raw data from the same samples. We found no difference in batch correction results after application to mixed-sex or sex-stratified cohorts. Additionally, we identify two analytical methods suitable for XY chromosome data, the choice between which should be guided by the research question of interest, and we performed a proof-of-concept analysis studying differential DNAme on the X and Y chromosome in the context of placental acute chorioamnionitis. Finally, we provide an annotation of probe types that may be desirable to filter in X and Y chromosome analyses, including probes in repetitive elements, the X-transposed region, and cancer-testis gene promoters. CONCLUSION While there may be no single "best" approach for analyzing DNAme array data from the X and Y chromosome, analysts must consider key factors during processing and analysis of sex chromosome data to accommodate the underlying biology of these chromosomes, and the technical limitations of DNA methylation arrays.
Collapse
Affiliation(s)
- Amy M Inkster
- BC Children's Hospital Research Institute, 950 W 28th Ave, Vancouver, BC, V6H 3N1, Canada.
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1, Canada.
| | - Martin T Wong
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1, Canada
| | - Allison M Matthews
- BC Children's Hospital Research Institute, 950 W 28th Ave, Vancouver, BC, V6H 3N1, Canada
- Department of Pathology & Laboratory Medicine, University of British Columbia, 2211 Wesbrook Mall, Vancouver, V6T 1Z7, Canada
| | - Carolyn J Brown
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1, Canada
| | - Wendy P Robinson
- BC Children's Hospital Research Institute, 950 W 28th Ave, Vancouver, BC, V6H 3N1, Canada
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1, Canada
| |
Collapse
|
19
|
Yosef A, Shnaider E, Schneider M, Gurevich M. Normalization of Large-Scale Transcriptome Data Using Heuristic Methods. Bioinform Biol Insights 2023; 17:11779322231160397. [PMID: 37020503 PMCID: PMC10068970 DOI: 10.1177/11779322231160397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 02/09/2023] [Indexed: 04/03/2023] Open
Abstract
In this study, we introduce an artificial intelligent method for addressing the batch effect of a transcriptome data. The method has several clear advantages in comparison with the alternative methods presently in use. Batch effect refers to the discrepancy in gene expression data series, measured under different conditions. While the data from the same batch (measurements performed under the same conditions) are compatible, combining various batches into 1 data set is problematic because of incompatible measurements. Therefore, it is necessary to perform correction of the combined data (normalization), before performing biological analysis. There are numerous methods attempting to correct data set for batch effect. These methods rely on various assumptions regarding the distribution of the measurements. Forcing the data elements into pre-supposed distribution can severely distort biological signals, thus leading to incorrect results and conclusions. As the discrepancy between the assumptions regarding the data distribution and the actual distribution is wider, the biases introduced by such “correction methods” are greater. We introduce a heuristic method to reduce batch effect. The method does not rely on any assumptions regarding the distribution and the behavior of data elements. Hence, it does not introduce any new biases in the process of correcting the batch effect. It strictly maintains the integrity of measurements within the original batches.
Collapse
|
20
|
Environmental neuroscience linking exposome to brain structure and function underlying cognition and behavior. Mol Psychiatry 2023; 28:17-27. [PMID: 35790874 DOI: 10.1038/s41380-022-01669-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2021] [Revised: 06/02/2022] [Accepted: 06/09/2022] [Indexed: 01/07/2023]
Abstract
Individual differences in human brain structure, function, and behavior can be attributed to genetic variations, environmental exposures, and their interactions. Although genome-wide association studies have identified many genetic variants associated with brain imaging phenotypes, environmental exposures associated with these phenotypes remain largely unknown. Here, we propose that environmental neuroscience should pay more attention on exploring the associations between lifetime environmental exposures (exposome) and brain imaging phenotypes and identifying both cumulative environmental effects and their vulnerable age windows during the life course. Exposome-neuroimaging association studies face several challenges including the accurate measurement of the totality of environmental exposures varied in space and time, the highly correlated structure of the exposome, and the lack of standardized approaches for exposome-wide association studies. By agnostically scanning the effects of environmental exposures on brain imaging phenotypes and their interactions with genomic variations, exposome-neuroimaging association analyses will improve our understanding of causal factors associated with individual differences in brain structure and function as well as their relations with cognitive abilities and neuropsychiatric disorders.
Collapse
|
21
|
Hattaway ME, Black GP, Young TM. Batch correction methods for nontarget chemical analysis data: application to a municipal wastewater collection system. Anal Bioanal Chem 2023; 415:1321-1331. [PMID: 36627378 PMCID: PMC9928919 DOI: 10.1007/s00216-023-04511-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 12/08/2022] [Accepted: 01/02/2023] [Indexed: 01/12/2023]
Abstract
Nontarget chemical analysis using high-resolution mass spectrometry has increasingly been used to discern spatial patterns and temporal trends in anthropogenic chemical abundance in natural and engineered systems. A critical experimental design consideration in such applications, especially those monitoring complex matrices over long time periods, is a choice between analyzing samples in multiple batches as they are collected, or in one batch after all samples have been processed. While datasets acquired in multiple analytical batches can include the effects of instrumental variability over time, datasets acquired in a single batch risk compound degradation during sample storage. To assess the influence of batch effects on the analysis and interpretation of nontarget data, this study examined a set of 56 samples collected from a municipal wastewater system over 7 months. Each month's samples included 6 from sites within the collection system, one combined influent, and one treated effluent sample. Samples were analyzed using liquid chromatography high-resolution mass spectrometry in positive electrospray ionization mode in multiple batches as the samples were collected and in a single batch at the conclusion of the study. Data were aligned and normalized using internal standard scaling and ComBat, an empirical Bayes method developed for estimating and removing batch effects in microarrays. As judged by multiple lines of evidence, including comparing principal variance component analysis between single and multi-batch datasets and through patterns in principal components and hierarchical clustering analyses, ComBat appeared to significantly reduce the influence of batch effects. For this reason, we recommend the use of more, small batches with an appropriate batch correction step rather than acquisition in one large batch.
Collapse
Affiliation(s)
- Madison E. Hattaway
- grid.27860.3b0000 0004 1936 9684Department of Civil and Environmental Engineering, University of California, Davis, Davis, CA 95616 USA
| | - Gabrielle P. Black
- grid.27860.3b0000 0004 1936 9684Department of Civil and Environmental Engineering, University of California, Davis, Davis, CA 95616 USA
| | - Thomas M. Young
- grid.27860.3b0000 0004 1936 9684Department of Civil and Environmental Engineering, University of California, Davis, Davis, CA 95616 USA
| |
Collapse
|
22
|
Kalyakulina A, Yusipov I, Bacalini MG, Franceschi C, Vedunova M, Ivanchenko M. Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI. Gigascience 2022; 11:giac097. [PMID: 36259657 PMCID: PMC9718659 DOI: 10.1093/gigascience/giac097] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 08/01/2022] [Accepted: 09/15/2022] [Indexed: 07/25/2023] Open
Abstract
BACKGROUND DNA methylation has a significant effect on gene expression and can be associated with various diseases. Meta-analysis of available DNA methylation datasets requires development of a specific workflow for joint data processing. RESULTS We propose a comprehensive approach of combined DNA methylation datasets to classify controls and patients. The solution includes data harmonization, construction of machine learning classification models, dimensionality reduction of models, imputation of missing values, and explanation of model predictions by explainable artificial intelligence (XAI) algorithms. We show that harmonization can improve classification accuracy by up to 20% when preprocessing methods of the training and test datasets are different. The best accuracy results were obtained with tree ensembles, reaching above 95% for Parkinson's disease. Dimensionality reduction can substantially decrease the number of features, without detriment to the classification accuracy. The best imputation methods achieve almost the same classification accuracy for data with missing values as for the original data. XAI approaches have allowed us to explain model predictions from both populational and individual perspectives. CONCLUSIONS We propose a methodologically valid and comprehensive approach to the classification of healthy individuals and patients with various diseases based on whole-blood DNA methylation data using Parkinson's disease and schizophrenia as examples. The proposed algorithm works better for the former pathology, characterized by a complex set of symptoms. It allows to solve data harmonization problems for meta-analysis of many different datasets, impute missing values, and build classification models of small dimensionality.
Collapse
Affiliation(s)
- Alena Kalyakulina
- Correspondence author. Alena Kalyakulina, Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, Gagarin avenue 22, Nizhny Novgorod 603022, Russia. E-mail:
| | | | | | - Claudio Franceschi
- Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, 603022 Nizhny Novgorod, Russia
| | - Maria Vedunova
- Institute of Biology and Biomedicine, Lobachevsky State University, 603022 Nizhny Novgorod, Russia
| | - Mikhail Ivanchenko
- Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, 603022 Nizhny Novgorod, Russia
| |
Collapse
|
23
|
Adamer MF, Brüningk SC, Tejada-Arranz A, Estermann F, Basler M, Borgwardt K. reComBat: batch-effect removal in large-scale multi-source gene-expression data integration. BIOINFORMATICS ADVANCES 2022; 2:vbac071. [PMID: 36699372 PMCID: PMC9710604 DOI: 10.1093/bioadv/vbac071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/01/2022] [Accepted: 09/26/2022] [Indexed: 01/28/2023]
Abstract
Motivation With the steadily increasing abundance of omics data produced all over the world under vastly different experimental conditions residing in public databases, a crucial step in many data-driven bioinformatics applications is that of data integration. The challenge of batch-effect removal for entire databases lies in the large number of batches and biological variation, which can result in design matrix singularity. This problem can currently not be solved satisfactorily by any common batch-correction algorithm. Results We present reComBat, a regularized version of the empirical Bayes method to overcome this limitation and benchmark it against popular approaches for the harmonization of public gene-expression data (both microarray and bulkRNAsq) of the human opportunistic pathogen Pseudomonas aeruginosa. Batch-effects are successfully mitigated while biologically meaningful gene-expression variation is retained. reComBat fills the gap in batch-correction approaches applicable to large-scale, public omics databases and opens up new avenues for data-driven analysis of complex biological processes beyond the scope of a single study. Availability and implementation The code is available at https://github.com/BorgwardtLab/reComBat, all data and evaluation code can be found at https://github.com/BorgwardtLab/batchCorrectionPublicData. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | | | | | - Marek Basler
- Biozentrum, University of Basel, Basel 4056, Switzerland
| | - Karsten Borgwardt
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland,Swiss Institute for Bioinformatics (SIB), Lausanne 1015, Switzerland
| |
Collapse
|
24
|
Zhou Q, Zhu X, Li Y, Yang P, Wang S, Ning K, Chen S. Intestinal microbiome-mediated resistance against vibriosis for Cynoglossus semilaevis. MICROBIOME 2022; 10:153. [PMID: 36138436 PMCID: PMC9503257 DOI: 10.1186/s40168-022-01346-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Accepted: 08/10/2022] [Indexed: 06/06/2023]
Abstract
BACKGROUND Infectious diseases have caused huge economic loss and food security issues in fish aquaculture. Current management and breeding strategies heavily rely on the knowledge of regulative mechanisms underlying disease resistance. Though the intestinal microbial community was linked with disease infection, there is little knowledge about the roles of intestinal microbes in fish disease resistance. Cynoglossus semilaevis is an economically important and widely cultivated flatfish species in China. However, it suffers from outbreaks of vibriosis, which results in huge mortalities and economic loss. RESULTS Here, we used C. semilaevis as a research model to investigate the host-microbiome interactions in regulating vibriosis resistance. The resistance to vibriosis was reflected in intestinal microbiome on both taxonomic and functional levels. Such differences also influenced the host gene expressions in the resistant family. Moreover, the intestinal microbiome might control the host immunological homeostasis and inflammation to enhance vibriosis resistance through the microbe-intestine-immunity axis. For example, Phaeobacter regulated its hdhA gene and host cyp27a1 gene up-expressed in bile acid biosynthesis pathways, but regulated its trxA gene and host akt gene down-expressed in proinflammatory cytokines biosynthesis pathways, to reduce inflammation and resist disease infection in the resistant family. Furthermore, the combination of intestinal microbes and host genes as biomarkers could accurately differentiate resistant family from susceptible family. CONCLUSION Our study uncovered the regulatory patterns of the microbe-intestine-immunity axis that may contribute to vibriosis resistance in C. semilaevis. These findings could facilitate the disease control and selective breeding of superior germplasm with high disease resistance in fish aquaculture. Video Abstract.
Collapse
Affiliation(s)
- Qian Zhou
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences/Key Laboratory for Sustainable Development of Marine Fisheries, Ministry of Agriculture; Shandong Key Laboratory for Marine Fishery Biotechnology and Genetic Breeding; Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, 266071, Shandong, China
| | - Xue Zhu
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Yangzhen Li
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences/Key Laboratory for Sustainable Development of Marine Fisheries, Ministry of Agriculture; Shandong Key Laboratory for Marine Fishery Biotechnology and Genetic Breeding; Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, 266071, Shandong, China
| | - Pengshuo Yang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China
| | - Shengpeng Wang
- Dezhou Key Laboratory for Applied Bile Acid Research, Shandong Longchang Animal Health Product Co., Ltd., Qihe, Shandong Lachance Co., Ltd., Jinan, 251100, Shandong, China
| | - Kang Ning
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, Hubei, China.
| | - Songlin Chen
- Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences/Key Laboratory for Sustainable Development of Marine Fisheries, Ministry of Agriculture; Shandong Key Laboratory for Marine Fishery Biotechnology and Genetic Breeding; Laboratory for Marine Fisheries Science and Food Production Processes, Pilot National Laboratory for Marine Science and Technology (Qingdao), Qingdao, 266071, Shandong, China.
| |
Collapse
|
25
|
Arencibia A, Salazar LA. Microarray meta-analysis reveals IL6 and p38β/MAPK11 as potential targets of hsa-miR-124 in endothelial progenitor cells: Implications for stent re-endothelization in diabetic patients. Front Cardiovasc Med 2022; 9:964721. [PMID: 36176980 PMCID: PMC9513120 DOI: 10.3389/fcvm.2022.964721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 08/25/2022] [Indexed: 11/13/2022] Open
Abstract
Circulating endothelial progenitor cells (EPCs) play an important role in the repair processes of damaged vessels, favoring re-endothelization of stented vessels to minimize restenosis. EPCs number and function is diminished in patients with type 2 diabetes, a known risk factor for restenosis. Considering the impact of EPCs in vascular injury repair, we conducted a meta-analysis of microarray to assess the transcriptomic profile and determine target genes during the differentiation process of EPCs into mature ECs. Five microarray datasets, including 13 EPC and 12 EC samples were analyzed, using the online tool ExpressAnalyst. Differentially expressed genes (DEGs) analysis was done by Limma method, with an | log2FC| > 1 and FDR < 0.05. Combined p-value by Fisher exact method was computed for the intersection of datasets. There were 3,267 DEGs, 1,539 up-regulated and 1,728 down-regulated in EPCs, with 407 common DEGs in at least four datasets. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed enrichment for terms related to “AGE-RAGE signaling pathway in diabetic complications.” Intersection of common DEGs, KEGG pathways genes and genes in protein-protein interaction network (PPI) identified four key genes, two up-regulated (IL1B and STAT5A) and two down-regulated (IL6 and MAPK11). MicroRNA enrichment analysis of common DEGs depicted five hub microRNA targeting 175 DEGs, including STAT5A, IL6 and MAPK11, with hsa-miR-124 as common regulator. This group of genes and microRNAs could serve as biomarkers of EPCs differentiation during coronary stenting as well as potential therapeutic targets to improve stent re-endothelization, especially in diabetic patients.
Collapse
|
26
|
An L, Chen J, Chen P, Zhang C, He T, Chen C, Zhou JH, Yeo BTT. Goal-specific brain MRI harmonization. Neuroimage 2022; 263:119570. [PMID: 35987490 DOI: 10.1016/j.neuroimage.2022.119570] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 08/05/2022] [Accepted: 08/15/2022] [Indexed: 11/19/2022] Open
Abstract
There is significant interest in pooling magnetic resonance image (MRI) data from multiple datasets to enable mega-analysis. Harmonization is typically performed to reduce heterogeneity when pooling MRI data across datasets. Most MRI harmonization algorithms do not explicitly consider downstream application performance during harmonization. However, the choice of downstream application might influence what might be considered as study-specific confounds. Therefore, ignoring downstream applications during harmonization might potentially limit downstream performance. Here we propose a goal-specific harmonization framework that utilizes downstream application performance to regularize the harmonization procedure. Our framework can be integrated with a wide variety of harmonization models based on deep neural networks, such as the recently proposed conditional variational autoencoder (cVAE) harmonization model. Three datasets from three different continents with a total of 2787 participants and 10,085 anatomical T1 scans were used for evaluation. We found that cVAE removed more dataset differences than the widely used ComBat model, but at the expense of removing desirable biological information as measured by downstream prediction of mini mental state examination (MMSE) scores and clinical diagnoses. On the other hand, our goal-specific cVAE (gcVAE) was able to remove as much dataset differences as cVAE, while improving downstream cross-sectional prediction of MMSE scores and clinical diagnoses.
Collapse
Affiliation(s)
- Lijun An
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health and Institute for Digital Medicine (WisDM), National University of Singapore, Singapore
| | - Jianzhong Chen
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health and Institute for Digital Medicine (WisDM), National University of Singapore, Singapore
| | - Pansheng Chen
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health and Institute for Digital Medicine (WisDM), National University of Singapore, Singapore
| | - Chen Zhang
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health and Institute for Digital Medicine (WisDM), National University of Singapore, Singapore
| | - Tong He
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health and Institute for Digital Medicine (WisDM), National University of Singapore, Singapore
| | - Christopher Chen
- Department of Pharmacology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Juan Helen Zhou
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore
| | - B T Thomas Yeo
- Centre for Sleep and Cognition (CSC) & Centre for Translational Magnetic Resonance Research (TMR), Yong Loo Lin School of Medicine, National University of Singapore, Singapore; Department of Electrical and Computer Engineering, National University of Singapore, Singapore; N.1 Institute for Health and Institute for Digital Medicine (WisDM), National University of Singapore, Singapore; NUS Graduate School for Integrative Sciences and Engineering, National University of Singapore, Singapore; Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA, USA.
| |
Collapse
|
27
|
Ross JP, van Dijk S, Phang M, Skilton MR, Molloy PL, Oytam Y. Batch-effect detection, correction and characterisation in Illumina HumanMethylation450 and MethylationEPIC BeadChip array data. Clin Epigenetics 2022; 14:58. [PMID: 35488315 PMCID: PMC9055778 DOI: 10.1186/s13148-022-01277-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 04/10/2022] [Indexed: 11/20/2022] Open
Abstract
Background Genomic technologies can be subject to significant batch-effects which are known to reduce experimental power and to potentially create false positive results. The Illumina Infinium Methylation BeadChip is a popular technology choice for epigenome-wide association studies (EWAS), but presently, little is known about the nature of batch-effects on these designs. Given the subtlety of biological phenotypes in many EWAS, control for batch-effects should be a consideration.
Results Using the batch-effect removal approaches in the ComBat and Harman software, we examined two in-house datasets and compared results with three large publicly available datasets, (1214 HumanMethylation450 and 1094 MethylationEPIC BeadChips in total), and find that despite various forms of preprocessing, some batch-effects persist. This residual batch-effect is associated with the day of processing, the individual glass slide and the position of the array on the slide. Consistently across all datasets, 4649 probes required high amounts of correction. To understand the impact of this set to EWAS studies, we explored the literature and found three instances where persistently batch-effect prone probes have been reported in abstracts as key sites of differential methylation. As well as batch-effect susceptible probes, we also discover a set of probes which are erroneously corrected. We provide batch-effect workflows for Infinium Methylation data and provide reference matrices of batch-effect prone and erroneously corrected features across the five datasets spanning regionally diverse populations and three commonly collected biosamples (blood, buccal and saliva). Conclusions Batch-effects are ever present, even in high-quality data, and a strategy to deal with them should be part of experimental design, particularly for EWAS. Batch-effect removal tools are useful to reduce technical variance in Infinium Methylation data, but they need to be applied with care and make use of post hoc diagnostic measures. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-022-01277-9.
Collapse
Affiliation(s)
- Jason P Ross
- Human Health Program, Health and Biosecurity, CSIRO, Sydney, Australia.
| | - Susan van Dijk
- Human Health Program, Health and Biosecurity, CSIRO, Sydney, Australia
| | - Melinda Phang
- Charles Perkins Centre, The University of Sydney, Sydney, Australia
| | - Michael R Skilton
- Charles Perkins Centre, The University of Sydney, Sydney, Australia.,Sydney Medical School, The University of Sydney, Sydney, Australia.,Sydney Institute for Women, Children and Their Families, Sydney Local Health District, Sydney, Australia
| | - Peter L Molloy
- Human Health Program, Health and Biosecurity, CSIRO, Sydney, Australia
| | - Yalchin Oytam
- Clinical Insights and Analytics Unit, South Eastern Sydney Local Health District, Sydney, Australia
| |
Collapse
|
28
|
Raffington L, Belsky DW. Integrating DNA Methylation Measures of Biological Aging into Social Determinants of Health Research. Curr Environ Health Rep 2022; 9:196-210. [PMID: 35181865 DOI: 10.1007/s40572-022-00338-8] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/10/2022] [Indexed: 12/13/2022]
Abstract
PURPOSE OF REVIEW Acceleration of biological processes of aging is hypothesized to drive excess morbidity and mortality in socially disadvantaged populations. DNA methylation measures of biological aging provide tools for testing this hypothesis. RECENT FINDINGS Next-generation DNA methylation measures of biological aging developed to predict mortality risk and physiological decline are more predictive of morbidity and mortality than the original epigenetic clocks developed to predict chronological age. These new measures show consistent evidence of more advanced and faster biological aging in people exposed to socioeconomic disadvantage and may be able to record the emergence of socially determined health inequalities as early as childhood. Next-generation DNA methylation measures of biological aging also indicate race/ethnic disparities in biological aging. More research is needed on these measures in samples of non-Western and non-White populations. New DNA methylation measures of biological aging open opportunities for refining inference about the causes of social disparities in health and devising policies to eliminate them. Further refining measures of biological aging by including more diversity in samples used for measurement development is a critical priority for the field.
Collapse
Affiliation(s)
- Laurel Raffington
- Department of Psychology, University of Texas at Austin, Austin, TX, USA
- Population Research Center, The University of Texas at Austin, Austin, TX, USA
| | - Daniel W Belsky
- Department of Epidemiology, Columbia University Mailman School of Public Health, 722 W 168th St. Rm 413, New York, NY, 10032, USA.
- Robert N Butler Columbia Aging Center, Columbia University Mailman School of Public Health, New York, NY, USA.
| |
Collapse
|
29
|
Vandenbon A. Evaluation of critical data processing steps for reliable prediction of gene co-expression from large collections of RNA-seq data. PLoS One 2022; 17:e0263344. [PMID: 35089979 PMCID: PMC8797241 DOI: 10.1371/journal.pone.0263344] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 01/16/2022] [Indexed: 11/19/2022] Open
Abstract
Motivation Gene co-expression analysis is an attractive tool for leveraging enormous amounts of public RNA-seq datasets for the prediction of gene functions and regulatory mechanisms. However, the optimal data processing steps for the accurate prediction of gene co-expression from such large datasets remain unclear. Especially the importance of batch effect correction is understudied. Results We processed RNA-seq data of 68 human and 76 mouse cell types and tissues using 50 different workflows into 7,200 genome-wide gene co-expression networks. We then conducted a systematic analysis of the factors that result in high-quality co-expression predictions, focusing on normalization, batch effect correction, and measure of correlation. We confirmed the key importance of high sample counts for high-quality predictions. However, choosing a suitable normalization approach and applying batch effect correction can further improve the quality of co-expression estimates, equivalent to a >80% and >40% increase in samples. In larger datasets, batch effect removal was equivalent to a more than doubling of the sample size. Finally, Pearson correlation appears more suitable than Spearman correlation, except for smaller datasets. Conclusion A key point for accurate prediction of gene co-expression is the collection of many samples. However, paying attention to data normalization, batch effects, and the measure of correlation can significantly improve the quality of co-expression estimates.
Collapse
Affiliation(s)
- Alexis Vandenbon
- Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto, Japan
- Institute for Liberal Arts and Sciences, Kyoto University, Kyoto, Japan
- * E-mail:
| |
Collapse
|
30
|
Halder A, Verma A, Biswas D, Srivastava S. Recent advances in mass-spectrometry based proteomics software, tools and databases. DRUG DISCOVERY TODAY. TECHNOLOGIES 2021; 39:69-79. [PMID: 34906327 DOI: 10.1016/j.ddtec.2021.06.007] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 05/08/2021] [Accepted: 06/21/2021] [Indexed: 01/12/2023]
Abstract
The field of proteomics immensely depends on data generation and data analysis which are thoroughly supported by software and databases. There has been a massive advancement in mass spectrometry-based proteomics over the last 10 years which has compelled the scientific community to upgrade or develop algorithms, tools, and repository databases in the field of proteomics. Several standalone software, and comprehensive databases have aided the establishment of integrated omics pipeline and meta-analysis workflow which has contributed to understand the disease pathobiology, biomarker discovery and predicting new therapeutic modalities. For shotgun proteomics where Data Dependent Acquisition is performed, several user-friendly software are developed that can analyse the pre-processed data to provide mechanistic insights of the disease. Likewise, in Data Independent Acquisition, pipelines are emerged which can accomplish the task from building the spectral library to identify the therapeutic targets. Furthermore, in the age of big data analysis the implications of machine learning and cloud computing are appending robustness, rapidness and in-depth proteomics data analysis. The current review talks about the recent advancement, and development of software, tools, and database in the field of mass-spectrometry based proteomics.
Collapse
Affiliation(s)
- Ankit Halder
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Ayushi Verma
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Deeptarup Biswas
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | - Sanjeeva Srivastava
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India.
| |
Collapse
|
31
|
Xia Q, Thompson JA, Koestler DC. Batch effect reduction of microarray data with dependent samples using an empirical Bayes approach (BRIDGE). Stat Appl Genet Mol Biol 2021; 20:101-119. [PMID: 34905304 PMCID: PMC9617207 DOI: 10.1515/sagmb-2021-0020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 10/29/2021] [Indexed: 11/15/2022]
Abstract
Batch-effects present challenges in the analysis of high-throughput molecular data and are particularly problematic in longitudinal studies when interest lies in identifying genes/features whose expression changes over time, but time is confounded with batch. While many methods to correct for batch-effects exist, most assume independence across samples; an assumption that is unlikely to hold in longitudinal microarray studies. We propose Batch effect Reduction of mIcroarray data with Dependent samples usinGEmpirical Bayes (BRIDGE), a three-step parametric empirical Bayes approach that leverages technical replicate samples profiled at multiple timepoints/batches, so-called "bridge samples", to inform batch-effect reduction/attenuation in longitudinal microarray studies. Extensive simulation studies and an analysis of a real biological data set were conducted to benchmark the performance of BRIDGE against both ComBat and longitudinalComBat. Our results demonstrate that while all methods perform well in facilitating accurate estimates of time effects, BRIDGE outperforms both ComBat and longitudinal ComBat in the removal of batch-effects in data sets with bridging samples, and perhaps as a result, was observed to have improved statistical power for detecting genes with a time effect. BRIDGE demonstrated competitive performance in batch effect reduction of confounded longitudinal microarray studies, both in simulated and a real data sets, and may serve as a useful preprocessing method for researchers conducting longitudinal microarray studies that include bridging samples.
Collapse
Affiliation(s)
- Qing Xia
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, KS 66160
| | - Jeffrey A. Thompson
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, KS 66160
| | - Devin C. Koestler
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, KS 66160
| |
Collapse
|
32
|
Chen AA, Beer JC, Tustison NJ, Cook PA, Shinohara RT, Shou H. Mitigating site effects in covariance for machine learning in neuroimaging data. Hum Brain Mapp 2021; 43:1179-1195. [PMID: 34904312 PMCID: PMC8837590 DOI: 10.1002/hbm.25688] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2021] [Revised: 09/16/2021] [Accepted: 10/03/2021] [Indexed: 12/29/2022] Open
Abstract
To acquire larger samples for answering complex questions in neuroscience, researchers have increasingly turned to multi‐site neuroimaging studies. However, these studies are hindered by differences in images acquired across multiple sites. These effects have been shown to bias comparison between sites, mask biologically meaningful associations, and even introduce spurious associations. To address this, the field has focused on harmonizing data by removing site‐related effects in the mean and variance of measurements. Contemporaneously with the increase in popularity of multi‐center imaging, the use of machine learning (ML) in neuroimaging has also become commonplace. These approaches have been shown to provide improved sensitivity, specificity, and power due to their modeling the joint relationship across measurements in the brain. In this work, we demonstrate that methods for removing site effects in mean and variance may not be sufficient for ML. This stems from the fact that such methods fail to address how correlations between measurements can vary across sites. Data from the Alzheimer's Disease Neuroimaging Initiative is used to show that considerable differences in covariance exist across sites and that popular harmonization techniques do not address this issue. We then propose a novel harmonization method called Correcting Covariance Batch Effects (CovBat) that removes site effects in mean, variance, and covariance. We apply CovBat and show that within‐site correlation matrices are successfully harmonized. Furthermore, we find that ML methods are unable to distinguish scanner manufacturer after our proposed harmonization is applied, and that the CovBat‐harmonized data retain accurate prediction of disease group.
Collapse
Affiliation(s)
- Andrew A Chen
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Joanne C Beer
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Nicholas J Tustison
- Department of Radiology and Medical Imaging, University of Virginia, Charlottesville, Virginia, USA
| | - Philip A Cook
- Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Russell T Shinohara
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Haochang Shou
- Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | | |
Collapse
|
33
|
Campagna MP, Xavier A, Lechner-Scott J, Maltby V, Scott RJ, Butzkueven H, Jokubaitis VG, Lea RA. Epigenome-wide association studies: current knowledge, strategies and recommendations. Clin Epigenetics 2021; 13:214. [PMID: 34863305 PMCID: PMC8645110 DOI: 10.1186/s13148-021-01200-8] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 11/19/2021] [Indexed: 02/06/2023] Open
Abstract
The aetiology and pathophysiology of complex diseases are driven by the interaction between genetic and environmental factors. The variability in risk and outcomes in these diseases are incompletely explained by genetics or environmental risk factors individually. Therefore, researchers are now exploring the epigenome, a biological interface at which genetics and the environment can interact. There is a growing body of evidence supporting the role of epigenetic mechanisms in complex disease pathophysiology. Epigenome-wide association studies (EWASes) investigate the association between a phenotype and epigenetic variants, most commonly DNA methylation. The decreasing cost of measuring epigenome-wide methylation and the increasing accessibility of bioinformatic pipelines have contributed to the rise in EWASes published in recent years. Here, we review the current literature on these EWASes and provide further recommendations and strategies for successfully conducting them. We have constrained our review to studies using methylation data as this is the most studied epigenetic mechanism; microarray-based data as whole-genome bisulphite sequencing remains prohibitively expensive for most laboratories; and blood-based studies due to the non-invasiveness of peripheral blood collection and availability of archived DNA, as well as the accessibility of publicly available blood-cell-based methylation data. Further, we address multiple novel areas of EWAS analysis that have not been covered in previous reviews: (1) longitudinal study designs, (2) the chip analysis methylation pipeline (ChAMP), (3) differentially methylated region (DMR) identification paradigms, (4) methylation quantitative trait loci (methQTL) analysis, (5) methylation age analysis and (6) identifying cell-specific differential methylation from mixed cell data using statistical deconvolution.
Collapse
Affiliation(s)
- Maria Pia Campagna
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Australia
| | - Alexandre Xavier
- Centre for Information Based Medicine, Hunter Medical Research Institute, Newcastle, Australia
- School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, Australia
| | - Jeannette Lechner-Scott
- School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, Australia
- Department of Neurology, Division of Medicine, John Hunter Hospital, Newcastle, Australia
| | - Vicky Maltby
- Centre for Information Based Medicine, Hunter Medical Research Institute, Newcastle, Australia
- School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, Australia
| | - Rodney J Scott
- Centre for Information Based Medicine, Hunter Medical Research Institute, Newcastle, Australia
- School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, Australia
- Division of Molecular Medicine, New South Wales Health Pathology North, Newcastle, Australia
| | - Helmut Butzkueven
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Australia
- Department of Neurology, Alfred Health, Melbourne, Australia
| | - Vilija G Jokubaitis
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Australia
- Department of Neurology, Alfred Health, Melbourne, Australia
| | - Rodney A Lea
- School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, Australia.
- Centre for Genomics and Personalised Health, School of Biomedical Sciences, Queensland University of Technology, Brisbane, Australia.
| |
Collapse
|
34
|
Qin Y, Yi D, Chen X, Guan Y. Deep learning identifies erroneous microarray-based, gene-level conclusions in literature. NAR Genom Bioinform 2021; 3:lqab089. [PMID: 34617014 PMCID: PMC8489595 DOI: 10.1093/nargab/lqab089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 08/25/2021] [Accepted: 09/17/2021] [Indexed: 11/14/2022] Open
Abstract
More than 110 000 publications have used microarrays to decipher phenotype-associated genes, clinical biomarkers and gene functions. Microarrays rely on digital assaying the fluorescence signals of arrays. In this study, we retrospectively constructed raw images for 37 724 published microarray data, and developed deep learning algorithms to automatically detect systematic defects. We report that an alarming amount of 26.73% of the microarray-based studies are affected by serious imaging defects. By literature mining, we found that publications associated with these affected microarrays have reported disproportionately more biological discoveries on the genes in the contaminated areas compared to other genes. 28.82% of the gene-level conclusions reported in these publications were based on measurements falling into the contaminated area, indicating severe, systematic problems caused by such contaminations. We provided the identified published, problematic datasets, affected genes and the imputed arrays as well as software tools for scanning such contamination that will become essential to future studies to scrutinize and critically analyze microarray data.
Collapse
Affiliation(s)
- Yanan Qin
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Daiyao Yi
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Xianghao Chen
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
- Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| |
Collapse
|
35
|
Wani AH, Armstrong D, Dahrendorff J, Uddin M. RANDOMIZE: A Web Server for Data Randomization. ARCHIVES OF PROTEOMICS AND BIOINFORMATICS 2020; 1:31-37. [PMID: 33554223 PMCID: PMC7861512] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/31/2022]
Abstract
The microarray-based Illumina Infinium MethylationEpic BeadChip (Epic 850k) has become a useful and standard tool for epigenome wide deoxyribonucleic acid (DNA) methylation profiling. Data from this technology may suffer from batch effects due to improper handling of the samples during the plating process. Batch effects are a significant issue and can give rise to spurious and inaccurate results and reduction in power to detect real biological differences. Careful study design, such as randomizing the samples to uniformly distribute the samples across the factors responsible for batch effects, is crucial to address batch effects and other technical artifacts. Randomization helps to reduce the likelihood of bias and impact of difference among groups. This process of randomizing the samples can be a tedious, error-prone, and time-consuming task without a user-friendly and efficient tool. We present RANDOMIZE, a web-based application designed to perform randomization of relevant metadata to evenly distribute samples across the factors typically responsible for batch effects in DNA methylation microarrays, such as rows, chips and plates. We demonstrate that the tool is efficient, fast and easy to use. The tool is freely available online at https://coph-usf.shinyapps.io/RANDOMIZE/ and can be accessed using any web browser. Sample data and tutorial is also available with the tool.
Collapse
Affiliation(s)
- Agaz H. Wani
- Genomics Program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Don Armstrong
- University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Jan Dahrendorff
- Genomics Program, College of Public Health, University of South Florida, Tampa, FL, USA
| | - Monica Uddin
- Genomics Program, College of Public Health, University of South Florida, Tampa, FL, USA
| |
Collapse
|