1
|
Pospiech M, Beckford J, Kumar AMS, Tamizharasan M, Brito J, Liang G, Mangul S, Alachkar H. The DNA methylation landscape across the TCR loci in patients with acute myeloid leukemia. Int Immunopharmacol 2024; 138:112376. [PMID: 38917523 DOI: 10.1016/j.intimp.2024.112376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 05/09/2024] [Accepted: 05/28/2024] [Indexed: 06/27/2024]
Abstract
The capacity of T cells to initiate anti-leukemia immune responses is determined by the ability of their receptors (TCRs) to recognize leukemia neoantigens. Epigenetic mechanisms including DNA methylation contribute to shaping the TCR repertoire composition and diversity. The DNA hypomethylating agents (HMAs) have been widely used in the treatment of acute myeloid leukemia (AML) and myelodysplastic syndrome (MDS). Whether DNA HMAs directly influence TCR gene loci methylation patterns remains unknown. By analyzing public datasets, we compared methylation patterns across TCR loci in AML patients and healthy controls. We also explored how HMAs influence TCR loci DNA methylation in patients with AML. While methylation patterns are largely conserved across the TCR loci, certain V genes exhibit high interindividual variability. Although overall methylation levels within the TCR loci did not show significant differences, specific sites, including 32 TRAV and 12 TRBV sites exhibited distinct methylation patterns when comparing T cells from healthy donors to those from patients with AML. In leukemic cells, decitabine treatment demethylates sites across the TRAV and TRBV genes. While not as significant, a similar pattern of demethylation is observed in T cells. Pretreatment AML samples exhibit higher methylation beta values in differentially methylated positions (DMPs) compared with non-DMPs. Methylation levels of certain TRAV and TRBV genes in leukemic cells are associated with patients' risk status. The presence of disease specific TCR loci methylated signatures that are associated with clinical outcome presents an opportunity for therapeutic intervention. HMAs can modulate the TCR loci methylation patterns, yet whether they could reprogram the TCR repertoire composition remains to be explored.
Collapse
MESH Headings
- Humans
- DNA Methylation
- Leukemia, Myeloid, Acute/genetics
- Leukemia, Myeloid, Acute/drug therapy
- Leukemia, Myeloid, Acute/immunology
- Decitabine/pharmacology
- Decitabine/therapeutic use
- Receptors, Antigen, T-Cell/genetics
- T-Lymphocytes/immunology
- Epigenesis, Genetic
- Antimetabolites, Antineoplastic/therapeutic use
- Antimetabolites, Antineoplastic/pharmacology
Collapse
Affiliation(s)
- Mateusz Pospiech
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America
| | - John Beckford
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America
| | - Advaith Maya Sanjeev Kumar
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America; Department of Computer Science, University of Southern California, Los Angeles, CA, the United States of America
| | - Mukund Tamizharasan
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America; Department of Computer Science, University of Southern California, Los Angeles, CA, the United States of America
| | - Jaqueline Brito
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America
| | - Gangning Liang
- Department of Urology, Keck School of Medicine, University of Southern California, Los Angeles, CA, the United States of America
| | - Serghei Mangul
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America
| | - Houda Alachkar
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, University of Southern California, Los Angeles, CA, the United States of America.
| |
Collapse
|
2
|
Rahman ML, Breeze CE, Shu XO, Wong JYY, Blechter B, Cardenas A, Wang X, Ji BT, Hu W, Cai Q, Hosgood HD, Yang G, Shi J, Long J, Gao YT, Bell DA, Zheng W, Rothman N, Lan Q. Epigenome-wide association study of lung cancer among never smokers in two prospective cohorts in Shanghai, China. Thorax 2024; 79:735-744. [PMID: 38702190 PMCID: PMC11251856 DOI: 10.1136/thorax-2023-220352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Accepted: 02/17/2024] [Indexed: 05/06/2024]
Abstract
BACKGROUND The aetiology of lung cancer among individuals who never smoked remains elusive, despite 15% of lung cancer cases in men and 53% in women worldwide being unrelated to smoking. Epigenetic alterations, particularly DNA methylation (DNAm) changes, have emerged as potential drivers. Yet, few prospective epigenome-wide association studies (EWAS), primarily focusing on peripheral blood DNAm with limited representation of never smokers, have been conducted. METHODS We conducted a nested case-control study of 80 never-smoking incident lung cancer cases and 83 never-smoking controls within the Shanghai Women's Health Study and Shanghai Men's Health Study. DNAm was measured in prediagnostic oral rinse samples using Illumina MethylationEPIC array. Initially, we conducted an EWAS to identify differentially methylated positions (DMPs) associated with lung cancer in the discovery sample of 101 subjects. The top 50 DMPs were further evaluated in a replication sample of 62 subjects, and results were pooled using fixed-effect meta-analysis. RESULTS Our study identified three DMPs significantly associated with lung cancer at the epigenome-wide significance level of p<8.22×10-8. These DMPs were identified as cg09198866 (MYH9; TXN2), cg01411366 (SLC9A10) and cg12787323. Furthermore, examination of the top 1000 DMPs indicated significant enrichment in epithelial regulatory regions and their involvement in small GTPase-mediated signal transduction pathways. Additionally, GrimAge acceleration was identified as a risk factor for lung cancer (OR=1.19 per year; 95% CI 1.06 to 1.34). CONCLUSIONS While replication in a larger sample size is necessary, our findings suggest that DNAm patterns in prediagnostic oral rinse samples could provide novel insights into the underlying mechanisms of lung cancer in never smokers.
Collapse
Affiliation(s)
- Mohammad L Rahman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Charles E Breeze
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Xiao-Ou Shu
- Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jason Y Y Wong
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Batel Blechter
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Andres Cardenas
- Department of Epidemiology and Population Health, Stanford University, Stanford, California, USA
| | - Xuting Wang
- Immunity, Inflammation and Diseases Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Bu-Tian Ji
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Wei Hu
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Qiuyin Cai
- Vanderbilt University, Nashville, Tennessee, USA
| | - H Dean Hosgood
- Albert Einstein College of Medicine, Bronx, New York, USA
| | - Gong Yang
- Department of Medicine, Vanderbilt-Ingram Cancer Center, Nashville, Tennessee, USA
| | - Jianxin Shi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Jirong Long
- Department of Medicine, Vanderbilt-Ingram Cancer Center, Nashville, Tennessee, USA
| | | | - Douglas A Bell
- Immunity, Inflammation and Diseases Laboratory, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA
| | - Wei Zheng
- Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Nathaniel Rothman
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| | - Qing Lan
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, USA
| |
Collapse
|
3
|
Hannon ER, Marsit CJ, Dent AE, Embury P, Ogolla S, Midem D, Williams SM, Kazura JW. Transcriptome- and DNA methylation-based cell-type deconvolutions produce similar estimates of differential gene expression and differential methylation. BioData Min 2024; 17:21. [PMID: 38992677 PMCID: PMC11241886 DOI: 10.1186/s13040-024-00374-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 07/01/2024] [Indexed: 07/13/2024] Open
Abstract
BACKGROUND Changing cell-type proportions can confound studies of differential gene expression or DNA methylation (DNAm) from peripheral blood mononuclear cells (PBMCs). We examined how cell-type proportions derived from the transcriptome versus the methylome (DNAm) influence estimates of differentially expressed genes (DEGs) and differentially methylated positions (DMPs). METHODS Transcriptome and DNAm data were obtained from PBMC RNA and DNA of Kenyan children (n = 8) before, during, and 6 weeks following uncomplicated malaria. DEGs and DMPs between time points were detected using cell-type adjusted modeling with Cibersortx or IDOL, respectively. RESULTS Most major cell types and principal components had moderate to high correlation between the two deconvolution methods (r = 0.60-0.96). Estimates of cell-type proportions and DEGs or DMPs were largely unaffected by the method, with the greatest discrepancy in the estimation of neutrophils. CONCLUSION Variation in cell-type proportions is captured similarly by both transcriptomic and methylome deconvolution methods for most major cell types.
Collapse
Affiliation(s)
- Emily R Hannon
- Center for Global Health and Diseases, Case Western Reserve University, 10900 Euclid Avenue LC:4983, Cleveland, OH, 44106, USA.
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, 44106, USA.
| | - Carmen J Marsit
- Gangarosa Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA, 30322, USA
| | - Arlene E Dent
- Center for Global Health and Diseases, Case Western Reserve University, 10900 Euclid Avenue LC:4983, Cleveland, OH, 44106, USA
- Division of Pediatric Infectious Diseases, Rainbow Babies and Children's Hospital, Cleveland, OH, 44106, USA
| | - Paula Embury
- Center for Global Health and Diseases, Case Western Reserve University, 10900 Euclid Avenue LC:4983, Cleveland, OH, 44106, USA
| | | | - David Midem
- Chulaimbo Sub-county Hospital, Kisumu County, Kenya
| | - Scott M Williams
- Department of Population and Quantitative Health Sciences, Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, 44106, USA
| | - James W Kazura
- Center for Global Health and Diseases, Case Western Reserve University, 10900 Euclid Avenue LC:4983, Cleveland, OH, 44106, USA
| |
Collapse
|
4
|
Zhuang BC, Jude MS, Konwar C, Ryan CP, Whitehead J, Engelbrecht HR, MacIsaac JL, Dever K, Toan TK, Korinek K, Zimmer Z, Huffman KM, Lee NR, McDade TW, Kuzawa CW, Belsky DW, Kobor MS. Comparison of Infinium MethylationEPIC v2.0 to v1.0 for human population epigenetics: considerations for addressing EPIC version differences in DNA methylation-based tools. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.02.600461. [PMID: 39005299 PMCID: PMC11245009 DOI: 10.1101/2024.07.02.600461] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Background The recently launched DNA methylation profiling platform, Illumina MethylationEPIC BeadChip Infinium microarray v2.0 (EPICv2), is highly correlated with measurements obtained from its predecessor MethylationEPIC BeadChip Infinium microarray v1.0 (EPICv1). However, the concordance between the two versions in the context of DNA methylation-based tools, including cell type deconvolution algorithms, epigenetic clocks, and inflammation and lifestyle biomarkers has not yet been investigated. Findings We profiled DNA methylation on both EPIC versions using matched venous blood samples from individuals spanning early to late adulthood across three cohorts. On combining the DNA methylomes of the cohorts, we observed that samples primarily clustered by the EPIC version they were measured on. Within each cohort, when we calculated cell type proportions, epigenetic age acceleration (EAA), rate of aging estimates, and biomarker scores for the matched samples on each version, we noted significant differences between EPICv1 and EPICv2 in the majority of these estimates. These differences were not significant, however, when estimates were adjusted for EPIC version or when EAAs were calculated separately for each EPIC version. Conclusions Our findings indicate that EPIC version differences predominantly explain DNA methylation variation and influence estimates of DNA methylation-based tools, and therefore we recommend caution when combining cohorts run on different versions. We demonstrate the importance of calculating DNA methylation-based estimates separately for each EPIC version or accounting for EPIC version either as a covariate in statistical models or by using version correction algorithms.
Collapse
Affiliation(s)
- Beryl C Zhuang
- BC Children's Hospital Research Institute, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, Faculty of Medicine, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
| | - Marcia Smiti Jude
- BC Children's Hospital Research Institute, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, Faculty of Medicine, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
| | - Chaini Konwar
- BC Children's Hospital Research Institute, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, Faculty of Medicine, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
| | - Calen P Ryan
- Robert N. Butler Columbia Aging Center, Mailman School of Public Health, Columbia University, New York, NY 10032, USA
| | - Joanne Whitehead
- BC Children's Hospital Research Institute, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, Faculty of Medicine, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
| | - Hannah-Ruth Engelbrecht
- BC Children's Hospital Research Institute, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, Faculty of Medicine, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
| | - Julia L MacIsaac
- BC Children's Hospital Research Institute, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, Faculty of Medicine, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
| | - Kristy Dever
- BC Children's Hospital Research Institute, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, Faculty of Medicine, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
| | - Tran Khanh Toan
- Family Medicine Department, Hanoi Medical University, Hanoi, Vietnam
| | - Kim Korinek
- Department of Sociology, University of Utah, Salt Lake City, USA
| | - Zachary Zimmer
- Department of Family Studies and Gerontology, Mount Saint Vincent University, Halifax, Canada
- Canada Research Chair, Global Aging and Community Initiative
| | - Kim M Huffman
- Duke University School of Medicine, Durham, NC, 27701, USA
| | - Nanette R Lee
- USC-Office of Population Studies Foundation, Inc., University of San Carlos, Cebu City, Philippines
| | - Thomas W McDade
- Department of Anthropology, Northwestern University, Evanston, Illinois, USA
- Program in Child and Brain Development, CIFAR, Toronto, Ontario, Canada
| | - Christopher W Kuzawa
- Department of Anthropology and Institute for Policy Research, Northwestern University, Evanston, IL 60208, USA
| | - Daniel W Belsky
- Butler Columbia Aging Center, Columbia University Mailman School of Public Health, New York, NY, USA
- Department of Epidemiology, Columbia University Mailman School of Public Health, New York, NY, USA
| | - Michael S Kobor
- BC Children's Hospital Research Institute, 950 West 28th Avenue, Vancouver, BC, V5Z 4H4, Canada
- Department of Medical Genetics, Faculty of Medicine, University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
- Program in Child and Brain Development, CIFAR, Toronto, Ontario, Canada
- The Edwin S.H. Leong UBC Healthy Aging Chair-A UBC President's Excellence Chair, University of British Columbia, Canada
| |
Collapse
|
5
|
Deng WQ, Pigeyre M, Azab SM, Wilson SL, Campbell N, Cawte N, Morrison KM, Atkinson SA, Subbarao P, Turvey SE, Moraes TJ, Mandhane P, Azad MB, Simons E, Pare G, Anand SS. Consistent cord blood DNA methylation signatures of gestational age between South Asian and white European cohorts. Clin Epigenetics 2024; 16:74. [PMID: 38840168 PMCID: PMC11155053 DOI: 10.1186/s13148-024-01684-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 05/23/2024] [Indexed: 06/07/2024] Open
Abstract
BACKGROUND Epigenetic modifications, particularly DNA methylation (DNAm) in cord blood, are an important biological marker of how external exposures during gestation can influence the in-utero environment and subsequent offspring development. Despite the recognized importance of DNAm during gestation, comparative studies to determine the consistency of these epigenetic signals across different ethnic groups are largely absent. To address this gap, we first performed epigenome-wide association studies (EWAS) of gestational age (GA) using newborn cord blood DNAm comparatively in a white European (n = 342) and a South Asian (n = 490) birth cohort living in Canada. Then, we capitalized on established cord blood epigenetic GA clocks to examine the associations between maternal exposures, offspring characteristics and epigenetic GA, as well as GA acceleration, defined as the residual difference between epigenetic and chronological GA at birth. RESULTS Individual EWASs confirmed 1,211 and 1,543 differentially methylated CpGs previously reported to be associated with GA, in white European and South Asian cohorts, respectively, with a similar distribution of effects. We confirmed that Bohlin's cord blood GA clock was robustly correlated with GA in white Europeans (r = 0.71; p = 6.0 × 10-54) and South Asians (r = 0.66; p = 6.9 × 10-64). In both cohorts, Bohlin's clock was positively associated with newborn weight and length and negatively associated with parity, newborn female sex, and gestational diabetes. Exclusive to South Asians, the GA clock was positively associated with the newborn ponderal index, while pre-pregnancy weight and gestational weight gain were strongly predictive of increased epigenetic GA in white Europeans. Important predictors of GA acceleration included gestational diabetes mellitus, newborn sex, and parity in both cohorts. CONCLUSIONS These results demonstrate the consistent DNAm signatures of GA and the utility of Bohlin's GA clock across the two populations. Although the overall pattern of DNAm is similar, its connections with the mother's environment and the baby's anthropometrics can differ between the two groups. Further research is needed to understand these unique relationships.
Collapse
Affiliation(s)
- Wei Q Deng
- Peter Boris Centre for Addictions Research, St. Joseph's Healthcare Hamilton, Hamilton, Canada.
- Department of Psychiatry and Behavioural Neurosciences, McMaster University, Hamilton, Canada.
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Canada.
| | - Marie Pigeyre
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Canada
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Canada
- Thrombosis and Atherosclerosis Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, ON, Canada
| | - Sandi M Azab
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Canada
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Canada
| | - Samantha L Wilson
- Department of Obstetrics and Gynecology, McMaster University, Hamilton, Canada
| | - Natalie Campbell
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - Nathan Cawte
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Canada
| | | | | | - Padmaja Subbarao
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Canada
- Hospital for Sick Children, Department of Pediatrics, University of Toronto, Toronto, Canada
- Program in Translational Medicine, SickKids Research Institute, Toronto, Canada
| | - Stuart E Turvey
- Department of Pediatrics, BC Children's Hospital, The University of British Columbia, Vancouver, Canada
| | - Theo J Moraes
- Hospital for Sick Children, Department of Pediatrics, University of Toronto, Toronto, Canada
- Program in Translational Medicine, SickKids Research Institute, Toronto, Canada
| | - Piush Mandhane
- Department of Pediatrics, University of Alberta, Edmonton, Canada
| | - Meghan B Azad
- Department of Pediatrics and Child Health, Children's Hospital Research Institute of Manitoba, University of Manitoba, Winnipeg, Canada
| | - Elinor Simons
- Section of Allergy and Immunology, Department of Pediatrics and Child Health, University of Manitoba, Winnipeg, Canada
| | - Guillaume Pare
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Canada
- Thrombosis and Atherosclerosis Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, ON, Canada
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Canada
- Department of Pathology and Molecular Medicine, Michael G. DeGroote School of Medicine, McMaster University, Hamilton, Canada
| | - Sonia S Anand
- Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Canada.
- Population Health Research Institute, David Braley Cardiac, Vascular and Stroke Research Institute, Hamilton, Canada.
- Department of Health Research Methods, Evidence, and Impact, McMaster University, Hamilton, Canada.
| |
Collapse
|
6
|
Hannon ER, Marsit CJ, Dent AE, Embury P, Ogolla S, Midem D, Williams SM, Kazura JW. Transcriptome- and DNA methylation-based cell-type deconvolutions produce similar estimates of differential gene expression and differential methylation. RESEARCH SQUARE 2024:rs.3.rs-3992113. [PMID: 38645047 PMCID: PMC11030537 DOI: 10.21203/rs.3.rs-3992113/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/23/2024]
Abstract
Background Changing cell-type proportions can confound studies of differential gene expression or DNA methylation (DNAm) from peripheral blood mononuclear cells (PBMCs). We examined how cell-type proportions derived from the transcriptome versus the methylome (DNAm) influence estimates of differentially expressed genes (DEGs) and differentially methylated positions (DMPs). Methods Transcriptome and DNAm data were obtained from PBMC RNA and DNA of Kenyan children (n = 8) before, during, and 6 weeks following uncomplicated malaria. DEGs and DMPs between time points were detected using cell-type adjusted modeling with Cibersortx or IDOL, respectively. Results Most major cell types and principal components had moderate to high correlation between the two deconvolution methods (r = 0.60-0.96). Estimates of cell-type proportions and DEGs or DMPs were largely unaffected by the method, with the greatest discrepancy in the estimation of neutrophils. Conclusion Variation in cell-type proportions is captured similarly by both transcriptomic and methylome deconvolution methods for most major cell types.
Collapse
|
7
|
Bunyavanich S, Becker PM, Altman MC, Lasky-Su J, Ober C, Zengler K, Berdyshev E, Bonneau R, Chatila T, Chatterjee N, Chung KF, Cutcliffe C, Davidson W, Dong G, Fang G, Fulkerson P, Himes BE, Liang L, Mathias RA, Ogino S, Petrosino J, Price ND, Schadt E, Schofield J, Seibold MA, Steen H, Wheatley L, Zhang H, Togias A, Hasegawa K. Analytical challenges in omics research on asthma and allergy: A National Institute of Allergy and Infectious Diseases workshop. J Allergy Clin Immunol 2024; 153:954-968. [PMID: 38295882 PMCID: PMC10999353 DOI: 10.1016/j.jaci.2024.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/19/2024] [Accepted: 01/24/2024] [Indexed: 02/29/2024]
Abstract
Studies of asthma and allergy are generating increasing volumes of omics data for analysis and interpretation. The National Institute of Allergy and Infectious Diseases (NIAID) assembled a workshop comprising investigators studying asthma and allergic diseases using omics approaches, omics investigators from outside the field, and NIAID medical and scientific officers to discuss the following areas in asthma and allergy research: genomics, epigenomics, transcriptomics, microbiomics, metabolomics, proteomics, lipidomics, integrative omics, systems biology, and causal inference. Current states of the art, present challenges, novel and emerging strategies, and priorities for progress were presented and discussed for each area. This workshop report summarizes the major points and conclusions from this NIAID workshop. As a group, the investigators underscored the imperatives for rigorous analytic frameworks, integration of different omics data types, cross-disciplinary interaction, strategies for overcoming current limitations, and the overarching goal to improve scientific understanding and care of asthma and allergic diseases.
Collapse
Affiliation(s)
| | - Patrice M Becker
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Md
| | | | - Jessica Lasky-Su
- Brigham & Women's Hospital and Harvard Medical School, Boston, Mass
| | | | | | | | | | - Talal Chatila
- Boston Children's Hospital and Harvard Medical School, Boston, Mass
| | | | | | | | - Wendy Davidson
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Md
| | - Gang Dong
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Md
| | - Gang Fang
- Icahn School of Medicine at Mount Sinai, New York, NY
| | - Patricia Fulkerson
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Md
| | | | - Liming Liang
- Harvard T. H. Chan School of Public Health, Boston, Mass
| | | | - Shuji Ogino
- Brigham & Women's Hospital and Harvard Medical School, Boston, Mass; Harvard T. H. Chan School of Public Health, Boston, Mass; Broad Institute of MIT and Harvard, Boston, Mass
| | | | | | - Eric Schadt
- Icahn School of Medicine at Mount Sinai, New York, NY
| | | | - Max A Seibold
- National Jewish Health, Denver, Colo; University of Colorado School of Medicine, Aurora, Colo
| | - Hanno Steen
- Boston Children's Hospital and Harvard Medical School, Boston, Mass
| | - Lisa Wheatley
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Md
| | - Hongmei Zhang
- School of Public Health, University of Memphis, Memphis, Tenn
| | - Alkis Togias
- National Institute of Allergy and Infectious Diseases, National Institutes of Health, Rockville, Md
| | - Kohei Hasegawa
- Massachusetts General Hospital and Harvard Medical School, Boston, Mass
| |
Collapse
|
8
|
Kreitmaier P, Park YC, Swift D, Gilly A, Wilkinson JM, Zeggini E. Epigenomic profiling of the infrapatellar fat pad in osteoarthritis. Hum Mol Genet 2024; 33:501-509. [PMID: 37975894 PMCID: PMC10939427 DOI: 10.1093/hmg/ddad198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Revised: 10/13/2023] [Accepted: 11/07/2023] [Indexed: 11/19/2023] Open
Abstract
Osteoarthritis is a prevalent, complex disease of the joints, and affects multiple intra-articular tissues. Here, we have examined genome-wide DNA methylation profiles of primary infrapatellar fat pad and matched blood samples from 70 osteoarthritis patients undergoing total knee replacement surgery. Comparing the DNA methylation profiles between these tissues reveal widespread epigenetic differences. We produce the first genome-wide methylation quantitative trait locus (mQTL) map of fat pad, and make the resource available to the wider community. Using two-sample Mendelian randomization and colocalization analyses, we resolve osteoarthritis GWAS signals and provide insights into the molecular mechanisms underpinning disease aetiopathology. Our findings provide the first view of the epigenetic landscape of infrapatellar fat pad primary tissue in osteoarthritis.
Collapse
Affiliation(s)
- Peter Kreitmaier
- Technical University of Munich (TUM) and Klinikum Rechts der Isar, TUM School of Medicine and Health, Ismaninger Str. 22, Munich 81675, Germany
- Graduate School of Experimental Medicine, TUM School of Medicine and Health, Technical University of Munich, Ismaninger Str. 22, Munich 81675, Germany
- Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstaedter Landstr. 1, Neuherberg 85764, Germany
| | - Young-Chan Park
- Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstaedter Landstr. 1, Neuherberg 85764, Germany
| | - Diane Swift
- Department of Oncology and Metabolism, The University of Sheffield, Beech Hill Rd, Sheffield S10 2RX, United Kingdom
| | - Arthur Gilly
- Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstaedter Landstr. 1, Neuherberg 85764, Germany
| | - J Mark Wilkinson
- Department of Oncology and Metabolism, The University of Sheffield, Beech Hill Rd, Sheffield S10 2RX, United Kingdom
| | - Eleftheria Zeggini
- Technical University of Munich (TUM) and Klinikum Rechts der Isar, TUM School of Medicine and Health, Ismaninger Str. 22, Munich 81675, Germany
- Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, Ingolstaedter Landstr. 1, Neuherberg 85764, Germany
| |
Collapse
|
9
|
Casazza W, Inkster AM, Del Gobbo GF, Yuan V, Delahaye F, Marsit C, Park YP, Robinson WP, Mostafavi S, Dennis JK. Sex-dependent placental methylation quantitative trait loci provide insight into the prenatal origins of childhood onset traits and conditions. iScience 2024; 27:109047. [PMID: 38357671 PMCID: PMC10865402 DOI: 10.1016/j.isci.2024.109047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 06/19/2023] [Accepted: 01/23/2024] [Indexed: 02/16/2024] Open
Abstract
Molecular quantitative trait loci (QTLs) allow us to understand the biology captured in genome-wide association studies (GWASs). The placenta regulates fetal development and shows sex differences in DNA methylation. We therefore hypothesized that placental methylation QTL (mQTL) explain variation in genetic risk for childhood onset traits, and that effects differ by sex. We analyzed 411 term placentas from two studies and found 49,252 methylation (CpG) sites with mQTL and 2,489 CpG sites with sex-dependent mQTL. All mQTL were enriched in regions that typically affect gene expression in prenatal tissues. All mQTL were also enriched in GWAS results for growth- and immune-related traits, but male- and female-specific mQTL were more enriched than cross-sex mQTL. mQTL colocalized with trait loci at 777 CpG sites, with 216 (28%) specific to males or females. Overall, mQTL specific to male and female placenta capture otherwise overlooked variation in childhood traits.
Collapse
Affiliation(s)
- William Casazza
- Centre for Molecular Medicine and Therapeutics, BC Children’s Hospital, Vancouver, BC, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada
- BC Children’s Hospital Research Institute, Vancouver, BC, Canada
| | - Amy M. Inkster
- BC Children’s Hospital Research Institute, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Giulia F. Del Gobbo
- BC Children’s Hospital Research Institute, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
- Children’s Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON, Canada
| | - Victor Yuan
- BC Children’s Hospital Research Institute, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | | | - Carmen Marsit
- Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Yongjin P. Park
- Department of Statistics, University of British Columbia, Vancouver, BC, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Wendy P. Robinson
- BC Children’s Hospital Research Institute, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Sara Mostafavi
- Centre for Molecular Medicine and Therapeutics, BC Children’s Hospital, Vancouver, BC, Canada
- Paul Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Jessica K. Dennis
- Centre for Molecular Medicine and Therapeutics, BC Children’s Hospital, Vancouver, BC, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada
- BC Children’s Hospital Research Institute, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
10
|
Stone TC, Ward V, Hogan A, Alexander Ho KM, Wilson A, McBain H, Duku M, Wolfson P, Cheung S, Rosenfeld A, Lovat LB. Using saliva epigenetic data to develop and validate a multivariable predictor of esophageal cancer status. Epigenomics 2024; 16:109-125. [PMID: 38226541 PMCID: PMC10825730 DOI: 10.2217/epi-2023-0248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Accepted: 01/04/2024] [Indexed: 01/17/2024] Open
Abstract
Background: Salivary epigenetic biomarkers may detect esophageal cancer. Methods: A total of 256 saliva samples from esophageal adenocarcinoma patients and matched volunteers were analyzed with Illumina EPIC methylation arrays. Three datasets were created, using 64% for discovery, 16% for testing and 20% for validation. Modules of gene-based methylation probes were created using weighted gene coexpression network analysis. Module significance to disease and gene importance to module were determined and a random forest classifier generated using best-scoring gene-related epigenetic probes. A cost-sensitive wrapper algorithm maximized cancer diagnosis. Results: Using age, sex and seven probes, esophageal adenocarcinoma was detected with area under the curve of 0.72 in discovery, 0.73 in testing and 0.75 in validation datasets. Cancer sensitivity was 88% with specificity of 31%. Conclusion: We have demonstrated a potentially clinically viable classifier of esophageal cancer based on saliva methylation.
Collapse
Affiliation(s)
- Timothy C Stone
- Division of Surgery & Interventional Science, University College London, Charles Bell House, 43-45 Foley Street, London, W1W 7TY, UK
| | - Vanessa Ward
- Division of Surgery & Interventional Science, University College London, Charles Bell House, 43-45 Foley Street, London, W1W 7TY, UK
| | - Aine Hogan
- Division of Surgery & Interventional Science, University College London, Charles Bell House, 43-45 Foley Street, London, W1W 7TY, UK
| | - Kai Man Alexander Ho
- Division of Surgery & Interventional Science, University College London, Charles Bell House, 43-45 Foley Street, London, W1W 7TY, UK
- Wellcome/EPSRC Centre for Interventional & Surgical Sciences (WEISS), University College London, Charles Bell House, 43-45 Foley Street, London, W1W 7TY, UK
| | - Ash Wilson
- Division of Surgery & Interventional Science, University College London, Charles Bell House, 43-45 Foley Street, London, W1W 7TY, UK
| | - Hazel McBain
- Wellcome/EPSRC Centre for Interventional & Surgical Sciences (WEISS), University College London, Charles Bell House, 43-45 Foley Street, London, W1W 7TY, UK
| | - Margaret Duku
- Wellcome/EPSRC Centre for Interventional & Surgical Sciences (WEISS), University College London, Charles Bell House, 43-45 Foley Street, London, W1W 7TY, UK
| | - Paul Wolfson
- Division of Surgery & Interventional Science, University College London, Charles Bell House, 43-45 Foley Street, London, W1W 7TY, UK
| | - Sharon Cheung
- Division of Surgery & Interventional Science, University College London, Charles Bell House, 43-45 Foley Street, London, W1W 7TY, UK
| | - Avi Rosenfeld
- Department of Computer Science, Jerusalem College of Technology, Havaad Haleumi 21, Givat Mordechai, 91160, Jerusalem, Israel
| | - Laurence B Lovat
- Division of Surgery & Interventional Science, University College London, Charles Bell House, 43-45 Foley Street, London, W1W 7TY, UK
- Wellcome/EPSRC Centre for Interventional & Surgical Sciences (WEISS), University College London, Charles Bell House, 43-45 Foley Street, London, W1W 7TY, UK
- Department of Gastrointestinal Services, University College London Hospital, 235 Euston Road, London, NW1 2BU, UK
| |
Collapse
|
11
|
Al-Chalabi N, Qian J, Gerretsen P, Chaudhary Z, Fischer C, Graff A, Remington G, De Luca V. Dynamic change in genome-wide methylation in response to increased suicidal ideation in schizophrenia spectrum disorders. J Neural Transm (Vienna) 2023; 130:1303-1313. [PMID: 37584690 DOI: 10.1007/s00702-023-02661-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 06/01/2023] [Indexed: 08/17/2023]
Abstract
Suicide is a significant public health crisis, with 800,000 people dying annually. Most people completing suicide have previous psychiatric conditions, and those with psychotic and mood disorders are particularly vulnerable. Unfortunately, there are currently no biomarkers available for accurately detecting suicidal ideation. Given the genetic and environmental factors that play a role in suicidal ideation, we attempted to determine epigenetic modifications, specifically DNA methylation, in response to changes in suicidal ideation. Using a longitudinal study design, 31 participants with schizophrenia spectrum disorders were interviewed at a baseline visit and again at a follow-up visit 3-12 months later. Current suicidal ideation was recorded at both visits with the Columbia Suicide Severity Rating Scale and the Beck Scale for Suicide Ideation, and whole blood was collected for methylation analysis. Our analysis shows a significant negative correlation between cg26910920 methylation and increasing Columbia Suicide Severity Rating Scale scores and a positive correlation between cg13673029 methylation and increasing Beck Scale for Suicide Ideation scores. This pilot study indicates that there is the possibility that DNA methylation can respond to changes in suicidal ideation over time and potentially be used as a biomarker of suicidal ideation in the future.
Collapse
Affiliation(s)
| | | | | | | | | | - Ariel Graff
- CAMH, 250 College St, Toronto, M5T1R8, Canada
| | | | - Vincenzo De Luca
- CAMH, 250 College St, Toronto, M5T1R8, Canada.
- St. Michael's Hospital, Toronto, Canada.
| |
Collapse
|
12
|
Landen S, Jacques M, Hiam D, Alvarez-Romero J, Schittenhelm RB, Shah AD, Huang C, Steele JR, Harvey NR, Haupt LM, Griffiths LR, Ashton KJ, Lamon S, Voisin S, Eynon N. Sex differences in muscle protein expression and DNA methylation in response to exercise training. Biol Sex Differ 2023; 14:56. [PMID: 37670389 PMCID: PMC10478435 DOI: 10.1186/s13293-023-00539-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 08/18/2023] [Indexed: 09/07/2023] Open
Abstract
BACKGROUND Exercise training elicits changes in muscle physiology, epigenomics, transcriptomics, and proteomics, with males and females exhibiting differing physiological responses to exercise training. However, the molecular mechanisms contributing to the differing adaptations between the sexes are poorly understood. METHODS We performed a meta-analysis for sex differences in skeletal muscle DNA methylation following an endurance training intervention (Gene SMART cohort and E-MTAB-11282 cohort). We investigated for sex differences in the skeletal muscle proteome following an endurance training intervention (Gene SMART cohort). Lastly, we investigated whether the methylome and proteome are associated with baseline cardiorespiratory fitness (maximal oxygen consumption; VO2max) in a sex-specific manner. RESULTS Here, we investigated for the first time, DNA methylome and proteome sex differences in response to exercise training in human skeletal muscle (n = 78; 50 males, 28 females). We identified 92 DNA methylation sites (CpGs) associated with exercise training; however, no CpGs changed in a sex-dependent manner. In contrast, we identified 189 proteins that are differentially expressed between the sexes following training, with 82 proteins differentially expressed between the sexes at baseline. Proteins showing the most robust sex-specific response to exercise include SIRT3, MRPL41, and MBP. Irrespective of sex, cardiorespiratory fitness was associated with robust methylome changes (19,257 CpGs) and no proteomic changes. We did not observe sex differences in the association between cardiorespiratory fitness and the DNA methylome. Integrative multi-omic analysis identified sex-specific mitochondrial metabolism pathways associated with exercise responses. Lastly, exercise training and cardiorespiratory fitness shifted the DNA methylomes to be more similar between the sexes. CONCLUSIONS We identified sex differences in protein expression changes, but not DNA methylation changes, following an endurance exercise training intervention; whereas we identified no sex differences in the DNA methylome or proteome response to lifelong training. Given the delicate interaction between sex and training as well as the limitations of the current study, more studies are required to elucidate whether there is a sex-specific training effect on the DNA methylome. We found that genes involved in mitochondrial metabolism pathways are differentially modulated between the sexes following endurance exercise training. These results shed light on sex differences in molecular adaptations to exercise training in skeletal muscle.
Collapse
Affiliation(s)
- Shanie Landen
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, Australia
- Centre for Endocrinology and Metabolism, Hudson Institute of Medical Research, Melbourne, VIC, Australia
| | - Macsue Jacques
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, Australia
| | - Danielle Hiam
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, Australia
- Institute for Physical Activity and Nutrition, School of Exercise and Nutrition Sciences, Deakin University, Geelong, Australia
| | | | - Ralf B Schittenhelm
- Monash Proteomics and Metabolomics Facility, Monash University, Melbourne, Australia
| | - Anup D Shah
- Monash Proteomics and Metabolomics Facility, Monash University, Melbourne, Australia
| | - Cheng Huang
- Monash Proteomics and Metabolomics Facility, Monash University, Melbourne, Australia
| | - Joel R Steele
- Monash Proteomics and Metabolomics Facility, Monash University, Melbourne, Australia
| | - Nicholas R Harvey
- Faculty of Health Sciences and Medicine, Bond University, Gold Coast, QLD, 4226, Australia
- Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Queensland University of Technology (QUT), 60 Musk Ave., Kelvin Grove, QLD, 4059, Australia
| | - Larisa M Haupt
- Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Queensland University of Technology (QUT), 60 Musk Ave., Kelvin Grove, QLD, 4059, Australia
| | - Lyn R Griffiths
- Centre for Genomics and Personalised Health, Genomics Research Centre, School of Biomedical Sciences, Queensland University of Technology (QUT), 60 Musk Ave., Kelvin Grove, QLD, 4059, Australia
| | - Kevin J Ashton
- Faculty of Health Sciences and Medicine, Bond University, Gold Coast, QLD, 4226, Australia
| | - Séverine Lamon
- Institute for Physical Activity and Nutrition, School of Exercise and Nutrition Sciences, Deakin University, Geelong, Australia
| | - Sarah Voisin
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, Australia
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Nir Eynon
- Institute for Health and Sport (iHeS), Victoria University, Melbourne, Australia.
- Australian Regenerative Medicine Institute (ARMI), Monash University, Clayton, VIC, 3800, Australia.
| |
Collapse
|
13
|
Fransquet PD, Macdonald JA, Ryan J, Greenwood CJ, Olsson CA. Exploring perinatal biopsychosocial factors and epigenetic age in 1-year-old offspring. Epigenomics 2023; 15:927-939. [PMID: 37905426 DOI: 10.2217/epi-2023-0284] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2023] Open
Abstract
Background: Little is known about the determinants of epigenetic aging in pediatric populations. Methods: Epigenetic age was estimated from 258 1-year-olds, using pediatric buccal epigenetic and Horvath clocks. We explored associations between epigenetic age and maternal indicators of mental and relational health, substance use and general physical health assessed during trimester three. Results: Higher anxiety and stress, BMI and higher parent-parent relationship quality were associated with pediatric buccal epigenetic clock differences. High blood pressure during pregnancy was associated with Horvath age acceleration. Third-trimester smoking and pre-pregnancy weight were associated with acceleration and deceleration respectively, and concordant across clocks. Conclusion: A broad range of maternal factors may shape epigenetic age in infancy; further research is needed to explore the possible effects on health and development.
Collapse
Affiliation(s)
- Peter D Fransquet
- Deakin University, Centre for Social & Early Emotional Development, School of Psychology, Faculty of Health, Geelong, Victoria, Australia
| | - Jacqui A Macdonald
- Deakin University, Centre for Social & Early Emotional Development, School of Psychology, Faculty of Health, Geelong, Victoria, Australia
- Murdoch Children's Research Institute, Population Studies of Adolescents, The Royal Children's Hospital Melbourne, Parkville, Victoria, Australia
- The University of Melbourne, Department of Paediatrics, The Royal Children's Hospital Melbourne, Parkville, Victoria, Australia
| | - Joanne Ryan
- School of Public Health & Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Christopher J Greenwood
- Deakin University, Centre for Social & Early Emotional Development, School of Psychology, Faculty of Health, Geelong, Victoria, Australia
- Murdoch Children's Research Institute, Population Studies of Adolescents, The Royal Children's Hospital Melbourne, Parkville, Victoria, Australia
- The University of Melbourne, Department of Paediatrics, The Royal Children's Hospital Melbourne, Parkville, Victoria, Australia
| | - Craig A Olsson
- Deakin University, Centre for Social & Early Emotional Development, School of Psychology, Faculty of Health, Geelong, Victoria, Australia
- Murdoch Children's Research Institute, Population Studies of Adolescents, The Royal Children's Hospital Melbourne, Parkville, Victoria, Australia
- The University of Melbourne, Department of Paediatrics, The Royal Children's Hospital Melbourne, Parkville, Victoria, Australia
| |
Collapse
|
14
|
Nishitani S, Fujisawa TX, Yao A, Takiguchi S, Tomoda A. Evaluation of the pooled sample method in Infinium MethylationEPIC BeadChip array by comparison with individual samples. Clin Epigenetics 2023; 15:138. [PMID: 37641110 PMCID: PMC10463626 DOI: 10.1186/s13148-023-01544-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 07/29/2023] [Indexed: 08/31/2023] Open
Abstract
BACKGROUND The pooled sample method is used in epigenomic research and expression analysis and is a cost-effective screening approach for small amounts of DNA. Evaluation of the pooled sample method in epigenomic studies is performed using the Illumina Infinium Methylation 450K BeadChip array; however, subsequent reports on the updated 850K array are lacking. A previous study demonstrated that the methylation levels obtained from individual samples were accurately replicated using pooled samples but did not address epigenome-wide association study (EWAS) statistics. The DNA quantification method, which is important for the homogeneous mixing of DNA in the pooled sample method, has since become fluorescence-based, and additional factors need to be considered including the resolution of batch effects of microarray chips and the heterogeneity of the cellular proportions from which the DNA samples are derived. In this study, four pooled samples were created from 44 individual samples, and EWAS statistics for differentially methylated positions (DMPs) and regions (DMRs) were conducted for individual samples and compared with the statistics obtained from the pooled samples. RESULTS The methylation levels could be reproduced fairly well in the pooled samples. This was the case for the entire dataset and when limited to the top 100 CpG sites, consistent with a previous study using the 450K BeadChip array. However, the statistical results of the EWAS for the DMP by individual samples were not replicated in pooled samples. Qualitative analyses highlighting methylation within an arbitrary candidate gene were replicable. Focusing on chr 20, the statistical results of EWAS for DMR from individual samples showed replicability in the pooled samples as long as they were limited to regions with a sufficient effect size. CONCLUSIONS The pooled sample method replicated the methylation values well and can be used for EWAS in DMR. This method is sample amount-effective and cost-effective and can be utilized for screening by carefully understanding the effective features and disadvantages of the pooled sample method and combining it with candidate gene analyses.
Collapse
Affiliation(s)
- Shota Nishitani
- Research Center for Child Mental Development, University of Fukui, Fukui, Japan.
- Division of Developmental Higher Brain Functions, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University, University of Fukui, Osaka, Japan.
- Life Science Innovation Center, University of Fukui, Fukui, Japan.
| | - Takashi X Fujisawa
- Research Center for Child Mental Development, University of Fukui, Fukui, Japan
- Division of Developmental Higher Brain Functions, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University, University of Fukui, Osaka, Japan
- Life Science Innovation Center, University of Fukui, Fukui, Japan
| | - Akiko Yao
- Research Center for Child Mental Development, University of Fukui, Fukui, Japan
| | - Shinichiro Takiguchi
- Division of Developmental Higher Brain Functions, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University, University of Fukui, Osaka, Japan
- Life Science Innovation Center, University of Fukui, Fukui, Japan
- Department of Child and Adolescent Psychological Medicine, University of Fukui Hospital, Fukui, Japan
| | - Akemi Tomoda
- Research Center for Child Mental Development, University of Fukui, Fukui, Japan
- Division of Developmental Higher Brain Functions, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University, University of Fukui, Osaka, Japan
- Life Science Innovation Center, University of Fukui, Fukui, Japan
- Department of Child and Adolescent Psychological Medicine, University of Fukui Hospital, Fukui, Japan
| |
Collapse
|
15
|
Ye H, Zhang X, Wang C, Goode EL, Chen J. Batch-effect correction with sample remeasurement in highly confounded case-control studies. NATURE COMPUTATIONAL SCIENCE 2023; 3:709-719. [PMID: 38177326 PMCID: PMC10993308 DOI: 10.1038/s43588-023-00500-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2022] [Accepted: 07/11/2023] [Indexed: 01/06/2024]
Abstract
Batch effects are pervasive in biomedical studies. One approach to address the batch effects is repeatedly measuring a subset of samples in each batch. These remeasured samples are used to estimate and correct the batch effects. However, rigorous statistical methods for batch-effect correction with remeasured samples are severely underdeveloped. Here we developed a framework for batch-effect correction using remeasured samples in highly confounded case-control studies. We provided theoretical analyses of the proposed procedure, evaluated its power characteristics and provided a power calculation tool to aid in the study design. We found that the number of samples that need to be remeasured depends strongly on the between-batch correlation. When the correlation is high, remeasuring a small subset of samples is possible to rescue most of the power.
Collapse
Affiliation(s)
- Hanxuan Ye
- Department of Statistics, Texas A&M University, College Station, TX, USA
| | - Xianyang Zhang
- Department of Statistics, Texas A&M University, College Station, TX, USA.
| | - Chen Wang
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Ellen L Goode
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA
| | - Jun Chen
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
16
|
Fajarda O, Almeida JR, Duarte-Pereira S, Silva RM, Oliveira JL. Methodology to identify a gene expression signature by merging microarray datasets. Comput Biol Med 2023; 159:106867. [PMID: 37060770 DOI: 10.1016/j.compbiomed.2023.106867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 03/01/2023] [Accepted: 03/30/2023] [Indexed: 04/17/2023]
Abstract
A vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.
Collapse
Affiliation(s)
- Olga Fajarda
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal.
| | - João Rafael Almeida
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Computation, University of A Coruña, A Coruña, Spain.
| | - Sara Duarte-Pereira
- DETI/IEETA, LASI, University of Aveiro, Aveiro, Portugal; Department of Medical Sciences and iBiMED-Institute of Biomedicine, University of Aveiro, Aveiro, Portugal.
| | - Raquel M Silva
- Universidade Católica Portuguesa, Faculty of Dental Medicine (FMD), Center for Interdisciplinary Research in Health (CIIS), Viseu, Portugal.
| | | |
Collapse
|
17
|
Perini S, Filosi M, Domenici E. Candidate biomarkers from the integration of methylation and gene expression in discordant autistic sibling pairs. Transl Psychiatry 2023; 13:109. [PMID: 37012247 PMCID: PMC10070641 DOI: 10.1038/s41398-023-02407-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 03/18/2023] [Accepted: 03/21/2023] [Indexed: 04/05/2023] Open
Abstract
While the genetics of autism spectrum disorders (ASD) has been intensively studied, resulting in the identification of over 100 putative risk genes, the epigenetics of ASD has received less attention, and results have been inconsistent across studies. We aimed to investigate the contribution of DNA methylation (DNAm) to the risk of ASD and identify candidate biomarkers arising from the interaction of epigenetic mechanisms with genotype, gene expression, and cellular proportions. We performed DNAm differential analysis using whole blood samples from 75 discordant sibling pairs of the Italian Autism Network collection and estimated their cellular composition. We studied the correlation between DNAm and gene expression accounting for the potential effects of different genotypes on DNAm. We showed that the proportion of NK cells was significantly reduced in ASD siblings suggesting an imbalance in their immune system. We identified differentially methylated regions (DMRs) involved in neurogenesis and synaptic organization. Among candidate loci for ASD, we detected a DMR mapping to CLEC11A (neighboring SHANK1) where DNAm and gene expression were significantly and negatively correlated, independently from genotype effects. As reported in previous studies, we confirmed the involvement of immune functions in the pathophysiology of ASD. Notwithstanding the complexity of the disorder, suitable biomarkers such as CLEC11A and its neighbor SHANK1 can be discovered using integrative analyses even with peripheral tissues.
Collapse
Affiliation(s)
- Samuel Perini
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento (TN), Italy
| | - Michele Filosi
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento (TN), Italy
- EURAC Research, Bolzano, Italy
| | - Enrico Domenici
- Department of Cellular, Computational and Integrative Biology (CIBIO), University of Trento, Trento (TN), Italy.
- Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI), Rovereto (TN), Italy.
| |
Collapse
|
18
|
Yosef A, Shnaider E, Schneider M, Gurevich M. Heuristic normalization procedure for batch effect correction. Soft comput 2023. [DOI: 10.1007/s00500-023-08049-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/31/2023]
|
19
|
Brockway HM, Wilson SL, Kallapur SG, Buhimschi CS, Muglia LJ, Jones HN. Characterization of methylation profiles in spontaneous preterm birth placental villous tissue. PLoS One 2023; 18:e0279991. [PMID: 36952446 PMCID: PMC10035933 DOI: 10.1371/journal.pone.0279991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Indexed: 03/25/2023] Open
Abstract
Preterm birth is a global public health crisis which results in significant neonatal and maternal mortality. Yet little is known regarding the molecular mechanisms of idiopathic spontaneous preterm birth, and we have few diagnostic markers for adequate assessment of placental development and function. Previous studies of placental pathology and our transcriptomics studies suggest a role for placental maturity in idiopathic spontaneous preterm birth. It is known that placental DNA methylation changes over gestation. We hypothesized that if placental hypermaturity is present in our samples, we would observe a unique idiopathic spontaneous preterm birth DNA methylation profile potentially driving the gene expression differences we previously identified in our placental samples. Our results indicate the idiopathic spontaneous preterm birth DNA methylation pattern mimics the term birth methylation pattern suggesting hypermaturity. Only seven significant differentially methylated regions fitting the idiopathic spontaneous preterm birth specific (relative to the controls) profile were identified, indicating unusually high similarity in DNA methylation between idiopathic spontaneous preterm birth and term birth samples. We identified an additional 1,718 significantly methylated regions in our gestational age matched controls where the idiopathic spontaneous preterm birth DNA methylation pattern mimics the term birth methylation pattern, again indicating a striking level of similarity between the idiopathic spontaneous preterm birth and term birth samples. Pathway analysis of these regions revealed differences in genes within the WNT and Cadherin signaling pathways, both of which are essential in placental development and maturation. Taken together, these data demonstrate that the idiopathic spontaneous preterm birth samples display a hypermature methylation signature than expected given their respective gestational age which likely impacts birth timing.
Collapse
Affiliation(s)
- Heather M. Brockway
- Department of Physiology and Functional Genomics, College of Medicine at the University of Florida, Gainesville, Florida, United States of America
| | - Samantha L. Wilson
- Princess Margaret Cancer Centre, University Health Network, University of Toronto, Toronto, Ontario, Canada
| | - Suhas G. Kallapur
- Divisions of Neonatology and Developmental Biology, David Geffen School of Medicine at the University of California, UCLA Mattel Children’s Hospital, Los Angeles, California, United States of America
| | - Catalin S. Buhimschi
- Department of Obstetrics and Gynecology, The University of Illinois College of Medicine, Chicago, Illinois, United States of America
| | - Louis J. Muglia
- Burroughs Wellcome Fund, Research Triangle Park, North Carolina, United States of America
| | - Helen N. Jones
- Department of Physiology and Functional Genomics, College of Medicine at the University of Florida, Gainesville, Florida, United States of America
| |
Collapse
|
20
|
Gim JA. Integrative Approaches of DNA Methylation Patterns According to Age, Sex and Longitudinal Changes. Curr Genomics 2023; 23:385-399. [PMID: 37920553 PMCID: PMC10173416 DOI: 10.2174/1389202924666221207100513] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 10/04/2022] [Accepted: 11/04/2022] [Indexed: 12/12/2022] Open
Abstract
Background In humans, age-related DNA methylation has been studied in blood, tissues, buccal swabs, and fibroblasts, and changes in DNA methylation patterns according to age and sex have been detected. To date, approximately 137,000 samples have been analyzed from 14,000 studies, and the information has been uploaded to the NCBI GEO database. Methods A correlation between age and methylation level and longitudinal changes in methylation levels was revealed in both sexes. Here, 20 public datasets derived from whole blood were analyzed using the Illumina BeadChip. Batch effects with respect to the time differences were correlated. The overall change in the pattern was provided as the inverse of the coefficient of variation (COV). Results Of the 20 datasets, nine were from a longitudinal study. All data had age and sex as common variables. Comprehensive details of age-, sex-, and longitudinal change-based DNA methylation levels in the whole blood sample were elucidated in this study. ELOVL2 and FHL2 showed the maximum correlation between age and DNA methylation. The methylation patterns of genes related to mental health differed according to age. Age-correlated genes have been associated with malformations (anteverted nostril, craniofacial abnormalities, and depressed nasal bridge) and drug addiction (drug habituation and smoking). Conclusion Based on 20 public DNA methylation datasets, methylation levels according to age and longitudinal changes by sex were identified and visualized using an integrated approach. The results highlight the molecular mechanisms underlying the association of sex and biological age with changes in DNA methylation, and the importance of optimal genomic information management.
Collapse
Affiliation(s)
- Jeong-An Gim
- Medical Science Research Center, College of Medicine, Korea University Guro Hospital, Seoul 08308, Republic of Korea
| |
Collapse
|
21
|
Louise J, Deussen AR, Dodd JM. Data processing choices can affect findings in differential methylation analyses: an investigation using data from the LIMIT RCT. PeerJ 2023; 11:e14786. [PMID: 36755865 PMCID: PMC9901304 DOI: 10.7717/peerj.14786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 01/03/2023] [Indexed: 02/05/2023] Open
Abstract
Objective A wide array of methods exist for processing and analysing DNA methylation data. We aimed to perform a systematic comparison of the behaviour of these methods, using cord blood DNAm from the LIMIT RCT, in relation to detecting hypothesised effects of interest (intervention and pre-pregnancy maternal BMI) as well as effects known to be spurious, and known to be present. Methods DNAm data, from 645 cord blood samples analysed using Illumina 450K BeadChip arrays, were normalised using three different methods (with probe filtering undertaken pre- or post- normalisation). Batch effects were handled with a supervised algorithm, an unsupervised algorithm, or adjustment in the analysis model. Analysis was undertaken with and without adjustment for estimated cell type proportions. The effects estimated included intervention and BMI (effects of interest in the original study), infant sex and randomly assigned groups. Data processing and analysis methods were compared in relation to number and identity of differentially methylated probes, rankings of probes by p value and log-fold-change, and distributions of p values and log-fold-change estimates. Results There were differences corresponding to each of the processing and analysis choices. Importantly, some combinations of data processing choices resulted in a substantial number of spurious 'significant' findings. We recommend greater emphasis on replication and greater use of sensitivity analyses.
Collapse
Affiliation(s)
- Jennie Louise
- Discipline of Obstetrics & Gynaecology and The Robinson Research Institute, The University of Adelaide, Adelaide, Australia,Adelaide Health Technology Asseessment, The University of Adelaide, Adelaide, Australia
| | - Andrea R. Deussen
- Discipline of Obstetrics & Gynaecology and The Robinson Research Institute, The University of Adelaide, Adelaide, Australia
| | - Jodie M. Dodd
- Discipline of Obstetrics & Gynaecology and The Robinson Research Institute, The University of Adelaide, Adelaide, Australia,Department of Perinatal Medicine, Women’s and Babies Division, The Women’s and Children’s Hospital, Adelaide, South Australia, Australia
| |
Collapse
|
22
|
Inkster AM, Wong MT, Matthews AM, Brown CJ, Robinson WP. Who's afraid of the X? Incorporating the X and Y chromosomes into the analysis of DNA methylation array data. Epigenetics Chromatin 2023; 16:1. [PMID: 36609459 PMCID: PMC9825011 DOI: 10.1186/s13072-022-00477-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 12/27/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Many human disease phenotypes manifest differently by sex, making the development of methods for incorporating X and Y-chromosome data into analyses vital. Unfortunately, X and Y chromosome data are frequently excluded from large-scale analyses of the human genome and epigenome due to analytical complexity associated with sex chromosome dosage differences between XX and XY individuals, and the impact of X-chromosome inactivation (XCI) on the epigenome. As such, little attention has been given to considering the methods by which sex chromosome data may be included in analyses of DNA methylation (DNAme) array data. RESULTS With Illumina Infinium HumanMethylation450 DNAme array data from 634 placental samples, we investigated the effects of probe filtering, normalization, and batch correction on DNAme data from the X and Y chromosomes. Processing steps were evaluated in both mixed-sex and sex-stratified subsets of the analysis cohort to identify whether including both sexes impacted processing results. We found that identification of probes that have a high detection p-value, or that are non-variable, should be performed in sex-stratified data subsets to avoid over- and under-estimation of the quantity of probes eligible for removal, respectively. All normalization techniques investigated returned X and Y DNAme data that were highly correlated with the raw data from the same samples. We found no difference in batch correction results after application to mixed-sex or sex-stratified cohorts. Additionally, we identify two analytical methods suitable for XY chromosome data, the choice between which should be guided by the research question of interest, and we performed a proof-of-concept analysis studying differential DNAme on the X and Y chromosome in the context of placental acute chorioamnionitis. Finally, we provide an annotation of probe types that may be desirable to filter in X and Y chromosome analyses, including probes in repetitive elements, the X-transposed region, and cancer-testis gene promoters. CONCLUSION While there may be no single "best" approach for analyzing DNAme array data from the X and Y chromosome, analysts must consider key factors during processing and analysis of sex chromosome data to accommodate the underlying biology of these chromosomes, and the technical limitations of DNA methylation arrays.
Collapse
Affiliation(s)
- Amy M Inkster
- BC Children's Hospital Research Institute, 950 W 28th Ave, Vancouver, BC, V6H 3N1, Canada.
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1, Canada.
| | - Martin T Wong
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1, Canada
| | - Allison M Matthews
- BC Children's Hospital Research Institute, 950 W 28th Ave, Vancouver, BC, V6H 3N1, Canada
- Department of Pathology & Laboratory Medicine, University of British Columbia, 2211 Wesbrook Mall, Vancouver, V6T 1Z7, Canada
| | - Carolyn J Brown
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1, Canada
| | - Wendy P Robinson
- BC Children's Hospital Research Institute, 950 W 28th Ave, Vancouver, BC, V6H 3N1, Canada
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1, Canada
| |
Collapse
|
23
|
Yosef A, Shnaider E, Schneider M, Gurevich M. Normalization of Large-Scale Transcriptome Data Using Heuristic Methods. Bioinform Biol Insights 2023; 17:11779322231160397. [PMID: 37020503 PMCID: PMC10068970 DOI: 10.1177/11779322231160397] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 02/09/2023] [Indexed: 04/03/2023] Open
Abstract
In this study, we introduce an artificial intelligent method for addressing the batch effect of a transcriptome data. The method has several clear advantages in comparison with the alternative methods presently in use. Batch effect refers to the discrepancy in gene expression data series, measured under different conditions. While the data from the same batch (measurements performed under the same conditions) are compatible, combining various batches into 1 data set is problematic because of incompatible measurements. Therefore, it is necessary to perform correction of the combined data (normalization), before performing biological analysis. There are numerous methods attempting to correct data set for batch effect. These methods rely on various assumptions regarding the distribution of the measurements. Forcing the data elements into pre-supposed distribution can severely distort biological signals, thus leading to incorrect results and conclusions. As the discrepancy between the assumptions regarding the data distribution and the actual distribution is wider, the biases introduced by such “correction methods” are greater. We introduce a heuristic method to reduce batch effect. The method does not rely on any assumptions regarding the distribution and the behavior of data elements. Hence, it does not introduce any new biases in the process of correcting the batch effect. It strictly maintains the integrity of measurements within the original batches.
Collapse
|
24
|
Abstract
Applying computational statistics or machine learning methods to data is a key component of many scientific studies, in any field, but alone might not be sufficient to generate robust and reliable outcomes and results. Before applying any discovery method, preprocessing steps are necessary to prepare the data to the computational analysis. In this framework, data cleaning and feature engineering are key pillars of any scientific study involving data analysis and that should be adequately designed and performed since the first phases of the project. We call "feature" a variable describing a particular trait of a person or an observation, recorded usually as a column in a dataset. Even if pivotal, these data cleaning and feature engineering steps sometimes are done poorly or inefficiently, especially by beginners and unexperienced researchers. For this reason, we propose here our quick tips for data cleaning and feature engineering on how to carry out these important preprocessing steps correctly avoiding common mistakes and pitfalls. Although we designed these guidelines with bioinformatics and health informatics scenarios in mind, we believe they can more in general be applied to any scientific area. We therefore target these guidelines to any researcher or practitioners wanting to perform data cleaning or feature engineering. We believe our simple recommendations can help researchers and scholars perform better computational analyses that can lead, in turn, to more solid outcomes and more reliable discoveries.
Collapse
Affiliation(s)
- Davide Chicco
- Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada
- * E-mail:
| | - Luca Oneto
- Dipartimento di Informatica Bioingegneria Robotica e Ingegneria dei Sistemi, Università di Genova, Genoa, Italy
- ZenaByte S.r.l., Genoa, Italy
| | - Erica Tavazzi
- Dipartimento di Ingegneria dell’Informazione, Università di Padova, Padua, Italy
| |
Collapse
|
25
|
Kalyakulina A, Yusipov I, Bacalini MG, Franceschi C, Vedunova M, Ivanchenko M. Disease classification for whole-blood DNA methylation: Meta-analysis, missing values imputation, and XAI. Gigascience 2022; 11:giac097. [PMID: 36259657 PMCID: PMC9718659 DOI: 10.1093/gigascience/giac097] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2022] [Revised: 08/01/2022] [Accepted: 09/15/2022] [Indexed: 07/25/2023] Open
Abstract
BACKGROUND DNA methylation has a significant effect on gene expression and can be associated with various diseases. Meta-analysis of available DNA methylation datasets requires development of a specific workflow for joint data processing. RESULTS We propose a comprehensive approach of combined DNA methylation datasets to classify controls and patients. The solution includes data harmonization, construction of machine learning classification models, dimensionality reduction of models, imputation of missing values, and explanation of model predictions by explainable artificial intelligence (XAI) algorithms. We show that harmonization can improve classification accuracy by up to 20% when preprocessing methods of the training and test datasets are different. The best accuracy results were obtained with tree ensembles, reaching above 95% for Parkinson's disease. Dimensionality reduction can substantially decrease the number of features, without detriment to the classification accuracy. The best imputation methods achieve almost the same classification accuracy for data with missing values as for the original data. XAI approaches have allowed us to explain model predictions from both populational and individual perspectives. CONCLUSIONS We propose a methodologically valid and comprehensive approach to the classification of healthy individuals and patients with various diseases based on whole-blood DNA methylation data using Parkinson's disease and schizophrenia as examples. The proposed algorithm works better for the former pathology, characterized by a complex set of symptoms. It allows to solve data harmonization problems for meta-analysis of many different datasets, impute missing values, and build classification models of small dimensionality.
Collapse
Affiliation(s)
- Alena Kalyakulina
- Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, 603022 Nizhny Novgorod, Russia
| | - Igor Yusipov
- Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, 603022 Nizhny Novgorod, Russia
| | | | - Claudio Franceschi
- Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, 603022 Nizhny Novgorod, Russia
| | - Maria Vedunova
- Institute of Biology and Biomedicine, Lobachevsky State University, 603022 Nizhny Novgorod, Russia
| | - Mikhail Ivanchenko
- Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, 603022 Nizhny Novgorod, Russia
| |
Collapse
|
26
|
Keshawarz A, Joehanes R, Guan W, Huan T, DeMeo DL, Grove ML, Fornage M, Levy D, O’Connor G. Longitudinal change in blood DNA epigenetic signature after smoking cessation. Epigenetics 2022; 17:1098-1109. [PMID: 34570667 PMCID: PMC9542417 DOI: 10.1080/15592294.2021.1985301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 08/20/2021] [Accepted: 09/21/2021] [Indexed: 12/14/2022] Open
Abstract
Cigarette smoking is associated with epigenetic changes that may be reversible following smoking cessation. Whole blood DNA methylation was evaluated in Framingham Heart Study Offspring (n = 169) and Third Generation (n = 30) cohort participants at two study visits 6 years apart and in Atherosclerosis Risk in Communities (ARIC) study (n = 222) participants at two study visits 20 years apart. Changes in DNA methylation (delta β values) at 483,565 cytosine-phosphate-guanine (CpG) sites and differentially methylated regions (DMRs) were compared between participants who were current, former, or never smokers at both visits (current-current, former-former, never-never, respectively), versus those who quit in the interim (current-former). Interim quitters had more hypermethylation at four CpGs annotated to AHRR, one CpG annotated to F2RL3, and one intergenic CpG (cg21566642) compared with current-current smokers (FDR < 0.02 for all), and two significant DMRs were identified. While there were no significant differentially methylated CpGs in the comparison of interim quitters and former-former smokers, 106 DMRs overlapping with small nucleolar RNA were identified. As compared with all non-smokers, current-current smokers additionally had more hypermethylation at two CpG sites annotated to HIVEP3 and TMEM126A, respectively, and another intergenic CpG (cg14339116). Gene transcripts associated with smoking cessation were implicated in immune responses, cell homoeostasis, and apoptosis. Smoking cessation is associated with early reversion of blood DNA methylation changes at CpG sites annotated to AHRR and F2RL3 towards those of never smokers. Associated gene expression suggests a role of longitudinal smoking-related DNA methylation changes in immune response processes.
Collapse
Affiliation(s)
- Amena Keshawarz
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Roby Joehanes
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Weihua Guan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Tianxiao Huan
- Framingham Heart Study, Framingham, MA, USA
- Department of Ophthalmology and Visual Sciences, University of Massachusetts Medical School, Worcester, MA, USA
| | - Dawn L. DeMeo
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Megan L. Grove
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Myriam Fornage
- McGovern Medical School and Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Brown Foundation Institute of Molecular Medicine, Houston, TX, USA
| | - Daniel Levy
- Population Sciences Branch, Division of Intramural Research, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, USA
| | - George O’Connor
- Pulmonary Center, Boston University School of Medicine, Boston, MA, USA
| |
Collapse
|
27
|
Chu S, Avery A, Yoshimoto J, Bryan JN. Genome wide exploration of the methylome in aggressive B-cell lymphoma in Golden Retrievers reveals a conserved hypermethylome. Epigenetics 2022; 17:2022-2038. [PMID: 35912844 PMCID: PMC9665123 DOI: 10.1080/15592294.2022.2105033] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Few recurrent DNA mutations are seen in aggressive canine B cell lymphomas (cBCL), suggesting other frequent drivers. The methylated island recovery assay (MIRA-seq) or methylated CpG-binding domain sequencing (MBD-seq) was used to define the genome-wide methylation profiles in aggressive cBCL in Golden Retrievers to determine if cBCL can be better defined by epigenetic changes than by DNA mutations. DNA hypermethylation patterns were relatively homogenous within cBCL samples in Golden Retrievers, in different breeds and in geographical regions. Aberrant hypermethylation is thus suspected to be a central and early event in cBCL lymphomagenesis. Distinct subgroups within cBCL in Golden Retrievers were not identified with DNA methylation profiles. In comparison, the methylome profile of human DLBCL (hDLBCL) is relatively heterogeneous. Only moderate similarity between hDLBCL and cBCL was seen and cBCL likely cannot be accurately classified into the subtypes seen in hDLBCL. Genes with hypermethylated regions in the promoter-TSS-first exon of cBCL compared to normal B cells often also had additional hyper- and hypomethylated regions distributed throughout the gene suggesting non-randomized repeat targeting of key genes by epigenetic mechanisms. The prevalence of hypermethylation in transcription factor families in aggressive cBCL may represent a fundamental step in lymphomagenesis.
Collapse
Affiliation(s)
- Shirley Chu
- Department of Veterinary Medicine and Surgery, University of Missouri, 900 E. Campus Drive, Columbia, MO, USA
| | - Anne Avery
- Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, CO, USA
| | - Janna Yoshimoto
- Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, CO, USA
| | - Jeffrey N Bryan
- Department of Veterinary Medicine and Surgery, University of Missouri, 900 E. Campus Drive, Columbia, MO, USA
| |
Collapse
|
28
|
Derakhshan M, Kessler NJ, Ishida M, Demetriou C, Brucato N, Moore G, Fall CHD, Chandak GR, Ricaut FX, Prentice A, Hellenthal G, Silver M. Tissue- and ethnicity-independent hypervariable DNA methylation states show evidence of establishment in the early human embryo. Nucleic Acids Res 2022; 50:6735-6752. [PMID: 35713545 PMCID: PMC9749461 DOI: 10.1093/nar/gkac503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 05/06/2022] [Accepted: 05/27/2022] [Indexed: 12/24/2022] Open
Abstract
We analysed DNA methylation data from 30 datasets comprising 3474 individuals, 19 tissues and 8 ethnicities at CpGs covered by the Illumina450K array. We identified 4143 hypervariable CpGs ('hvCpGs') with methylation in the top 5% most variable sites across multiple tissues and ethnicities. hvCpG methylation was influenced but not determined by genetic variation, and was not linked to probe reliability, epigenetic drift, age, sex or cell heterogeneity effects. hvCpG methylation tended to covary across tissues derived from different germ-layers and hvCpGs were enriched for proximity to ERV1 and ERVK retrovirus elements. hvCpGs were also enriched for loci previously associated with periconceptional environment, parent-of-origin-specific methylation, and distinctive methylation signatures in monozygotic twins. Together, these properties position hvCpGs as strong candidates for studying how stochastic and/or environmentally influenced DNA methylation states which are established in the early embryo and maintained stably thereafter can influence life-long health and disease.
Collapse
Affiliation(s)
| | - Noah J Kessler
- Department of Genetics, University of Cambridge,
Cambridge CB2 3EH, UK
| | - Miho Ishida
- UCL Great Ormond Street Institute of Child Health, UK
| | | | - Nicolas Brucato
- Laboratoire Évolution and Diversité Biologique (EDB UMR 5174), Université
de Toulouse Midi-Pyrénées, CNRS, IRD, UPS,Toulouse, France
| | | | - Caroline H D Fall
- MRC Lifecourse Epidemiology Unit, University of Southampton,
Southampton, UK
| | - Giriraj R Chandak
- Genomic Research on Complex Diseases (GRC Group), CSIR-Centre for Cellular
and Molecular Biology,Hyderabad, India
| | - Francois-Xavier Ricaut
- Laboratoire Évolution and Diversité Biologique (EDB UMR 5174), Université
de Toulouse Midi-Pyrénées, CNRS, IRD, UPS,Toulouse, France
| | - Andrew M Prentice
- Medical Research Council Unit The Gambia at the London School of Hygiene
and Tropical Medicine, The Gambia
| | - Garrett Hellenthal
- UCL Genetics Institute, University College London,
Gower Street, London WC1E 6BT, UK
| | - Matt J Silver
- London School of Hygiene and Tropical Medicine, UK
- Medical Research Council Unit The Gambia at the London School of Hygiene
and Tropical Medicine, The Gambia
| |
Collapse
|
29
|
HarmonizR enables data harmonization across independent proteomic datasets with appropriate handling of missing values. Nat Commun 2022; 13:3523. [PMID: 35725563 PMCID: PMC9209422 DOI: 10.1038/s41467-022-31007-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 05/25/2022] [Indexed: 01/01/2023] Open
Abstract
Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Proteome datasets display high technical variability and frequent missing values. Sophisticated strategies for batch effect reduction are lacking or rely on error-prone data imputation. Here we introduce HarmonizR, a data harmonization tool with appropriate missing value handling. The method exploits the structure of available data and matrix dissection for minimal data loss, without data imputation. This strategy implements two common batch effect reduction methods—ComBat and limma (removeBatchEffect()). The HarmonizR strategy, evaluated on four exemplarily analyzed datasets with up to 23 batches, demonstrated successful data harmonization for different tissue preservation techniques, LC-MS/MS instrumentation setups, and quantification approaches. Compared to data imputation methods, HarmonizR was more efficient and performed superior regarding the detection of significant proteins. HarmonizR is an efficient tool for missing data tolerant experimental variance reduction and is easily adjustable for individual dataset properties and user preferences. Dataset integration is common practice to overcome limitations in statistically underpowered omics datasets. Here the authors present “HarmonizR”, a tool for missing data tolerant experimental variance reduction in large, integrated but independently generated datasets without data imputation, adjustable for individual dataset modalities, correction algorithm, and user preferences.
Collapse
|
30
|
Ross JP, van Dijk S, Phang M, Skilton MR, Molloy PL, Oytam Y. Batch-effect detection, correction and characterisation in Illumina HumanMethylation450 and MethylationEPIC BeadChip array data. Clin Epigenetics 2022; 14:58. [PMID: 35488315 PMCID: PMC9055778 DOI: 10.1186/s13148-022-01277-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 04/10/2022] [Indexed: 11/20/2022] Open
Abstract
Background Genomic technologies can be subject to significant batch-effects which are known to reduce experimental power and to potentially create false positive results. The Illumina Infinium Methylation BeadChip is a popular technology choice for epigenome-wide association studies (EWAS), but presently, little is known about the nature of batch-effects on these designs. Given the subtlety of biological phenotypes in many EWAS, control for batch-effects should be a consideration.
Results Using the batch-effect removal approaches in the ComBat and Harman software, we examined two in-house datasets and compared results with three large publicly available datasets, (1214 HumanMethylation450 and 1094 MethylationEPIC BeadChips in total), and find that despite various forms of preprocessing, some batch-effects persist. This residual batch-effect is associated with the day of processing, the individual glass slide and the position of the array on the slide. Consistently across all datasets, 4649 probes required high amounts of correction. To understand the impact of this set to EWAS studies, we explored the literature and found three instances where persistently batch-effect prone probes have been reported in abstracts as key sites of differential methylation. As well as batch-effect susceptible probes, we also discover a set of probes which are erroneously corrected. We provide batch-effect workflows for Infinium Methylation data and provide reference matrices of batch-effect prone and erroneously corrected features across the five datasets spanning regionally diverse populations and three commonly collected biosamples (blood, buccal and saliva). Conclusions Batch-effects are ever present, even in high-quality data, and a strategy to deal with them should be part of experimental design, particularly for EWAS. Batch-effect removal tools are useful to reduce technical variance in Infinium Methylation data, but they need to be applied with care and make use of post hoc diagnostic measures. Supplementary Information The online version contains supplementary material available at 10.1186/s13148-022-01277-9.
Collapse
Affiliation(s)
- Jason P Ross
- Human Health Program, Health and Biosecurity, CSIRO, Sydney, Australia.
| | - Susan van Dijk
- Human Health Program, Health and Biosecurity, CSIRO, Sydney, Australia
| | - Melinda Phang
- Charles Perkins Centre, The University of Sydney, Sydney, Australia
| | - Michael R Skilton
- Charles Perkins Centre, The University of Sydney, Sydney, Australia.,Sydney Medical School, The University of Sydney, Sydney, Australia.,Sydney Institute for Women, Children and Their Families, Sydney Local Health District, Sydney, Australia
| | - Peter L Molloy
- Human Health Program, Health and Biosecurity, CSIRO, Sydney, Australia
| | - Yalchin Oytam
- Clinical Insights and Analytics Unit, South Eastern Sydney Local Health District, Sydney, Australia
| |
Collapse
|
31
|
Dayon L, Cominetti O, Affolter M. Proteomics of Human Biological Fluids for Biomarker Discoveries: Technical Advances and Recent Applications. Expert Rev Proteomics 2022; 19:131-151. [PMID: 35466824 DOI: 10.1080/14789450.2022.2070477] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
INTRODUCTION Biological fluids are routine samples for diagnostic testing and monitoring. Blood samples are typically measured because of their moderate collection invasiveness and high information content on health and disease. Several body fluids, such as cerebrospinal fluid (CSF), are also studied and suited to specific pathologies. Over the last two decades proteomics has quested to identify protein biomarkers but with limited success. Recent technologies and refined pipelines have accelerated the profiling of human biological fluids. AREAS COVERED We review proteomic technologies for the identification of biomarkers. Those are based on antibodies/aptamers arrays or mass spectrometry (MS), but new ones are emerging. Advances in scalability and throughput have allowed to better design studies and cope with the limited sample size that had until now prevailed due to technological constraints. With these enablers, plasma/serum, CSF, saliva, tears, urine, and milk proteomes have been further profiled; we provide a non-exhaustive picture of some recent highlights (mainly covering literature from last five years in the Scopus database) using MS-based proteomics. EXPERT OPINION While proteomics has been in the shadow of genomics for years, proteomic tools and methodologies have reached a certain maturity. They are better suited to discover innovative and robust biofluid biomarkers.
Collapse
Affiliation(s)
- Loïc Dayon
- Proteomics, Nestlé Institute of Food Safety & Analytical Sciences, Nestlé Research, CH-1015 Lausanne, Switzerland.,Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
| | - Ornella Cominetti
- Proteomics, Nestlé Institute of Food Safety & Analytical Sciences, Nestlé Research, CH-1015 Lausanne, Switzerland
| | - Michael Affolter
- Proteomics, Nestlé Institute of Food Safety & Analytical Sciences, Nestlé Research, CH-1015 Lausanne, Switzerland
| |
Collapse
|
32
|
Cao X, Li W, Wang T, Ran D, Davalos V, Planas-Serra L, Pujol A, Esteller M, Wang X, Yu H. Accelerated biological aging in COVID-19 patients. Nat Commun 2022; 13:2135. [PMID: 35440567 PMCID: PMC9018863 DOI: 10.1038/s41467-022-29801-8] [Citation(s) in RCA: 82] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 03/30/2022] [Indexed: 01/01/2023] Open
Abstract
Chronological age is a risk factor for SARS-CoV-2 infection and severe COVID-19. Previous findings indicate that epigenetic age could be altered in viral infection. However, the epigenetic aging in COVID-19 has not been well studied. In this study, DNA methylation of the blood samples from 232 healthy individuals and 413 COVID-19 patients is profiled using EPIC methylation array. Epigenetic ages of each individual are determined by applying epigenetic clocks and telomere length estimator to the methylation profile of the individual. Epigenetic age acceleration is calculated and compared between groups. We observe strong correlations between the epigenetic clocks and individual's chronological age (r > 0.8, p < 0.0001). We also find the increasing acceleration of epigenetic aging and telomere attrition in the sequential blood samples from healthy individuals and infected patients developing non-severe and severe COVID-19. In addition, the longitudinal DNA methylation profiling analysis find that the accumulation of epigenetic aging from COVID-19 syndrome could be partly reversed at late clinic phases in some patients. In conclusion, accelerated epigenetic aging is associated with the risk of SARS-CoV-2 infection and developing severe COVID-19. In addition, the accumulation of epigenetic aging from COVID-19 may contribute to the post-COVID-19 syndrome among survivors.
Collapse
Affiliation(s)
- Xue Cao
- Department of Oncology, Guizhou Provincial People's Hospital, Guiyang, Guizhou, China.,Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Disease, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China.,Department of Colorectal Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Wenjuan Li
- Department of Pulmonary and Critical Care Medicine, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Ting Wang
- Research & Development, Thermo Fisher Scientific Inc., Los Angeles, CA, USA
| | - Dongzhi Ran
- Department of Pharmacology, College of Medicine, University of Arizona, Tucson, AZ, USA.,Key Laboratory of Biochemistry and Molecular Pharmacology, Department of Pharmacology, Chongqing Medical University, Chongqing, China
| | - Veronica Davalos
- Josep Carreras Leukaemia Research Institute (IJC), Barcelona, Catalonia, Spain
| | - Laura Planas-Serra
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Catalonia, Spain.,Center for Biomedical Research on Rare Diseases (CIBERER), ISCIII, Madrid, Spain
| | - Aurora Pujol
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Barcelona, Catalonia, Spain.,Center for Biomedical Research on Rare Diseases (CIBERER), ISCIII, Madrid, Spain.,Institucio Catalana de Recerca i Estudis Avancats (ICREA), Barcelona, Catalonia, Spain
| | - Manel Esteller
- Josep Carreras Leukaemia Research Institute (IJC), Barcelona, Catalonia, Spain.,Institucio Catalana de Recerca i Estudis Avancats (ICREA), Barcelona, Catalonia, Spain.,Centro de Investigación Biomédica en Red de Cancer (CIBERONC), Madrid, Spain.,Physiological Sciences Department, School of Medicine and Health Sciences, University of Barcelona (UB), Barcelona, Catalonia, Spain
| | - Xiaolin Wang
- Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Disease, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Huichuan Yu
- Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Disease, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China. .,Department of Colorectal Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China.
| |
Collapse
|
33
|
Noble AJ, Purcell RV, Adams AT, Lam YK, Ring PM, Anderson JR, Osborne AJ. A Final Frontier in Environment-Genome Interactions? Integrated, Multi-Omic Approaches to Predictions of Non-Communicable Disease Risk. Front Genet 2022; 13:831866. [PMID: 35211161 PMCID: PMC8861380 DOI: 10.3389/fgene.2022.831866] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Accepted: 01/19/2022] [Indexed: 12/26/2022] Open
Abstract
Epidemiological and associative research from humans and animals identifies correlations between the environment and health impacts. The environment-health inter-relationship is effected through an individual's underlying genetic variation and mediated by mechanisms that include the changes to gene regulation that are associated with the diversity of phenotypes we exhibit. However, the causal relationships have yet to be established, in part because the associations are reduced to individual interactions and the combinatorial effects are rarely studied. This problem is exacerbated by the fact that our genomes are highly dynamic; they integrate information across multiple levels (from linear sequence, to structural organisation, to temporal variation) each of which is open to and responds to environmental influence. To unravel the complexities of the genomic basis of human disease, and in particular non-communicable diseases that are also influenced by the environment (e.g., obesity, type II diabetes, cancer, multiple sclerosis, some neurodegenerative diseases, inflammatory bowel disease, rheumatoid arthritis) it is imperative that we fully integrate multiple layers of genomic data. Here we review current progress in integrated genomic data analysis, and discuss cases where data integration would lead to significant advances in our ability to predict how the environment may impact on our health. We also outline limitations which should form the basis of future research questions. In so doing, this review will lay the foundations for future research into the impact of the environment on our health.
Collapse
Affiliation(s)
- Alexandra J. Noble
- Translational Gastroenterology Unit, Nuffield Department of Experimental Medicine, University of Oxford, Oxford, United Kingdom
| | - Rachel V. Purcell
- Department of Surgery, University of Otago Christchurch, Christchurch, New Zealand
| | - Alex T. Adams
- Translational Gastroenterology Unit, Nuffield Department of Experimental Medicine, University of Oxford, Oxford, United Kingdom
| | - Ying K. Lam
- Translational Gastroenterology Unit, Nuffield Department of Experimental Medicine, University of Oxford, Oxford, United Kingdom
| | - Paulina M. Ring
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Jessica R. Anderson
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Amy J. Osborne
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| |
Collapse
|
34
|
Vandenbon A. Evaluation of critical data processing steps for reliable prediction of gene co-expression from large collections of RNA-seq data. PLoS One 2022; 17:e0263344. [PMID: 35089979 PMCID: PMC8797241 DOI: 10.1371/journal.pone.0263344] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 01/16/2022] [Indexed: 11/19/2022] Open
Abstract
Motivation Gene co-expression analysis is an attractive tool for leveraging enormous amounts of public RNA-seq datasets for the prediction of gene functions and regulatory mechanisms. However, the optimal data processing steps for the accurate prediction of gene co-expression from such large datasets remain unclear. Especially the importance of batch effect correction is understudied. Results We processed RNA-seq data of 68 human and 76 mouse cell types and tissues using 50 different workflows into 7,200 genome-wide gene co-expression networks. We then conducted a systematic analysis of the factors that result in high-quality co-expression predictions, focusing on normalization, batch effect correction, and measure of correlation. We confirmed the key importance of high sample counts for high-quality predictions. However, choosing a suitable normalization approach and applying batch effect correction can further improve the quality of co-expression estimates, equivalent to a >80% and >40% increase in samples. In larger datasets, batch effect removal was equivalent to a more than doubling of the sample size. Finally, Pearson correlation appears more suitable than Spearman correlation, except for smaller datasets. Conclusion A key point for accurate prediction of gene co-expression is the collection of many samples. However, paying attention to data normalization, batch effects, and the measure of correlation can significantly improve the quality of co-expression estimates.
Collapse
Affiliation(s)
- Alexis Vandenbon
- Institute for Frontier Life and Medical Sciences, Kyoto University, Kyoto, Japan
- Institute for Liberal Arts and Sciences, Kyoto University, Kyoto, Japan
- * E-mail:
| |
Collapse
|
35
|
Fujisawa TX, Nishitani S, Makita K, Yao A, Takiguchi S, Hamamura S, Shimada K, Okazawa H, Matsuzaki H, Tomoda A. Association of Epigenetic Differences Screened in a Few Cases of Monozygotic Twins Discordant for Attention-Deficit Hyperactivity Disorder With Brain Structures. Front Neurosci 2022; 15:799761. [PMID: 35145374 PMCID: PMC8823258 DOI: 10.3389/fnins.2021.799761] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Accepted: 12/16/2021] [Indexed: 11/13/2022] Open
Abstract
The present study examined the relationship between DNA methylation differences and variations in brain structures involved in the development of attention-deficit hyperactivity disorder (ADHD). First, we used monozygotic (MZ) twins discordant (2 pairs of 4 individuals, 2 boys, mean age 12.5 years) for ADHD to identify candidate DNA methylation sites involved in the development of ADHD. Next, we tried to replicate these candidates in a case-control study (ADHD: N = 18, 15 boys, mean age 10.0 years; Controls: N = 62, 40 boys, mean age 13.9 years). Finally, we examined how methylation rates at those sites relate to the degree of local structural alterations where significant differences were observed between cases and controls. As a result, we identified 61 candidate DNA methylation sites involved in ADHD development in two pairs of discordant MZ twins, among which elevated methylation at a site in the sortilin-related Vps10p domain containing receptor 2 (SorCS2) gene was replicated in the case-control study. We also observed that the ADHD group had significantly reduced gray matter volume (GMV) in the precentral and posterior orbital gyri compared to the control group and that this volume reduction was positively associated with SorCS2 methylation. Furthermore, the reduced GMV regions in children with ADHD are involved in language processing and emotional control, while SorCS2 methylation is also negatively associated with emotional behavioral problems in children. These results indicate that SorCS2 methylation might mediate a reduced GMV in the precentral and posterior orbital gyri and therefore influence the pathology of children with ADHD.
Collapse
Affiliation(s)
- Takashi X. Fujisawa
- Research Center for Child Mental Development, University of Fukui, Fukui, Japan
- Division of Developmental Higher Brain Functions, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University, and University of Fukui, Osaka, Japan
- *Correspondence: Takashi X. Fujisawa,
| | - Shota Nishitani
- Research Center for Child Mental Development, University of Fukui, Fukui, Japan
- Division of Developmental Higher Brain Functions, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University, and University of Fukui, Osaka, Japan
| | - Kai Makita
- Research Center for Child Mental Development, University of Fukui, Fukui, Japan
| | - Akiko Yao
- Research Center for Child Mental Development, University of Fukui, Fukui, Japan
- Division of Developmental Higher Brain Functions, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University, and University of Fukui, Osaka, Japan
| | - Shinichiro Takiguchi
- Department of Child and Adolescent Psychological Medicine, University of Fukui Hospital, Fukui, Japan
| | - Shoko Hamamura
- Division of Developmental Higher Brain Functions, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University, and University of Fukui, Osaka, Japan
- Department of Child and Adolescent Psychological Medicine, University of Fukui Hospital, Fukui, Japan
| | - Koji Shimada
- Research Center for Child Mental Development, University of Fukui, Fukui, Japan
- Division of Developmental Higher Brain Functions, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University, and University of Fukui, Osaka, Japan
- Biomedical Imaging Research Center, University of Fukui, Fukui, Japan
| | - Hidehiko Okazawa
- Research Center for Child Mental Development, University of Fukui, Fukui, Japan
- Division of Developmental Higher Brain Functions, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University, and University of Fukui, Osaka, Japan
- Biomedical Imaging Research Center, University of Fukui, Fukui, Japan
| | - Hideo Matsuzaki
- Research Center for Child Mental Development, University of Fukui, Fukui, Japan
- Division of Developmental Higher Brain Functions, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University, and University of Fukui, Osaka, Japan
- Department of Child and Adolescent Psychological Medicine, University of Fukui Hospital, Fukui, Japan
| | - Akemi Tomoda
- Research Center for Child Mental Development, University of Fukui, Fukui, Japan
- Division of Developmental Higher Brain Functions, United Graduate School of Child Development, Osaka University, Kanazawa University, Hamamatsu University School of Medicine, Chiba University, and University of Fukui, Osaka, Japan
- Department of Child and Adolescent Psychological Medicine, University of Fukui Hospital, Fukui, Japan
- *Correspondence: Takashi X. Fujisawa,
| |
Collapse
|
36
|
Xia Q, Thompson JA, Koestler DC. Batch effect reduction of microarray data with dependent samples using an empirical Bayes approach (BRIDGE). Stat Appl Genet Mol Biol 2021; 20:101-119. [PMID: 34905304 PMCID: PMC9617207 DOI: 10.1515/sagmb-2021-0020] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Accepted: 10/29/2021] [Indexed: 11/15/2022]
Abstract
Batch-effects present challenges in the analysis of high-throughput molecular data and are particularly problematic in longitudinal studies when interest lies in identifying genes/features whose expression changes over time, but time is confounded with batch. While many methods to correct for batch-effects exist, most assume independence across samples; an assumption that is unlikely to hold in longitudinal microarray studies. We propose Batch effect Reduction of mIcroarray data with Dependent samples usinGEmpirical Bayes (BRIDGE), a three-step parametric empirical Bayes approach that leverages technical replicate samples profiled at multiple timepoints/batches, so-called "bridge samples", to inform batch-effect reduction/attenuation in longitudinal microarray studies. Extensive simulation studies and an analysis of a real biological data set were conducted to benchmark the performance of BRIDGE against both ComBat and longitudinalComBat. Our results demonstrate that while all methods perform well in facilitating accurate estimates of time effects, BRIDGE outperforms both ComBat and longitudinal ComBat in the removal of batch-effects in data sets with bridging samples, and perhaps as a result, was observed to have improved statistical power for detecting genes with a time effect. BRIDGE demonstrated competitive performance in batch effect reduction of confounded longitudinal microarray studies, both in simulated and a real data sets, and may serve as a useful preprocessing method for researchers conducting longitudinal microarray studies that include bridging samples.
Collapse
Affiliation(s)
- Qing Xia
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, KS 66160
| | - Jeffrey A. Thompson
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, KS 66160
| | - Devin C. Koestler
- Department of Biostatistics & Data Science, University of Kansas Medical Center, Kansas City, KS 66160
| |
Collapse
|
37
|
Campagna MP, Xavier A, Lechner-Scott J, Maltby V, Scott RJ, Butzkueven H, Jokubaitis VG, Lea RA. Epigenome-wide association studies: current knowledge, strategies and recommendations. Clin Epigenetics 2021; 13:214. [PMID: 34863305 PMCID: PMC8645110 DOI: 10.1186/s13148-021-01200-8] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 11/19/2021] [Indexed: 02/06/2023] Open
Abstract
The aetiology and pathophysiology of complex diseases are driven by the interaction between genetic and environmental factors. The variability in risk and outcomes in these diseases are incompletely explained by genetics or environmental risk factors individually. Therefore, researchers are now exploring the epigenome, a biological interface at which genetics and the environment can interact. There is a growing body of evidence supporting the role of epigenetic mechanisms in complex disease pathophysiology. Epigenome-wide association studies (EWASes) investigate the association between a phenotype and epigenetic variants, most commonly DNA methylation. The decreasing cost of measuring epigenome-wide methylation and the increasing accessibility of bioinformatic pipelines have contributed to the rise in EWASes published in recent years. Here, we review the current literature on these EWASes and provide further recommendations and strategies for successfully conducting them. We have constrained our review to studies using methylation data as this is the most studied epigenetic mechanism; microarray-based data as whole-genome bisulphite sequencing remains prohibitively expensive for most laboratories; and blood-based studies due to the non-invasiveness of peripheral blood collection and availability of archived DNA, as well as the accessibility of publicly available blood-cell-based methylation data. Further, we address multiple novel areas of EWAS analysis that have not been covered in previous reviews: (1) longitudinal study designs, (2) the chip analysis methylation pipeline (ChAMP), (3) differentially methylated region (DMR) identification paradigms, (4) methylation quantitative trait loci (methQTL) analysis, (5) methylation age analysis and (6) identifying cell-specific differential methylation from mixed cell data using statistical deconvolution.
Collapse
Affiliation(s)
- Maria Pia Campagna
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Australia
| | - Alexandre Xavier
- Centre for Information Based Medicine, Hunter Medical Research Institute, Newcastle, Australia
- School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, Australia
| | - Jeannette Lechner-Scott
- School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, Australia
- Department of Neurology, Division of Medicine, John Hunter Hospital, Newcastle, Australia
| | - Vicky Maltby
- Centre for Information Based Medicine, Hunter Medical Research Institute, Newcastle, Australia
- School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, Australia
| | - Rodney J Scott
- Centre for Information Based Medicine, Hunter Medical Research Institute, Newcastle, Australia
- School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, Australia
- Division of Molecular Medicine, New South Wales Health Pathology North, Newcastle, Australia
| | - Helmut Butzkueven
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Australia
- Department of Neurology, Alfred Health, Melbourne, Australia
| | - Vilija G Jokubaitis
- Department of Neuroscience, Central Clinical School, Monash University, Melbourne, Australia
- Department of Neurology, Alfred Health, Melbourne, Australia
| | - Rodney A Lea
- School of Biomedical Sciences and Pharmacy, University of Newcastle, Newcastle, Australia.
- Centre for Genomics and Personalised Health, School of Biomedical Sciences, Queensland University of Technology, Brisbane, Australia.
| |
Collapse
|
38
|
Zou Q, Wang X, Ren D, Hu B, Tang G, Zhang Y, Huang M, Pai RK, Buchanan DD, Win AK, Newcomb PA, Grady WM, Yu H, Luo Y. DNA methylation-based signature of CD8+ tumor-infiltrating lymphocytes enables evaluation of immune response and prognosis in colorectal cancer. J Immunother Cancer 2021; 9:jitc-2021-002671. [PMID: 34548385 PMCID: PMC8458312 DOI: 10.1136/jitc-2021-002671] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/11/2021] [Indexed: 01/12/2023] Open
Abstract
Background Tumor-infiltrating lymphocytes (TILs), especially CD8+ TILs, can be used for predicting immunotherapy responsiveness and survival outcome. However, the evaluation of CD8+ TILs currently relies on histopathological methodology with high variability. We therefore aimed to develop a DNA methylation signature for CD8+ TILs (CD8+ MeTIL) that could evaluate immune response and prognosis in colorectal cancer (CRC). Methods A CD8+ MeTIL signature score was constructed by using CD8+ T cell-specific differentially methylated positions (DMPs) that were identified from Illumina EPIC methylation arrays. Immune cells, colon epithelial cells, and two CRC cohorts (n=282 and 335) were used to develop a PCR-based assay for quantitative analysis of DNA methylation at single-base resolution (QASM) to determine CD8 + MeTIL signature score. Results Three CD8+ T cell-specific DMPs were identified to construct the CD8+ MeTIL signature score, which showed a dramatic discriminability between CD8+ T cells and other cells. The QASM assay we developed for CD8+ MeTIL markers could measure CD8+ TILs distributions in a fully quantitative, accurate, and simple manner. The CD8+ MeTIL score determined by QASM assay showed a strong association with histopathology-based CD8+ TIL counts and a gene expression-based immune marker. Furthermore, the low CD8+ MeTIL score (enriched CD8+ TILs) was associated with MSI-H tumors and predicted better survival in CRC cohorts. Conclusions This study developed a quantitative DNA methylation-based signature that was reliable to evaluate CD8+ TILs and prognosis in CRC. This approach has the potential to be a tool for investigations on CD8+ TILs and a biomarker for therapeutic approaches, including immunotherapy.
Collapse
Affiliation(s)
- Qi Zou
- Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Disease, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.,Department of Colorectal Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.,Department of Colorectal and Anal Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Xiaolin Wang
- Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Disease, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Donglin Ren
- Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Disease, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.,Department of Colorectal and Anal Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Bang Hu
- Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Disease, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.,Department of Colorectal and Anal Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Guannan Tang
- Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Disease, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yu Zhang
- Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Disease, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Meijin Huang
- Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Disease, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China.,Department of Colorectal Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Rish K Pai
- Department of laboratory Medicine and Pathology, Mayo Clinic Arizona, Scottsdale, Arizona, USA
| | - Daniel D Buchanan
- Colorectal Oncogenomics Group, Department of Clinical Pathology, The University of Melbourne, Parkville, Victoria, Australia.,University of Melbourne Centre for Cancer Research, Victorian Comprehensive Cancer Centre, Parkville, Victoria, Australia.,Genomic Medicine and Familial Cancer Centre, The Royal Melbourne Hospital, Parkville, Victoria, Australia
| | - Aung Ko Win
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Victoria, Australia
| | - Polly A Newcomb
- Department of Epidemiology, University of Washington School of Public Health, Seattle, Washington, USA.,Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
| | - William M Grady
- Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA.,Department of Medicine, University of Washington School of Medicine, Seattle, Washington, USA
| | - Huichuan Yu
- Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Disease, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yanxin Luo
- Guangdong Institute of Gastroenterology, Guangdong Provincial Key Laboratory of Colorectal and Pelvic Floor Disease, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China .,Department of Colorectal Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
39
|
Vanderlinden LA, Johnson RK, Carry PM, Dong F, DeMeo DL, Yang IV, Norris JM, Kechris K. An effective processing pipeline for harmonizing DNA methylation data from Illumina's 450K and EPIC platforms for epidemiological studies. BMC Res Notes 2021; 14:352. [PMID: 34496950 PMCID: PMC8424820 DOI: 10.1186/s13104-021-05741-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 08/16/2021] [Indexed: 02/06/2023] Open
Abstract
OBJECTIVE Illumina BeadChip arrays are commonly used to generate DNA methylation data for large epidemiological studies. Updates in technology over time create challenges for data harmonization within and between studies, many of which obtained data from the older 450K and newer EPIC platforms. The pre-processing pipeline for DNA methylation is not trivial, and influences the downstream analyses. Incorporating different platforms adds a new level of technical variability that has not yet been taken into account by recommended pipelines. Our study evaluated the performance of various tools on different versions of platform data harmonization at each step of pre-processing pipeline, including quality control (QC), normalization, batch effect adjustment, and genomic inflation. We illustrate our novel approach using 450K and EPIC data from the Diabetes Autoimmunity Study in the Young (DAISY) prospective cohort. RESULTS We found normalization and probe filtering had the biggest effect on data harmonization. Employing a meta-analysis was an effective and easily executable method for accounting for platform variability. Correcting for genomic inflation also helped with harmonization. We present guidelines for studies seeking to harmonize data from the 450K and EPIC platforms, which includes the use of technical replicates for evaluating numerous pre-processing steps, and employing a meta-analysis.
Collapse
Affiliation(s)
- Lauren A Vanderlinden
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Randi K Johnson
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Patrick M Carry
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Fran Dong
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Dawn L DeMeo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Ivana V Yang
- School of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | - Jill M Norris
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.
| |
Collapse
|
40
|
Fang ZY, Lin CX, Xu YP, Li HD, Xu QS. REBET: a method to determine the number of cell clusters based on batch effect removal. Brief Bioinform 2021; 22:6299206. [PMID: 34131702 DOI: 10.1093/bib/bbab204] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 04/20/2021] [Accepted: 03/12/2021] [Indexed: 01/01/2023] Open
Abstract
In single-cell RNA-seq (scRNA-seq) data analysis, a fundamental problem is to determine the number of cell clusters based on the gene expression profiles. However, the performance of current methods is still far from satisfactory, presumably due to their limitations in capturing the expression variability among cell clusters. Batch effects represent the undesired variability between data measured in different batches. When data are obtained from different labs or protocols batch effects occur. Motivated by the practice of batch effect removal, we considered cell clusters as batches. We hypothesized that the number of cell clusters (i.e. batches) could be correctly determined if the variances among clusters (i.e. batch effects) were removed. We developed a new method, namely, removal of batch effect and testing (REBET), for determining the number of cell clusters. In this method, cells are first partitioned into k clusters. Second, the batch effects among these k clusters are then removed. Third, the quality of batch effect removal is evaluated with the average range of normalized mutual information (ARNMI), which measures how uniformly the cells with batch-effects-removal are mixed. By testing a range of k values, the k value that corresponds to the lowest ARNMI is determined to be the optimal number of clusters. We compared REBET with state-of-the-art methods on 32 simulated datasets and 14 published scRNA-seq datasets. The results show that REBET can accurately and robustly estimate the number of cell clusters and outperform existing methods. Contact: H.D.L. (hongdong@csu.edu.cn) or Q.S.X. (qsxu@csu.edu.cn).
Collapse
Affiliation(s)
- Zhao-Yu Fang
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Cui-Xiang Lin
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China.,School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Yun-Pei Xu
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China.,School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China.,School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China
| | - Qing-Song Xu
- School of Mathematics and Statistics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
41
|
Inkster AM, Yuan V, Konwar C, Matthews AM, Brown CJ, Robinson WP. A cross-cohort analysis of autosomal DNA methylation sex differences in the term placenta. Biol Sex Differ 2021; 12:38. [PMID: 34044884 PMCID: PMC8162041 DOI: 10.1186/s13293-021-00381-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 05/17/2021] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Human placental DNA methylation (DNAme) data is a valuable resource for studying sex differences during gestation, as DNAme profiles after delivery reflect the cumulative effects of gene expression patterns and exposures across gestation. Here, we present an analysis of sex differences in autosomal DNAme in the uncomplicated term placenta (n = 343) using the Illumina 450K array. RESULTS At a false discovery rate < 0.05 and a mean sex difference in DNAme beta value of > 0.10, we identified 162 autosomal CpG sites that were differentially methylated by sex and replicated in an independent cohort of samples (n = 293). Several of these differentially methylated CpG sites were part of larger correlated regions of sex differential DNAme. Although global DNAme levels did not differ by sex, the majority of significantly differentially methylated CpGs were more highly methylated in male placentae, the opposite of what is seen in differential methylation analyses of somatic tissues. Patterns of autosomal DNAme at these 162 CpGs were significantly associated with maternal age (in males) and newborn birthweight standard deviation (in females). CONCLUSIONS Our results provide a comprehensive analysis of sex differences in autosomal DNAme in the term human placenta. We report a list of high-confidence autosomal sex-associated differentially methylated CpGs and identify several key features of these loci that suggest their relevance to sex differences observed in normative and complicated pregnancies.
Collapse
Affiliation(s)
- Amy M. Inkster
- BC Children’s Hospital Research Institute, 950 W 28th Ave, Vancouver, V6H 3N1 Canada
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1 Canada
| | - Victor Yuan
- BC Children’s Hospital Research Institute, 950 W 28th Ave, Vancouver, V6H 3N1 Canada
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1 Canada
| | - Chaini Konwar
- BC Children’s Hospital Research Institute, 950 W 28th Ave, Vancouver, V6H 3N1 Canada
- Centre for Molecular Medicine and Therapeutics, 950 W 28th Ave, Vancouver, V6H 3N1 Canada
| | - Allison M. Matthews
- BC Children’s Hospital Research Institute, 950 W 28th Ave, Vancouver, V6H 3N1 Canada
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1 Canada
- Centre for Molecular Medicine and Therapeutics, 950 W 28th Ave, Vancouver, V6H 3N1 Canada
- Department of Pathology & Laboratory Medicine, University of British Columbia, 2211 Wesbrook Mall, Vancouver, V6T 1Z7 Canada
| | - Carolyn J. Brown
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1 Canada
| | - Wendy P. Robinson
- BC Children’s Hospital Research Institute, 950 W 28th Ave, Vancouver, V6H 3N1 Canada
- Department of Medical Genetics, University of British Columbia, 4500 Oak St, Vancouver, V6H 3N1 Canada
| |
Collapse
|
42
|
Abstract
Aim: Social scientists have placed particularly high expectations on the study of epigenomics to explain how exposure to adverse social factors like poverty, child maltreatment and racism - particularly early in childhood - might contribute to complex diseases. However, progress has stalled, reflecting many of the same challenges faced in genomics, including overhype, lack of diversity in samples, limited replication and difficulty interpreting significance of findings. Materials & methods: This review focuses on the future of social epigenomics by discussing progress made, ongoing methodological and analytical challenges and suggestions for improvement. Results & conclusion: Recommendations include more diverse sample types, cross-cultural, longitudinal and multi-generational studies. True integration of social and epigenomic data will require increased access to both data types in publicly available databases, enhanced data integration frameworks, and more collaborative efforts between social scientists and geneticists.
Collapse
Affiliation(s)
- Amy L Non
- Department of Anthropology at the University of California, San Diego, 92093 CA, USA
| |
Collapse
|
43
|
Hettegger P, Vierlinger K, Weinhaeusel A. Random rotation for identifying differentially expressed genes with linear models following batch effect correction. Bioinformatics 2021; 37:2142-2149. [PMID: 33523104 DOI: 10.1093/bioinformatics/btab063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2020] [Revised: 01/11/2021] [Accepted: 01/27/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Data generated from high-throughput technologies such as sequencing, microarray and bead-chip technologies are unavoidably affected by batch effects. Large effort has been put into developing methods for correcting these effects. Often, batch effect correction and hypothesis testing cannot be done with one single model, but are done successively with separate models in data analysis pipelines. This potentially leads to biased p-values or false discovery rates due to the influence of batch effect correction on the data. RESULTS We present a novel approach for estimating null distributions of test statistics in data analysis pipelines where batch effect correction is followed by linear model analysis. The approach is based on generating simulated datasets by random rotation and thereby retains the dependence structure of genes adequately. This allows estimating null distributions of dependent test statistics and thus the calculation of resampling based p-values and false discovery rates following batch effect correction while maintaining the alpha level. AVAILABILITY The described methods are implemented as randRotation package on Bioconductor: https://bioconductor.org/packages/randRotation/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peter Hettegger
- Competence Unit Molecular Diagnostics, Health and Environment Department, Austrian Institute of Technology, Vienna, 1220, Austria
| | - Klemens Vierlinger
- Competence Unit Molecular Diagnostics, Health and Environment Department, Austrian Institute of Technology, Vienna, 1220, Austria
| | - Andreas Weinhaeusel
- Competence Unit Molecular Diagnostics, Health and Environment Department, Austrian Institute of Technology, Vienna, 1220, Austria
| |
Collapse
|
44
|
Transcriptome Profiling Analyses in Psoriasis: A Dynamic Contribution of Keratinocytes to the Pathogenesis. Genes (Basel) 2020; 11:genes11101155. [PMID: 33007857 PMCID: PMC7600703 DOI: 10.3390/genes11101155] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2020] [Revised: 09/28/2020] [Accepted: 09/29/2020] [Indexed: 02/08/2023] Open
Abstract
Psoriasis is an immune-mediated inflammatory skin disease with a complex etiology involving environmental and genetic factors. A better insight into related genomic alteration helps design precise therapies leading to better treatment outcome. Gene expression in psoriasis can provide relevant information about the altered expression of mRNA transcripts, thus giving new insights into the disease onset. Techniques for transcriptome analyses, such as microarray and RNA sequencing (RNA-seq), are relevant tools for the discovery of new biomarkers as well as new therapeutic targets. This review summarizes the findings related to the contribution of keratinocytes in the pathogenesis of psoriasis by an in-depth review of studies that have examined psoriatic transcriptomes in the past years. It also provides valuable information on reconstructed 3D psoriatic skin models using cells isolated from psoriatic patients for transcriptomic studies.
Collapse
|
45
|
Microarray Normalization Revisited for Reproducible Breast Cancer Biomarkers. BIOMED RESEARCH INTERNATIONAL 2020; 2020:1363827. [PMID: 32832541 PMCID: PMC7428878 DOI: 10.1155/2020/1363827] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Revised: 03/30/2020] [Accepted: 05/11/2020] [Indexed: 11/21/2022]
Abstract
Precision medicine for breast cancer relies on biomarkers to select therapies. However, the reliability of biomarkers drawn from gene expression arrays has been questioned and calls for reassessment, in particular for large datasets. We revisit widely used data-normalization procedures and evaluate differences in outcome in order to pinpoint the most reliable reprocessing methods biomarkers can be based upon. We generated a database of 3753 breast cancer patients out of 38 studies by downloading and curating patient samples from NCBI-GEO. As gene-expression biomarkers, we select the assessment of receptor status and breast cancer subtype classification. Each normalization procedure is applied separately, and biomarkers are then evaluated for each patient. Differences between normalization pipelines are quantified as percentages of patients having outcomes different for each pipeline. Some normalization procedures lead to quite consistent biomarkers, differing only in 1-2% of patients. Other normalization procedures—some of them have been used in many clinical studies—end up with distrusting discrepancies (10% and more). A good deal of doubt regarding the reliability of microarrays may root in the haphazard application of inadequate preprocessing pipelines. Several modes of batch corrections are evaluated regarding a possible improvement of receptor prediction from gene expression versus the golden standard of immunohistochemistry. Finally, we nominate those normalization methods yielding consistent and trustable results. Adequate bioinformatics data preprocessing is key and crucial for any subsequent statistics to arrive at trustable results. We conclude with a suggestion for future bioinformatics development to further increase the reliability of cancer biomarkers.
Collapse
|
46
|
Zindler T, Frieling H, Neyazi A, Bleich S, Friedel E. Simulating ComBat: how batch correction can lead to the systematic introduction of false positive results in DNA methylation microarray studies. BMC Bioinformatics 2020; 21:271. [PMID: 32605541 PMCID: PMC7328269 DOI: 10.1186/s12859-020-03559-6] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Accepted: 05/26/2020] [Indexed: 12/04/2022] Open
Abstract
Background Systematic technical effects—also called batch effects—are a considerable challenge when analyzing DNA methylation (DNAm) microarray data, because they can lead to false results when confounded with the variable of interest. Methods to correct these batch effects are error-prone, as previous findings have shown. Results Here, we demonstrate how using the R function ComBat to correct simulated Infinium HumanMethylation450 BeadChip (450 K) and Infinium MethylationEPIC BeadChip Kit (EPIC) DNAm data can lead to a large number of false positive results under certain conditions. We further provide a detailed assessment of the consequences for the highly relevant problem of p-value inflation with subsequent false positive findings after application of the frequently used ComBat method. Using ComBat to correct for batch effects in randomly generated samples produced alarming numbers of false discovery rate (FDR) and Bonferroni-corrected (BF) false positive results in unbalanced as well as in balanced sample distributions in terms of the relation between the outcome of interest variable and the technical position of the sample during the probe measurement. Both sample size and number of batch factors (e.g. number of chips) were systematically simulated to assess the probability of false positive findings. The effect of sample size was simulated using n = 48 up to n = 768 randomly generated samples. Increasing the number of corrected factors led to an exponential increase in the number of false positive signals. Increasing the number of samples reduced, but did not completely prevent, this effect. Conclusions Using the approach described, we demonstrate, that using ComBat for batch correction in DNAm data can lead to false positive results under certain conditions and sample distributions. Our results are thus contrary to previous publications, considering a balanced sample distribution as unproblematic when using ComBat. We do not claim completeness in terms of reporting all technical conditions and possible solutions of the occurring problems as we approach the problem from a clinician’s perspective and not from that of a computer scientist. With our approach of simulating data, we provide readers with a simple method to assess the probability of false positive findings in DNAm microarray data analysis pipelines.
Collapse
Affiliation(s)
- Tristan Zindler
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, Hannover, Germany.
| | - Helge Frieling
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, Hannover, Germany
| | - Alexandra Neyazi
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, Hannover, Germany
| | - Stefan Bleich
- Department of Psychiatry, Social Psychiatry and Psychotherapy, Hannover Medical School, Hannover, Germany
| | - Eva Friedel
- Department of Psychiatry and Psychotherapy, Charité Campus Mitte (CCM), Charité-Universitätsmedizin Berlin, Berlin, Germany.,Berlin Institute of Health (BIH), 10178, Berlin, Germany
| |
Collapse
|
47
|
Judge M, Parker E, Naniche D, Le Souëf P. Gene Expression: the Key to Understanding HIV-1 Infection? Microbiol Mol Biol Rev 2020; 84:e00080-19. [PMID: 32404327 PMCID: PMC7233484 DOI: 10.1128/mmbr.00080-19] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Gene expression profiling of the host response to HIV infection has promised to fill the gaps in our knowledge and provide new insights toward vaccine and cure. However, despite 20 years of research, the biggest questions remained unanswered. A literature review identified 62 studies examining gene expression dysregulation in samples from individuals living with HIV. Changes in gene expression were dependent on cell/tissue type, stage of infection, viremia, and treatment status. Some cell types, notably CD4+ T cells, exhibit upregulation of cell cycle, interferon-related, and apoptosis genes consistent with depletion. Others, including CD8+ T cells and natural killer cells, exhibit perturbed function in the absence of direct infection with HIV. Dysregulation is greatest during acute infection. Differences in study design and data reporting limit comparability of existing research and do not as yet provide a coherent overview of gene expression in HIV. This review outlines the extraordinarily complex host response to HIV and offers recommendations to realize the full potential of HIV host transcriptomics.
Collapse
Affiliation(s)
- Melinda Judge
- Faculty of Health and Medical Sciences, University of Western Australia, Perth, Australia
| | - Erica Parker
- Faculty of Health and Medical Sciences, University of Western Australia, Perth, Australia
| | - Denise Naniche
- Barcelona Institute for Global Health (ISGlobal), Barcelona, Spain
- Centro de Investigação de Saúde de Manhiça (CISM), Manhiça, Mozambique
| | - Peter Le Souëf
- Faculty of Health and Medical Sciences, University of Western Australia, Perth, Australia
| |
Collapse
|
48
|
Salgado C, Gruis N, Heijmans BT, Oosting J, van Doorn R. Genome-wide analysis of constitutional DNA methylation in familial melanoma. Clin Epigenetics 2020; 12:43. [PMID: 32143689 PMCID: PMC7060565 DOI: 10.1186/s13148-020-00831-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 02/20/2020] [Indexed: 12/26/2022] Open
Abstract
Background Heritable epigenetic alterations have been proposed as an explanation for familial clustering of melanoma. Here we performed genome-wide DNA methylation analysis on affected family members not carrying pathogenic variants in established melanoma susceptibility genes, compared with healthy volunteers. Results All melanoma susceptibility genes showed the absence of epimutations in familial melanoma patients, and no loss of imprinting was detected. Unbiased genome-wide DNA methylation analysis revealed significantly different levels of methylation in single CpG sites. The methylation level differences were small and did not affect reported tumour predisposition genes. Conclusion Our results provide no support for heritable epimutations as a cause of familial melanoma.
Collapse
Affiliation(s)
- Catarina Salgado
- Department of Dermatology, Leiden University Medical Center, Leiden, PO Box 9600, 2300 RC, Leiden, The Netherlands
| | - Nelleke Gruis
- Department of Dermatology, Leiden University Medical Center, Leiden, PO Box 9600, 2300 RC, Leiden, The Netherlands
| | | | - Bastiaan T Heijmans
- Molecular Epidemiology, Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, The Netherlands
| | - Jan Oosting
- Department of Pathology, Leiden University Medical Center, Leiden, The Netherlands
| | - Remco van Doorn
- Department of Dermatology, Leiden University Medical Center, Leiden, PO Box 9600, 2300 RC, Leiden, The Netherlands.
| |
Collapse
|
49
|
Yu F, Qiu C, Xu C, Tian Q, Zhao LJ, Wu L, Deng HW, Shen H. Mendelian Randomization Identifies CpG Methylation Sites With Mediation Effects for Genetic Influences on BMD in Peripheral Blood Monocytes. Front Genet 2020; 11:60. [PMID: 32180791 PMCID: PMC7059767 DOI: 10.3389/fgene.2020.00060] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Accepted: 01/17/2020] [Indexed: 12/18/2022] Open
Abstract
Osteoporosis is mainly characterized by low bone mineral density (BMD) and is an increasingly serious public health concern. DNA methylation is a major epigenetic mechanism that may contribute to the variation in BMD and may mediate the effects of genetic and environmental factors of osteoporosis. In this study, we performed an epigenome-wide DNA methylation analysis in peripheral blood monocytes of 118 Caucasian women with extreme BMD values. Further, we developed and implemented a novel analytical framework that integrates Mendelian randomization with genetic fine mapping and colocalization to evaluate the causal relationships between DNA methylation and BMD phenotype. We identified 2,188 differentially methylated CpGs (DMCs) between the low and high BMD groups and distinguished 30 DMCs that may mediate the genetic effects on BMD. The causal relationship was further confirmed by eliminating the possibility of horizontal pleiotropy, linkage effect and reverse causality. The fine-mapping analysis determined 25 causal variants that are most likely to affect the methylation levels at these mediator DMCs. The majority of the causal methylation quantitative loci and DMCs reside within cell type-specific histone mark peaks, enhancers, promoters, promoter flanking regions and CTCF binding sites, supporting the regulatory potentials of these loci. The established causal pathways from genetic variant to BMD phenotype mediated by DNA methylation provide a gene list to aid in designing future functional studies and lead to a better understanding of the genetic and epigenetic mechanisms underlying the variation of BMD.
Collapse
Affiliation(s)
- Fangtang Yu
- Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, United States
| | - Chuan Qiu
- Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, United States
| | - Chao Xu
- Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, United States
| | - Qing Tian
- Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, United States
| | - Lan-Juan Zhao
- Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, United States
| | - Li Wu
- Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, United States
| | - Hong-Wen Deng
- Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, United States.,School of Basic Medical Science, Central South University, Changsha, China
| | - Hui Shen
- Center for Bioinformatics and Genomics, Department of Global Biostatistics and Data Science, School of Public Health and Tropical Medicine, Tulane University, New Orleans, LA, United States
| |
Collapse
|
50
|
Machine learning workflows to estimate class probabilities for precision cancer diagnostics on DNA methylation microarray data. Nat Protoc 2020; 15:479-512. [PMID: 31932775 DOI: 10.1038/s41596-019-0251-6] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2019] [Accepted: 10/04/2019] [Indexed: 01/01/2023]
Abstract
DNA methylation data-based precision cancer diagnostics is emerging as the state of the art for molecular tumor classification. Standards for choosing statistical methods with regard to well-calibrated probability estimates for these typically highly multiclass classification tasks are still lacking. To support this choice, we evaluated well-established machine learning (ML) classifiers including random forests (RFs), elastic net (ELNET), support vector machines (SVMs) and boosted trees in combination with post-processing algorithms and developed ML workflows that allow for unbiased class probability (CP) estimation. Calibrators included ridge-penalized multinomial logistic regression (MR) and Platt scaling by fitting logistic regression (LR) and Firth's penalized LR. We compared these workflows on a recently published brain tumor 450k DNA methylation cohort of 2,801 samples with 91 diagnostic categories using a 5 × 5-fold nested cross-validation scheme and demonstrated their generalizability on external data from The Cancer Genome Atlas. ELNET was the top stand-alone classifier with the best calibration profiles. The best overall two-stage workflow was MR-calibrated SVM with linear kernels closely followed by ridge-calibrated tuned RF. For calibration, MR was the most effective regardless of the primary classifier. The protocols developed as a result of these comparisons provide valuable guidance on choosing ML workflows and their tuning to generate well-calibrated CP estimates for precision diagnostics using DNA methylation data. Computation times vary depending on the ML algorithm from <15 min to 5 d using multi-core desktop PCs. Detailed scripts in the open-source R language are freely available on GitHub, targeting users with intermediate experience in bioinformatics and statistics and using R with Bioconductor extensions.
Collapse
|