1
|
Li A, Mueller A, English B, Arena A, Vera D, Kane AE, Sinclair DA. Novel feature selection methods for construction of accurate epigenetic clocks. PLoS Comput Biol 2022; 18:e1009938. [PMID: 35984867 PMCID: PMC9432708 DOI: 10.1371/journal.pcbi.1009938] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2022] [Revised: 08/31/2022] [Accepted: 07/11/2022] [Indexed: 11/22/2022] Open
Abstract
Epigenetic clocks allow us to accurately predict the age and future health of individuals based on the methylation status of specific CpG sites in the genome and are a powerful tool to measure the effectiveness of longevity interventions. There is a growing need for methods to efficiently construct epigenetic clocks. The most common approach is to create clocks using elastic net regression modelling of all measured CpG sites, without first identifying specific features or CpGs of interest. The addition of feature selection approaches provides the opportunity to optimise the identification of predictive CpG sites. Here, we apply novel feature selection methods and combinatorial approaches including newly adapted neural networks, genetic algorithms, and 'chained' combinations. Human whole blood methylation data of ~470,000 CpGs was used to develop clocks that predict age with R2 correlation scores of greater than 0.73, the most predictive of which uses 35 CpG sites for a R2 correlation score of 0.87. The five most frequent sites across all clocks were modelled to build a clock with a R2 correlation score of 0.83. These two clocks are validated on two external datasets where they maintain excellent predictive accuracy. When compared with three published epigenetic clocks (Hannum, Horvath, Weidner) also applied to these validation datasets, our clocks outperformed all three models. We identified gene regulatory regions associated with selected CpGs as possible targets for future aging studies. Thus, our feature selection algorithms build accurate, generalizable clocks with a low number of CpG sites, providing important tools for the field.
Collapse
Affiliation(s)
- Adam Li
- Blavatnik Institute, Dept. of Genetics, Paul F. Glenn Center for Biology of Aging Research at Harvard Medical School, Boston, Massachusetts, United States of America
| | - Amber Mueller
- Blavatnik Institute, Dept. of Genetics, Paul F. Glenn Center for Biology of Aging Research at Harvard Medical School, Boston, Massachusetts, United States of America
| | - Brad English
- Blavatnik Institute, Dept. of Genetics, Paul F. Glenn Center for Biology of Aging Research at Harvard Medical School, Boston, Massachusetts, United States of America
| | - Anthony Arena
- Blavatnik Institute, Dept. of Genetics, Paul F. Glenn Center for Biology of Aging Research at Harvard Medical School, Boston, Massachusetts, United States of America
| | - Daniel Vera
- Blavatnik Institute, Dept. of Genetics, Paul F. Glenn Center for Biology of Aging Research at Harvard Medical School, Boston, Massachusetts, United States of America
| | - Alice E. Kane
- Blavatnik Institute, Dept. of Genetics, Paul F. Glenn Center for Biology of Aging Research at Harvard Medical School, Boston, Massachusetts, United States of America
| | - David A. Sinclair
- Blavatnik Institute, Dept. of Genetics, Paul F. Glenn Center for Biology of Aging Research at Harvard Medical School, Boston, Massachusetts, United States of America
| |
Collapse
|
2
|
Bai L, Sun H, Jiang W, Yang L, Liu G, Zhao X, Hu H, Wang J, Gao S. DNA methylation and histone acetylation are involved in Wnt10b expression during the secondary hair follicle cycle in Angora rabbits. J Anim Physiol Anim Nutr (Berl) 2021; 105:599-609. [PMID: 33404138 DOI: 10.1111/jpn.13481] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 09/03/2020] [Accepted: 11/02/2020] [Indexed: 12/25/2022]
Abstract
Secondary hair follicles (SHFs) in the Angora rabbit exhibit classic cyclic hair development, but the multiple molecular signals involved in hair cycling are yet to be explored in detail. In the present study, we investigated the expression pattern, methylation and histone H3 acetylation status of Wnt10b, as a molecular signal participating in hair cycling, during the SHF cycle in the Angora rabbit. Expression of Wnt10b at the anagen phase was significantly higher than that at both the telogen and catagen phases, suggesting that Wnt10b might serve as a critical activator during cyclic transition of SHFs. Methylation frequency of the fifth CpG site (CpG5-175 bp) in CpG islands at the anagen phase was lower than that at both the catagen and telogen phases. The methylation status of the CpG5 site was negatively correlated with Wnt10b expression. This indicated that the methylation of CpG5 might participate in Wnt10b transcriptional suppression in SHFs. Furthermore, histone H3 acetylation status in the regions-256~-11 bp and 98 ~ 361 bp were significantly lower at both the catagen and telogen phases than at the anagen phase. The histone H3 acetylation level was significantly positively correlated with Wnt10b expression. This confirmed that histone acetylation was likely involved in upregulating Wnt10b transcription in SHFs. Additionally, potential binding to the transcription factors ZF57 and HDBP was predicted within the CpG5 site. In conclusion, our findings reveal the epigenetic mechanism of Wnt10b transcription and provide a new insight into epigenetic regulation during the SHF cycle in the Angora rabbit.
Collapse
Affiliation(s)
- Liya Bai
- Shandong Provincial Key Laboratory of Animal Disease Control & Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Haitao Sun
- Shandong Provincial Key Laboratory of Animal Disease Control & Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Wenxue Jiang
- Shandong Provincial Key Laboratory of Animal Disease Control & Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Liping Yang
- Shandong Provincial Key Laboratory of Animal Disease Control & Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Gongyan Liu
- Shandong Provincial Key Laboratory of Animal Disease Control & Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Xueyan Zhao
- Shandong Provincial Key Laboratory of Animal Disease Control & Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Hongmei Hu
- Shandong Provincial Key Laboratory of Animal Disease Control & Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Jianying Wang
- Shandong Provincial Key Laboratory of Animal Disease Control & Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Shuxia Gao
- Shandong Provincial Key Laboratory of Animal Disease Control & Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, China
| |
Collapse
|
3
|
Choi J, Chae H. methCancer-gen: a DNA methylome dataset generator for user-specified cancer type based on conditional variational autoencoder. BMC Bioinformatics 2020; 21:181. [PMID: 32393170 PMCID: PMC7216580 DOI: 10.1186/s12859-020-3516-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2019] [Accepted: 04/29/2020] [Indexed: 12/31/2022] Open
Abstract
Background Recently, DNA methylation has drawn great attention due to its strong correlation with abnormal gene activities and informative representation of the cancer status. As a number of studies focus on DNA methylation signatures in cancer, demand for utilizing publicly available methylome dataset has been increased. To satisfy this, large-scale projects were launched to discover biological insights into cancer, providing a collection of the dataset. However, public cancer data, especially for certain cancer types, is still limited to be used in research. Several simulation tools for producing epigenetic dataset have been introduced in order to alleviate the issue, still, to date, generation for user-specified cancer type dataset has not been proposed. Results In this paper, we present methCancer-gen, a tool for generating DNA methylome dataset considering type for cancer. Employing conditional variational autoencoder, a neural network-based generative model, it estimates the conditional distribution with latent variables and data, and generates samples for specified cancer type. Conclusions To evaluate the simulation performance of methCancer-gen for the user-specified cancer type, our proposed model was compared to a benchmark method and it could successfully reproduce cancer type-wise data with high accuracy helping to alleviate the lack of condition-specific data issue. methCancer-gen is publicly available at https://github.com/cbi-bioinfo/methCancer-gen.
Collapse
Affiliation(s)
- Joungmin Choi
- Division of Computer Science, Sookmyung Women's University, Seoul, Republic of Korea
| | - Heejoon Chae
- Division of Computer Science, Sookmyung Women's University, Seoul, Republic of Korea.
| |
Collapse
|
4
|
Prosocial Emotion, Adolescence, and Warfare. HUMAN NATURE-AN INTERDISCIPLINARY BIOSOCIAL PERSPECTIVE 2019; 30:192-216. [DOI: 10.1007/s12110-019-09344-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
5
|
Zhao X, Wang Y, Guo J, Wang J. Correlation analyses of CpG island methylation of cluster of differentiation 4 protein with gene expression and T lymphocyte subpopulation traits. ASIAN-AUSTRALASIAN JOURNAL OF ANIMAL SCIENCES 2018. [PMID: 29514434 PMCID: PMC6043439 DOI: 10.5713/ajas.17.0805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Objective Cluster of differentiation 4 protein (CD4) gene is an important immune related gene which plays a significant role in T cell development and host resistance during viral infection. Methods In order to unravel the relationship of CpG island methylation level of CD4 gene with its gene expression and T lymphocyte subpopulation traits, we used one typical Chinese indigenous breed (Dapulian, DP) and one commercial breed (Landrace), then predicted the CpG island of CD4 gene, determined the methylation status of CpG sites by bisulfite sequencing polymerase chain reaction (BSP), and carried out the correlation analyses of methylation frequencies of CpG sites with mRNA expression and T lymphocyte subpopulation traits. Results There was one CpG island predicted in the upstream −2 kb region and exon one of porcine CD4 gene, which located 333 bp upstream from the start site of gene and contained nine CpG sites. The correlation analysis results indicated that the methylation frequency of CpG_2 significantly correlated with CD4 mRNA expression in the DP and Landrace combined population, though it did not reach significance level in DP and Landrace separately. Additionally, 15 potential binding transcription factors (TFs) were predicted within the CpG island, and one of them (Jumonji) contained CpG_2 site, suggesting that it may influence the CD4 gene expression through the potential binding TFs. We also found methylation frequency of CpG_2 negatively correlated with T lymphocyte subpopulation traits CD4+CD8−CD3−, CD4−CD8+CD3− and CD4+/CD8+, and positively correlated with CD4−CD8+CD3+ and CD4+CD8+CD3+ (for all correlation, p<0.01) in DP and Landrace combined population. Thus, the CpG_2 was a critical methylation site for porcine CD4 gene expression and T lymphocyte subpopulation traits. Conclusion We speculated that increased methylation frequency of CpG_2 may lead to the decreased expression of CD4, which may have some kind of influence on T lymphocyte subpopulation traits and the immunity of DP population.
Collapse
Affiliation(s)
- Xueyan Zhao
- Shandong Provincial Key Laboratory of Animal Disease Control and Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Yanping Wang
- Shandong Provincial Key Laboratory of Animal Disease Control and Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Jianfeng Guo
- Shandong Provincial Key Laboratory of Animal Disease Control and Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| | - Jiying Wang
- Shandong Provincial Key Laboratory of Animal Disease Control and Breeding, Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan 250100, China
| |
Collapse
|
6
|
Alkuhlani A, Nassef M, Farag I. Multistage feature selection approach for high-dimensional cancer data. Soft comput 2016. [DOI: 10.1007/s00500-016-2439-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
|
7
|
Gomez-Rueda H, Palacios-Corona R, Gutiérrez-Hermosillo H, Trevino V. A robust biomarker of differential correlations improves the diagnosis of cytologically indeterminate thyroid cancers. Int J Mol Med 2016; 37:1355-62. [PMID: 27035928 DOI: 10.3892/ijmm.2016.2534] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2015] [Accepted: 02/23/2016] [Indexed: 11/05/2022] Open
Abstract
The fine-needle aspiration of thyroid nodules and subsequent cytological analysis is unable to determine the diagnosis in 15 to 30% of thyroid cancer cases; patients with indeterminate cytological results undergo diagnostic surgery which is potentially unnecessary. Current gene expression biomarkers based on well-determined cytology are complex and their accuracy is inconsistent across public datasets. In the present study, we identified a robust biomarker using the differences in gene expression values specifically from cytologically indeterminate thyroid tumors and a powerful multivariate search tool coupled with a nearest centroid classifier. The biomarker is based on differences in the expression of the following genes: CCND1, CLDN16, CPE, LRP1B, MAGI3, MAPK6, MATN2, MPPED2, PFKFB2, PTPRE, PYGL, SEMA3D, SERGEF, SLC4A4 and TIMP1. This 15-gene biomarker exhibited superior accuracy independently of the cytology in six datasets, including The Cancer Genome Atlas (TCGA) thyroid dataset. In addition, this biomarker exhibited differences in the correlation coefficients between benign and malignant samples that indicate its discriminatory power, and these 15 genes have been previously related to cancer in the literature. Thus, this 15-gene biomarker provides advantages in clinical practice for the effective diagnosis of thyroid cancer.
Collapse
Affiliation(s)
- Hugo Gomez-Rueda
- Bioinformatics Research Group, Department of Research and Innovation, Medical School, Tecnológico de Monterrey, Colonia Los Doctores, 64710 Monterrey, Nuevo León, Mexico
| | - Rebeca Palacios-Corona
- Northeastern Biomedical Research Center, Instituto Mexicano del Seguro Social, Colonia Independencia, 64720 Monterrey, Nuevo León, Mexico
| | - Hugo Gutiérrez-Hermosillo
- Department of Geriatrics, UMAE 1 CMN del Bajío, Instituto Mexicano del Seguro Social, Hospital Aranda de la Parra, Colonia Centro, 37000 León, Guanajuato, Mexico
| | - Victor Trevino
- Bioinformatics Research Group, Department of Research and Innovation, Medical School, Tecnológico de Monterrey, Colonia Los Doctores, 64710 Monterrey, Nuevo León, Mexico
| |
Collapse
|
8
|
Abstract
Epigenetic mechanisms control gene expression in a way that is stably propagated over multiple cell divisions, but which is also flexible enough to respond to environmental influences. This intermediate position between stability and plasticity renders epigenetic information highly useful for monitoring cellular states in the context of personalized medicine. Epigenetic alterations have also been identified as causal events for common diseases such as cancer and autoimmune disorders. The goal of epigenetic biomarker development is to design experimental assays that produce relevant information for diagnosis, prognosis and therapy optimization in routine clinical treatment and drug discovery. Here, I outline a systematic approach to epigenetic biomarker development and highlight key bioinformatic tools that facilitate discovery, optimization and validation of novel biomarkers.
Collapse
Affiliation(s)
- Christoph Bock
- Max-Planck-Institut für Informatik, Saarbrücken, Germany.
| |
Collapse
|
9
|
Laurila K, Oster B, Andersen CL, Lamy P, Orntoft T, Yli-Harja O, Wiuf C. A beta-mixture model for dimensionality reduction, sample classification and analysis. BMC Bioinformatics 2011; 12:215. [PMID: 21619656 PMCID: PMC3126746 DOI: 10.1186/1471-2105-12-215] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2010] [Accepted: 05/27/2011] [Indexed: 01/15/2023] Open
Abstract
Background Patterns of genome-wide methylation vary between tissue types. For example, cancer tissue shows markedly different patterns from those of normal tissue. In this paper we propose a beta-mixture model to describe genome-wide methylation patterns based on probe data from methylation microarrays. The model takes dependencies between neighbour probe pairs into account and assumes three broad categories of methylation, low, medium and high. The model is described by 37 parameters, which reduces the dimensionality of a typical methylation microarray significantly. We used methylation microarray data from 42 colon cancer samples to assess the model. Results Based on data from colon cancer samples we show that our model captures genome-wide characteristics of methylation patterns. We estimate the parameters of the model and show that they vary between different tissue types. Further, for each methylation probe the posterior probability of a methylation state (low, medium or high) is calculated and the probability that the state is correctly predicted is assessed. We demonstrate that the model can be applied to classify cancer tissue types accurately and that the model provides accessible and easily interpretable data summaries. Conclusions We have developed a beta-mixture model for methylation microarray data. The model substantially reduces the dimensionality of the data. It can be used for further analysis, such as sample classification or to detect changes in methylation status between different samples and tissues.
Collapse
Affiliation(s)
- Kirsti Laurila
- Bioinformatics Research Centre, Aarhus University, C.F. Møllers Allé 8, DK-8000 Århus C, Denmark
| | | | | | | | | | | | | |
Collapse
|
10
|
An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS One 2009; 4:e8274. [PMID: 20019873 PMCID: PMC2793425 DOI: 10.1371/journal.pone.0008274] [Citation(s) in RCA: 256] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2009] [Accepted: 11/13/2009] [Indexed: 02/06/2023] Open
Abstract
Background Recent studies have shown that DNA methylation (DNAm) markers in peripheral blood may hold promise as diagnostic or early detection/risk markers for epithelial cancers. However, to date no study has evaluated the diagnostic and predictive potential of such markers in a large case control cohort and on a genome-wide basis. Principal Findings By performing genome-wide DNAm profiling of a large ovarian cancer case control cohort, we here demonstrate that active ovarian cancer has a significant impact on the DNAm pattern in peripheral blood. Specifically, by measuring the methylation levels of over 27,000 CpGs in blood cells from 148 healthy individuals and 113 age-matched pre-treatment ovarian cancer cases, we derive a DNAm signature that can predict the presence of active ovarian cancer in blind test sets with an AUC of 0.8 (95% CI (0.74–0.87)). We further validate our findings in another independent set of 122 post-treatment cases (AUC = 0.76 (0.72–0.81)). In addition, we provide evidence for a significant number of candidate risk or early detection markers for ovarian cancer. Furthermore, by comparing the pattern of methylation with gene expression data from major blood cell types, we here demonstrate that age and cancer elicit common changes in the composition of peripheral blood, with a myeloid skewing that increases with age and which is further aggravated in the presence of ovarian cancer. Finally, we show that most cancer and age associated methylation variability is found at CpGs located outside of CpG islands. Significance Our results underscore the potential of DNAm profiling in peripheral blood as a tool for detection or risk-prediction of epithelial cancers, and warrants further in-depth and higher CpG coverage studies to further elucidate this role.
Collapse
|