1
|
Kim S, Qin Y, Park HJ, Yue M, Xu Z, Forno E, Chen W, Celedón JC. Methyl-TWAS: A powerful method for in silico transcriptome-wide association studies (TWAS) using long-range DNA methylation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.10.566586. [PMID: 38014125 PMCID: PMC10680683 DOI: 10.1101/2023.11.10.566586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
In silico transcriptome-wide association studies (TWAS) are commonly used to test whether expression of specific genes is linked to a complex trait. However, genotype-based in silico TWAS such as PrediXcan, exhibit low prediction accuracy for a majority of genes because genotypic data lack tissue- and disease-specificity and are not affected by the environment. Because methylation is tissue-specific and, like gene expression, can be modified by environment or disease status, methylation should predict gene expression with more accuracy than SNPs. Therefore, we propose Methyl-TWAS, the first approach that utilizes long-range methylation markers to impute gene expression for in silico TWAS through penalized regression. Methyl-TWAS 1) predicts epigenetically regulated/associated expression (eGReX), which incorporates tissue-specific expression and both genetically- (GReX) and environmentally-regulated expression to identify differentially expressed genes (DEGs) that could not be identified by genotype-based methods; and 2) incorporates both cis- and trans- CpGs, including various regulatory regions to identify DEGs that would be missed using cis- methylation only. Methyl-TWAS outperforms PrediXcan and two other methods in imputing gene expression in the nasal epithelium, particularly for immunity-related genes and DEGs in atopic asthma. Methyl-TWAS identified 3,681 (85.2%) of the 4,316 DEGs identified in a previous TWAS of atopic asthma using measured expression, while PrediXcan could not identify any gene. Methyl-TWAS also outperforms PrediXcan for expression imputation as well as in silico TWAS in white blood cells. Methyl-TWAS is a valuable tool for in silico TWAS, leveraging a growing body of publicly available genome-wide DNA methylation data for a variety of human tissues.
Collapse
Affiliation(s)
- Soyeon Kim
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| | - Yidi Qin
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Hyun Jung Park
- Department of Human Genetics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Molin Yue
- Department of Biostatistics, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Zhongli Xu
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
- School of Medicine, Tsinghua University, Beijing, China
| | - Erick Forno
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei Chen
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| | - Juan C. Celedón
- Division of Pulmonary Medicine, Department of Pediatrics, UPMC Children’s Hospital of Pittsburgh, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
2
|
Yao S, Cao B, Li T, Kalos D, Yuan Y, Wang X. Prediction-oriented prognostic biomarker discovery with survival machine learning methods. NAR Genom Bioinform 2023; 5:lqad055. [PMID: 37332657 PMCID: PMC10273194 DOI: 10.1093/nargab/lqad055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 04/21/2023] [Accepted: 05/26/2023] [Indexed: 06/20/2023] Open
Abstract
Identifying novel and reliable prognostic biomarkers for predicting patient survival outcomes is essential for deciding personalized treatment strategies for diseases such as cancer. Numerous feature selection techniques have been proposed to address the high-dimensional problem in constructing prediction models. Not only does feature selection lower the data dimension, but it also improves the prediction accuracy of the resulted models by mitigating overfitting. The performances of these feature selection methods when applied to survival models, on the other hand, deserve further investigation. In this paper, we construct and compare a series of prediction-oriented biomarker selection frameworks by leveraging recent machine learning algorithms, including random survival forests, extreme gradient boosting, light gradient boosting and deep learning-based survival models. Additionally, we adapt the recently proposed prediction-oriented marker selection (PROMISE) to a survival model (PROMISE-Cox) as a benchmark approach. Our simulation studies indicate that boosting-based approaches tend to provide superior accuracy with better true positive rate and false positive rate in more complicated scenarios. For demonstration purpose, we applied the proposed biomarker selection strategies to identify prognostic biomarkers in different modalities of head and neck cancer data.
Collapse
Affiliation(s)
- Sijie Yao
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
| | - Biwei Cao
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
| | - Tingyi Li
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
| | - Denise Kalos
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
| | - Yading Yuan
- Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, New York City, NY 10029, USA
| | - Xuefeng Wang
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA
| |
Collapse
|
3
|
Yao S, Wang X. Statistical and Machine Learning Methods for Discovering Prognostic Biomarkers for Survival Outcomes. Methods Mol Biol 2023; 2629:11-21. [PMID: 36929071 DOI: 10.1007/978-1-0716-2986-4_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Abstract
Discovering molecular biomarkers for predicting patient survival outcomes is an essential step toward improving prognosis and therapeutic decision-making in the treatment of severe diseases such as cancer. Due to the high-dimensionality nature of omics datasets, statistical methods such as the least absolute shrinkage and selection operator (Lasso) have been widely applied for cancer biomarker discovery. Due to their scalability and demonstrated prediction performance, machine learning methods such as XGBoost and neural network models have also been gaining popularity in the community recently. However, compared to more traditional survival methods such as Kaplan-Meier and Cox regression methods, high-dimensional methods for survival outcomes are still less well known to biomedical researchers. In this chapter, we will discuss the key analytical procedures in employing these methods for identifying biomarkers associated with survival data. We will also identify important considerations that emerged from the analysis of actual omics data. Some typical instances of misapplication and misinterpretation of machine learning methods will also be discussed. Using lung cancer and head and neck cancer datasets as demonstrations, we provide step-by-step instructions and sample R codes for prioritizing prognostic biomarkers.
Collapse
Affiliation(s)
- Sijie Yao
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA
| | - Xuefeng Wang
- Department of Biostatistics and Bioinformatics, H. Lee Moffitt Cancer Center & Research Institute, Tampa, FL, USA.
| |
Collapse
|
4
|
Zhang X, de Leon J, Crespo-Facorro B, Diaz FJ. Measuring individual benefits of psychiatric treatment using longitudinal binary outcomes: Application to antipsychotic benefits in non-cannabis and cannabis users. J Biopharm Stat 2020; 30:916-940. [DOI: 10.1080/10543406.2020.1765371] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Affiliation(s)
- Xuan Zhang
- Department of Biostatistics, The University of Kansas Medical Center, Kansas City, KS, United States
- Boston Strategic Partners, Inc, Boston, MA, United States
| | - Jose de Leon
- Mental Health Research Center at Eastern State Hospital, Lexington, KY, United States
| | - Benedicto Crespo-Facorro
- University Hospital Virgen Del Rocío, Seville, Spain
- CIBERSAM G26-IBiS, University of Seville, Seville, Spain
- Department of Psychiatry, Marqués De Valdecilla University Hospital, IDIVAL, Santander, Spain
- School of Medicine, University of Cantabria, Santander, Spain
| | - Francisco J. Diaz
- Department of Biostatistics, The University of Kansas Medical Center, Kansas City, KS, United States
| |
Collapse
|
5
|
Kim S, Park HJ, Cui X, Zhi D. Collective effects of long-range DNA methylations predict gene expressions and estimate phenotypes in cancer. Sci Rep 2020; 10:3920. [PMID: 32127627 PMCID: PMC7054398 DOI: 10.1038/s41598-020-60845-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 02/07/2020] [Indexed: 01/12/2023] Open
Abstract
DNA methylation of various genomic regions has been found to be associated with gene expression in diverse biological contexts. However, most genome-wide studies have focused on the effect of (1) methylation in cis, not in trans and (2) a single CpG, not the collective effects of multiple CpGs, on gene expression. In this study, we developed a statistical machine learning model, geneEXPLORE (gene expression prediction by long-range epigenetics), that quantifies the collective effects of both cis- and trans- methylations on gene expression. By applying geneEXPLORE to The Cancer Genome Atlas (TCGA) breast and 10 other types of cancer data, we found that most genes are associated with methylations of as much as 10 Mb from the promoters or more, and the long-range methylation explains 50% of the variation in gene expression on average, far greater than cis-methylation. geneEXPLORE outperforms competing methods such as BioMethyl and MethylXcan. Further, the predicted gene expressions could predict clinical phenotypes such as breast tumor status and estrogen receptor status (AUC = 0.999, 0.94 respectively) as accurately as the measured gene expression levels. These results suggest that geneEXPLORE provides a means for accurate imputation of gene expression, which can be further used to predict clinical phenotypes.
Collapse
Affiliation(s)
- Soyeon Kim
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States.,Division of Pediatric Pulmonary Medicine, UPMC Children's hospital of Pittsburgh, Pittsburgh, Pennsylvania, United States
| | - Hyun Jung Park
- Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pennsylvania, United States
| | - Xiangqin Cui
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States
| | - Degui Zhi
- Center for Precision Health, School of Biomedical Informatics, School of Public Health, University of Texas Health Center at Houston, Houston, Texas, United States.
| |
Collapse
|
6
|
Germinal Immunogenetics predict treatment outcome for PD-1/PD-L1 checkpoint inhibitors. Invest New Drugs 2019; 38:160-171. [PMID: 31402427 DOI: 10.1007/s10637-019-00845-w] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2019] [Accepted: 08/01/2019] [Indexed: 02/07/2023]
Abstract
Background Checkpoint inhibitors bring marked benefits but only in a minority of patients and may also be associated with severe adverse events. Treatment outcome still cannot be faithfully predicted. The following study hypothesized that host genetics could be applied as predictive biomarkers for checkpoint inhibitor response and immune-related adverse events. We conducted a study based on germinal polymorphisms from genes coding for proteins involved in immune regulation. Methods Germinal DNA was obtained from advanced cancer patients treated with anti-PD-1/PD-L1 checkpoint inhibitors. DNA was genotyped using a custom panel of 166 single nucleotide polymorphisms covering 86 preselected immunogenetic-related genes. Computational analysis using a GTEX portal was made to determine potential expression Quantitative Trait Loci in tissues. Results Ninety-four consecutive patients were included. Objective response rate (complete or partial response) was significantly correlated to tumor microenvironment-related SNPs concerning CCL2, NOS3, IL1RN, IL12B, CXCR3 and IL6R genes. Toxicity were linked to target-related gene SNPs including UNG, IFNW1, CTLA4, PD-L1 and IFNL4 genes. The Area Under the ROC curve (AUC) was 0.81 (95% CI: 0.72-0.9) for response and 0.89 (95% CI: 0.76-1.00) for toxicity. In silico functionality exploring pointed rs4845618 (IL6R), rs10964859 (IFNW1) and rs3087243 (CTLA4) as potentially impacting gene expression. Conclusion These results strongly support a role for distinct immunogenetic-related gene SNPs able to predict efficacy and safety of anti-PD1/PD-L1 therapies. The results highlight the existence of patient-specific, germinal biomarkers able predict response to checkpoint inhibitor efficacy and, possibly, to predict treatment-related adverse events.
Collapse
|
7
|
Park HJ, Ji P, Kim S, Xia Z, Rodriguez B, Li L, Su J, Chen K, Masamha CP, Baillat D, Fontes-Garfias CR, Shyu AB, Neilson JR, Wagner EJ, Li W. 3' UTR shortening represses tumor-suppressor genes in trans by disrupting ceRNA crosstalk. Nat Genet 2018; 50:783-789. [PMID: 29785014 DOI: 10.1038/s41588-018-0118-8] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2016] [Accepted: 03/22/2018] [Indexed: 01/27/2023]
Abstract
Widespread mRNA 3' UTR shortening through alternative polyadenylation 1 promotes tumor growth in vivo 2 . A prevailing hypothesis is that it induces proto-oncogene expression in cis through escaping microRNA-mediated repression. Here we report a surprising enrichment of 3'UTR shortening among transcripts that are predicted to act as competing-endogenous RNAs (ceRNAs) for tumor-suppressor genes. Our model-based analysis of the trans effect of 3' UTR shortening (MAT3UTR) reveals a significant role in altering ceRNA expression. MAT3UTR predicts many trans-targets of 3' UTR shortening, including PTEN, a crucial tumor-suppressor gene 3 involved in ceRNA crosstalk 4 with nine 3'UTR-shortening genes, including EPS15 and NFIA. Knockdown of NUDT21, a master 3' UTR-shortening regulator 2 , represses tumor-suppressor genes such as PHF6 and LARP1 in trans in a miRNA-dependent manner. Together, the results of our analysis suggest a major role of 3' UTR shortening in repressing tumor-suppressor genes in trans by disrupting ceRNA crosstalk, rather than inducing proto-oncogenes in cis.
Collapse
Affiliation(s)
- Hyun Jung Park
- Division of Biostatistics, Dan L Duncan Cancer Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA.,Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Ping Ji
- Department of Biochemistry & Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA
| | - Soyeon Kim
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Zheng Xia
- Division of Biostatistics, Dan L Duncan Cancer Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Benjamin Rodriguez
- Division of Biostatistics, Dan L Duncan Cancer Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Lei Li
- Division of Biostatistics, Dan L Duncan Cancer Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Jianzhong Su
- Division of Biostatistics, Dan L Duncan Cancer Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Kaifu Chen
- Division of Biostatistics, Dan L Duncan Cancer Center, Baylor College of Medicine, Houston, TX, USA.,Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA
| | - Chioniso P Masamha
- Department of Pharmaceutical Sciences, College of Pharmacy and Health Sciences, Butler University, Indianapolis, IN, USA
| | - David Baillat
- Department of Biochemistry & Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA
| | - Camila R Fontes-Garfias
- Department of Biochemistry & Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA
| | - Ann-Bin Shyu
- Department of Biochemistry and Molecular Biology, University of Texas, McGovern Medical School, Houston, TX, USA
| | - Joel R Neilson
- Department of Molecular Physiology and Biophysics, Baylor College of Medicine, Houston, TX, USA
| | - Eric J Wagner
- Department of Biochemistry & Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA.
| | - Wei Li
- Division of Biostatistics, Dan L Duncan Cancer Center, Baylor College of Medicine, Houston, TX, USA. .,Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|