1
|
Liu Y, Ren J, Ma S, Wu C. The spike-and-slab quantile LASSO for robust variable selection in cancer genomics studies. Stat Med 2024. [PMID: 39260448 DOI: 10.1002/sim.10196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 05/28/2024] [Accepted: 07/31/2024] [Indexed: 09/13/2024]
Abstract
Data irregularity in cancer genomics studies has been widely observed in the form of outliers and heavy-tailed distributions in the complex traits. In the past decade, robust variable selection methods have emerged as powerful alternatives to the nonrobust ones to identify important genes associated with heterogeneous disease traits and build superior predictive models. In this study, to keep the remarkable features of the quantile LASSO and fully Bayesian regularized quantile regression while overcoming their disadvantage in the analysis of high-dimensional genomics data, we propose the spike-and-slab quantile LASSO through a fully Bayesian spike-and-slab formulation under the robust likelihood by adopting the asymmetric Laplace distribution (ALD). The proposed robust method has inherited the prominent properties of selective shrinkage and self-adaptivity to the sparsity pattern from the spike-and-slab LASSO (Roc̆ková and George, J Am Stat Associat, 2018, 113(521): 431-444). Furthermore, the spike-and-slab quantile LASSO has a computational advantage to locate the posterior modes via soft-thresholding rule guided Expectation-Maximization (EM) steps in the coordinate descent framework, a phenomenon rarely observed for robust regularization with nondifferentiable loss functions. We have conducted comprehensive simulation studies with a variety of heavy-tailed errors in both homogeneous and heterogeneous model settings to demonstrate the superiority of the spike-and-slab quantile LASSO over its competing methods. The advantage of the proposed method has been further demonstrated in case studies of the lung adenocarcinomas (LUAD) and skin cutaneous melanoma (SKCM) data from The Cancer Genome Atlas (TCGA).
Collapse
Affiliation(s)
- Yuwen Liu
- Department of Statistics, Kansas State University, Manhattan, Kansas, USA
| | - Jie Ren
- Department of Biostatistics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, Connecticut, USA
| | - Cen Wu
- Department of Statistics, Kansas State University, Manhattan, Kansas, USA
| |
Collapse
|
2
|
Ma G, Kang J, Yu T. Bayesian functional analysis for untargeted metabolomics data with matching uncertainty and small sample sizes. Brief Bioinform 2024; 25:bbae141. [PMID: 38581417 PMCID: PMC10998539 DOI: 10.1093/bib/bbae141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/28/2024] [Accepted: 03/13/2024] [Indexed: 04/08/2024] Open
Abstract
Untargeted metabolomics based on liquid chromatography-mass spectrometry technology is quickly gaining widespread application, given its ability to depict the global metabolic pattern in biological samples. However, the data are noisy and plagued by the lack of clear identity of data features measured from samples. Multiple potential matchings exist between data features and known metabolites, while the truth can only be one-to-one matches. Some existing methods attempt to reduce the matching uncertainty, but are far from being able to remove the uncertainty for most features. The existence of the uncertainty causes major difficulty in downstream functional analysis. To address these issues, we develop a novel approach for Bayesian Analysis of Untargeted Metabolomics data (BAUM) to integrate previously separate tasks into a single framework, including matching uncertainty inference, metabolite selection and functional analysis. By incorporating the knowledge graph between variables and using relatively simple assumptions, BAUM can analyze datasets with small sample sizes. By allowing different confidence levels of feature-metabolite matching, the method is applicable to datasets in which feature identities are partially known. Simulation studies demonstrate that, compared with other existing methods, BAUM achieves better accuracy in selecting important metabolites that tend to be functionally consistent and assigning confidence scores to feature-metabolite matches. We analyze a COVID-19 metabolomics dataset and a mouse brain metabolomics dataset using BAUM. Even with a very small sample size of 16 mice per group, BAUM is robust and stable. It finds pathways that conform to existing knowledge, as well as novel pathways that are biologically plausible.
Collapse
Affiliation(s)
- Guoxuan Ma
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Tianwei Yu
- Shenzhen Research Institute of Big Data, School of Data Science, The Chinese University of Hong Kong - Shenzhen (CUHK-Shenzhen), Shenzhen, Guangdong 518172, China
| |
Collapse
|
3
|
Sajedi S, Ebrahimi G, Roudi R, Mehta I, Heshmat A, Samimi H, Kazempour S, Zainulabadeen A, Docking TR, Arora SP, Cigarroa F, Seshadri S, Karsan A, Zare H. Integrating DNA methylation and gene expression data in a single gene network using the iNETgrate package. Sci Rep 2023; 13:21721. [PMID: 38066050 PMCID: PMC10709411 DOI: 10.1038/s41598-023-48237-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
Analyzing different omics data types independently is often too restrictive to allow for detection of subtle, but consistent, variations that are coherently supported based upon different assays. Integrating multi-omics data in one model can increase statistical power. However, designing such a model is challenging because different omics are measured at different levels. We developed the iNETgrate package ( https://bioconductor.org/packages/iNETgrate/ ) that efficiently integrates transcriptome and DNA methylation data in a single gene network. Applying iNETgrate on five independent datasets improved prognostication compared to common clinical gold standards and a patient similarity network approach.
Collapse
Affiliation(s)
- Sogand Sajedi
- Department of Cell Systems & Anatomy, The University of Texas Health Science Center, San Antonio, TX, 78229, USA
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, San Antonio, TX, 78229, USA
| | - Ghazal Ebrahimi
- Bioinformatics Program, The University of British Columbia, Vancouver, BC, Canada
| | - Raheleh Roudi
- Department of Radiology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Isha Mehta
- Department of Immunology, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Amirreza Heshmat
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Hanie Samimi
- School of Architecture, University of Utah, Salt Lake City, UT, 84112, USA
| | - Shiva Kazempour
- Department of Cell Systems & Anatomy, The University of Texas Health Science Center, San Antonio, TX, 78229, USA
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, San Antonio, TX, 78229, USA
| | - Aamir Zainulabadeen
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA
| | - Thomas Roderick Docking
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Research Centre, Vancouver, BC, V5Z 1L3, Canada
| | - Sukeshi Patel Arora
- Mays Cancer Center, The University of Texas Health Science Center, San Antonio, TX, 78229, USA
| | - Francisco Cigarroa
- Malu and Carlos Alvarez Center for Transplantation, Hepatobiliary Surgery and Innovation, The University of Texas Health Science Center, San Antonio, TX, 78229, USA
| | - Sudha Seshadri
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, San Antonio, TX, 78229, USA
- Department of Neurology, University of Texas, San Antonio, TX, 78229, USA
- Department of Neurology, Boston University School of Medicine, Boston, Massachusetts, 02139, USA
| | - Aly Karsan
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Research Centre, Vancouver, BC, V5Z 1L3, Canada
| | - Habil Zare
- Department of Cell Systems & Anatomy, The University of Texas Health Science Center, San Antonio, TX, 78229, USA.
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, San Antonio, TX, 78229, USA.
- Department of Cell Systems & Anatomy, 7703 Floyd Curl Drive, San Antonio, TX, 78229, USA.
| |
Collapse
|
4
|
Deng Q, Du Y, Wang Z, Chen Y, Wang J, Liang H, Zhang D. Identification and validation of a DNA methylation-driven gene-based prognostic model for clear cell renal cell carcinoma. BMC Genomics 2023; 24:307. [PMID: 37286941 DOI: 10.1186/s12864-023-09416-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 05/30/2023] [Indexed: 06/09/2023] Open
Abstract
BACKGROUND Clear cell renal cell carcinoma (ccRCC) is a malignant tumor with heterogeneous morphology and poor prognosis. This study aimed to establish a DNA methylation (DNAm)-driven gene-based prognostic model for ccRCC. METHODS Reduced representation bisulfite sequencing (RRBS) was performed on the DNA extracts from ccRCC patients. We analyzed the RRBS data from 10 pairs of patient samples to screen the candidate CpG sites, then trained and validated an 18-CpG site model, and integrated the clinical characters to establish a Nomogram model for the prognosis or risk evaluation of ccRCC. RESULTS We identified 2261 DMRs in the promoter region. After DMR selection, 578 candidates were screened, and was correspondence with 408 CpG dinucleotides in the 450 K array. We collected the DNAm profiles of 478 ccRCC samples from TCGA dataset. Using the training set with 319 samples, a prognostic panel of 18 CpGs was determined by univariate Cox regression, LASSO regression, and multivariate Cox proportional hazards regression analyses. We constructed a prognostic model by combining the clinical signatures. In the test set (159 samples) and whole set (478 samples), the Kaplan-Meier plot showed significant differences; and the ROC curve and survival analyses showed AUC greater than 0.7. The Nomogram integrated with clinicopathological characters and methylation risk score had better performance, and the decision curve analyses also showed a beneficial effect. CONCLUSIONS This work provides insight into the role of hypermethylation in ccRCC. The targets identified might serve as biomarkers for early ccRCC diagnosis and prognosis biomarkers for ccRCC. We believe our findings have implications for better risk stratification and personalized management of this disease.
Collapse
Affiliation(s)
- Qiong Deng
- Department of Urology, Affiliated Longhua People's Hospital, Southern Medical University, Shenzhen, 518109, China
- College of Basic Medicine, Southern Medical University, Guangzhou, 510515, China
| | - Ye Du
- Central Laboratory, Affiliated Longhua People's Hospital, Southern Medical University, Shenzhen, 518109, China
| | - Zhu Wang
- Department of Urology, Affiliated Longhua People's Hospital, Southern Medical University, Shenzhen, 518109, China
| | - Yeda Chen
- Department of Urology, Affiliated Longhua People's Hospital, Southern Medical University, Shenzhen, 518109, China
| | - Jieyan Wang
- Department of Urology, Affiliated Longhua People's Hospital, Southern Medical University, Shenzhen, 518109, China
| | - Hui Liang
- Department of Urology, Affiliated Longhua People's Hospital, Southern Medical University, Shenzhen, 518109, China
| | - Du Zhang
- Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, No 7, Pengfei Road, Dapeng New District, Shenzhen, 518120, China.
| |
Collapse
|
5
|
Ren J, Zhou F, Li X, Ma S, Jiang Y, Wu C. Robust Bayesian variable selection for gene-environment interactions. Biometrics 2023; 79:684-694. [PMID: 35394058 PMCID: PMC11086965 DOI: 10.1111/biom.13670] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 03/23/2022] [Accepted: 03/28/2022] [Indexed: 11/30/2022]
Abstract
Gene-environment (G× E) interactions have important implications to elucidate the etiology of complex diseases beyond the main genetic and environmental effects. Outliers and data contamination in disease phenotypes of G× E studies have been commonly encountered, leading to the development of a broad spectrum of robust regularization methods. Nevertheless, within the Bayesian framework, the issue has not been taken care of in existing studies. We develop a fully Bayesian robust variable selection method for G× E interaction studies. The proposed Bayesian method can effectively accommodate heavy-tailed errors and outliers in the response variable while conducting variable selection by accounting for structural sparsity. In particular, for the robust sparse group selection, the spike-and-slab priors have been imposed on both individual and group levels to identify important main and interaction effects robustly. An efficient Gibbs sampler has been developed to facilitate fast computation. Extensive simulation studies, analysis of diabetes data with single-nucleotide polymorphism measurements from the Nurses' Health Study, and The Cancer Genome Atlas melanoma data with gene expression measurements demonstrate the superior performance of the proposed method over multiple competing alternatives.
Collapse
Affiliation(s)
- Jie Ren
- Department of Biostatistics and Health Data Science, Indiana University School of Medicine, Indianapolis, Indiana, USA
| | - Fei Zhou
- Department of Statistics, Kansas State University, Manhattan, Kansas, USA
| | - Xiaoxi Li
- Department of Statistics, Kansas State University, Manhattan, Kansas, USA
| | - Shuangge Ma
- Department of Biostatistics, Yale University, New Haven, Connecticut, USA
| | - Yu Jiang
- Division of Epidemiology, Biostatistics and Environmental Health, School of Public Health, University of Memphis, Memphis, Tennessee, USA
| | - Cen Wu
- Department of Statistics, Kansas State University, Manhattan, Kansas, USA
| |
Collapse
|
6
|
Lin C, Chen Y, Pan J, Lu Q, Ji P, Lin S, Liu C, Lin S, Li M, Zong J. Identification of an individualized therapy prognostic signature for head and neck squamous cell carcinoma. BMC Genomics 2023; 24:221. [PMID: 37106442 PMCID: PMC10142243 DOI: 10.1186/s12864-023-09325-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 04/20/2023] [Indexed: 04/29/2023] Open
Abstract
BACKGROUND Head and neck squamous cell carcinoma (HNSCC) are the most common cancers in the head and neck. Therapeutic response-related genes (TRRGs) are closely associated with carcinogenesis and prognosis in HNSCC. However, the clinical value and prognostic significance of TRRGs are still unclear. We aimed to construct a prognostic risk model to predict therapy response and prognosis in TRRGs-defined subgroups of HNSCC. METHODS The multiomics data and clinical information of HNSCC patients were downloaded from The Cancer Genome Atlas (TCGA). The profile data GSE65858 and GSE67614 chip was downloaded from public functional genomics data Gene Expression Omnibus (GEO). Based on TCGA-HNSC database, patients were divided into a remission group and a non-remission group according to therapy response, and differentially expressed TRRGs between those two groups were screened. Using Cox regression analysis and Least absolute shrinkage and selection operator (LASSO) analysis, candidate TRRGs that can predict the prognosis of HNSCC were identified and used to construct a TRRGs-based signature and a prognostic nomogram. RESULT A total of 1896 differentially expressed TRRGs were screened, including 1530 upregulated genes and 366 downregulated genes. Then, 206 differently expressed TRRGs that was significantly associated with the survival were chosen using univariate Cox regression analysis. Finally, a total of 20 candidate TRRGs genes were identified by LASSO analysis to establish a signature for risk prediction, and the risk score of each patient was calculated. Patients were divided into a high-risk group (Risk-H) and a low-risk group (Risk-L) based on the risk score. Results showed that the Risk-L patients had better overall survival (OS) than Risk-H patients. Receiver operating characteristic (ROC) curve analysis revealed great predictive performance for 1-, 3-, and 5-year OS in TCGA-HNSC and GEO databases. Moreover, for patients treated with post-operative radiotherapy, Risk-L patients had longer OS and lower recurrence than Risk-H patients. The nomogram involves risk score and other clinical factors had good performance in predicting survival probability. CONCLUSIONS The proposed risk prognostic signature and Nomogram based on TRRGs are novel promising tools for predicting therapy response and overall survival in HNSCC patients.
Collapse
Affiliation(s)
- Cheng Lin
- Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, Fujian Province, China
| | - Yuebing Chen
- Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, Fujian Province, China
| | - Jianji Pan
- Department of Radiation Oncology, Fujian Medical University Xiamen Humanity Hospital, Xiamen, Fujian Province, China
| | - Qiongjiao Lu
- Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, Fujian Province, China
| | - Pengjie Ji
- Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, Fujian Province, China
| | - Shuiqin Lin
- Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, Fujian Province, China
| | - Chunfeng Liu
- Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, Fujian Province, China
| | - Shaojun Lin
- Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, Fujian Province, China
| | - Meifang Li
- Department of Medical Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350300, Fujian Province, China.
| | - Jingfeng Zong
- Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, 350014, Fujian Province, China.
| |
Collapse
|
7
|
Zhou F, Liu Y, Ren J, Wang W, Wu C. Springer: An R package for bi-level variable selection of high-dimensional longitudinal data. Front Genet 2023; 14:1088223. [PMID: 37091810 PMCID: PMC10117642 DOI: 10.3389/fgene.2023.1088223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 02/28/2023] [Indexed: 04/09/2023] Open
Abstract
In high-dimensional data analysis, the bi-level (or the sparse group) variable selection can simultaneously conduct penalization on the group level and within groups, which has been developed for continuous, binary, and survival responses in the literature. Zhou et al. (2022) (PMID: 35766061) has further extended it under the longitudinal response by proposing a quadratic inference function-based penalization method in gene–environment interaction studies. This study introduces “springer,” an R package implementing the bi-level variable selection within the QIF framework developed in Zhou et al. (2022). In addition, R package “springer” has also implemented the generalized estimating equation-based sparse group penalization method. Alternative methods focusing only on the group level or individual level have also been provided by the package. In this study, we have systematically introduced the longitudinal penalization methods implemented in the “springer” package. We demonstrate the usage of the core and supporting functions, which is followed by the numerical examples and discussions. R package “springer” is available at https://cran.r-project.org/package=springer.
Collapse
Affiliation(s)
- Fei Zhou
- Department of Statistics, Kansas State University, Manhattan, KS, United States
| | - Yuwen Liu
- Department of Statistics, Kansas State University, Manhattan, KS, United States
| | - Jie Ren
- Department of Biostatistics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, IN, United States
| | - Weiqun Wang
- Department of Food, Nutrition, Dietetics and Health, Kansas State University, Manhattan, KS, United States
| | - Cen Wu
- Department of Statistics, Kansas State University, Manhattan, KS, United States
- *Correspondence: Cen Wu,
| |
Collapse
|
8
|
Ke C, Bandyopadhyay D, Acunzo M, Winn R. Gene Screening in High-Throughput Right-Censored Lung Cancer Data. ONCO 2022; 2:305-318. [PMID: 37066112 PMCID: PMC10100230 DOI: 10.3390/onco2040017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Background Advances in sequencing technologies have allowed collection of massive genome-wide information that substantially advances lung cancer diagnosis and prognosis. Identifying influential markers for clinical endpoints of interest has been an indispensable and critical component of the statistical analysis pipeline. However, classical variable selection methods are not feasible or reliable for high-throughput genetic data. Our objective is to propose a model-free gene screening procedure for high-throughput right-censored data, and to develop a predictive gene signature for lung squamous cell carcinoma (LUSC) with the proposed procedure. Methods A gene screening procedure was developed based on a recently proposed independence measure. The Cancer Genome Atlas (TCGA) data on LUSC was then studied. The screening procedure was conducted to narrow down the set of influential genes to 378 candidates. A penalized Cox model was then fitted to the reduced set, which further identified a 6-gene signature for LUSC prognosis. The 6-gene signature was validated on datasets from the Gene Expression Omnibus. Results Both model-fitting and validation results reveal that our method selected influential genes that lead to biologically sensible findings as well as better predictive performance, compared to existing alternatives. According to our multivariable Cox regression analysis, the 6-gene signature was indeed a significant prognostic factor (p-value < 0.001) while controlling for clinical covariates. Conclusions Gene screening as a fast dimension reduction technique plays an important role in analyzing high-throughput data. The main contribution of this paper is to introduce a fundamental yet pragmatic model-free gene screening approach that aids statistical analysis of right-censored cancer data, and provide a lateral comparison with other available methods in the context of LUSC.
Collapse
Affiliation(s)
- Chenlu Ke
- Department of Statistical Sciences and Operations Research, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Dipankar Bandyopadhyay
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23284, USA
- Correspondence: ; Tel.: +1-804-827-2058
| | - Mario Acunzo
- Department of Internal Medicine, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Robert Winn
- Massey Cancer Center, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
9
|
Hu R, Zhou XJ, Li W. Computational Analysis of High-Dimensional DNA Methylation Data for Cancer Prognosis. J Comput Biol 2022; 29:769-781. [PMID: 35671506 PMCID: PMC9419965 DOI: 10.1089/cmb.2022.0002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Developing cancer prognostic models using multiomics data is a major goal of precision oncology. DNA methylation provides promising prognostic biomarkers, which have been used to predict survival and treatment response in solid tumor or plasma samples. This review article presents an overview of recently published computational analyses on DNA methylation for cancer prognosis. To address the challenges of survival analysis with high-dimensional methylation data, various feature selection methods have been applied to screen a subset of informative markers. Using candidate markers associated with survival, prognostic models either predict risk scores or stratify patients into subtypes. The model's discriminatory power can be assessed by multiple evaluation metrics. Finally, we discuss the limitations of existing studies and present the prospects of applying machine learning algorithms to fully exploit the prognostic value of DNA methylation.
Collapse
Affiliation(s)
- Ran Hu
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California, USA
- Bioinformatics Interdepartmental Graduate Program, University of California at Los Angeles, Los Angeles, California, USA
- Institute for Quantitative & Computational Biosciences, University of California at Los Angeles, Los Angeles, California, USA
| | - Xianghong Jasmine Zhou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California, USA
- Institute for Quantitative & Computational Biosciences, University of California at Los Angeles, Los Angeles, California, USA
| | - Wenyuan Li
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, California, USA
- Institute for Quantitative & Computational Biosciences, University of California at Los Angeles, Los Angeles, California, USA
| |
Collapse
|
10
|
Huang CH, Han W, Wu YZ, Shen GL. Identification of aberrantly methylated differentially expressed genes and pro-tumorigenic role of KIF2C in melanoma. Front Genet 2022; 13:817656. [PMID: 35991567 PMCID: PMC9387026 DOI: 10.3389/fgene.2022.817656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2021] [Accepted: 07/04/2022] [Indexed: 11/13/2022] Open
Abstract
Background: Skin Cutaneous Melanoma (SKCM) is known as an aggressive malignant cancer, which could be directly derived from melanocytic nevi. However, the molecular mechanisms underlying the malignant transformation of melanocytes and melanoma tumor progression still remain unclear. Increasing research showed significant roles of epigenetic modifications, especially DNA methylation, in melanoma. This study focused on the identification and analysis of methylation-regulated differentially expressed genes (MeDEGs) between melanocytic nevus and malignant melanoma in genome-wide profiles.Methods: The gene expression profiling datasets (GSE3189 and GSE114445) and gene methylation profiling datasets (GSE86355 and GSE120878) were downloaded from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) and differentially methylated genes (DMGs) were identified via GEO2R. MeDEGs were obtained by integrating the DEGs and DMGs. Then, a functional enrichment analysis of MeDEGs was performed. STRING and Cytoscape were used to describe the protein-protein interaction (PPI) network. Furthermore, survival analysis was implemented to select the prognostic hub genes. Next, we conducted gene set enrichment analysis (GSEA) of hub genes. To validate, SKCM cell culture and lentivirus infection was performed to reveal the expression and behavior pattern of KIF2C. Patients and specimens were collected and then immunohistochemistry (IHC) staining was conducted.Results: We identified 237 hypomethylated, upregulated genes and 182 hypermethylated, downregulated genes. Hypomethylation-upregulated genes were enriched in biological processes of the oxidation-reduction process, cell proliferation, cell division, phosphorylation, extracellular matrix disassembly and protein sumoylation. Pathway enrichment showed selenocompound metabolism, small cell lung cancer and lysosome. Hypermethylation-downregulated genes were enriched in biological processes of positive regulation of transcription from RNA polymerase II promoter, cell adhesion, cell proliferation, positive regulation of transcription, DNA-templated and angiogenesis. The most significantly enriched pathways involved the transcriptional misregulation in cancer, circadian rhythm, tight junction, protein digestion and absorption and Hippo signaling pathway. After PPI establishment and survival analysis, seven prognostic hub genes were CKS2, DTL, KIF2C, KPNA2, MYBL2, TPX2, and FBL. Moreover, the most involved hallmarks obtained by GSEA were E2F targets, G2M checkpoint and mitotic spindle. Importantly, among the 7 hub genes, we found that down-regulated level of KIF2C expression significantly inhibited the proliferative ability of SKCM cells and suppressed the metastasis capacity of SKCM cells.Conclusions: Our study identified potential aberrantly methylated-differentially expressed genes participating in the process of malignant transformation from nevus to melanoma tissues based on comprehensive genomic profiles. Transcription profiles of CKS2, DTL, KIF2C, KPNA2, MYBL2, TPX2, and FBL provided clues of aberrantly methylation-based biomarkers, which might improve the development of precision medicine. KIF2C plays a pro-tumorigenic role and potentially inhibited the proliferative ability in SKCM.
Collapse
Affiliation(s)
- Chun-Hui Huang
- Department of Burn and Plastic Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
- Department of Surgery, Soochow University, Suzhou, China
| | - Wei Han
- Institute of Regenerative Biology and Medicine, Helmholtz Zentrum München, Munich, Germany
| | - Yi-Zhu Wu
- Department of Burn and Plastic Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
- Department of Surgery, Soochow University, Suzhou, China
| | - Guo-Liang Shen
- Department of Burn and Plastic Surgery, The First Affiliated Hospital of Soochow University, Suzhou, China
- Department of Surgery, Soochow University, Suzhou, China
- *Correspondence: Guo-Liang Shen,
| |
Collapse
|
11
|
Exploitation of Emerging Technologies and Advanced Networks for a Smart Healthcare System. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12125859] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Current medical methods still confront numerous limitations and barriers to detect and fight against illnesses and disorders. The introduction of emerging technologies in the healthcare industry is anticipated to enable novel medical techniques for an efficient and effective smart healthcare system. Internet of Things (IoT), Wireless Sensor Networks (WSN), Big Data Analytics (BDA), and Cloud Computing (CC) can play a vital role in the instant detection of illnesses, diseases, viruses, or disorders. Complicated techniques such as Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) could provide acceleration in drug and antibiotics discovery. Moreover, the integration of visualization techniques such as Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR) with Tactile Internet (TI), can be applied from the medical staff to provide the most accurate diagnosis and treatment for the patients. A novel system architecture, which combines several future technologies, is proposed in this paper. The objective is to describe the integration of a mixture of emerging technologies in assistance with advanced networks to provide a smart healthcare system that may be established in hospitals or medical centers. Such a system will be able to deliver immediate and accurate data to the medical stuff in order to aim them in order to provide precise patient diagnosis and treatment.
Collapse
|
12
|
Lin Z, Chen L, Wu T, Zhang Y, Huang X, Chen Y, Chen J, Xu Y. Prognostic Value of SPOCD1 in Esophageal Squamous Cell Carcinoma: A Comprehensive Study Based on Bioinformatics and Validation. Front Genet 2022; 13:872026. [PMID: 35646092 PMCID: PMC9130929 DOI: 10.3389/fgene.2022.872026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Accepted: 04/20/2022] [Indexed: 12/24/2022] Open
Abstract
In the study, we aimed to explore and analyze the potential function of SPOC Domain Containing 1 (SPOCD1) in esophageal squamous cell carcinoma (ESCC). We performed a comprehensive analysis of gene expression of SPOCD1 and its corresponding clinicopathological features in ESCC. In particular, the correlation between SPOCD1 and ESCC was evaluated using a wide range of analysis tools and databases, including TCGA, GTEx, GenePattern, CellMiner, GDSC, and STRING datasets. Different bioinformatics analyses, including differential expression analysis, mutation analysis, drug sensitivity analysis, function analysis, pathway analysis, co-expression network analysis, immune cell infiltration analysis, and survival analysis, were carried out to comprehensively explore the potential molecular mechanisms and functional effects of SPOCD1 on the initiation and progression of ESCC. The expression of SPOCD1 was upregulated in ESCC tissues compared to those in normal tissues. In the high SPOCD1 expression group, we found apparent mutations in TP53, TTN, and MUC16 genes, which were 92, 36, and 18%, respectively. GO and KEGG enrichment analysis of SPOCD1 and its co-expressed genes demonstrated that it may serve as an ESCC oncogene by regulating the genes expression in the essential functions and pathways of tumorigenesis, such as glycosaminoglycan binding, Cytokine-cytokine receptor interaction, and Ras signaling pathway. Besides, the immune cell infiltration results revealed that SPOCD1 expression was positively correlated with Macrophages M0 and Mast cells activated cells, and negatively correlated with plasma cells and T cells follicular helper cell infiltration. Finally, ESCC patients with high expression of SPOCD1 indicated poor overall survival. qRT-PCR demonstrated that the SPOCD1 expression in ESCC tissues was significantly higher than adjacent tissues (p < 0.001). Our study indicated that SPOCD1 was increased in ESCC tissues. The current data support the oncogenic role of SPOCD1 in the occurrence and development of ESCC. Most importantly, SPOCD1 might be an independent prognostic factor for ESCC patients.
Collapse
Affiliation(s)
- Zhizhong Lin
- Department of Radiation Oncology, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Lin Chen
- Department of Radiation Oncology, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Tingting Wu
- The School of Nusing, Fujian Medical University, Fuzhou, China.,Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
| | - Yiping Zhang
- Department of Radiation Oncology, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Xinyi Huang
- Department of Radiation Oncology, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Yuanmei Chen
- Department of Thoracic Surgery, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Junqiang Chen
- Department of Radiation Oncology, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| | - Yuanji Xu
- Department of Radiation Oncology, Fujian Medical University Cancer Hospital, Fujian Cancer Hospital, Fuzhou, China
| |
Collapse
|
13
|
Jin Z, Kang J, Yu T. Feature selection and classification over the network with missing node observations. Stat Med 2022; 41:1242-1262. [PMID: 34816464 PMCID: PMC9773124 DOI: 10.1002/sim.9267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 09/14/2021] [Accepted: 10/29/2021] [Indexed: 12/25/2022]
Abstract
Jointly analyzing transcriptomic data and the existing biological networks can yield more robust and informative feature selection results, as well as better understanding of the biological mechanisms. Selecting and classifying node features over genome-scale networks has become increasingly important in genomic biology and genomic medicine. Existing methods have some critical drawbacks. The first is they do not allow flexible modeling of different subtypes of selected nodes. The second is they ignore nodes with missing values, very likely to increase bias in estimation. To address these limitations, we propose a general modeling framework for Bayesian node classification (BNC) with missing values. A new prior model is developed for the class indicators incorporating the network structure. For posterior computation, we resort to the Swendsen-Wang algorithm for efficiently updating class indicators. BNC can naturally handle missing values in the Bayesian modeling framework, which improves the node classification accuracy and reduces the bias in estimating gene effects. We demonstrate the advantages of our methods via extensive simulation studies and the analysis of the cutaneous melanoma dataset from The Cancer Genome Atlas.
Collapse
Affiliation(s)
| | - Jian Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - Tianwei Yu
- School of Data Science and Warshel Institute, The Chinese University of Hong Kong - Shenzhen, and Shenzhen Research Institute of Big Data, Shenzhen, China
| |
Collapse
|
14
|
Zhou F, Ren J, Liu Y, Li X, Wang W, Wu C. Interep: An R Package for High-Dimensional Interaction Analysis of the Repeated Measurement Data. Genes (Basel) 2022; 13:544. [PMID: 35328097 PMCID: PMC8950762 DOI: 10.3390/genes13030544] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/12/2022] [Accepted: 03/13/2022] [Indexed: 02/05/2023] Open
Abstract
We introduce interep, an R package for interaction analysis of repeated measurement data with high-dimensional main and interaction effects. In G × E interaction studies, the forms of environmental factors play a critical role in determining how structured sparsity should be imposed in the high-dimensional scenario to identify important effects. Zhou et al. (2019) (PMID: 31816972) proposed a longitudinal penalization method to select main and interaction effects corresponding to the individual and group structure, respectively, which requires a mixture of individual and group level penalties. The R package interep implements generalized estimating equation (GEE)-based penalization methods with this sparsity assumption. Moreover, alternative methods have also been implemented in the package. These alternative methods merely select effects on an individual level and ignore the group-level interaction structure. In this software article, we first introduce the statistical methodology corresponding to the penalized GEE methods implemented in the package. Next, we present the usage of the core and supporting functions, which is followed by a simulation example with R codes and annotations. The R package interep is available at The Comprehensive R Archive Network (CRAN).
Collapse
Affiliation(s)
- Fei Zhou
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA; (F.Z.); (Y.L.); (X.L.)
| | - Jie Ren
- Department of Biostatistics and Health Data Sciences, Indiana University School of Medicine, Indianapolis, IN 46202, USA;
| | - Yuwen Liu
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA; (F.Z.); (Y.L.); (X.L.)
| | - Xiaoxi Li
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA; (F.Z.); (Y.L.); (X.L.)
| | - Weiqun Wang
- Department of Food, Nutrition, Dietetics and Health, Kansas State University, Manhattan, KS 66506, USA;
| | - Cen Wu
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA; (F.Z.); (Y.L.); (X.L.)
| |
Collapse
|
15
|
Huang H, Wu N, Liang Y, Peng X, Jun S. SLNL: A novel method for gene selection and phenotype classification. INT J INTELL SYST 2022. [DOI: 10.1002/int.22844] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Affiliation(s)
- HaiHui Huang
- School of Information Engineering Shaoguan University Shaoguan China
| | - NaiQi Wu
- Macau Institute of Systems Engineering and Collaborative Laboratory of Intelligent Science and Systems Macau University of Science and Technology Macau China
| | - Yong Liang
- The Peng Cheng Laboratory Shenzhen China
| | - XinDong Peng
- School of Information Engineering Shaoguan University Shaoguan China
| | - Shu Jun
- School of Mathematics and Statistics Xi'an Jiaotong University Xi'an China
| |
Collapse
|
16
|
Gao M, Wen C. Subset selection in network-linked data. J STAT COMPUT SIM 2022. [DOI: 10.1080/00949655.2022.2029444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Affiliation(s)
- Mingyu Gao
- School of Data Science, University of Science and Technology of China, Hefei, People's Republic of China
| | - Canhong Wen
- International Institute of Finance, School of Management, University of Science of Technology of China, Hefei, People's Republic of China
| |
Collapse
|
17
|
Xu Y, Hong M, Kong D, Deng J, Zhong Z, Liang J. Ferroptosis-associated DNA methylation signature predicts overall survival in patients with head and neck squamous cell carcinoma. BMC Genomics 2022; 23:63. [PMID: 35042463 PMCID: PMC8767683 DOI: 10.1186/s12864-022-08296-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Accepted: 01/05/2022] [Indexed: 01/12/2023] Open
Abstract
Background Head and neck squamous cell carcinoma (HNSCC) is a common cancer characterized by late diagnosis and poor prognosis. The aim of this study was to identify a novel ferroptosis-related DNA methylation signature as an alternative diagnosis index for patients with HNSCC. Methods Methylome and transcriptome data of 499 HNSCC patients, including 275 oral squamous cell carcinoma (OSCC) samples, were obtained from The Cancer Genome Atlas (TCGA). An additional independent methylation dataset of 50 OSCC patients from the NCBI Gene Expression Omnibus (GEO) database was used for validation. As an index of ferroptosis activity, the ferroptosis score (FS) of each patient was inferred from the transcriptome data using single-sample gene set enrichment analysis. Univariate, multivariate, and LASSO Cox regression analyses were used to select CpG sites for the construction of a ferroptosis-related DNA methylation signature for diagnosis of patients. Results We initially inferred the FS of each TCGA HNSCC patient and divided the samples into high- and low-FS subgroups. Results showed that the high-FS subgroup displayed poor overall survival. Moreover, 378 differentially methylated CpG sites (DMCs) were identified between the two HNSCC subgroups, with 16 selected to construct a 16-DNA methylation signature for risk prediction in HNSCC patients using the LASSO and multivariate Cox regression models. Relative operating characteristic (ROC) curve analysis showed great predictive efficiency for 1-, 3-, and 5-year HNSCC survival using the 16-DNA methylation signature. Its predictive efficiency was also observed in OSCC patients from the TCGA and GEO databases. In addition, we found that the signature was associated with the fractions of immune types in the tumor immune microenvironment (TIME), suggesting potential interactions between ferroptosis and TIME in HNSCC progression. Conclusions We established a novel ferroptosis-related 16-DNA methylation signature that could be applied as an alternative tool to predict prognosis outcome in patients with HNSCC, including OSCC. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08296-z.
Collapse
|
18
|
Zhao J, Liu Z, Zheng X, Gao H, Li L. Prognostic Model and Nomogram Construction Based on a Novel Ferroptosis-Related Gene Signature in Lower-Grade Glioma. Front Genet 2021; 12:753680. [PMID: 34819946 PMCID: PMC8606636 DOI: 10.3389/fgene.2021.753680] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2021] [Accepted: 10/04/2021] [Indexed: 01/31/2023] Open
Abstract
Background: Low-grade glioma (LGG) is considered a fatal disease for young adults, with overall survival widely ranging from 1 to 15 years depending on histopathologic and molecular subtypes. As a novel type of programmed cell death, ferroptosis was reported to be involved in tumorigenesis and development, which has been intensively studied in recent years. Methods: For the discovery cohort, data from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) were used to identify the differentially expressed and prognostic ferroptosis-related genes (FRGs). The least absolute shrinkage and selection operator (LASSO) and multivariate Cox were used to establish a prognostic signature with the above-selected FRGs. Then, the signature was developed and validated in TCGA and Chinese Glioma Genome Atlas (CGGA) databases. By combining clinicopathological features and the FRG signature, a nomogram was established to predict individuals’ one-, three-, and five-year survival probability, and its predictive performance was evaluated by Harrell’s concordance index (C-index) and calibration curves. Enrichment analysis was performed to explore the signaling pathways regulated by the signature. Results: A novel risk signature contains seven FRGs that were constructed and were used to divide patients into two groups. Kaplan–Meier (K−M) survival curve and receiver-operating characteristic (ROC) curve analyses confirmed the prognostic performance of the risk model, followed by external validation based on data from the CGGA. The nomogram based on the risk signature and clinical traits was validated to perform well for predicting the survival rate of LGG. Finally, functional analysis revealed that the immune statuses were different between the two risk groups, which might help explain the underlying mechanisms of ferroptosis in LGG. Conclusion: In conclusion, this study constructed a novel and robust seven-FRG signature and established a prognostic nomogram for LGG survival prediction.
Collapse
Affiliation(s)
- Junsheng Zhao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Zhengtao Liu
- Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Xiaoping Zheng
- Department of Pathology, Hangzhou Tongchuang Medical Laboratory, Hangzhou, China
| | - Hainv Gao
- Department of Infectious Diseases, ShuLan (Hangzhou) Hospital Affiliated to Zhejiang Shuren University, Shulan International Medical College, Hangzhou, China
| | - Lanjuan Li
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
19
|
Kang Z, Li W, Yu YH, Che M, Yang ML, Len JJ, Wu YR, Yang JF. Identification of Immune-Related Genes Associated With Bladder Cancer Based on Immunological Characteristics and Their Correlation With the Prognosis. Front Genet 2021; 12:763590. [PMID: 34899848 PMCID: PMC8664377 DOI: 10.3389/fgene.2021.763590] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 11/08/2021] [Indexed: 01/21/2023] Open
Abstract
BACKGROUND To identify the immune-related genes of bladder cancer (BLCA) based on immunological characteristics and explore their correlation with the prognosis. METHODS We downloaded the gene and clinical data of BLCA from the Cancer Genome Atlas (TCGA) as the training group, and obtained immune-related genes from the Immport database. We downloaded GSE31684 and GSE39281 from the Gene Expression Omnibus (GEO) as the external validation group. R (version 4.0.5) and Perl were used to analyze all data. RESULT Univariate Cox regression analysis and Lasso regression analysis revealed that 9 prognosis-related immunity genes (PIMGs) of differentially expressed immune genes (DEIGs) were significantly associated with the survival of BLCA patients (p < 0.01), of which 5 genes, including NPR2, PDGFRA, VIM, RBP1, RBP1 and TNC, increased the risk of the prognosis, while the rest, including CD3D, GNLY, LCK, and ZAP70, decreased the risk of the prognosis. Then, we used these genes to establish a prognostic model. We drew receiver operator characteristic (ROC) curves in the training group, and estimated the area under the curve (AUC) of 1-, 3- and 5-year survival for this model, which were 0.688, 0.719, and 0.706, respectively. The accuracy of the prognostic model was verified by the calibration chart. Combining clinical factors, we established a nomogram. The ROC curve in the external validation group showed that the nomogram had a good predictive ability for the survival rate, with a high accuracy, and the AUC values of 1-, 3-, and 5-year survival were 0.744, 0.770, and 0.782, respectively. The calibration chart indicated that the nomogram performed similarly with the ideal model. CONCLUSION We had identified nine genes, including PDGFRA, VIM, RBP1, RBP1, TNC, CD3D, GNLY, LCK, and ZAP70, which played important roles in the occurrence and development of BLCA. The prognostic model based on these genes had good accuracy in predicting the OS of patients and might be promising candidates of therapeutic targets. This study may provide a new insight for the diagnosis, treatment and prognosis of BLCA from the perspective of immunology. However, further experimental studies are necessary to reveal the underlying mechanisms by which these genes mediate the progression of BLCA.
Collapse
Affiliation(s)
- Zhen Kang
- The Affiliated Hospital, Kunming University of Science and Technology, Kunming, China
- Department of Urology, The First People’s Hospital of Yunnan Province, Kunming, China
| | - Wei Li
- The Affiliated Hospital, Kunming University of Science and Technology, Kunming, China
- Department of Urology, The First People’s Hospital of Yunnan Province, Kunming, China
| | - Yan-Hong Yu
- The Affiliated Hospital, Kunming University of Science and Technology, Kunming, China
- Department of Urology, The First People’s Hospital of Yunnan Province, Kunming, China
| | - Meng Che
- The Affiliated Hospital, Kunming University of Science and Technology, Kunming, China
| | - Mao-Lin Yang
- The Affiliated Hospital, Kunming University of Science and Technology, Kunming, China
- Department of Urology, The First People’s Hospital of Yunnan Province, Kunming, China
| | - Jin-Jun Len
- The Affiliated Hospital, Kunming University of Science and Technology, Kunming, China
- Department of Urology, The First People’s Hospital of Yunnan Province, Kunming, China
| | - Yue-Rong Wu
- The Affiliated Hospital, Kunming University of Science and Technology, Kunming, China
| | - Jun-Feng Yang
- The Affiliated Hospital, Kunming University of Science and Technology, Kunming, China
- Department of Urology, The First People’s Hospital of Yunnan Province, Kunming, China
| |
Collapse
|
20
|
Steinauer N, Zhang K, Guo C, Zhang J. Computational Modeling of Gene-Specific Transcriptional Repression, Activation and Chromatin Interactions in Leukemogenesis by LASSO-Regularized Logistic Regression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2109-2122. [PMID: 33961561 PMCID: PMC8572318 DOI: 10.1109/tcbb.2021.3078128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/14/2023]
Abstract
Many physiological and pathological pathways are dependent on gene-specific on/off regulation of transcription. Some genes are repressed, while others are activated. Although many previous studies have analyzed the mechanisms of gene-specific repression and activation, these studies are mainly based on the use of candidate genes, which are either repressed or activated, without simultaneously comparing and contrasting both groups of genes. There is also insufficient consideration of gene locations. Here we describe an integrated machine learning approach, using LASSO-regularized logistic regression, to model gene-specific repression and activation and the underlying contribution of chromatin interactions. LASSO-regularized logistic regression accurately predicted gene-specific transcriptional events and robustly detected the rate-limiting factors that underlie the differences of gene activation and repression. An example was provided by the leukemogenic transcription factor AML1-ETO, which is responsible for 10-15 percent of all acute myeloid leukemia cases. The analysis of AML1-ETO has also revealed novel networks of chromatin interactions and uncovered an unexpected role for E-proteins in AML1-ETO-p300 interactions and a role for the pre-existing gene state in governing the transcriptional response. Our results show that logistic regression-based probabilistic modeling is a promising tool to decipher mechanisms that integrate gene regulation and chromatin interactions in regulated transcription.
Collapse
|
21
|
Huang HH, Liang Y. A Novel Cox Proportional Hazards Model for High-Dimensional Genomic Data in Cancer Prognosis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1821-1830. [PMID: 31870990 DOI: 10.1109/tcbb.2019.2961667] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The Cox proportional hazards model is a popular method to study the connection between feature and survival time. Because of the high-dimensionality of genomic data, existing Cox models trained on any specific dataset often generalize poorly to other independent datasets. In this paper, we suggest a novel strategy for the Cox model. This strategy is included a new learning technique, self-paced learning (SPL), and a new gene selection method, SCAD-Net penalty. The SPL method is adopted to aid to build a more accurate prediction with its built-in mechanism of learning from easy samples first and adaptively learning from hard samples. The SCAD-Net penalty has fixed the problem of the SCAD method without an inherent mechanism to fuse the prior graphical information. We combined the SPL with the SCAD-Net penalty to the Cox model (SSNC). The simulation shows that the SSNC outperforms the benchmark in terms of prediction and gene selection. The analysis of a large-scale experiment across several cancer datasets shows that the SSNC method not only results in higher prediction accuracies but also identifies markers that satisfactory stability across another validation dataset. The demo code for the proposed method is provided in supplemental file.
Collapse
|
22
|
Spirko-Burns L, Devarajan K. Supervised Dimension Reduction for Large-Scale "Omics" Data With Censored Survival Outcomes Under Possible Non-Proportional Hazards. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2032-2044. [PMID: 31940547 DOI: 10.1109/tcbb.2020.2965934] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
The past two decades have witnessed significant advances in high-throughput "omics" technologies such as genomics, proteomics, metabolomics, transcriptomics and radiomics. These technologies have enabled simultaneous measurement of the expression levels of tens of thousands of features from individual patient samples and have generated enormous amounts of data that require analysis and interpretation. One specific area of interest has been in studying the relationship between these features and patient outcomes, such as overall and recurrence-free survival, with the goal of developing a predictive "omics" profile. Large-scale studies often suffer from the presence of a large fraction of censored observations and potential time-varying effects of features, and methods for handling them have been lacking. In this paper, we propose supervised methods for feature selection and survival prediction that simultaneously deal with both issues. Our approach utilizes continuum power regression (CPR) - a framework that includes a variety of regression methods - in conjunction with the parametric or semi-parametric accelerated failure time (AFT) model. Both CPR and AFT fall within the linear models framework and, unlike black-box models, the proposed prognostic index has a simple yet useful interpretation. We demonstrate the utility of our methods using simulated and publicly available cancer genomics data.
Collapse
|
23
|
Hu Z, Zhou Y, Tong T. Meta-Analyzing Multiple Omics Data With Robust Variable Selection. Front Genet 2021; 12:656826. [PMID: 34290735 PMCID: PMC8288516 DOI: 10.3389/fgene.2021.656826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 05/24/2021] [Indexed: 12/03/2022] Open
Abstract
High-throughput omics data are becoming more and more popular in various areas of science. Given that many publicly available datasets address the same questions, researchers have applied meta-analysis to synthesize multiple datasets to achieve more reliable results for model estimation and prediction. Due to the high dimensionality of omics data, it is also desirable to incorporate variable selection into meta-analysis. Existing meta-analyzing variable selection methods are often sensitive to the presence of outliers, and may lead to missed detections of relevant covariates, especially for lasso-type penalties. In this paper, we develop a robust variable selection algorithm for meta-analyzing high-dimensional datasets based on logistic regression. We first search an outlier-free subset from each dataset by borrowing information across the datasets with repeatedly use of the least trimmed squared estimates for the logistic model and together with a hierarchical bi-level variable selection technique. We then refine a reweighting step to further improve the efficiency after obtaining a reliable non-outlier subset. Simulation studies and real data analysis show that our new method can provide more reliable results than the existing meta-analysis methods in the presence of outliers.
Collapse
Affiliation(s)
- Zongliang Hu
- College of Mathematics and Statistics, Shenzhen University, Shenzhen, China
| | - Yan Zhou
- College of Mathematics and Statistics, Shenzhen University, Shenzhen, China
| | - Tiejun Tong
- Department of Mathematics, Hong Kong Baptist University, Kowloon Tong, Hong Kong
| |
Collapse
|
24
|
Huang H, Peng X, Liang Y. SPLSN: An efficient tool for survival analysis and biomarker selection. INT J INTELL SYST 2021. [DOI: 10.1002/int.22532] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Hai‐Hui Huang
- Faculty of Information Technology Macau University of Science and Technology Macau China
- Laboratory of Intelligent Science and Systems, Macau Institute of Systems Engineering and Collaborative Macau University of Science and Technology Macau China
| | - Xin‐Dong Peng
- School of Information Engineering Shaoguan University Shaoguan China
| | - Yong Liang
- Laboratory of Intelligent Science and Systems, Macau Institute of Systems Engineering and Collaborative Macau University of Science and Technology Macau China
- State Key Laboratory of Quality Research in Chinese Medicines Macau University of Science and Technology Macau China
| |
Collapse
|
25
|
Yoshihara T, Zaitsu M, Ito K, Chung E, Matsumoto M, Manabe J, Sakamoto T, Tsukikawa H, Nakagawa M, Shingu M, Matsuki S, Irie S. Statistical Analysis of the Axillary Temperatures Measured by a Predictive Electronic Thermometer in Healthy Japanese Adults. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2021; 18:ijerph18105096. [PMID: 34065809 PMCID: PMC8151447 DOI: 10.3390/ijerph18105096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 04/30/2021] [Accepted: 05/05/2021] [Indexed: 11/16/2022]
Abstract
Body temperature is important for diagnosing illnesses. However, its assessment is often a difficult task, considering the large individual differences. Although 37 °C has been the gold standard of body temperature for over a century, the temperature of modern people is reportedly decreasing year by year. However, a mean axillary temperature of 36.89 ± 0.34 °C reported in 1957 is still cited in Japan. To assess the measured axillary temperature appropriately, understanding its distribution in modern people is important. This study retrospectively analyzed 2454 axillary temperature measurement data of healthy Japanese adults in 2019 (age range, 20–79 years; 2258 males). Their mean temperature was 36.47 ± 0.28 °C (36.48 ± 0.27 °C in males and 36.35 ± 0.31 °C in females). Approximately 5% of the 20–39-year-old males had body temperature ≥37 °C, whereas 8% had a temperature ≥ 37 °C in the afternoon. However, none of the subjects aged ≥50 years reported body temperature ≥37 °C. In multivariable regression analysis, age, blood pressure, pulse rate, and measurement time of the day were associated with axillary temperature. Our data showed that the body temperature of modern Japanese adults was lower than that reported previously. When assessing body temperature, the age, blood pressure, pulse rate, and measurement time of the day should be considered.
Collapse
Affiliation(s)
- Tatsuya Yoshihara
- SOUSEIKAI Fukuoka Mirai Hospital Clinical Research Center, Kashiiteriha 3-5-1, Higashi-ku, Fukuoka 813-0017, Japan; (M.M.); (J.M.); (T.S.); (H.T.); (M.N.); (M.S.); (S.M.); (S.I.)
- Correspondence: ; Tel.: +81-92-662-3608
| | - Masayoshi Zaitsu
- Department of Public Health, Dokkyo Medical University School of Medicine, 880 Kitakobayashi, Mibu-machi, Shimotsuga-gun, Tochigi 321-0293, Japan;
| | - Kazuya Ito
- SOUSEIKAI Clinical Epidemiological Research Center, Kashiiteriha 3-5-1, Higashi-ku, Fukuoka 813-0017, Japan;
- College of Healthcare Management, Takayanagi 960-4, Setaka-machi, Miyama 835-0018, Japan
| | - Eunhee Chung
- SOUSEIKAI Global Clinical Research Center, Kashiiteriha 3-5-1, Higashi-ku, Fukuoka 813-0017, Japan;
| | - Mayumi Matsumoto
- SOUSEIKAI Fukuoka Mirai Hospital Clinical Research Center, Kashiiteriha 3-5-1, Higashi-ku, Fukuoka 813-0017, Japan; (M.M.); (J.M.); (T.S.); (H.T.); (M.N.); (M.S.); (S.M.); (S.I.)
| | - Junko Manabe
- SOUSEIKAI Fukuoka Mirai Hospital Clinical Research Center, Kashiiteriha 3-5-1, Higashi-ku, Fukuoka 813-0017, Japan; (M.M.); (J.M.); (T.S.); (H.T.); (M.N.); (M.S.); (S.M.); (S.I.)
| | - Takashi Sakamoto
- SOUSEIKAI Fukuoka Mirai Hospital Clinical Research Center, Kashiiteriha 3-5-1, Higashi-ku, Fukuoka 813-0017, Japan; (M.M.); (J.M.); (T.S.); (H.T.); (M.N.); (M.S.); (S.M.); (S.I.)
| | - Hiroshi Tsukikawa
- SOUSEIKAI Fukuoka Mirai Hospital Clinical Research Center, Kashiiteriha 3-5-1, Higashi-ku, Fukuoka 813-0017, Japan; (M.M.); (J.M.); (T.S.); (H.T.); (M.N.); (M.S.); (S.M.); (S.I.)
| | - Misato Nakagawa
- SOUSEIKAI Fukuoka Mirai Hospital Clinical Research Center, Kashiiteriha 3-5-1, Higashi-ku, Fukuoka 813-0017, Japan; (M.M.); (J.M.); (T.S.); (H.T.); (M.N.); (M.S.); (S.M.); (S.I.)
| | - Masami Shingu
- SOUSEIKAI Fukuoka Mirai Hospital Clinical Research Center, Kashiiteriha 3-5-1, Higashi-ku, Fukuoka 813-0017, Japan; (M.M.); (J.M.); (T.S.); (H.T.); (M.N.); (M.S.); (S.M.); (S.I.)
| | - Shunji Matsuki
- SOUSEIKAI Fukuoka Mirai Hospital Clinical Research Center, Kashiiteriha 3-5-1, Higashi-ku, Fukuoka 813-0017, Japan; (M.M.); (J.M.); (T.S.); (H.T.); (M.N.); (M.S.); (S.M.); (S.I.)
| | - Shin Irie
- SOUSEIKAI Fukuoka Mirai Hospital Clinical Research Center, Kashiiteriha 3-5-1, Higashi-ku, Fukuoka 813-0017, Japan; (M.M.); (J.M.); (T.S.); (H.T.); (M.N.); (M.S.); (S.M.); (S.I.)
| |
Collapse
|
26
|
Du Y, Fan K, Lu X, Wu C. Integrating Multi–Omics Data for Gene-Environment Interactions. BIOTECH 2021; 10:biotech10010003. [PMID: 35822775 PMCID: PMC9245467 DOI: 10.3390/biotech10010003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2020] [Revised: 01/22/2021] [Accepted: 01/22/2021] [Indexed: 01/05/2023] Open
Abstract
Gene-environment (G×E) interaction is critical for understanding the genetic basis of complex disease beyond genetic and environment main effects. In addition to existing tools for interaction studies, penalized variable selection emerges as a promising alternative for dissecting G×E interactions. Despite the success, variable selection is limited in terms of accounting for multidimensional measurements. Published variable selection methods cannot accommodate structured sparsity in the framework of integrating multiomics data for disease outcomes. In this paper, we have developed a novel variable selection method in order to integrate multi-omics measurements in G×E interaction studies. Extensive studies have already revealed that analyzing omics data across multi-platforms is not only sensible biologically, but also resulting in improved identification and prediction performance. Our integrative model can efficiently pinpoint important regulators of gene expressions through sparse dimensionality reduction, and link the disease outcomes to multiple effects in the integrative G×E studies through accommodating a sparse bi-level structure. The simulation studies show the integrative model leads to better identification of G×E interactions and regulators than alternative methods. In two G×E lung cancer studies with high dimensional multi-omics data, the integrative model leads to an improved prediction and findings with important biological implications.
Collapse
|
27
|
Mg 2+ Transporters in Digestive Cancers. Nutrients 2021; 13:nu13010210. [PMID: 33450887 PMCID: PMC7828344 DOI: 10.3390/nu13010210] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Revised: 01/07/2021] [Accepted: 01/08/2021] [Indexed: 02/08/2023] Open
Abstract
Despite magnesium (Mg2+) representing the second most abundant cation in the cell, its role in cellular physiology and pathology is far from being elucidated. Mg2+ homeostasis is regulated by Mg2+ transporters including Mitochondrial RNA Splicing Protein 2 (MRS2), Transient Receptor Potential Cation Channel Subfamily M, Member 6/7 (TRPM6/7), Magnesium Transporter 1 (MAGT1), Solute Carrier Family 41 Member 1 (SCL41A1), and Cyclin and CBS Domain Divalent Metal Cation Transport Mediator (CNNM) proteins. Recent data show that Mg2+ transporters may regulate several cancer cell hallmarks. In this review, we describe the expression of Mg2+ transporters in digestive cancers, the most common and deadliest malignancies worldwide. Moreover, Mg2+ transporters’ expression, correlation and impact on patient overall and disease-free survival is analyzed using Genotype Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) datasets. Finally, we discuss the role of these Mg2+ transporters in the regulation of cancer cell fates and oncogenic signaling pathways.
Collapse
|
28
|
Zhou Z, Huang H, Liang Y. Cancer classification and biomarker selection via a penalized logsum network-based logistic regression model. Technol Health Care 2021; 29:287-295. [PMID: 33682765 PMCID: PMC8150479 DOI: 10.3233/thc-218026] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
BACKGROUND In genome research, it is particularly important to identify molecular biomarkers or signaling pathways related to phenotypes. Logistic regression model is a powerful discrimination method that can offer a clear statistical explanation and obtain the classification probability of classification label information. However, it is unable to fulfill biomarker selection. OBJECTIVE The aim of this paper is to give the model efficient gene selection capability. METHODS In this paper, we propose a new penalized logsum network-based regularization logistic regression model for gene selection and cancer classification. RESULTS Experimental results on simulated data sets show that our method is effective in the analysis of high-dimensional data. For a large data set, the proposed method has achieved 89.66% (training) and 90.02% (testing) AUC performances, which are, on average, 5.17% (training) and 4.49% (testing) better than mainstream methods. CONCLUSIONS The proposed method can be considered a promising tool for gene selection and cancer classification of high-dimensional biological data.
Collapse
Affiliation(s)
- Zhiming Zhou
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
| | - Haihui Huang
- Faculty of Information Technology, Macau University of Science and Technology, Macau, China
- Shaoguan University, Shaoguan, Guangdong, China
| | - Yong Liang
- Macau Institute of Systems Engineering and Collaborative Laboratory of Intelligent Science and Systems, Macau University of Science and Technology, Macau, China
| |
Collapse
|
29
|
Zhou F, Ren J, Lu X, Ma S, Wu C. Gene-Environment Interaction: A Variable Selection Perspective. Methods Mol Biol 2021; 2212:191-223. [PMID: 33733358 DOI: 10.1007/978-1-0716-0947-7_13] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Gene-environment interactions have important implications for elucidating the genetic basis of complex diseases beyond the joint function of multiple genetic factors and their interactions (or epistasis). In the past, G × E interactions have been mainly conducted within the framework of genetic association studies. The high dimensionality of G × E interactions, due to the complicated form of environmental effects and the presence of a large number of genetic factors including gene expressions and SNPs, has motivated the recent development of penalized variable selection methods for dissecting G × E interactions, which has been ignored in the majority of published reviews on genetic interaction studies. In this article, we first survey existing studies on both gene-environment and gene-gene interactions. Then, after a brief introduction to the variable selection methods, we review penalization and relevant variable selection methods in marginal and joint paradigms, respectively, under a variety of conceptual models. Discussions on strengths and limitations, as well as computational aspects of the variable selection methods tailored for G × E studies, have also been provided.
Collapse
Affiliation(s)
- Fei Zhou
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| | - Jie Ren
- Department of Biostatistics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Xi Lu
- Department of Statistics, Kansas State University, Manhattan, KS, USA
| | - Shuangge Ma
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, USA
| | - Cen Wu
- Department of Statistics, Kansas State University, Manhattan, KS, USA.
| |
Collapse
|
30
|
Cai L, Hu C, Yu S, Liu L, Zhao J, Zhao Y, Lin F, Du X, Yu Q, Xiao Q. Identification of EMT-Related Gene Signatures to Predict the Prognosis of Patients With Endometrial Cancer. Front Genet 2020; 11:582274. [PMID: 33343628 PMCID: PMC7738567 DOI: 10.3389/fgene.2020.582274] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 10/30/2020] [Indexed: 12/17/2022] Open
Abstract
Background Endometrial cancer (EC) is one of the most common gynecological cancers. Epithelial–mesenchymal transition (EMT) is believed to be significantly associated with the malignant progression of tumors. However, there is no relevant study on the relationship between EMT-related gene (ERG) signatures and the prognosis of EC patients. Methods We extracted the mRNA expression profiles of 543 tumor and 23 normal tissues from The Cancer Genome Atlas database. Then, we selected differentially expressed ERGs (DEERGs) among these mRNAs. Next, univariate and multivariate Cox regression analyses were performed to select the ERGs with predictive ability for the prognosis of EC patients. In addition, risk score models were constructed based on the selected genes to predict patients’ overall survival (OS), progression-free survival (PFS), and disease-free survival (DFS). Finally, nomograms were constructed to estimate the OS and PFS of EC patients, and pan-cancer analysis was performed to further analyze the functions of a certain gene. Results Six OS-, ten PFS-, and five DFS-related ERGs were obtained. By constructing the prognostic risk score model, we found that the OS, PFS, and DFS of the high-risk group were notably poorer. Last, we found that AQP5 appeared in all three gene signatures, and through pan-cancer analysis, it was also found to play an important role in immunity in lower grade glioma (LGG), which may contribute to the poor prognosis of LGG patients. Conclusions We constructed ERG signatures to predict the prognosis of EC patients using bioinformatics methods. Our findings provide a thorough understanding of the effect of EMT in patients with EC and provide new targets and ideas for individualized treatment, which has important clinical significance.
Collapse
Affiliation(s)
- Luya Cai
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Chuan Hu
- Department of Orthopaedic Surgery, The Affiliated Hospital of Qingdao University, Qingdao, China
| | - Shanshan Yu
- Department of Chemoradiation Oncology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Lixiao Liu
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Jinduo Zhao
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Ye Zhao
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Fan Lin
- Department of Dermatology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Xuedan Du
- Department of Chemoradiation Oncology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Qiongjie Yu
- Department of Chemoradiation Oncology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| | - Qinqin Xiao
- Department of Radiology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, China
| |
Collapse
|
31
|
Zhang Y, Ma S, Niu Q, Han Y, Liu X, Jiang J, Chen S, Lin H. Features of alternative splicing in stomach adenocarcinoma and their clinical implication: a research based on massive sequencing data. BMC Genomics 2020; 21:580. [PMID: 32831016 PMCID: PMC7443856 DOI: 10.1186/s12864-020-06997-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Accepted: 08/17/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Alternative splicing (AS) offers a main mechanism to form protein polymorphism. A growing body of evidence indicates the correlation between splicing disorders and carcinoma. Nevertheless, an overall analysis of AS signatures in stomach adenocarcinoma (STAD) is absent and urgently needed. RESULTS 2042 splicing events were confirmed as prognostic molecular events. Furthermore, the final prognostic signature constructed by 10 AS events gave good result with an area under the curve (AUC) of receiver operating characteristic (ROC) curve up to 0.902 for 5 years, showing high potency in predicting patient outcome. We built the splicing regulatory network to show the internal regulation mechanism of splicing events in STAD. QKI may play a significant part in the prognosis induced by splicing events. CONCLUSIONS In our study, a high-efficiency prognostic prediction model was built for STAD patients, and the results showed that AS events could become potential prognostic biomarkers for STAD. Meanwhile, QKI may become an important target for drug design in the future.
Collapse
Affiliation(s)
- Yuanyuan Zhang
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China.
| | - Shengling Ma
- Institute of Hematology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Qian Niu
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Yun Han
- Department of Ophthalmology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Xingyu Liu
- Department of Gynecology and Obstetrics, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Jie Jiang
- Department of Anesthesiology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Simiao Chen
- Department of Neurology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| | - Haolong Lin
- Department of Hematology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China
| |
Collapse
|
32
|
Vinga S. Structured sparsity regularization for analyzing high-dimensional omics data. Brief Bioinform 2020; 22:77-87. [PMID: 32597465 DOI: 10.1093/bib/bbaa122] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2019] [Revised: 05/15/2020] [Accepted: 05/18/2020] [Indexed: 12/18/2022] Open
Abstract
The development of new molecular and cell technologies is having a significant impact on the quantity of data generated nowadays. The growth of omics databases is creating a considerable potential for knowledge discovery and, concomitantly, is bringing new challenges to statistical learning and computational biology for health applications. Indeed, the high dimensionality of these data may hamper the use of traditional regression methods and parameter estimation algorithms due to the intrinsic non-identifiability of the inherent optimization problem. Regularized optimization has been rising as a promising and useful strategy to solve these ill-posed problems by imposing additional constraints in the solution parameter space. In particular, the field of statistical learning with sparsity has been significantly contributing to building accurate models that also bring interpretability to biological observations and phenomena. Beyond the now-classic elastic net, one of the best-known methods that combine lasso with ridge penalizations, we briefly overview recent literature on structured regularizers and penalty functions that have been applied in biomedical data to build parsimonious models in a variety of underlying contexts, from survival to generalized linear models. These methods include functions of $\ell _k$-norms and network-based penalties that take into account the inherent relationships between the features. The successful application to omics data illustrates the potential of sparse structured regularization for identifying disease's molecular signatures and for creating high-performance clinical decision support systems towards more personalized healthcare. Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Susana Vinga
- INESC-ID, Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
33
|
Wu C, Zhou F, Ren J, Li X, Jiang Y, Ma S. A Selective Review of Multi-Level Omics Data Integration Using Variable Selection. High Throughput 2019; 8:E4. [PMID: 30669303 PMCID: PMC6473252 DOI: 10.3390/ht8010004] [Citation(s) in RCA: 114] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2018] [Revised: 12/24/2018] [Accepted: 01/10/2019] [Indexed: 01/02/2023] Open
Abstract
High-throughput technologies have been used to generate a large amount of omics data. In the past, single-level analysis has been extensively conducted where the omics measurements at different levels, including mRNA, microRNA, CNV and DNA methylation, are analyzed separately. As the molecular complexity of disease etiology exists at all different levels, integrative analysis offers an effective way to borrow strength across multi-level omics data and can be more powerful than single level analysis. In this article, we focus on reviewing existing multi-omics integration studies by paying special attention to variable selection methods. We first summarize published reviews on integrating multi-level omics data. Next, after a brief overview on variable selection methods, we review existing supervised, semi-supervised and unsupervised integrative analyses within parallel and hierarchical integration studies, respectively. The strength and limitations of the methods are discussed in detail. No existing integration method can dominate the rest. The computation aspects are also investigated. The review concludes with possible limitations and future directions for multi-level omics data integration.
Collapse
Affiliation(s)
- Cen Wu
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA.
| | - Fei Zhou
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA.
| | - Jie Ren
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA.
| | - Xiaoxi Li
- Department of Statistics, Kansas State University, Manhattan, KS 66506, USA.
| | - Yu Jiang
- Division of Epidemiology, Biostatistics and Environmental Health, School of Public Health, University of Memphis, Memphis, TN 38152, USA.
| | - Shuangge Ma
- Department of Biostatistics, School of Public Health, Yale University, New Haven, CT 06510, USA.
| |
Collapse
|