1
|
Anh NK, Phat NK, Thu NQ, Tien NTN, Eunsu C, Kim HS, Nguyen DN, Kim DH, Long NP, Oh JY. Discovery of urinary biosignatures for tuberculosis and nontuberculous mycobacteria classification using metabolomics and machine learning. Sci Rep 2024; 14:15312. [PMID: 38961191 PMCID: PMC11222504 DOI: 10.1038/s41598-024-66113-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 06/27/2024] [Indexed: 07/05/2024] Open
Abstract
Nontuberculous mycobacteria (NTM) infection diagnosis remains a challenge due to its overlapping clinical symptoms with tuberculosis (TB), leading to inappropriate treatment. Herein, we employed noninvasive metabolic phenotyping coupled with comprehensive statistical modeling to discover potential biomarkers for the differential diagnosis of NTM infection versus TB. Urine samples from 19 NTM and 35 TB patients were collected, and untargeted metabolomics was performed using rapid liquid chromatography-mass spectrometry. The urine metabolome was analyzed using a combination of univariate and multivariate statistical approaches, incorporating machine learning. Univariate analysis revealed significant alterations in amino acids, especially tryptophan metabolism, in NTM infection compared to TB. Specifically, NTM infection was associated with upregulated levels of methionine but downregulated levels of glutarate, valine, 3-hydroxyanthranilate, and tryptophan. Five machine learning models were used to classify NTM and TB. Notably, the random forest model demonstrated excellent performance [area under the receiver operating characteristic (ROC) curve greater than 0.8] in distinguishing NTM from TB. Six potential biomarkers for NTM infection diagnosis, including methionine, valine, glutarate, 3-hydroxyanthranilate, corticosterone, and indole-3-carboxyaldehyde, were revealed from univariate ROC analysis and machine learning models. Altogether, our study suggested new noninvasive biomarkers and laid a foundation for applying machine learning to NTM differential diagnosis.
Collapse
Affiliation(s)
- Nguyen Ky Anh
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan, 47392, Republic of Korea
- Faculty of Pharmacy, Ton Duc Thang University, Ho Chi Minh City, Vietnam
| | - Nguyen Ky Phat
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan, 47392, Republic of Korea
| | - Nguyen Quang Thu
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan, 47392, Republic of Korea
| | - Nguyen Tran Nam Tien
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan, 47392, Republic of Korea
| | - Cho Eunsu
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan, 47392, Republic of Korea
| | - Ho-Sook Kim
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan, 47392, Republic of Korea
| | - Duc Ninh Nguyen
- Section for Comparative Pediatrics and Nutrition, Department of Veterinary and Animal Sciences, University of Copenhagen, 1870, Frederiksberg, Denmark
| | - Dong Hyun Kim
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan, 47392, Republic of Korea
| | - Nguyen Phuoc Long
- Department of Pharmacology and PharmacoGenomics Research Center, Inje University College of Medicine, Busan, 47392, Republic of Korea.
| | - Jee Youn Oh
- Division of Pulmonary, Allergy and Critical Care Medicine, Department of Internal Medicine, Korea University Guro Hospital, Seoul, 08308, Republic of Korea.
| |
Collapse
|
2
|
Sokołowski H, Czajkowski M, Czajkowska A, Jurczuk K, Kretowski M. ITree: a user-driven tool for interactive decision-making with classification trees. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae273. [PMID: 38640482 DOI: 10.1093/bioinformatics/btae273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 03/16/2024] [Accepted: 04/17/2024] [Indexed: 04/21/2024]
Abstract
MOTIVATION ITree is an intuitive web tool for the manual, semi-automatic, and automatic induction of decision trees. It enables interactive modifications of tree structures and incorporates Relative Expression Analysis for detecting complex patterns in high-throughput molecular data. This makes ITree a versatile tool for both research and education in biomedical data analysis. RESULTS The tool allows users to instantly see the effects of modifications on decision trees, with updates to predictions and statistics displayed in real time, facilitating a deeper understanding of data classification processes. AVAILABILITY AND IMPLEMENTATION Available online at https://itree.wi.pb.edu.pl. Source code and documentation are hosted on GitHub at https://github.com/hsokolowski/iTree and in supplement.
Collapse
Affiliation(s)
- Hubert Sokołowski
- Faculty of Computer Science, Bialystok University of Technology, Bialystok 15-351, Poland
| | - Marcin Czajkowski
- Faculty of Computer Science, Bialystok University of Technology, Bialystok 15-351, Poland
| | - Anna Czajkowska
- Department of Medical Biology, Medical University of Bialystok, Bialystok 15-089, Poland
| | - Krzysztof Jurczuk
- Faculty of Computer Science, Bialystok University of Technology, Bialystok 15-351, Poland
| | - Marek Kretowski
- Faculty of Computer Science, Bialystok University of Technology, Bialystok 15-351, Poland
| |
Collapse
|
3
|
Ma J, Wang X, Tang M, Zhang C. Preoperative prediction of pancreatic neuroendocrine tumor grade based on 68Ga-DOTATATE PET/CT. Endocrine 2024; 83:502-510. [PMID: 37715934 PMCID: PMC10850018 DOI: 10.1007/s12020-023-03515-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 08/29/2023] [Indexed: 09/18/2023]
Abstract
OBJECTIVE To establish a prediction model for preoperatively predicting grade 1 and grade 2/3 tumors in patients with pancreatic neuroendocrine tumors (PNETs) based on 68Ga-DOTATATE PET/CT. METHODS Clinical data of 41 patients with PNETs were included in this study. According to the pathological results, they were divided into grade 1 and grade 2/3. 68Ga-DOTATATE PET/CT images were collected within one month before surgery. The clinical risk factors and significant radiological features were filtered, and a clinical predictive model based on these clinical and radiological features was established. 3D slicer was used to extracted 107 radiomic features from the region of interest (ROI) of 68Ga-dotata PET/CT images. The Pearson correlation coefficient (PCC), recursive feature elimination (REF) based five-fold cross validation were adopted for the radiomic feature selection, and a radiomic score was computed subsequently. The comprehensive model combining the clinical risk factors and the rad-score was established as well as the nomogram. The performance of above clinical model and comprehensive model were evaluated and compared. RESULTS Adjacent organ invasion, N staging, and M staging were the risk factors for PNET grading (p < 0.05). 12 optimal radiomic features (3 PET radiomic features, 9 CT radiomic features) were screen out. The clinical predictive model achieved an area under the curve (AUC) of 0.785. The comprehensive model has better predictive performance (AUC = 0.953). CONCLUSION We proposed a comprehensive nomogram model based on 68Ga-DOTATATE PET/CT to predict grade 1 and grade 2/3 of PNETs and assist personalized clinical diagnosis and treatment plans for patients with PNETs.
Collapse
Affiliation(s)
- Jiao Ma
- Department of Nuclear Medicine, The Affilliated Hospital of Southwest Medical University, Luzhou, 646000, Sichuan, PR China
| | - Xiaoyong Wang
- Department of Radiology, The Affilliated Hospital of Southwest Medical University, Luzhou, 646000, Sichuan, PR China
| | - Mingsong Tang
- Department of Radiology, The Affilliated Hospital of Southwest Medical University, Luzhou, 646000, Sichuan, PR China
| | - Chunyin Zhang
- Department of Nuclear Medicine, The Affilliated Hospital of Southwest Medical University, Luzhou, 646000, Sichuan, PR China.
- Nuclear Medicine and Molecular Imaging Key Laboratory of Sichuan Province, Luzhou, 646000, Sichuan, PR China.
- Academician (expert) Workstation of Sichuan Province, Luzhou, 646000, Sichuan, PR China.
| |
Collapse
|
4
|
Wu T, Li N, Luo F, Chen Z, Ma L, Hu T, Hong G, Li H. Screening prognostic markers for hepatocellular carcinoma based on pyroptosis-related lncRNA pairs. BMC Bioinformatics 2023; 24:176. [PMID: 37120506 PMCID: PMC10148420 DOI: 10.1186/s12859-023-05299-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Accepted: 04/20/2023] [Indexed: 05/01/2023] Open
Abstract
BACKGROUND Pyroptosis is closely related to cancer prognosis. In this study, we tried to construct an individualized prognostic risk model for hepatocellular carcinoma (HCC) based on within-sample relative expression orderings (REOs) of pyroptosis-related lncRNAs (PRlncRNAs). METHODS RNA-seq data of 343 HCC samples derived from The Cancer Genome Atlas (TCGA) database were analyzed. PRlncRNAs were detected based on differentially expressed lncRNAs between sample groups clustered by 40 reported pyroptosis-related genes (PRGs). Univariate Cox regression was used to screen out prognosis-related PRlncRNA pairs. Then, based on REOs of prognosis-related PRlncRNA pairs, a risk model for HCC was constructed by combining LASSO and stepwise multivariate Cox regression analysis. Finally, a prognosis-related competing endogenous RNA (ceRNA) network was built based on information about lncRNA-miRNA-mRNA interactions derived from the miRNet and TargetScan databases. RESULTS Hierarchical clustering of HCC patients according to the 40 PRGs identified two groups with a significant survival difference (Kaplan-Meier log-rank, p = 0.026). Between the two groups, 104 differentially expressed lncRNAs were identified (|log2(FC)|> 1 and FDR < 5%). Among them, 83 PRlncRNA pairs showed significant associations between their REOs within HCC samples and overall survival (Univariate Cox regression, p < 0.005). An optimal 11-PRlncRNA-pair prognostic risk model was constructed for HCC. The areas under the curves (AUCs) of time-dependent receiver operating characteristic (ROC) curves of the risk model for 1-, 3-, and 5-year survival were 0.737, 0.705, and 0.797 in the validation set, respectively. Gene Set Enrichment Analysis showed that inflammation-related interleukin signaling pathways were upregulated in the predicted high-risk group (p < 0.05). Tumor immune infiltration analysis revealed a higher abundance of regulatory T cells (Tregs) and M2 macrophages and a lower abundance of CD8 + T cells in the high-risk group, indicating that excessive pyroptosis might occur in high-risk patients. Finally, eleven lncRNA-miRNA-mRNA regulatory axes associated with pyroptosis were established. CONCLUSION Our risk model allowed us to determine the robustness of the REO-based PRlncRNA prognostic biomarkers in the stratification of HCC patients at high and low risk. The model is also helpful for understanding the molecular mechanisms between pyroptosis and HCC prognosis. High-risk patients may have excessive pyroptosis and thus be less sensitive to immune therapy.
Collapse
Affiliation(s)
- Tong Wu
- School of Medical Information Engineering, Gannan Medical University, Ganzhou, 341000, China
| | - Na Li
- School of Medical Information Engineering, Gannan Medical University, Ganzhou, 341000, China
| | - Fengyuan Luo
- School of Medical Information Engineering, Gannan Medical University, Ganzhou, 341000, China
| | - Zhihong Chen
- School of Medical Information Engineering, Gannan Medical University, Ganzhou, 341000, China
| | - Liyuan Ma
- School of Public Health and Health Management, Gannan Medical University, Ganzhou, 341000, China
| | - Tao Hu
- School of Medical Information Engineering, Gannan Medical University, Ganzhou, 341000, China
| | - Guini Hong
- School of Medical Information Engineering, Gannan Medical University, Ganzhou, 341000, China.
| | - Hongdong Li
- School of Medical Information Engineering, Gannan Medical University, Ganzhou, 341000, China.
| |
Collapse
|
5
|
Identification of Ubiquitin-Related Gene-Pair Signatures for Predicting Tumor Microenvironment Infiltration and Drug Sensitivity of Lung Adenocarcinoma. Cancers (Basel) 2022; 14:cancers14143478. [PMID: 35884544 PMCID: PMC9317993 DOI: 10.3390/cancers14143478] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 07/12/2022] [Accepted: 07/15/2022] [Indexed: 11/17/2022] Open
Abstract
Simple Summary Lung adenocarcinoma (LUAD) has a high mortality and incidence rate. The therapeutic efficacy of LUAD varies with the individual heterogeneity of the tumor microenvironment (TME). It is necessary to explore more biomarkers and targets to improve the prognosis of patients. Ubiquitination pathways are involved in the biological process of regulating the anti-tumor immunity of immune cells and immunosuppression of tumor cells in the TME of patients. In this study, we clarified the characteristics of ubiquitin-related gene pairs (UbRGPs) and identified the relationship between the status of the TME and UbRGPs of patients with LUAD. A prognostic signature based on six UbRGPs was established, which performed well in predicting the immune infiltration and tumor mutation burden (TMB) in the TME and the response of LUAD to immuno-, chemo-, and targeted therapy. In conclusion, the UbRGPs signature is an independent prognostic indicator and has great potential in assisting the clinical therapy for patients with LUAD. Abstract Lung adenocarcinoma (LUAD) is a common pathological type of lung cancer worldwide, and new biomarkers are urgently required to guide more effective individualized therapy for patients. Ubiquitin-related genes (UbRGs) partially participate in the initiation and progression of lung cancer. In this study, we used ubiquitin-related gene pairs (UbRGPs) in tumor tissues to access the function of UbRGs in overall survival, immunocyte infiltration, and tumor mutation burden (TMB) of patients with LUAD from The Cancer Genome Atlas (TCGA) database. In addition, we constructed a prognostic signature based on six UbRGPs and evaluated its performance in an internal (TCGA testing set) and an external validation set (GSE13213). The prognostic signature revealed that risk scores were negatively correlated with the overall survival, immunocyte infiltration, and expression of immune checkpoint inhibitor-related genes and positively correlated with the TMB. Patients in the high-risk group showed higher sensitivity to partially targeted and chemotherapeutic drugs than those in the low-risk group. This study contributes to the understanding of the characteristics of UbRGPs in LUAD and provides guidance for effective immuno-, chemo-, and targeted therapy.
Collapse
|
6
|
Yan J, Wu X, Yu J, Kong Y, Cang S. An Immune-Related Gene Pair Index Predicts Clinical Response and Survival Outcome of Immune Checkpoint Inhibitors in Melanoma. Front Immunol 2022; 13:839901. [PMID: 35280982 PMCID: PMC8907429 DOI: 10.3389/fimmu.2022.839901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 02/04/2022] [Indexed: 12/03/2022] Open
Abstract
The durable responses and favorable long-term outcomes are limited to a proportion of advanced melanoma patients treated with immune checkpoint inhibitors (ICI). Considering the critical role of antitumor immunity status in the regulation of ICI therapy responsiveness, we focused on the immune-related gene profiles and aimed to develop an individualized immune signature for predicting the benefit of ICI therapy. During the discovery phase, we integrated three published datasets of metastatic melanoma treated with anti-PD-1 (n = 120) and established an immune-related gene pair index (IRGPI) for patient classification. The IRGPI was constructed based on 31 immune-related gene pairs (IRGPs) consisting of 51 immune-related genes (IRGs). The ROC curve analysis was performed to evaluate the predictive accuracy of IRGPI with AUC = 0.854. Then, we retrospectively collected one anti-PD-1 therapy dataset of metastatic melanoma (n = 55) from Peking University Cancer Hospital (PUCH) and performed the whole-transcriptome RNA sequencing. Combined with another published dataset of metastatic melanoma received anti-CTLA-4 (VanAllen15; n = 42), we further validated the prediction accuracy of IRGPI for ICI therapy in two datasets (PUCH and VanAllen15) with AUCs of 0.737 and 0.767, respectively. Notably, the survival analyses revealed that higher IRGPI conferred poor survival outcomes in both the discovery and validation datasets. Moreover, correlation analyses of IRGPI with the immune cell infiltration and biological functions indicated that IRGPI may be an indicator of the immune status of the tumor microenvironment (TME). These findings demonstrated that IRGPI might serve as a novel marker for treating of melanoma with ICI, which needs to be validated in prospective clinical trials.
Collapse
Affiliation(s)
- Junya Yan
- Department of Oncology, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Henan University People's Hospital, Zhengzhou, China
| | - Xiaowen Wu
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Melanoma and Sarcoma, Peking University Cancer Hospital & Institute, Beijing, China
| | - Jiayi Yu
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Radiation Oncology, Peking University Cancer Hospital & Institute, Beijing, China
| | - Yan Kong
- Key Laboratory of Carcinogenesis and Translational Research (Ministry of Education/Beijing), Department of Melanoma and Sarcoma, Peking University Cancer Hospital & Institute, Beijing, China
| | - Shundong Cang
- Department of Oncology, Henan Provincial People's Hospital, Zhengzhou University People's Hospital, Henan University People's Hospital, Zhengzhou, China
| |
Collapse
|
7
|
Krepel J, Kircher M, Kohls M, Jung K. Comparison of merging strategies for building machine learning models on multiple independent gene expression data sets. Stat Anal Data Min 2022. [DOI: 10.1002/sam.11549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Jessica Krepel
- Institute for Animal Breeding and Genetics University of Veterinary Medicine Hannover Hannover Germany
| | - Magdalena Kircher
- Institute for Animal Breeding and Genetics University of Veterinary Medicine Hannover Hannover Germany
| | - Moritz Kohls
- Institute for Animal Breeding and Genetics University of Veterinary Medicine Hannover Hannover Germany
| | - Klaus Jung
- Institute for Animal Breeding and Genetics University of Veterinary Medicine Hannover Hannover Germany
| |
Collapse
|
8
|
Yin Z, Zhou M, Liao T, Xu J, Fan J, Deng J, Jin Y. Immune-Related lncRNA Pairs as Prognostic Signature and Immune-Landscape Predictor in Lung Adenocarcinoma. Front Oncol 2022; 11:673567. [PMID: 35083132 PMCID: PMC8784752 DOI: 10.3389/fonc.2021.673567] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Accepted: 12/14/2021] [Indexed: 12/18/2022] Open
Abstract
BACKGROUND Suppressive tumor microenvironment is closely related to the progression and poor prognosis of lung adenocarcinoma (LUAD). Novel individual and universal immune-related biomarkers to predict the prognosis and immune landscape of LUAD patients are urgently needed. Two-gene pairing patterns could integrate and utilize various gene expression data. METHODS The RNA-seq and relevant clinicopathological data of the LUAD project from the TCGA and well-known immune-related genes list from the ImmPort database were obtained. Co-expression analysis followed by an analysis of variance was performed to identify differentially expressed immune-related lncRNA (irlncRNA) (DEirlncRNA) between tumor and normal tissues. Two arbitrary DEirlncRNAs (DEirlncRNAs pair) in a tumor sample underwent pairwise comparison to generate a score (0 or 1). Next, Univariate analysis, Lasso regression and Multivariate analysis were used to screen survival-related DEirlncRNAs pairs and construct a prognostic model. The Acak information standard (AIC) values of the receiver operating characteristic (ROC) curve for 3 years are calculated to determine the cut-off point for high- or low-risk score. Finally, we evaluated the relationship between the risk score and overall survival, clinicopathological features, immune landscape, and chemotherapy efficacy. RESULTS Data of 54 normal and 497 tumor samples of LUAD were enrolled. After a strict screening process, 15 survival-independent-related DEirlncRNA pairs were integrated to construct a prognostic model. The AUC value of the 3-year ROC curve was 0.828. Kaplan-Meier analysis showed that patients with low risk lived longer than patients with high risk (p <0.001). Univariate and Multivariate Cox analysis suggested that the risk score was an independent factor of survival. The risk score was negatively associated with most tumor-infiltrating immune cells, immune score, and microenvironment scores. The low-risk group was correlated with increased expression of ICOS. The high-risk group had a connection with lower half inhibitory centration (IC50) of most chemotherapy drugs (e.g., etoposide, paclitaxel, vinorelbine, gemcitabine, and docetaxel) and targeted medicine-erlotinib, but with higher IC50 of methotrexate. CONCLUSION The established irlncRNA pairs-based model is a promising prognostic signature for LUAD patients. Furthermore, the prognostic signature has great potential in the evaluation of tumor immune landscape and guiding individualized treatment regimens.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Yang Jin
- Department of Respiratory and Critical Care Medicine, NHC Key Laboratory of Pulmonary Diseases, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
9
|
Cao K, Liu M, Ma K, Jiang X, Ma J, Zhu J. Prediction of prognosis and immunotherapy response with a robust immune-related lncRNA pair signature in lung adenocarcinoma. Cancer Immunol Immunother 2021; 71:1295-1311. [PMID: 34652523 DOI: 10.1007/s00262-021-03069-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 09/26/2021] [Indexed: 12/24/2022]
Abstract
The tumor immune microenvironment plays essential roles in regulating inflammation, angiogenesis, immune modulation, and sensitivity to therapies. Here, we developed a powerful prognostic signature with immune-related lncRNAs (irlncRNAs) in lung adenocarcinoma (LUAD). We obtained differentially expressed irlncRNAs by intersecting the transcriptome dataset for The Cancer Genome Atlas (TCGA)-LUAD cohort and the ImmLnc database. A rank-based algorithm was applied to select top-ranking altered irlncRNA pairs for the model construction. We built a prognostic signature of 33 irlncRNA pairs comprising 40 unique irlncRNAs in the TCGA-LUAD cohort (training set). The immune signature significantly dichotomized LUAD patients into high- and low-risk groups regarding overall survival, which is likewise independently predictive of prognosis (hazard ratio = 3.580, 95% confidence interval = 2.451-5.229, P < 0.001). A nomogram with a C-index of 0.79 demonstrates the superior prognostic accuracy of the signature. The prognostic accuracy of the signature of 33 irlncRNA pairs was validated using the GSE31210 dataset (validation set) from the Gene Expression Omnibus database. Immune cell infiltration was calculated using ESTIMATE, CIBERSORT, and MCP-count methodologies. The low-risk group exhibited high immune cell infiltration, high mutation burden, high expression of CTLA4 and human leukocyte antigen genes, and low expression of mismatch repair genes, which predicted response to immunotherapy. Interestingly, pRRophetic analysis demonstrated that the high-risk group possessed reverse characteristics was sensitive to chemotherapy. The established immune signature shows marked clinical and translational potential for predicting prognosis, tumor immunogenicity, and therapeutic response in LUAD.
Collapse
Affiliation(s)
- Kui Cao
- Department of Clinical Laboratory, Biobank, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, 150040, Heilongjiang, China.,Department of Clinical Oncology, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, 150040, Heilongjiang, China
| | - Mingdong Liu
- Department of Clinical Oncology, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, 150040, Heilongjiang, China
| | - Keru Ma
- Department of Thoracic Surgery, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, 150040, Heilongjiang, China
| | - Xiangyu Jiang
- Department of Thoracic Surgery, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, 150040, Heilongjiang, China
| | - Jianqun Ma
- Department of Thoracic Surgery, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, 150040, Heilongjiang, China.
| | - Jinhong Zhu
- Department of Clinical Laboratory, Biobank, Harbin Medical University Cancer Hospital, 150 Haping Road, Harbin, 150040, Heilongjiang, China.
| |
Collapse
|
10
|
Yu Y, Zeng Y, Xia X, Zhou JG, Cao F. Establishment and Validation of a Prognostic Immune Signature in Neuroblastoma. Cancer Control 2021; 28:10732748211033751. [PMID: 34569303 PMCID: PMC8477712 DOI: 10.1177/10732748211033751] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND Neuroblastoma (NBL) is the most common extracranial solid tumor in childhood, and patients with high-risk neuroblastoma had a relatively poor prognosis despite multimodal treatment. To improve immunotherapy efficacy in neuroblastoma, systematic profiling of the immune landscape in neuroblastoma is an urgent need. METHODS RNA-seq and according clinical information of neuroblastoma were downloaded from the TARGET database and GEO database (GSE62564). With an immune-related-gene set obtained from the ImmPort database, Immune-related Prognostic Gene Pairs for Neuroblastoma (IPGPN) for overall survival (OS) were established with the TARGET-NBL cohort and then verified with the GEO-NBL cohort. Immune cell infiltration analysis was subsequently performed. The integrated model was established with IPGPN and clinicopathological parameters. Immune cell infiltration was analyzed with the XCELL algorithm. Functional enrichment analysis was performed with clusterProfiler package in R. RESULTS Immune-related Prognostic Gene Pairs for Neuroblastoma was successfully established with seven immune-related gene pairs (IGPs) involving 13 unique genes in the training cohort. In the training cohort, IPGPN successfully stratified neuroblastoma patients into a high and low immune-risk groups with different OS (HR=3.92, P = 2 × 10-8) and event-free survival (HR=3.66, P=2 × 10-8). ROC curve analysis confirmed its predictive power. Consistently, high IPGPN also predicted worse OS (HR=1.84, P = .002) and EFS in validation cohort (HR=1.38, P = .06) Moreover, higher activated dendritic cells, M1 macrophage, Th1 CD4+, and Th2 CD4+ T cell enrichment were evident in low immune-risk group. Further integrating IPGPN with age and stage demonstrated improved predictive performance than IPGPN alone. CONCLUSION Herein, we presented an immune landscape with IPGPN for prognosis prediction in neuroblastoma, which complements the present understanding of the immune signature in neuroblastoma.
Collapse
Affiliation(s)
- Yunhu Yu
- Department of Neurosurgery, the Third Affiliated Hospital of Zunyi Medical University, Zunyi, China.,Clinical Research Center for Neurological Disease, the People's Hospital of HongHuaGang District of ZunYi, Zunyi, China
| | - Yu Zeng
- Department of Cell Biology, School of Basic Medical Science, 70570Southern Medical University, Guangzhou, China
| | - Xiangping Xia
- Department of Cerebrovascular Disease, 66367Affiliated Hospital of Zunyi Medical University, Zunyi, China
| | - Jian-Guo Zhou
- Department of Oncology, 66367Second Affiliated Hospital of Zunyi Medical University, Zunyi, China
| | - Fang Cao
- Department of Cerebrovascular Disease, 66367Affiliated Hospital of Zunyi Medical University, Zunyi, China
| |
Collapse
|
11
|
Ma C, Zhang X, Zhao X, Zhang N, Zhou S, Zhang Y, Li P. Predicting the Survival and Immune Landscape of Colorectal Cancer Patients Using an Immune-Related lncRNA Pair Model. Front Genet 2021; 12:690530. [PMID: 34552614 PMCID: PMC8451271 DOI: 10.3389/fgene.2021.690530] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 06/29/2021] [Indexed: 12/12/2022] Open
Abstract
Background Accumulating evidence has demonstrated that immune-related long non-coding ribonucleic acids (irlncRNAs) can be used as prognostic indicators of overall survival (OS) in patients with colorectal cancer (CRC). Our aim in this research, therefore, was to construct a risk model using irlncRNA pairs with no requirement for a specific expression level, in hope of reliably predicting the prognosis and immune landscape of CRC patients. Methods Clinical and transcriptome profiling data of CRC patients downloaded from the Cancer Genome Atlas (TCGA) database were analyzed to identify differentially expressed (DE) irlncRNAs. The irlncRNA pairs significantly correlated with the prognosis of patients were screened out by univariable Cox regression analysis and a prognostic model was constructed by Lasso and multivariate Cox regression analyses. A receiver operating characteristic (ROC) curve was then plotted, with the area under the curve calculated to confirm the reliability of the model. Based on the optimal cutoff value, CRC patients in the high- or low-risk groups were distinguished, laying the ground for evaluating the risk model from the following perspectives: survival, clinicopathological traits, tumor-infiltrating immune cells (TIICs), antitumor drug efficacy, kinase inhibitor efficacy, and molecules related to immune checkpoints. Results A prognostic model consisting of 15 irlncRNA pairs was constructed, which was found to have a high correlation with patient prognosis in a cohort from the TCGA (p < 0.001, HR = 1.089, 95% CI [1.067-1.112]). According to both univariate and multivariate Cox analyses, this model could be used as an independent prognostic indicator in the TCGA cohort (p < 0.001). Effective differentiation between high- and low-risk patients was also accomplished, on the basis of aggressive clinicopathological characteristics, sensitivity to antitumor drugs, and kinase inhibitors, the tumor immune infiltration status, and the expression levels of specific molecules related to immune checkpoints. Conclusion The prognostic model established with irlncRNA pairs is a promising indicator for prognosis prediction in CRC patients.
Collapse
Affiliation(s)
- Chao Ma
- Medical School of Chinese PLA, Beijing, China.,Department of General Surgery, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Xin Zhang
- State Key Laboratory of Proteomics Beijing Proteome Research Center National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Xudong Zhao
- Department of General Surgery, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Nan Zhang
- Department of General Surgery, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Sixin Zhou
- Department of General Surgery, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Yonghui Zhang
- Medical School of Chinese PLA, Beijing, China.,Department of General Surgery, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Peiyu Li
- Department of General Surgery, The First Medical Center, Chinese PLA General Hospital, Beijing, China
| |
Collapse
|
12
|
Ye Z, Ke H, Chen S, Cruz-Cano R, He X, Zhang J, Dorgan J, Milton DK, Ma T. Biomarker Categorization in Transcriptomic Meta-Analysis by Concordant Patterns With Application to Pan-Cancer Studies. Front Genet 2021; 12:651546. [PMID: 34276766 PMCID: PMC8283696 DOI: 10.3389/fgene.2021.651546] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Accepted: 05/28/2021] [Indexed: 01/21/2023] Open
Abstract
With the increasing availability and dropping cost of high-throughput technology in recent years, many-omics datasets have accumulated in the public domain. Combining multiple transcriptomic studies on related hypothesis via meta-analysis can improve statistical power and reproducibility over single studies. For differential expression (DE) analysis, biomarker categorization by DE pattern across studies is a natural but critical task following biomarker detection to help explain between study heterogeneity and classify biomarkers into categories with potentially related functionality. In this paper, we propose a novel meta-analysis method to categorize biomarkers by simultaneously considering the concordant pattern and the biological and statistical significance across studies. Biomarkers with the same DE pattern can be analyzed together in downstream pathway enrichment analysis. In the presence of different types of transcripts (e.g., mRNA, miRNA, and lncRNA, etc.), integrative analysis including miRNA/lncRNA target enrichment analysis and miRNA-mRNA and lncRNA-mRNA causal regulatory network analysis can be conducted jointly on all the transcripts of the same category. We applied our method to two Pan-cancer transcriptomic study examples with single or multiple types of transcripts available. Targeted downstream analysis identified categories of biomarkers with unique functionality and regulatory relationships that motivate new hypothesis in Pan-cancer analysis.
Collapse
Affiliation(s)
- Zhenyao Ye
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, College Park, MD, United States
| | - Hongjie Ke
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, College Park, MD, United States
| | - Shuo Chen
- Department of Epidemiology and Public Health, School of Medicine, University of Maryland, Baltimore, Baltimore, MD, United States
| | - Raul Cruz-Cano
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, College Park, MD, United States
| | - Xin He
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, College Park, MD, United States
| | - Jing Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, College Park, MD, United States
| | - Joanne Dorgan
- Department of Epidemiology and Public Health, School of Medicine, University of Maryland, Baltimore, Baltimore, MD, United States
| | - Donald K Milton
- Maryland Institute for Applied Environmental Health, School of Public Health, University of Maryland, College Park, College Park, MD, United States
| | - Tianzhou Ma
- Department of Epidemiology and Biostatistics, School of Public Health, University of Maryland, College Park, College Park, MD, United States
| |
Collapse
|
13
|
Ning L, Huixin H. Topic Evolution Analysis for Omics Data Integration in Cancers. Front Cell Dev Biol 2021; 9:631011. [PMID: 33898421 PMCID: PMC8058380 DOI: 10.3389/fcell.2021.631011] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Accepted: 02/04/2021] [Indexed: 12/02/2022] Open
Abstract
One of the vital challenges for cancer diseases is efficient biomarkers monitoring formation and development are limited. Omics data integration plays a crucial role in the mining of biomarkers in the human condition. As the link between omics study on biomarkers discovery and cancer diseases is deepened, defining the principal technologies applied in the field is a must not only for the current period but also for the future. We utilize topic modeling to extract topics (or themes) as a probabilistic distribution of latent topics from the dataset. To predict the future trend of related cases, we utilize the Prophet neural network to perform a prediction correction model for existing topics. A total of 2,318 pieces of literature (from 2006 to 2020) were retrieved from MEDLINE with the query on “omics” and “cancer.” Our study found 20 topics covering current research types. The topic extraction results indicate that, with the rapid development of omics data integration research, multi-omics analysis (Topic 11) and genomics of colorectal cancer (Topic 10) have more studies reported last 15 years. From the topic prediction view, research findings in multi-omics data processing and novel biomarker discovery for cancer prediction (Topic 2, 3, 10, 11) will be heavily focused in the future. From the topic visuallization and evolution trends, metabolomics of breast cancer (Topic 9), pharmacogenomics (Topic 15), genome-guided therapy regimens (Topic 16), and microRNAs target genes (Topic 17) could have more rapidly developed in the study of cancer treatment effect and recurrence prediction.
Collapse
Affiliation(s)
- Li Ning
- Business School of Huaqiao University, Quan Zhou, China.,Business School of Huaqiao University, Quan Zhou, China
| | - He Huixin
- Management Science and Engineering Department, Management School, Xiamen University, Xiamen, China
| |
Collapse
|
14
|
Saegusa T, Zhao Z, Ke H, Ye Z, Xu Z, Chen S, Ma T. Detecting survival-associated biomarkers from heterogeneous populations. Sci Rep 2021; 11:3203. [PMID: 33547332 PMCID: PMC7865037 DOI: 10.1038/s41598-021-82332-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2020] [Accepted: 01/11/2021] [Indexed: 01/30/2023] Open
Abstract
Detection of prognostic factors associated with patients' survival outcome helps gain insights into a disease and guide treatment decisions. The rapid advancement of high-throughput technologies has yielded plentiful genomic biomarkers as candidate prognostic factors, but most are of limited use in clinical application. As the price of the technology drops over time, many genomic studies are conducted to explore a common scientific question in different cohorts to identify more reproducible and credible biomarkers. However, new challenges arise from heterogeneity in study populations and designs when jointly analyzing the multiple studies. For example, patients from different cohorts show different demographic characteristics and risk profiles. Existing high-dimensional variable selection methods for survival analysis, however, are restricted to single study analysis. We propose a novel Cox model based two-stage variable selection method called "Cox-TOTEM" to detect survival-associated biomarkers common in multiple genomic studies. Simulations showed our method greatly improved the sensitivity of variable selection as compared to the separate applications of existing methods to each study, especially when the signals are weak or when the studies are heterogeneous. An application of our method to TCGA transcriptomic data identified essential survival associated genes related to the common disease mechanism of five Pan-Gynecologic cancers.
Collapse
Affiliation(s)
- Takumi Saegusa
- grid.164295.d0000 0001 0941 7177Department of Mathematics, University of Maryland, College Park, MD 20742 USA
| | - Zhiwei Zhao
- grid.164295.d0000 0001 0941 7177Department of Mathematics, University of Maryland, College Park, MD 20742 USA
| | - Hongjie Ke
- grid.164295.d0000 0001 0941 7177Department of Mathematics, University of Maryland, College Park, MD 20742 USA
| | - Zhenyao Ye
- grid.164295.d0000 0001 0941 7177Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD 20740 USA
| | - Zhongying Xu
- grid.21925.3d0000 0004 1936 9000Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15213 USA
| | - Shuo Chen
- grid.411024.20000 0001 2175 4264Division of Biostatistics and Bioinformatics, Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore, MD 21201 USA
| | - Tianzhou Ma
- grid.164295.d0000 0001 0941 7177Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD 20740 USA
| |
Collapse
|
15
|
Marzouka NAD, Eriksson P. multiclassPairs: an R package to train multiclass pair-based classifier. Bioinformatics 2021; 37:3043-3044. [PMID: 33543757 PMCID: PMC8479681 DOI: 10.1093/bioinformatics/btab088] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Revised: 01/27/2021] [Accepted: 02/02/2021] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION k-Top Scoring Pairs (kTSP) algorithms utilize in-sample gene expression feature pair rules for class prediction, and have demonstrated excellent performance and robustness. The available packages and tools primarily focus on binary prediction (i.e. two classes). However, many real-world classification problems e.g. tumor subtype prediction, are multiclass tasks. RESULTS Here, we present multiclassPairs, an R package to train pair-based single sample classifiers for multiclass problems. multiclassPairs offers two main methods to build multiclass prediction models, either using a one-versus-rest kTSP scheme or through a novel pair-based Random Forest approach. The package also provides options for dealing with class imbalances, multiplatform training, missing features in test data and visualization of training and test results. AVAILABILITY AND IMPLEMENTATION 'multiclassPairs' package is available on CRAN servers and GitHub: https://github.com/NourMarzouka/multiclassPairs. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nour-Al-Dain Marzouka
- Department of Clinical Sciences, Division of Oncology, Lund University, 22381 Lund, Sweden,To whom correspondence should be addressed.
| | - Pontus Eriksson
- Department of Clinical Sciences, Division of Oncology, Lund University, 22381 Lund, Sweden
| |
Collapse
|
16
|
He Y, Chen H, Sun H, Ji J, Shi Y, Zhang X, Liu L. High-dimensional integrative copula discriminant analysis for multiomics data. Stat Med 2020; 39:4869-4884. [PMID: 33617001 DOI: 10.1002/sim.8758] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 08/30/2020] [Accepted: 09/04/2020] [Indexed: 11/08/2022]
Abstract
Multiomics or integrative omics data have been increasingly common in biomedical studies, holding a promise in better understanding human health and disease. In this article, we propose an integrative copula discrimination analysis classifier in the context of two-class classification, which relaxes the common Gaussian assumption and gains power by borrowing information from multiple omics data types in discriminant analysis. Numerical studies are conducted to assess the finite sample performance of the new classifier. We apply our model to the Religious Orders Study and Memory and Aging Project (ROSMAP) Study, integrating gene expression and DNA methylation data for better prediction.
Collapse
Affiliation(s)
- Yong He
- Shandong University, Jinan, China
| | - Hao Chen
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | - Hao Sun
- School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | | | - Yufeng Shi
- Shandong University, Jinan, China.,School of Statistics, Shandong University of Finance and Economics, Jinan, China
| | | | - Lei Liu
- Division of Biostatistics, Washington University in St. Louis, St. Louis, Missouri, USA
| |
Collapse
|
17
|
Krzhizhanovskaya VV, Závodszky G, Lees MH, Dongarra JJ, Sloot PMA, Brissos S, Teixeira J. Tree Based Advanced Relative Expression Analysis. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7304016 DOI: 10.1007/978-3-030-50420-5_37] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
This paper presents a new concept for biomarker discovery and gene expression data classification that rises from the Relative Expression Analysis (RXA). The basic idea of RXA is to focus on simple ordering relationships between the expression of small sets of genes rather than their raw values. We propose a paradigm shift as we extend RXA concept to tree-based Advanced Relative Expression Analysis (ARXA). The main contribution is a decision tree with splitting nodes that consider relative fraction comparisons between multiple gene pairs. In addition, to face the enormous computational complexity of RXA, the most time-consuming part which is scoring all possible gene pairs in each splitting node is parallelized using GPU. This way the algorithm allows searching for more tailored interactions between sub-groups of genes in a reasonable time. Experiments carried out on 8 cancer-related datasets show not only significant improvement in accuracy and speed of our approach in comparison to various RXA solutions but also new interesting patterns between subgroups of genes.
Collapse
|
18
|
Zhou JG, Liang B, Jin SH, Liao HL, Du GB, Cheng L, Ma H, Gaipl US. Development and Validation of an RNA-Seq-Based Prognostic Signature in Neuroblastoma. Front Oncol 2019; 9:1361. [PMID: 31867276 PMCID: PMC6904333 DOI: 10.3389/fonc.2019.01361] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 11/18/2019] [Indexed: 12/25/2022] Open
Abstract
Objective: The stratification of neuroblastoma (NBL) prognosis remains difficult. RNA-based signatures might be able to predict prognosis, but independent cross-platform validation is still rare. Methods: RNA-Seq-based profiles from NBL patients were acquired and then analyzed. The RNA-Seq prognostic index (RPI) and the clinically adjusted RPI (RCPI) were successively established in the training cohort (TARGET-NBL) and then verified in the validation cohort (GSE62564). Survival prediction was assessed using a time-dependent receiver operating characteristic (ROC) curve and area under the ROC curve (AUC). Functional enrichment analysis of the genes was conducted using bioinformatics methods. Results: In the training cohort, 10 gene pairs were eventually integrated into the RPI. In both cohorts, the high-risk group had poor overall survival (OS) (P < 0.001 and P < 0.001, respectively) and favorable event-free survival (EFS) (P = 0.00032 and P = 0.06, respectively). ROC curve analysis also showed that the RPI predicted OS (60 month AUC values of 0.718 and 0.593, respectively) and EFS (60 month AUC values of 0.627 and 0.852, respectively) well in both the training and validation cohorts. Clinicopathological indicators associated with prognosis in the univariate and multivariate regression analyses were identified and added to the RPI to form the RCPI. The RCPI was also used to divide populations into different risk groups, and the high-risk group had poor OS (P < 0.001 and P < 0.001, respectively) and EFS (P < 0.05 and P < 0.05, respectively). Finally, the RCPI had higher accuracy than the RPI for the prediction of OS (60 month AUC values of 0.730 and 0.852, respectively) and EFS (60 month AUC values of 0.663 and 0.763, respectively) in both the training and validation cohorts. Moreover, these differentially expressed genes may be involved in certain NBL-related events. Conclusions: The RCPI could reliably categorize NBL patients based on different risks of death.
Collapse
Affiliation(s)
- Jian-Guo Zhou
- Department of Oncology, Affiliated Hospital of Zunyi Medical University, Zunyi, China.,Department of Radiation Oncology, Universitätsklinikum Erlangen, Erlangen, Germany
| | - Bo Liang
- Affiliated Nanjing Hospital of Chinese Medicine, Nanjing University of Chinese Medicine, Nanjing, China
| | - Su-Han Jin
- Department of Orthodontics, Affiliated Stemmatological Hospital of Zunyi Medical University, Zunyi, China
| | - Hui-Ling Liao
- College of Integrated Traditional Chinese and Western Medicine, Southwest Medical University, Luzhou, China
| | - Guo-Bo Du
- Department of Oncology, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
| | - Long Cheng
- Department of Oncology, Affiliated Hospital of North Sichuan Medical College, Nanchong, China
| | - Hu Ma
- Department of Oncology, Affiliated Hospital of Zunyi Medical University, Zunyi, China
| | - Udo S Gaipl
- Department of Radiation Oncology, Universitätsklinikum Erlangen, Erlangen, Germany
| |
Collapse
|
19
|
Integrative Deep Learning for Identifying Differentially Expressed (DE) Biomarkers. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2019; 2019:8418760. [PMID: 31915462 PMCID: PMC6935456 DOI: 10.1155/2019/8418760] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 06/19/2019] [Accepted: 08/04/2019] [Indexed: 11/17/2022]
Abstract
As a large amount of genetic data are accumulated, an effective analytical method and a significant interpretation are required. Recently, various methods of machine learning have emerged to process genetic data. In addition, machine learning analysis tools using statistical models have been proposed. In this study, we propose adding an integrated layer to the deep learning structure, which would enable the effective analysis of genetic data and the discovery of significant biomarkers of diseases. We conducted a simulation study in order to compare the proposed method with metalogistic regression and meta-SVM methods. The objective function with lasso penalty is used for parameter estimation, and the Youden J index is used for model comparison. The simulation results indicate that the proposed method is more robust for the variance of the data than metalogistic regression and meta-SVM methods. We also conducted real data (breast cancer data (TCGA)) analysis. Based on the results of gene set enrichment analysis, we obtained that TCGA multiple omics data involve significantly enriched pathways which contain information related to breast cancer. Therefore, it is expected that the proposed method will be helpful to discover biomarkers.
Collapse
|
20
|
Kumar A, Hosseinnia A, Gagarinova A, Phanse S, Kim S, Aly KA, Zilles S, Babu M. A Gaussian process-based definition reveals new and bona fide genetic interactions compared to a multiplicative model in the Gram-negative Escherichia coli. Bioinformatics 2019; 36:880-889. [PMID: 31504172 PMCID: PMC9883677 DOI: 10.1093/bioinformatics/btz673] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 07/24/2019] [Accepted: 08/23/2019] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION A digenic genetic interaction (GI) is observed when mutations in two genes within the same organism yield a phenotype that is different from the expected, given each mutation's individual effects. While multiplicative scoring is widely applied to define GIs, revealing underlying gene functions, it remains unclear if it is the most suitable choice for scoring GIs in Escherichia coli. Here, we assess many different definitions, including the multiplicative model, for mapping functional links between genes and pathways in E.coli. RESULTS Using our published E.coli GI datasets, we show computationally that a machine learning Gaussian process (GP)-based definition better identifies functional associations among genes than a multiplicative model, which we have experimentally confirmed on a set of gene pairs. Overall, the GP definition improves the detection of GIs, biological reasoning of epistatic connectivity, as well as the quality of GI maps in E.coli, and, potentially, other microbes. AVAILABILITY AND IMPLEMENTATION The source code and parameters used to generate the machine learning models in WEKA software were provided in the Supplementary information. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Ali Hosseinnia
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Alla Gagarinova
- Department of Biochemistry, University of Saskatchewan, Saskatoon, SK S7N 5E5, Canada
| | - Sadhna Phanse
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Sunyoung Kim
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | - Khaled A Aly
- Department of Biochemistry, University of Regina, Regina, SK S4S 0A2, Canada
| | | | - Mohan Babu
- To whom correspondence should be addressed. or
| |
Collapse
|
21
|
Kim S, Kang D, Huo Z, Park Y, Tseng GC. Meta-analytic principal component analysis in integrative omics application. Bioinformatics 2019; 34:1321-1328. [PMID: 29186328 DOI: 10.1093/bioinformatics/btx765] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2016] [Accepted: 11/22/2017] [Indexed: 12/15/2022] Open
Abstract
Motivation With the prevalent usage of microarray and massively parallel sequencing, numerous high-throughput omics datasets have become available in the public domain. Integrating abundant information among omics datasets is critical to elucidate biological mechanisms. Due to the high-dimensional nature of the data, methods such as principal component analysis (PCA) have been widely applied, aiming at effective dimension reduction and exploratory visualization. Results In this article, we combine multiple omics datasets of identical or similar biological hypothesis and introduce two variations of meta-analytic framework of PCA, namely MetaPCA. Regularization is further incorporated to facilitate sparse feature selection in MetaPCA. We apply MetaPCA and sparse MetaPCA to simulations, three transcriptomic meta-analysis studies in yeast cell cycle, prostate cancer, mouse metabolism and a TCGA pan-cancer methylation study. The result shows improved accuracy, robustness and exploratory visualization of the proposed framework. Availability and implementation An R package MetaPCA is available online. (http://tsenglab.biostat.pitt.edu/software.htm). Contact ctseng@pitt.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- SungHwan Kim
- Department of Statistics, Keimyung University, Daegu 42601, South Korea
| | - Dongwan Kang
- Genomics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Zhiguang Huo
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Yongseok Park
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - George C Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261, USA.,Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
22
|
Long NP, Park S, Anh NH, Nghi TD, Yoon SJ, Park JH, Lim J, Kwon SW. High-Throughput Omics and Statistical Learning Integration for the Discovery and Validation of Novel Diagnostic Signatures in Colorectal Cancer. Int J Mol Sci 2019; 20:E296. [PMID: 30642095 PMCID: PMC6358915 DOI: 10.3390/ijms20020296] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 12/31/2018] [Accepted: 01/04/2019] [Indexed: 02/07/2023] Open
Abstract
The advancement of bioinformatics and machine learning has facilitated the discovery and validation of omics-based biomarkers. This study employed a novel approach combining multi-platform transcriptomics and cutting-edge algorithms to introduce novel signatures for accurate diagnosis of colorectal cancer (CRC). Different random forests (RF)-based feature selection methods including the area under the curve (AUC)-RF, Boruta, and Vita were used and the diagnostic performance of the proposed biosignatures was benchmarked using RF, logistic regression, naïve Bayes, and k-nearest neighbors models. All models showed satisfactory performance in which RF appeared to be the best. For instance, regarding the RF model, the following were observed: mean accuracy 0.998 (standard deviation (SD) < 0.003), mean specificity 0.999 (SD < 0.003), and mean sensitivity 0.998 (SD < 0.004). Moreover, proposed biomarker signatures were highly associated with multifaceted hallmarks in cancer. Some biomarkers were found to be enriched in epithelial cell signaling in Helicobacter pylori infection and inflammatory processes. The overexpression of TGFBI and S100A2 was associated with poor disease-free survival while the down-regulation of NR5A2, SLC4A4, and CD177 was linked to worse overall survival of the patients. In conclusion, novel transcriptome signatures to improve the diagnostic accuracy in CRC are introduced for further validations in various clinical settings.
Collapse
Affiliation(s)
- Nguyen Phuoc Long
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Seongoh Park
- Department of Statistics, Seoul National University, Seoul 08826, Korea.
| | - Nguyen Hoang Anh
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Tran Diem Nghi
- School of Medicine, Vietnam National University, Ho Chi Minh 70000, Vietnam.
| | - Sang Jun Yoon
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Jeong Hill Park
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| | - Johan Lim
- Department of Statistics, Seoul National University, Seoul 08826, Korea.
| | - Sung Won Kwon
- College of Pharmacy and Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea.
| |
Collapse
|
23
|
Langgartner D, Füchsl AM, Kaiser LM, Meier T, Foertsch S, Buske C, Reber SO, Mulaw MA. Biomarkers for classification and class prediction of stress in a murine model of chronic subordination stress. PLoS One 2018; 13:e0202471. [PMID: 30183738 PMCID: PMC6124755 DOI: 10.1371/journal.pone.0202471] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Accepted: 08/03/2018] [Indexed: 12/22/2022] Open
Abstract
Selye defined stress as the nonspecific response of the body to any demand and thus an inherent element of all diseases. He reported that rats show adrenal hypertrophy, thymicolymphatic atrophy, and gastrointestinal ulceration, referred to as the stress triad, upon repeated exposure to nocuous agents. However, Selye's stress triad as well as its extended version including reduced body weight gain, increased plasma glucocorticoid (GC) concentrations, and GC resistance of target cells do not represent reliable discriminatory biomarkers for chronic stress. To address this, we collected multivariate biological data from male mice exposed either to the preclinically validated chronic subordinate colony housing (CSC) paradigm or to single-housed control (SHC) condition. We then used principal component analysis (PCA), top scoring pairs (tsp) and support vector machines (SVM) analyses to identify markers that discriminate between chronically stressed and non-stressed mice. PCA segregated stressed and non-stressed mice, with high loading for some of Selye's stress triad parameters. The tsp analysis, a simple and highly interpretable statistical approach, identified left adrenal weight and relative thymus weight as the pair with the highest discrimination score and prediction accuracy validated by a blinded dataset (92% p-value < 0.0001; SVM model = 83% accuracy and p-value < 0.0001). This finding clearly shows that simultaneous consideration of these two parameters can be used as a reliable biomarker of chronic stress status. Furthermore, our analysis highlights that the tsp approach is a very powerful method whose application extends beyond what has previously been reported.
Collapse
Affiliation(s)
- Dominik Langgartner
- Laboratory for Molecular Psychosomatics, Clinic for Psychosomatic Medicine and Psychotherapy, Ulm University, Ulm, Germany
| | - Andrea M. Füchsl
- Laboratory for Molecular Psychosomatics, Clinic for Psychosomatic Medicine and Psychotherapy, Ulm University, Ulm, Germany
| | - Lisa M. Kaiser
- Institute for Experimental Cancer Research, Comprehensive Cancer Center Ulm, Ulm University, Ulm, Germany
| | - Tatjana Meier
- Clinic for Psychosomatic Medicine and Psychotherapy, Ulm University, Ulm, Germany
| | - Sandra Foertsch
- Laboratory for Molecular Psychosomatics, Clinic for Psychosomatic Medicine and Psychotherapy, Ulm University, Ulm, Germany
| | - Christian Buske
- Institute for Experimental Cancer Research, Comprehensive Cancer Center Ulm, Ulm University, Ulm, Germany
| | - Stefan O. Reber
- Laboratory for Molecular Psychosomatics, Clinic for Psychosomatic Medicine and Psychotherapy, Ulm University, Ulm, Germany
| | - Medhanie A. Mulaw
- Institute for Experimental Cancer Research, Comprehensive Cancer Center Ulm, Ulm University, Ulm, Germany
| |
Collapse
|
24
|
Long NP, Jung KH, Yoon SJ, Anh NH, Nghi TD, Kang YP, Yan HH, Min JE, Hong SS, Kwon SW. Systematic assessment of cervical cancer initiation and progression uncovers genetic panels for deep learning-based early diagnosis and proposes novel diagnostic and prognostic biomarkers. Oncotarget 2017; 8:109436-109456. [PMID: 29312619 PMCID: PMC5752532 DOI: 10.18632/oncotarget.22689] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/27/2017] [Indexed: 12/18/2022] Open
Abstract
Although many outstanding achievements in the management of cervical cancer (CxCa) have obtained, it still imposes a major burden which has prompted scientists to discover and validate new CxCa biomarkers to improve the diagnostic and prognostic assessment of CxCa. In this study, eight different gene expression data sets containing 202 cancer, 115 cervical intraepithelial neoplasia (CIN), and 105 normal samples were utilized for an integrative systems biology assessment in a multi-stage carcinogenesis manner. Deep learning-based diagnostic models were established based on the genetic panels of intrinsic genes of cervical carcinogenesis as well as on the unbiased variable selection approach. Survival analysis was also conducted to explore the potential biomarker candidates for prognostic assessment. Our results showed that cell cycle, RNA transport, mRNA surveillance, and one carbon pool by folate were the key regulatory mechanisms involved in the initiation, progression, and metastasis of CxCa. Various genetic panels combined with machine learning algorithms successfully differentiated CxCa from CIN and normalcy in cross-study normalized data sets. In particular, the 168-gene deep learning model for the differentiation of cancer from normalcy achieved an externally validated accuracy of 97.96% (99.01% sensitivity and 95.65% specificity). Survival analysis revealed that ZNF281 and EPHB6 were the two most promising prognostic genetic markers for CxCa among others. Our findings open new opportunities to enhance current understanding of the characteristics of CxCa pathobiology. In addition, the combination of transcriptomics-based signatures and deep learning classification may become an important approach to improve CxCa diagnosis and management in clinical practice.
Collapse
Affiliation(s)
| | - Kyung Hee Jung
- Department of Drug Development, College of Medicine, Inha University, Incheon 22212, Korea
| | - Sang Jun Yoon
- College of Pharmacy, Seoul National University, Seoul 08826, Korea
| | - Nguyen Hoang Anh
- School of Medicine, Vietnam National University, Ho Chi Minh 70000, Vietnam
| | - Tran Diem Nghi
- School of Medicine, Vietnam National University, Ho Chi Minh 70000, Vietnam
| | - Yun Pyo Kang
- College of Pharmacy, Seoul National University, Seoul 08826, Korea
| | - Hong Hua Yan
- Department of Drug Development, College of Medicine, Inha University, Incheon 22212, Korea
| | - Jung Eun Min
- College of Pharmacy, Seoul National University, Seoul 08826, Korea
| | - Soon-Sun Hong
- Department of Drug Development, College of Medicine, Inha University, Incheon 22212, Korea
| | - Sung Won Kwon
- College of Pharmacy, Seoul National University, Seoul 08826, Korea
- Research Institute of Pharmaceutical Sciences, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
25
|
Li B, Cui Y, Diehn M, Li R. Development and Validation of an Individualized Immune Prognostic Signature in Early-Stage Nonsquamous Non-Small Cell Lung Cancer. JAMA Oncol 2017; 3:1529-1537. [PMID: 28687838 DOI: 10.1001/jamaoncol.2017.1609] [Citation(s) in RCA: 287] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Importance The prevalence of early-stage non-small cell lung cancer (NSCLC) is expected to increase with recent implementation of annual screening programs. Reliable prognostic biomarkers are needed to identify patients at a high risk for recurrence to guide adjuvant therapy. Objective To develop a robust, individualized immune signature that can estimate prognosis in patients with early-stage nonsquamous NSCLC. Design, Setting, and Participants This retrospective study analyzed the gene expression profiles of frozen tumor tissue samples from 19 public NSCLC cohorts, including 18 microarray data sets and 1 RNA-Seq data set for The Cancer Genome Atlas (TCGA) lung adenocarcinoma cohort. Only patients with nonsquamous NSCLC with clinical annotation were included. Samples were from 2414 patients with nonsquamous NSCLC, divided into a meta-training cohort (729 patients), meta-testing cohort (716 patients), and 3 independent validation cohorts (439, 323, and 207 patients). All patients underwent surgery with a negative surgical margin, received no adjuvant or neoadjuvant therapy, and had publicly available gene expression data and survival information. Data were collected from July 22 through September 8, 2016. Main Outcomes and Measures Overall survival. Results Of 2414 patients (1205 men [50%], 1111 women [46%], and 98 of unknown sex [4%]; median age [range], 64 [15-90] years), a prognostic immune signature of 25 gene pairs consisting of 40 unique genes was constructed using the meta-training data set. In the meta-testing and validation cohorts, the immune signature significantly stratified patients into high- vs low-risk groups in terms of overall survival across and within subpopulations with stage I, IA, IB, or II disease and remained as an independent prognostic factor in multivariate analyses (hazard ratio range, 1.72 [95% CI, 1.26-2.33; P < .001] to 2.36 [95% CI, 1.47-3.79; P < .001]) after adjusting for clinical and pathologic factors. Several biological processes, including chemotaxis, were enriched among genes in the immune signature. The percentage of neutrophil infiltration (5.6% vs 1.8%) and necrosis (4.6% vs 1.5%) was significantly higher in the high-risk immune group compared with the low-risk groups in TCGA data set (P < .003). The immune signature achieved a higher accuracy (mean concordance index [C-index], 0.64) than 2 commercialized multigene signatures (mean C-index, 0.53 and 0.61) for estimation of survival in comparable validation cohorts. When integrated with clinical characteristics such as age and stage, the composite clinical and immune signature showed improved prognostic accuracy in all validation data sets relative to molecular signatures alone (mean C-index, 0.70 vs 0.63) and another commercialized clinical-molecular signature (mean C-index, 0.68 vs 0.65). Conclusions and Relevance The proposed clinical-immune signature is a promising biomarker for estimating overall survival in nonsquamous NSCLC, including early-stage disease. Prospective studies are needed to test the clinical utility of the biomarker in individualized management of nonsquamous NSCLC.
Collapse
Affiliation(s)
- Bailiang Li
- Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, California
| | - Yi Cui
- Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, California.,Global Institution for Collaborative Research and Education, Hokkaido University, Sapporo, Japan
| | - Maximilian Diehn
- Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, California.,Institute for Stem Cell Biology and Regenerative Medicine, Stanford University School of Medicine, Palo Alto, California.,Stanford Cancer Institute, Stanford University School of Medicine, Palo Alto, California
| | - Ruijiang Li
- Department of Radiation Oncology, Stanford University School of Medicine, Palo Alto, California.,Stanford Cancer Institute, Stanford University School of Medicine, Palo Alto, California
| |
Collapse
|
26
|
Ma T, Song C, Tseng GC. Discussant paper on ‘Statistical contributions to bioinformatics: Design, modelling, structure learning and integration’. STAT MODEL 2017. [DOI: 10.1177/1471082x17705992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Affiliation(s)
- Tianzhou Ma
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Pittsburgh, PA, USA
| | - Chi Song
- Division of Biostatistics, College of Public Health, Ohio State University, Columbus, OH, USA
| | - George C. Tseng
- Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh Pittsburgh, PA, USA
| |
Collapse
|
27
|
Rohart F, Eslami A, Matigian N, Bougeard S, Lê Cao KA. MINT: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms. BMC Bioinformatics 2017; 18:128. [PMID: 28241739 PMCID: PMC5327533 DOI: 10.1186/s12859-017-1553-8] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2016] [Accepted: 02/16/2017] [Indexed: 12/12/2022] Open
Abstract
Background Molecular signatures identified from high-throughput transcriptomic studies often have poor reliability and fail to reproduce across studies. One solution is to combine independent studies into a single integrative analysis, additionally increasing sample size. However, the different protocols and technological platforms across transcriptomic studies produce unwanted systematic variation that strongly confounds the integrative analysis results. When studies aim to discriminate an outcome of interest, the common approach is a sequential two-step procedure; unwanted systematic variation removal techniques are applied prior to classification methods. Results To limit the risk of overfitting and over-optimistic results of a two-step procedure, we developed a novel multivariate integration method, MINT, that simultaneously accounts for unwanted systematic variation and identifies predictive gene signatures with greater reproducibility and accuracy. In two biological examples on the classification of three human cell types and four subtypes of breast cancer, we combined high-dimensional microarray and RNA-seq data sets and MINT identified highly reproducible and relevant gene signatures predictive of a given phenotype. MINT led to superior classification and prediction accuracy compared to the existing sequential two-step procedures. Conclusions MINT is a powerful approach and the first of its kind to solve the integrative classification framework in a single step by combining multiple independent studies. MINT is computationally fast as part of the mixOmics R CRAN package, available at http://www.mixOmics.org/mixMINT/and http://cran.r-project.org/web/packages/mixOmics/. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1553-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Florian Rohart
- The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, 4102, QLD, Australia
| | - Aida Eslami
- Centre for Heart Lung Innovation, University of British Columbia, Vancouver, BC V6Z 1Y6, Canada
| | - Nicholas Matigian
- The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, 4102, QLD, Australia
| | - Stéphanie Bougeard
- French agency for food, environmental and occupational health safety (Anses), Department of Epidemiology, Ploufragan, 22440, France
| | - Kim-Anh Lê Cao
- The University of Queensland Diamantina Institute, The University of Queensland, Translational Research Institute, Brisbane, 4102, QLD, Australia.
| |
Collapse
|
28
|
Kim S, Jhong JH, Lee J, Koo JY. Meta-analytic support vector machine for integrating multiple omics data. BioData Min 2017; 10:2. [PMID: 28149325 PMCID: PMC5270233 DOI: 10.1186/s13040-017-0126-8] [Citation(s) in RCA: 76] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2016] [Accepted: 01/11/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Of late, high-throughput microarray and sequencing data have been extensively used to monitor biomarkers and biological processes related to many diseases. Under this circumstance, the support vector machine (SVM) has been popularly used and been successful for gene selection in many applications. Despite surpassing benefits of the SVMs, single data analysis using small- and mid-size of data inevitably runs into the problem of low reproducibility and statistical power. To address this problem, we propose a meta-analytic support vector machine (Meta-SVM) that can accommodate multiple omics data, making it possible to detect consensus genes associated with diseases across studies. RESULTS Experimental studies show that the Meta-SVM is superior to the existing meta-analysis method in detecting true signal genes. In real data applications, diverse omics data of breast cancer (TCGA) and mRNA expression data of lung disease (idiopathic pulmonary fibrosis; IPF) were applied. As a result, we identified gene sets consistently associated with the diseases across studies. In particular, the ascertained gene set of TCGA omics data was found to be significantly enriched in the ABC transporters pathways well known as critical for the breast cancer mechanism. CONCLUSION The Meta-SVM effectively achieves the purpose of meta-analysis as jointly leveraging multiple omics data, and facilitates identifying potential biomarkers and elucidating the disease process.
Collapse
Affiliation(s)
- SungHwan Kim
- Department of Statistics, Korea University, Anam-dong, Seoul, 136-701 South Korea.,Department of Statistics, Keimyung University, Dalseoku, Daegu, 42601 South Korea
| | - Jae-Hwan Jhong
- Department of Statistics, Korea University, Anam-dong, Seoul, 136-701 South Korea
| | - JungJun Lee
- Department of Statistics, Korea University, Anam-dong, Seoul, 136-701 South Korea
| | - Ja-Yong Koo
- Department of Statistics, Korea University, Anam-dong, Seoul, 136-701 South Korea
| |
Collapse
|
29
|
Richardson S, Tseng GC, Sun W. Statistical Methods in Integrative Genomics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2016; 3:181-209. [PMID: 27482531 PMCID: PMC4963036 DOI: 10.1146/annurev-statistics-041715-033506] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions.
Collapse
Affiliation(s)
- Sylvia Richardson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, University of Cambridge, CB2 0SR, United Kingdom
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261
| | - Wei Sun
- Department of Biostatistics, Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 27516
| |
Collapse
|