1
|
Saranya S, Chellapandi P, Velayutham P. Enhancement of anti-cancer compounds in fungal elicited-Oldenlandia umbellata culture. NAUNYN-SCHMIEDEBERG'S ARCHIVES OF PHARMACOLOGY 2024:10.1007/s00210-024-03239-9. [PMID: 38916834 DOI: 10.1007/s00210-024-03239-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 06/11/2024] [Indexed: 06/26/2024]
Abstract
Our study focused on enhancing the production of anthraquinone derivatives in Oldenlandia umbellata using fungal elicitors. Aspergillus niger, Mucor prayagensis, and Trichoderma viride were used to elicit the anthraquinone derivatives in root cultures. The elicitation process led to an increase in the production of phytochemicals and secondary metabolites, with the highest total protein content observed in A. niger-elicited plants. We performed qualitative and quantitative phytochemical screening of the 80% methanol extract of the plants. Using reverse phase-ultra-fast liquid chromatography, we identified and quantified five anthraquinone compounds: aloe-emodin, rhein, emodin, chrysophanol, and alizarin. The in vitro root samples elicited with A. niger and M. prayagensis exhibited four and three anthraquinone derivatives, respectively, whereas those elicited with T. viride showed only two derivatives. Interestingly, chrysophanol content was the highest in A. niger-elicited root samples. We constructed a system pharmacology framework consisting of 40 nodes and 45 edges with 34 interacting genes. We also identified human proteins that interact with these derivatives, and inferred their roles in cancer-associated pathways. These anthraquinone derivatives interact with various proteins in multiple pathways, including apoptosis, human cytomegalovirus infection, proteoglycans in cancer, MAPK signaling, and hepatitis C, highlighting their potential therapeutic applications in cancer treatment.
Collapse
Affiliation(s)
- S Saranya
- Industrial Systems Biology Lab, Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, 620024, Tamil Nadu, India
| | - P Chellapandi
- Industrial Systems Biology Lab, Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, 620024, Tamil Nadu, India.
| | - P Velayutham
- Department of Botany, Government Arts College, Karur, 639005, Tamil Nadu, India
| |
Collapse
|
2
|
Wu H, Liu X, Peng L, Yang Y, Zhou Z, Du D, Xu H, Lv W, Lu L. Optimal batch determination for improved harmonization and prognostication of multi-center PET/CT radiomics feature in head and neck cancer. Phys Med Biol 2023; 68:225014. [PMID: 37844604 DOI: 10.1088/1361-6560/ad03d1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2023] [Accepted: 10/16/2023] [Indexed: 10/18/2023]
Abstract
Objective. To determine the optimal approach for identifying and mitigating batch effects in PET/CT radiomics features, and further improve the prognosis of patients with head and neck cancer (HNC), this study investigated the performance of three batch harmonization methods.Approach. Unsupervised harmonization identified the batch labels by K-means clustering. Supervised harmonization regarding the image acquisition factors (center, manufacturer, scanner, filter kernel) as known/given batch labels, and Combat harmonization was then implemented separately and sequentially based on the batch labels, i.e. harmonizing features among batches determined by each factor individually or harmonizing features among batches determined by multiple factors successively. Extensive experiments were conducted to predict overall survival (OS) on public PET/CT datasets that contain 800 patients from 9 centers.Main results. In the external validation cohort, results show that compared to original models without harmonization, Combat harmonization would be beneficial in OS prediction with C-index of 0.687-0.740 versus 0.684-0.767. Supervised harmonization slightly outperformed unsupervised harmonization in all models (C-index: 0.692-0.767 versus 0.684-0.750). Separate harmonization outperformed sequential harmonization in CT_m+clinic and CT_cm+clinic models with C-index of 0.752 and 0.722, respectively, while sequential harmonization involved clinical features in PET_rs+clinic model further improving the performance and achieving the highest C-index of 0.767.Significance. Optimal batch determination especially sequential harmonization for Combat holds the potential to improve the prognostic power of radiomics model in multi-center HNC dataset with PET/CT imaging.
Collapse
Affiliation(s)
- Huiqin Wu
- Department of Medical Imaging, Guangdong Second Provincial General Hospital, Guangzhou, Guangdong, 518037, People's Republic of China
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Xiaohui Liu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Lihong Peng
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Yuling Yang
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Zidong Zhou
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Dongyang Du
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Hui Xu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
| | - Wenbing Lv
- School of Information and Yunnan Key Laboratory of Intelligent Systems and Computing, Yunnan University, Kunming, Yunnan, 650504, People's Republic of China
| | - Lijun Lu
- School of Biomedical Engineering and Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, Guangdong, 510515, People's Republic of China
- Pazhou Lab, Guangzhou 510330, People's Republic of China
| |
Collapse
|
3
|
Zhao H, Huang Y, Tong G, Wu W, Ren Y. Identification of a Novel Oxidative Stress- and Anoikis-Related Prognostic Signature and Its Immune Landscape Analysis in Non-Small Cell Lung Cancer. Int J Mol Sci 2023; 24:16188. [PMID: 38003378 PMCID: PMC10671784 DOI: 10.3390/ijms242216188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 11/06/2023] [Accepted: 11/07/2023] [Indexed: 11/26/2023] Open
Abstract
The objective of this study was to identify a kind of prognostic signature based on oxidative stress- and anoikis-related genes (OARGs) for predicting the prognosis and immune landscape of NSCLC. Initially, We identified 47 differentially expressed OARGs that primarily regulate oxidative stress and epithelial cell infiltration through the PI3K-Akt pathway. Subsequently, 10 OARGs related to prognosis determined two potential clusters. A cluster was associated with a shorter survival level, lower immune infiltration, higher stemness index and tumor mutation burden. Next, The best risk score model constructed by prognostic OARGs was the Random Survival Forest model, and it included SLC2A1, LDHA and PLAU. The high-risk group was associated with cluster A and poor prognosis, with a higher tumor mutation burden, stemness index and proportion of M0-type macrophages, and a lower immune checkpoint expression level, immune function score and IPS score. The calibration curve and decision-making curve showed that the risk score combined with clinical pathological characteristics could be used to construct a nomogram for guiding the clinical treatment strategies. Finally, We found that all three hub genes were highly expressed in tumor tissues, and LDHA expression was mainly regulated by has-miR-338-3p, has-miR-330-5p and has-miR-34c-5p. Altogether, We constructed an OARG-related prognostic signature to reveal potential relationships between the signature and clinical characteristics, TME, stemness, tumor mutational burden, drug sensitivity and immune landscape in NSCLC patients.
Collapse
Affiliation(s)
| | | | | | - Wei Wu
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang 110122, China; (H.Z.); (Y.H.); (G.T.)
| | - Yangwu Ren
- Department of Epidemiology, School of Public Health, China Medical University, Shenyang 110122, China; (H.Z.); (Y.H.); (G.T.)
| |
Collapse
|
4
|
Westerlund AM, Hawe JS, Heinig M, Schunkert H. Risk Prediction of Cardiovascular Events by Exploration of Molecular Data with Explainable Artificial Intelligence. Int J Mol Sci 2021; 22:10291. [PMID: 34638627 PMCID: PMC8508897 DOI: 10.3390/ijms221910291] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/17/2021] [Accepted: 09/18/2021] [Indexed: 12/11/2022] Open
Abstract
Cardiovascular diseases (CVD) annually take almost 18 million lives worldwide. Most lethal events occur months or years after the initial presentation. Indeed, many patients experience repeated complications or require multiple interventions (recurrent events). Apart from affecting the individual, this leads to high medical costs for society. Personalized treatment strategies aiming at prediction and prevention of recurrent events rely on early diagnosis and precise prognosis. Complementing the traditional environmental and clinical risk factors, multi-omics data provide a holistic view of the patient and disease progression, enabling studies to probe novel angles in risk stratification. Specifically, predictive molecular markers allow insights into regulatory networks, pathways, and mechanisms underlying disease. Moreover, artificial intelligence (AI) represents a powerful, yet adaptive, framework able to recognize complex patterns in large-scale clinical and molecular data with the potential to improve risk prediction. Here, we review the most recent advances in risk prediction of recurrent cardiovascular events, and discuss the value of molecular data and biomarkers for understanding patient risk in a systems biology context. Finally, we introduce explainable AI which may improve clinical decision systems by making predictions transparent to the medical practitioner.
Collapse
Affiliation(s)
- Annie M. Westerlund
- Department of Cardiology, Deutsches Herzzentrum München, Technical University Munich, Lazarettstrasse 36, 80636 Munich, Germany; (A.M.W.); (J.S.H.)
- Institute of Computational Biology, HelmholtzZentrum München, Ingolstädter Landstrasse 1, 85764 Munich, Germany
| | - Johann S. Hawe
- Department of Cardiology, Deutsches Herzzentrum München, Technical University Munich, Lazarettstrasse 36, 80636 Munich, Germany; (A.M.W.); (J.S.H.)
| | - Matthias Heinig
- Institute of Computational Biology, HelmholtzZentrum München, Ingolstädter Landstrasse 1, 85764 Munich, Germany
- Department of Informatics, Technical University Munich, Boltzmannstrasse 3, 85748 Garching, Germany
| | - Heribert Schunkert
- Department of Cardiology, Deutsches Herzzentrum München, Technical University Munich, Lazarettstrasse 36, 80636 Munich, Germany; (A.M.W.); (J.S.H.)
- Deutsches Zentrum für Herz- und Kreislaufforschung (DZHK), Munich Heart Alliance, Biedersteiner Strasse 29, 80802 Munich, Germany
| |
Collapse
|
5
|
Lu M, Sha Y, Silva TC, Colaprico A, Sun X, Ban Y, Wang L, Lehmann BD, Chen XS. LR Hunting: A Random Forest Based Cell-Cell Interaction Discovery Method for Single-Cell Gene Expression Data. Front Genet 2021; 12:708835. [PMID: 34497635 PMCID: PMC8420858 DOI: 10.3389/fgene.2021.708835] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 07/14/2021] [Indexed: 12/26/2022] Open
Abstract
Cell–cell interactions (CCIs) and cell–cell communication (CCC) are critical for maintaining complex biological systems. The availability of single-cell RNA sequencing (scRNA-seq) data opens new avenues for deciphering CCIs and CCCs through identifying ligand-receptor (LR) gene interactions between cells. However, most methods were developed to examine the LR interactions of individual pairs of genes. Here, we propose a novel approach named LR hunting which first uses random forests (RFs)-based data imputation technique to link the data between different cell types. To guarantee the robustness of the data imputation procedure, we repeat the computation procedures multiple times to generate aggregated imputed minimal depth index (IMDI). Next, we identify significant LR interactions among all combinations of LR pairs simultaneously using unsupervised RFs. We demonstrated LR hunting can recover biological meaningful CCIs using a mouse cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) dataset and a triple-negative breast cancer scRNA-seq dataset.
Collapse
Affiliation(s)
- Min Lu
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Yifan Sha
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Tiago C Silva
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Antonio Colaprico
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Xiaodian Sun
- Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Yuguang Ban
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States.,Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Lily Wang
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States.,Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, United States.,Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, FL, United States.,John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, United States
| | - Brian D Lehmann
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States.,Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, United States
| | - X Steven Chen
- Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, United States.,Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL, United States
| |
Collapse
|
6
|
Naseem M, Cao S, Yang D, Millstein J, Puccini A, Loupakis F, Stintzing S, Cremolini C, Tokunaga R, Battaglin F, Soni S, Berger MD, Barzi A, Zhang W, Falcone A, Heinemann V, Lenz HJ. Random survival forests identify pathways with polymorphisms predictive of survival in KRAS mutant and KRAS wild-type metastatic colorectal cancer patients. Sci Rep 2021; 11:12191. [PMID: 34108518 PMCID: PMC8190302 DOI: 10.1038/s41598-021-91330-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Accepted: 05/20/2021] [Indexed: 12/22/2022] Open
Abstract
KRAS status serves as a predictive biomarker of response to treatment in metastatic colorectal cancer (mCRC). We hypothesize that complex interactions between multiple pathways contribute to prognostic differences between KRAS wild-type and KRAS mutant patients with mCRC, and aim to identify polymorphisms predictive of clinical outcomes in this subpopulation. Most pathway association studies are limited in assessing gene–gene interactions and are restricted to an individual pathway. In this study, we use a random survival forests (RSF) method for identifying predictive markers of overall survival (OS) and progression-free survival (PFS) in mCRC patients treated with FOLFIRI/bevacizumab. A total of 486 mCRC patients treated with FOLFIRI/bevacizumab from two randomized phase III trials, TRIBE and FIRE-3, were included in the current study. Two RSF approaches were used, namely variable importance and minimal depth. We discovered that Wnt/β-catenin and tumor associated macrophage pathway SNPs are strong predictors of OS and PFS in mCRC patients treated with FOLFIRI/bevacizumab independent of KRAS status, whereas a SNP in the sex-differentiation pathway gene, DMRT1, is strongly predictive of OS and PFS in KRAS mutant mCRC patients. Our results highlight RSF as a useful method for identifying predictive SNPs in multiple pathways.
Collapse
Affiliation(s)
- Madiha Naseem
- Division of Medical Oncology, Sharon Carpenter Laboratory, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, 1441 Eastlake Avenue, Los Angeles, CA, 90033, USA
| | - Shu Cao
- Department of Preventive Medicine, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Dongyun Yang
- Department of Preventive Medicine, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Joshua Millstein
- Department of Preventive Medicine, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Alberto Puccini
- Division of Medical Oncology, Sharon Carpenter Laboratory, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, 1441 Eastlake Avenue, Los Angeles, CA, 90033, USA
| | - Fotios Loupakis
- Oncologia Medica 1, Istituto Oncologico Veneto, Istituto Di Ricovero E Cura a Carattere Scientifico, Via Gattamelata, Padua, Italy
| | - Sebastian Stintzing
- Medical Department, Division of Hematology, Oncology and Hematology, Tumor Immunology (CCM), Charité-Universitätsmedizin, Berlin, Germany
| | - Chiara Cremolini
- Oncologia Medica, Azienda Ospedaliero-Universitaria Pisana, Istituto Toscano Tumori, Via Roma, Pisa, Italy
| | - Ryuma Tokunaga
- Division of Medical Oncology, Sharon Carpenter Laboratory, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, 1441 Eastlake Avenue, Los Angeles, CA, 90033, USA
| | - Francesca Battaglin
- Division of Medical Oncology, Sharon Carpenter Laboratory, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, 1441 Eastlake Avenue, Los Angeles, CA, 90033, USA.,Oncologia Medica 1, Istituto Oncologico Veneto, Istituto Di Ricovero E Cura a Carattere Scientifico, Via Gattamelata, Padua, Italy
| | - Shivani Soni
- Division of Medical Oncology, Sharon Carpenter Laboratory, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, 1441 Eastlake Avenue, Los Angeles, CA, 90033, USA
| | - Martin D Berger
- Division of Medical Oncology, Sharon Carpenter Laboratory, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, 1441 Eastlake Avenue, Los Angeles, CA, 90033, USA
| | - Afsaneh Barzi
- Division of Medical Oncology, Sharon Carpenter Laboratory, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, 1441 Eastlake Avenue, Los Angeles, CA, 90033, USA
| | - Wu Zhang
- Division of Medical Oncology, Sharon Carpenter Laboratory, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, 1441 Eastlake Avenue, Los Angeles, CA, 90033, USA
| | - Alfredo Falcone
- Oncologia Medica, Azienda Ospedaliero-Universitaria Pisana, Istituto Toscano Tumori, Via Roma, Pisa, Italy
| | - Volker Heinemann
- Department of Medicine and Comprehensive Cancer Center, Ludwig-Maximilians-University Munich, Munich, Germany
| | - Heinz-Josef Lenz
- Division of Medical Oncology, Sharon Carpenter Laboratory, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, 1441 Eastlake Avenue, Los Angeles, CA, 90033, USA.
| |
Collapse
|
7
|
Lin H, Zeng L, Yang J, Hu W, Zhu Y. A Machine Learning-Based Model to Predict Survival After Transarterial Chemoembolization for BCLC Stage B Hepatocellular Carcinoma. Front Oncol 2021; 11:608260. [PMID: 33738252 PMCID: PMC7962602 DOI: 10.3389/fonc.2021.608260] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2020] [Accepted: 01/06/2021] [Indexed: 12/23/2022] Open
Abstract
Objective We sought to develop and validate a novel prognostic model for predicting survival of patients with Barcelona Clinic Liver Cancer Stages (BCLC) stage B hepatocellular carcinoma (HCC) using a machine learning approach based on random survival forests (RSF). Methods We retrospectively analyzed overall survival rates of patients with BCLC stage B HCC using a training (n = 602), internal validation (n = 301), and external validation (n = 343) groups. We extracted twenty-one clinical and biochemical parameters with established strategies for preprocessing, then adopted the RSF classifier for variable selection and model development. We evaluated model performance using the concordance index (c-index) and area under the receiver operator characteristic curves (AUROC). Results RSF revealed that five parameters, namely size of the tumor, BCLC-B sub-classification, AFP level, ALB level, and number of lesions, were strong predictors of survival. These were thereafter used for model development. The established model had a c-index of 0.69, whereas AUROC for predicting survival outcomes of the first three years reached 0.72, 0.71, and 0.73, respectively. Additionally, the model had better performance relative to other eight Cox proportional-hazards models, and excellent performance in the subgroup of BCLC-B sub-classification B I and B II stages. Conclusion The RSF-based model, established herein, can effectively predict survival of patients with BCLC stage B HCC, with better performance than previous Cox proportional hazards models.
Collapse
Affiliation(s)
- Huapeng Lin
- Department of Intensive Care Unit, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Department of Hepatobiliary Surgery, The Second Affiliated Hospital of Chongqing Medical University, Chongqing, China
| | - Lingfeng Zeng
- Department of Nephrology, The Second Xiangya Hospital of Central South University, Changsha, China
| | - Jing Yang
- Department of Intensive Care Unit, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Wei Hu
- Department of Intensive Care Unit, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Ying Zhu
- Department of Intensive Care Unit, Affiliated Hangzhou First People's Hospital, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
8
|
Seifert S, Gundlach S, Junge O, Szymczak S. Integrating biological knowledge and gene expression data using pathway-guided random forests: a benchmarking study. Bioinformatics 2021; 36:4301-4308. [PMID: 32399562 PMCID: PMC7520048 DOI: 10.1093/bioinformatics/btaa483] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Revised: 03/13/2020] [Accepted: 05/05/2020] [Indexed: 12/12/2022] Open
Abstract
MOTIVATION High-throughput technologies allow comprehensive characterization of individuals on many molecular levels. However, training computational models to predict disease status based on omics data is challenging. A promising solution is the integration of external knowledge about structural and functional relationships into the modeling process. We compared four published random forest-based approaches using two simulation studies and nine experimental datasets. RESULTS The self-sufficient prediction error approach should be applied when large numbers of relevant pathways are expected. The competing methods hunting and learner of functional enrichment should be used when low numbers of relevant pathways are expected or the most strongly associated pathways are of interest. The hybrid approach synthetic features is not recommended because of its high false discovery rate. AVAILABILITY AND IMPLEMENTATION An R package providing functions for data analysis and simulation is available at GitHub (https://github.com/szymczak-lab/PathwayGuidedRF). An accompanying R data package (https://github.com/szymczak-lab/DataPathwayGuidedRF) stores the processed and quality controlled experimental datasets downloaded from Gene Expression Omnibus (GEO). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Stephan Seifert
- Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein, Kiel 24105, Germany
| | - Sven Gundlach
- Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein, Kiel 24105, Germany
| | - Olaf Junge
- Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein, Kiel 24105, Germany
| | - Silke Szymczak
- Institute of Medical Informatics and Statistics, Kiel University, University Hospital Schleswig-Holstein, Kiel 24105, Germany
| |
Collapse
|
9
|
Song L, Wang XY, He XF. A 5-Gene Prognostic Combination for Predicting Survival of Patients with Gastric Cancer. Med Sci Monit 2019; 25:6313-6320. [PMID: 31422414 PMCID: PMC6713029 DOI: 10.12659/msm.914815] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Background The aim of the study was to identify a multigene prognostic factor in patients with gastric cancer (GC). Material/Methods Random survival forest (RSF) was performed to screen survival-related genes and develop a multigene combination based on the cumulative hazard function of each GC patient in TCGA-STAD and GSE15459. Kaplan-Meier curve and univariate and multivariable Cox proportional hazards regression model were applied to evaluate the prognostic performance of the 5-gene combination. C-index was used to compare the prognostic performance of the 5-gene combination and another 9-gene signature in GC. Gene set enrichment analysis (GSEA) was conducted. Results We obtained 19 survival-related genes through univariate Cox proportional hazards analysis in the training set, 5 of which were identified and were used to develop a 5-gene combination through RSF. Patients in the 5-gene combination low-risk group had better overall survival (OS) than those in the 5-gene combination high-risk group, and the 5-gene combination was demonstrated to be an independent prognostic factor in patients with GC. The 5-gene combination outperformed the 9-gene signature in predicting the OS of GC patients, and it might affect the prognosis of GC patients through E2F signaling, MYC signaling, and G2M checkpoint. Conclusions We introduce a 5-gene combination that can predict the survival of GC patients and might be an independent prognostic factor in GC.
Collapse
Affiliation(s)
- Liang Song
- Endoscopy Room, Heping Hospital Affiliated to Changzhi Medical College, Changzhi, Shanxi, China (mainland)
| | - Xiao-Yan Wang
- Department of Epidemiology and Health Statistics, Basic Medical College of Zhejiang University of Traditional Chinese Medicine, Hangzhou, Zhejiang, China (mainland)
| | - Xiao-Feng He
- Department of Science and Education, Heping Hospital Affiliated to Changzhi Medical College, Changzhi, Shanxi, China (mainland)
| |
Collapse
|
10
|
A methodological comparison of risk scores versus decision trees for predicting drug-resistant infections: A case study using extended-spectrum beta-lactamase (ESBL) bacteremia. Infect Control Hosp Epidemiol 2019; 40:400-407. [PMID: 30827286 DOI: 10.1017/ice.2019.17] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
BACKGROUND Timely identification of multidrug-resistant gram-negative infections remains an epidemiological challenge. Statistical models for predicting drug resistance can offer utility where rapid diagnostics are unavailable or resource-impractical. Logistic regression-derived risk scores are common in the healthcare epidemiology literature. Machine learning-derived decision trees are an alternative approach for developing decision support tools. Our group previously reported on a decision tree for predicting ESBL bloodstream infections. Our objective in the current study was to develop a risk score from the same ESBL dataset to compare these 2 methods and to offer general guiding principles for using each approach. METHODS Using a dataset of 1,288 patients with Escherichia coli or Klebsiella spp bacteremia, we generated a risk score to predict the likelihood that a bacteremic patient was infected with an ESBL-producer. We evaluated discrimination (original and cross-validated models) using receiver operating characteristic curves and C statistics. We compared risk score and decision tree performance, and we reviewed their practical and methodological attributes. RESULTS In total, 194 patients (15%) were infected with ESBL-producing bacteremia. The clinical risk score included 14 variables, compared to the 5 decision-tree variables. The positive and negative predictive values of the risk score and decision tree were similar (>90%), but the C statistic of the risk score (0.87) was 10% higher. CONCLUSIONS A decision tree and risk score performed similarly for predicting ESBL infection. The decision tree was more user-friendly, with fewer variables for the end user, whereas the risk score offered higher discrimination and greater flexibility for adjusting sensitivity and specificity.
Collapse
|
11
|
Bae S, Choi YS, Ahn SS, Chang JH, Kang SG, Kim EH, Kim SH, Lee SK. Radiomic MRI Phenotyping of Glioblastoma: Improving Survival Prediction. Radiology 2018; 289:797-806. [PMID: 30277442 DOI: 10.1148/radiol.2018180200] [Citation(s) in RCA: 145] [Impact Index Per Article: 24.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Purpose To investigate whether radiomic features at MRI improve survival prediction in patients with glioblastoma multiforme (GBM) when they are integrated with clinical and genetic profiles. Materials and Methods Data in patients with a diagnosis of GBM between December 2009 and January 2017 (217 patients) were retrospectively reviewed up to May 2017 and allocated to training and test sets (3:1 ratio). Radiomic features (n = 796) were extracted from multiparametric MRI. A random survival forest (RSF) model was trained with the radiomic features along with clinical and genetic profiles (O-6-methylguanine-DNA-methyltransferase promoter methylation and isocitrate dehydrogenase 1 mutation statuses) to predict overall survival (OS) and progression-free survival (PFS). The RSF models were validated on the test set. The incremental values of radiomic features were evaluated by using the integrated area under the receiver operating characteristic curve (iAUC). Results The 217 patients had a mean age of 57.9 years, and there were 87 female patients (age range, 22-81 years) and 130 male patients (age range, 17-85 years). The median OS and PFS of patients were 352 days (range, 20-1809 days) and 264 days (range, 21-1809 days), respectively. The RSF radiomics models were successfully validated on the test set (iAUC, 0.652 [95% confidence interval {CI}, 0.524, 0.769] and 0.590 [95% CI: 0.502, 0.689] for OS and PFS, respectively). The addition of a radiomics model to clinical and genetic profiles improved survival prediction when compared with models containing clinical and genetic profiles alone (P = .04 and .03 for OS and PFS, respectively). Conclusion Radiomic MRI phenotyping can improve survival prediction when integrated with clinical and genetic profiles and thus has potential as a practical imaging biomarker. © RSNA, 2018 Online supplemental material is available for this article. See also the editorial by Jain and Lui in this issue.
Collapse
Affiliation(s)
- Sohi Bae
- From the Department of Radiology, Research Institute of Radiological Science (S.B., Y.S.C., S.S.A., S.K.L.), Department of Neurosurgery (J.H.C., S.G.K., E.H.K.), and Department of Pathology (S.H.K.), Yonsei University College of Medicine, 50 Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea; and Department of Radiology, National Health Insurance Service Ilsan Hospital, Goyang, Korea (S.B.)
| | - Yoon Seong Choi
- From the Department of Radiology, Research Institute of Radiological Science (S.B., Y.S.C., S.S.A., S.K.L.), Department of Neurosurgery (J.H.C., S.G.K., E.H.K.), and Department of Pathology (S.H.K.), Yonsei University College of Medicine, 50 Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea; and Department of Radiology, National Health Insurance Service Ilsan Hospital, Goyang, Korea (S.B.)
| | - Sung Soo Ahn
- From the Department of Radiology, Research Institute of Radiological Science (S.B., Y.S.C., S.S.A., S.K.L.), Department of Neurosurgery (J.H.C., S.G.K., E.H.K.), and Department of Pathology (S.H.K.), Yonsei University College of Medicine, 50 Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea; and Department of Radiology, National Health Insurance Service Ilsan Hospital, Goyang, Korea (S.B.)
| | - Jong Hee Chang
- From the Department of Radiology, Research Institute of Radiological Science (S.B., Y.S.C., S.S.A., S.K.L.), Department of Neurosurgery (J.H.C., S.G.K., E.H.K.), and Department of Pathology (S.H.K.), Yonsei University College of Medicine, 50 Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea; and Department of Radiology, National Health Insurance Service Ilsan Hospital, Goyang, Korea (S.B.)
| | - Seok-Gu Kang
- From the Department of Radiology, Research Institute of Radiological Science (S.B., Y.S.C., S.S.A., S.K.L.), Department of Neurosurgery (J.H.C., S.G.K., E.H.K.), and Department of Pathology (S.H.K.), Yonsei University College of Medicine, 50 Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea; and Department of Radiology, National Health Insurance Service Ilsan Hospital, Goyang, Korea (S.B.)
| | - Eui Hyun Kim
- From the Department of Radiology, Research Institute of Radiological Science (S.B., Y.S.C., S.S.A., S.K.L.), Department of Neurosurgery (J.H.C., S.G.K., E.H.K.), and Department of Pathology (S.H.K.), Yonsei University College of Medicine, 50 Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea; and Department of Radiology, National Health Insurance Service Ilsan Hospital, Goyang, Korea (S.B.)
| | - Se Hoon Kim
- From the Department of Radiology, Research Institute of Radiological Science (S.B., Y.S.C., S.S.A., S.K.L.), Department of Neurosurgery (J.H.C., S.G.K., E.H.K.), and Department of Pathology (S.H.K.), Yonsei University College of Medicine, 50 Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea; and Department of Radiology, National Health Insurance Service Ilsan Hospital, Goyang, Korea (S.B.)
| | - Seung-Koo Lee
- From the Department of Radiology, Research Institute of Radiological Science (S.B., Y.S.C., S.S.A., S.K.L.), Department of Neurosurgery (J.H.C., S.G.K., E.H.K.), and Department of Pathology (S.H.K.), Yonsei University College of Medicine, 50 Yonsei-ro, Seodaemun-gu, Seoul 03722, Korea; and Department of Radiology, National Health Insurance Service Ilsan Hospital, Goyang, Korea (S.B.)
| |
Collapse
|
12
|
Wang W, Liu W. Integration of gene interaction information into a reweighted random survival forest approach for accurate survival prediction and survival biomarker discovery. Sci Rep 2018; 8:13202. [PMID: 30181543 PMCID: PMC6123437 DOI: 10.1038/s41598-018-31497-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 08/20/2018] [Indexed: 02/05/2023] Open
Abstract
Accurately predicting patient risk and identifying survival biomarkers are two important tasks in survival analysis. For the emerging high-throughput gene expression data, random survival forest (RSF) is attracting more and more attention as it not only shows excellent performance on survival prediction problems with high-dimensional variables, but also is capable of identifying important variables according to variable importance automatically calculated within the algorithm. However, RSF still suffers from some problems such as limited predictive accuracy on independent datasets and limited biological interpretation of survival biomarkers. In this study, we integrated gene interaction information into a Reweighted RSF model (RRSF) to improve predictive accuracy and identify biologically meaningful survival markers. We applied RRSF to the prediction of patients with glioblastoma multiforme (GBM) and esophageal squamous cell carcinoma (ESCC). With a reconstructed global pathway network and an mRNA-lncRNA co-expression network as the prior gene interaction information, RRSF showed better overall predictive performance than RSF on three GBM and two ESCC datasets. In addition, RRSF identified a two-gene and three-lncRNA signature, which showed robust prognostic values and had high biological relevance to the development of GBM and ESCC, respectively.
Collapse
Affiliation(s)
- Wei Wang
- Department of Mathematics, Heilongjiang Institute of Technology, Harbin, 150050, China
| | - Wei Liu
- Department of Mathematics, Heilongjiang Institute of Technology, Harbin, 150050, China.
- The Key Laboratory of Molecular Biology for High Cancer Incidence Coastal Chaoshan Area, Shantou University Medical College, Shantou, 515041, China.
| |
Collapse
|
13
|
Wang H, Shen L, Geng J, Wu Y, Xiao H, Zhang F, Si H. Prognostic value of cancer antigen -125 for lung adenocarcinoma patients with brain metastasis: A random survival forest prognostic model. Sci Rep 2018; 8:5670. [PMID: 29618796 PMCID: PMC5884842 DOI: 10.1038/s41598-018-23946-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2018] [Accepted: 03/20/2018] [Indexed: 01/09/2023] Open
Abstract
Using random survival forest, this study was intended to evaluate the prognostic value of serum markers for lung adenocarcinoma patients with brain metastasis (BM), and tried to integrate them into a prognostic model. During 2010 to 2015, the patients were retrieved from two medical centers. Besides the Cox proportional hazards regression, the random survival forest (RSF) were also used to develop prognostic model from the group A (n = 142). In RSF of the group A, the factors, whose minimal depth were greater than the depth threshold or had a negative variable importance (VIMP), were firstly excluded. Subsequently, C-index and Akaike information criterion (AIC) were used to guide us finding models with higher prognostic ability and lower overfitting possibility. These RSF models, together with the Cox, modified-RPA and lung-GPA index were validated and compared, especially in the group B (CAMS, n = 53). Our data indicated that the KSE125 model (KPS, smoking, EGFR-20 (exon 18, 19 and 21) and Ca125) was the best in survival prediction, and performed well in internal and external validation. In conclusions, for lung adenocarcinoma patients with brain metastasis, a validated prognostic nomogram (KPS, smoking, EGFR-20 and Ca125) can more accurately predict 1-year and 2-year survival of the patients.
Collapse
Affiliation(s)
- Hao Wang
- Department of Radiotherapy, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui Province, 230022, China
| | - Liuhai Shen
- Department of Nuclear Medicine, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui Province, 230022, China
| | - Jianhua Geng
- Department of Nuclear Medicine, National Cancer Center/ Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Yitian Wu
- Department of Nuclear Medicine, National Cancer Center/ Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100021, China
| | - Huan Xiao
- Department of Nuclear Medicine, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui Province, 230022, China
| | - Fan Zhang
- Department of Radiotherapy, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui Province, 230022, China
| | - Hongwei Si
- Department of Nuclear Medicine, The First Affiliated Hospital of Anhui Medical University, Hefei, Anhui Province, 230022, China.
| |
Collapse
|
14
|
Uppu S, Krishna A, Gopalan RP. A Review on Methods for Detecting SNP Interactions in High-Dimensional Genomic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:599-612. [PMID: 28060710 DOI: 10.1109/tcbb.2016.2635125] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
In this era of genome-wide association studies (GWAS), the quest for understanding the genetic architecture of complex diseases is rapidly increasing more than ever before. The development of high throughput genotyping and next generation sequencing technologies enables genetic epidemiological analysis of large scale data. These advances have led to the identification of a number of single nucleotide polymorphisms (SNPs) responsible for disease susceptibility. The interactions between SNPs associated with complex diseases are increasingly being explored in the current literature. These interaction studies are mathematically challenging and computationally complex. These challenges have been addressed by a number of data mining and machine learning approaches. This paper reviews the current methods and the related software packages to detect the SNP interactions that contribute to diseases. The issues that need to be considered when developing these models are addressed in this review. The paper also reviews the achievements in data simulation to evaluate the performance of these models. Further, it discusses the future of SNP interaction analysis.
Collapse
|
15
|
Moradian H, Larocque D, Bellavance F. L₁ splitting rules in survival forests. LIFETIME DATA ANALYSIS 2017; 23:671-691. [PMID: 27379423 DOI: 10.1007/s10985-016-9372-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2015] [Accepted: 06/17/2016] [Indexed: 06/06/2023]
Abstract
The log-rank test is used as the split function in many commonly used survival trees and forests algorithms. However, the log-rank test may have a significant loss of power in some circumstances, especially when the hazard functions or when the survival functions cross each other in the two compared groups. We investigate the use of the integrated absolute difference between the two children nodes survival functions as the splitting rule. Simulations studies and applications to real data sets show that forests built with this rule produce very good results in general, and that they are often better compared to forests built with the log-rank splitting rule.
Collapse
Affiliation(s)
- Hoora Moradian
- Department of Decision Sciences, HEC Montréal, 3000 chemin de la Côte-Sainte-Catherine, Montreal, QC, H3T 2A7, Canada
| | - Denis Larocque
- Department of Decision Sciences, HEC Montréal, 3000 chemin de la Côte-Sainte-Catherine, Montreal, QC, H3T 2A7, Canada.
| | - François Bellavance
- Department of Decision Sciences, HEC Montréal, 3000 chemin de la Côte-Sainte-Catherine, Montreal, QC, H3T 2A7, Canada
| |
Collapse
|
16
|
Cao J, Lan S, Shen L, Si H, Xiao H, Yuan Q, Li X, Li H, Guo R. Hemoglobin level, a prognostic factor for nasal extranodal natural killer/T-cell lymphoma patients from stage I to IV: A validated prognostic nomogram. Sci Rep 2017; 7:10982. [PMID: 28887511 PMCID: PMC5591293 DOI: 10.1038/s41598-017-11137-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 08/14/2017] [Indexed: 01/08/2023] Open
Abstract
Although nasal extranodal natural killer/T-cell lymphoma (nasal ENKL) shares some prognostic factors with other lymphomas, seldom studies had explored the prognostic value of hemoglobin. The ENKL cases in stage I–IV during 2000 to 2015 were collected from two medical centers (group A, n = 192), and were randomly divided into the group B (n = 155) and C (n = 37). Although the significant factors identified by the univariate analysis differed between the group A and B, the multivariate Cox regression indicated the same factors. C-index of the model was slightly better than Yang’s, but its integrated Brier score (IBS) was obviously lower than Yang’s both in the group A and B. Additionally, minimal depth of random survival forest (RSF) classifier confirmed that the prognostic ability of hemoglobin was better than age both in the group A and B. In the calibration of the nomogram, the predicted 3-year or 5-year OS of our nomogram well agreed with the corresponding actual OS. In conclusion, Hemoglobin is a prognostic factor for nasal ENKL patients in stage I - IV, and integrating it into a validated prognostic nomogram, whose generalization error is the smallest among the evaluated models, can be used to predict the patients’ outcome.
Collapse
Affiliation(s)
- Jianzhong Cao
- Department of Radiotherapy, Shanxi Cancer Hospital and Institute, Affiliated Hospital of Shanxi Medical University, Shanxi, 030013, China
| | - Shengmin Lan
- Department of Radiotherapy, Shanxi Cancer Hospital and Institute, Affiliated Hospital of Shanxi Medical University, Shanxi, 030013, China
| | - Liuhai Shen
- Department of Nuclear Medicine, the First Affiliated Hospital of Anhui Medical University, Hefei, Anhui Province, 230022, China
| | - Hongwei Si
- Department of Nuclear Medicine, the First Affiliated Hospital of Anhui Medical University, Hefei, Anhui Province, 230022, China.
| | - Huan Xiao
- Department of Nuclear Medicine, the First Affiliated Hospital of Anhui Medical University, Hefei, Anhui Province, 230022, China
| | - Qiang Yuan
- Department of Radiotherapy, Shanxi Cancer Hospital and Institute, Affiliated Hospital of Shanxi Medical University, Shanxi, 030013, China
| | - Xue Li
- Department of Radiotherapy, Shanxi Cancer Hospital and Institute, Affiliated Hospital of Shanxi Medical University, Shanxi, 030013, China
| | - Hongwei Li
- Department of Radiotherapy, Shanxi Cancer Hospital and Institute, Affiliated Hospital of Shanxi Medical University, Shanxi, 030013, China
| | - Ruyuan Guo
- Department of Radiotherapy, Shanxi Cancer Hospital and Institute, Affiliated Hospital of Shanxi Medical University, Shanxi, 030013, China
| |
Collapse
|
17
|
Attallah O, Karthikesalingam A, Holt PJE, Thompson MM, Sayers R, Bown MJ, Choke EC, Ma X. Feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re-intervention. BMC Med Inform Decis Mak 2017; 17:115. [PMID: 28774329 PMCID: PMC5543447 DOI: 10.1186/s12911-017-0508-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 07/24/2017] [Indexed: 12/25/2022] Open
Abstract
Background Feature selection (FS) process is essential in the medical area as it reduces the effort and time needed for physicians to measure unnecessary features. Choosing useful variables is a difficult task with the presence of censoring which is the unique characteristic in survival analysis. Most survival FS methods depend on Cox’s proportional hazard model; however, machine learning techniques (MLT) are preferred but not commonly used due to censoring. Techniques that have been proposed to adopt MLT to perform FS with survival data cannot be used with the high level of censoring. The researcher’s previous publications proposed a technique to deal with the high level of censoring. It also used existing FS techniques to reduce dataset dimension. However, in this paper a new FS technique was proposed and combined with feature transformation and the proposed uncensoring approaches to select a reduced set of features and produce a stable predictive model. Methods In this paper, a FS technique based on artificial neural network (ANN) MLT is proposed to deal with highly censored Endovascular Aortic Repair (EVAR). Survival data EVAR datasets were collected during 2004 to 2010 from two vascular centers in order to produce a final stable model. They contain almost 91% of censored patients. The proposed approach used a wrapper FS method with ANN to select a reduced subset of features that predict the risk of EVAR re-intervention after 5 years to patients from two different centers located in the United Kingdom, to allow it to be potentially applied to cross-centers predictions. The proposed model is compared with the two popular FS techniques; Akaike and Bayesian information criteria (AIC, BIC) that are used with Cox’s model. Results The final model outperforms other methods in distinguishing the high and low risk groups; as they both have concordance index and estimated AUC better than the Cox’s model based on AIC, BIC, Lasso, and SCAD approaches. These models have p-values lower than 0.05, meaning that patients with different risk groups can be separated significantly and those who would need re-intervention can be correctly predicted. Conclusion The proposed approach will save time and effort made by physicians to collect unnecessary variables. The final reduced model was able to predict the long-term risk of aortic complications after EVAR. This predictive model can help clinicians decide patients’ future observation plan. Electronic supplementary material The online version of this article (doi:10.1186/s12911-017-0508-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Omneya Attallah
- School of Engineering and Applied Science, Aston University, B4 7ET, Birmingham, UK.,Department of Electronics and Communications, College of Engineering and Technology, Arab Academy for Science and Technology, Alexandria, Egypt
| | | | | | | | - Rob Sayers
- St George's Vascular Institute, St George's University Hospitals NHS Foundation Trust, Blackshaw Road, London, SW17 0QT, UK
| | - Matthew J Bown
- Vascular Surgery Group, University of Leicester, Leicester, UK
| | - Eddie C Choke
- Vascular Surgery Group, Robert Kilpatrick Clinical Sciences Building, Leicester Royal Infirmary, University of Leicester, Leicester, LE2 7LX, UK
| | - Xianghong Ma
- School of Engineering and Applied Science, Aston University, B4 7ET, Birmingham, UK.
| |
Collapse
|
18
|
Wang H, Li G. A Selective Review on Random Survival Forests for High Dimensional Data. QUANTITATIVE BIO-SCIENCE 2017; 36:85-96. [PMID: 30740388 PMCID: PMC6364686 DOI: 10.22283/qbs.2017.36.2.85] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Over the past decades, there has been considerable interest in applying statistical machine learning methods in survival analysis. Ensemble based approaches, especially random survival forests, have been developed in a variety of contexts due to their high precision and non-parametric nature. This article aims to provide a timely review on recent developments and applications of random survival forests for time-to-event data with high dimensional covariates. This selective review begins with an introduction to the random survival forest framework, followed by a survey of recent developments on splitting criteria, variable selection, and other advanced topics of random survival forests for time-to-event data in high dimensional settings. We also discuss potential research directions for future research.
Collapse
Affiliation(s)
- Hong Wang
- School of Mathematics and Statistics, Central South University, Hunan 410083, China
| | - Gang Li
- Department of Biostatistics and Biomathematics, School of Public Health, University of California at Los Angeles, CA 90095, USA
| |
Collapse
|
19
|
Goodman KE, Lessler J, Cosgrove SE, Harris AD, Lautenbach E, Han JH, Milstone AM, Massey CJ, Tamma PD. A Clinical Decision Tree to Predict Whether a Bacteremic Patient Is Infected With an Extended-Spectrum β-Lactamase-Producing Organism. Clin Infect Dis 2016; 63:896-903. [PMID: 27358356 DOI: 10.1093/cid/ciw425] [Citation(s) in RCA: 124] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 06/20/2016] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND Timely identification of extended-spectrum β-lactamase (ESBL) bacteremia can improve clinical outcomes while minimizing unnecessary use of broad-spectrum antibiotics, including carbapenems. However, most clinical microbiology laboratories currently require at least 24 additional hours from the time of microbial genus and species identification to confirm ESBL production. Our objective was to develop a user-friendly decision tree to predict which organisms are ESBL producing, to guide appropriate antibiotic therapy. METHODS We included patients ≥18 years of age with bacteremia due to Escherichia coli or Klebsiella species from October 2008 to March 2015 at Johns Hopkins Hospital. Isolates with ceftriaxone minimum inhibitory concentrations ≥2 µg/mL underwent ESBL confirmatory testing. Recursive partitioning was used to generate a decision tree to determine the likelihood that a bacteremic patient was infected with an ESBL producer. Discrimination of the original and cross-validated models was evaluated using receiver operating characteristic curves and by calculation of C-statistics. RESULTS A total of 1288 patients with bacteremia met eligibility criteria. For 194 patients (15%), bacteremia was due to a confirmed ESBL producer. The final classification tree for predicting ESBL-positive bacteremia included 5 predictors: history of ESBL colonization/infection, chronic indwelling vascular hardware, age ≥43 years, recent hospitalization in an ESBL high-burden region, and ≥6 days of antibiotic exposure in the prior 6 months. The decision tree's positive and negative predictive values were 90.8% and 91.9%, respectively. CONCLUSIONS Our findings suggest that a clinical decision tree can be used to estimate a bacteremic patient's likelihood of infection with ESBL-producing bacteria. Recursive partitioning offers a practical, user-friendly approach for addressing important diagnostic questions.
Collapse
Affiliation(s)
| | - Justin Lessler
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health
| | - Sara E Cosgrove
- Department of Medicine, Division of Infectious Diseases, Johns Hopkins University School of Medicine
| | - Anthony D Harris
- Department of Epidemiology and Public Health, University of Maryland School of Medicine, Baltimore
| | - Ebbing Lautenbach
- Department of Medicine, Division of Infectious Diseases, University of Pennsylvania School of Medicine, Philadelphia
| | - Jennifer H Han
- Department of Medicine, Division of Infectious Diseases, University of Pennsylvania School of Medicine, Philadelphia
| | | | - Colin J Massey
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | | | | |
Collapse
|
20
|
Shi M, He J. ColoFinder: a prognostic 9-gene signature improves prognosis for 871 stage II and III colorectal cancer patients. PeerJ 2016; 4:e1804. [PMID: 26989635 PMCID: PMC4793313 DOI: 10.7717/peerj.1804] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2015] [Accepted: 02/23/2016] [Indexed: 12/24/2022] Open
Abstract
Colorectal cancer (CRC) is a heterogeneous disease with a high mortality rate and is still lacking an effective treatment. Our goal is to develop a robust prognosis model for predicting the prognosis in CRC patients. In this study, 871 stage II and III CRC samples were collected from six gene expression profilings. ColoFinder was developed using a 9-gene signature based Random Survival Forest (RSF) prognosis model. The 9-gene signature recurrence score was derived with a 5-fold cross validation to test the association with relapse-free survival, and the value of AUC was gained with 0.87 in GSE39582(95% CI [0.83-0.91]). The low-risk group had a significantly better relapse-free survival (HR, 14.8; 95% CI [8.17-26.8]; P < 0.001) than the high-risk group. We also found that the 9-gene signature recurrence score contributed more information about recurrence than standard clinical and pathological variables in univariate and multivariate Cox analyses when applied to GSE17536(p = 0.03 and p = 0.01 respectively). Furthermore, ColoFinder improved the predictive ability and better stratified the risk subgroups when applied to CRC gene expression datasets GSE14333, GSE17537, GSE12945and GSE24551. In summary, ColoFinder significantly improves the risk assessment in stage II and III CRC patients. The 9-gene prognostic classifier informs patient prognosis and treatment response.
Collapse
Affiliation(s)
- Mingguang Shi
- School of Electric Engineering and Automation, Hefei University of Technology , Hefei, Anhui , China
| | - Jianmin He
- School of Management, Hefei University of Technology , Hefei, Anhui , China
| |
Collapse
|
21
|
Jing GJ, Zhang Z, Wang HQ, Zheng HM. Mining gene link information for survival pathway hunting. IET Syst Biol 2015; 9:147-54. [PMID: 26243831 DOI: 10.1049/iet-syb.2014.0048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
This study proposes a gene link-based method for survival time-related pathway hunting. In this method, the authors incorporate gene link information to estimate how a pathway is associated with cancer patient's survival time. Specifically, a gene link-based Cox proportional hazard model (Link-Cox) is established, in which two linked genes are considered together to represent a link variable and the association of the link with survival time is assessed using Cox proportional hazard model. On the basis of the Link-Cox model, the authors formulate a new statistic for measuring the association of a pathway with survival time of cancer patients, referred to as pathway survival score (PSS), by summarising survival significance over all the gene links in the pathway, and devise a permutation test to test the significance of an observed PSS. To evaluate the proposed method, the authors applied it to simulation data and two publicly available real-world gene expression data sets. Extensive comparisons with previous methods show the effectiveness and efficiency of the proposed method for survival pathway hunting.
Collapse
Affiliation(s)
- Gao-Jian Jing
- School of Mechanical and Automotive Engineering, Hefei University of Technology, Hefei, People's Republic of China
| | - Zirui Zhang
- School of Mechanical and Automotive Engineering, Hefei University of Technology, Hefei, People's Republic of China
| | - Hong-Qiang Wang
- Machine Intelligence & Computational Biology Lab, Institute of Intelligent Machines, Chinese Academy of Sciences, P.O. Box 1130, Hefei, Anhui 230031, People's Republic of China
| | - Hong-Mei Zheng
- School of Mechanical and Automotive Engineering, Hefei University of Technology, Hefei, People's Republic of China.
| |
Collapse
|
22
|
Statistical and Computational Methods for Genetic Diseases: An Overview. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:954598. [PMID: 26106440 PMCID: PMC4464008 DOI: 10.1155/2015/954598] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 04/23/2015] [Indexed: 12/19/2022]
Abstract
The identification of causes of genetic diseases has been carried out by several approaches with increasing complexity. Innovation of genetic methodologies leads to the production of large amounts of data that needs the support of statistical and computational methods to be correctly processed. The aim of the paper is to provide an overview of statistical and computational methods paying attention to methods for the sequence analysis and complex diseases.
Collapse
|
23
|
Boulesteix AL, Janitza S, Hapfelmeier A, Van Steen K, Strobl C. Letter to the Editor: On the term 'interaction' and related phrases in the literature on Random Forests. Brief Bioinform 2014; 16:338-45. [PMID: 24723569 PMCID: PMC4364067 DOI: 10.1093/bib/bbu012] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
In an interesting and quite exhaustive review on Random Forests (RF) methodology in bioinformatics Touw et al. address—among other topics—the problem of the detection of interactions between variables based on RF methodology. We feel that some important statistical concepts, such as ‘interaction’, ‘conditional dependence’ or ‘correlation’, are sometimes employed inconsistently in the bioinformatics literature in general and in the literature on RF in particular. In this letter to the Editor, we aim to clarify some of the central statistical concepts and point out some confusing interpretations concerning RF given by Touw et al. and other authors.
Collapse
|
24
|
A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology. BIOMED RESEARCH INTERNATIONAL 2013; 2013:432375. [PMID: 24228248 PMCID: PMC3818807 DOI: 10.1155/2013/432375] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 08/26/2013] [Accepted: 08/27/2013] [Indexed: 01/04/2023]
Abstract
Recently, the greatest statistical computational challenge in genetic epidemiology is to identify and characterize the genes that interact with other genes and environment factors that bring the effect on complex multifactorial disease. These gene-gene interactions are also denoted as epitasis in which this phenomenon cannot be solved by traditional statistical method due to the high dimensionality of the data and the occurrence of multiple polymorphism. Hence, there are several machine learning methods to solve such problems by identifying such susceptibility gene which are neural networks (NNs), support vector machine (SVM), and random forests (RFs) in such common and multifactorial disease. This paper gives an overview on machine learning methods, describing the methodology of each machine learning methods and its application in detecting gene-gene and gene-environment interactions. Lastly, this paper discussed each machine learning method and presents the strengths and weaknesses of each machine learning method in detecting gene-gene interactions in complex human disease.
Collapse
|
25
|
Müller SA, Mehrabi A, Rahbari NN, Warschkow R, Elbers H, Leowardi C, Fonouni H, Tarantino I, Schemmer P, Schmied BM, Büchler MW. Allogeneic Blood Transfusion Does Not Affect Outcome After Curative Resection for Advanced Cholangiocarcinoma. Ann Surg Oncol 2013; 21:155-64. [DOI: 10.1245/s10434-013-3226-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2013] [Indexed: 01/04/2023]
|