1
|
Legault LM, Dupas T, Breton-Larrivée M, Filion-Bienvenue F, Lemieux A, Langford-Avelar A, McGraw S. Sex-specific DNA methylation and gene expression changes in mouse placentas after early preimplantation alcohol exposure. ENVIRONMENT INTERNATIONAL 2024; 192:109014. [PMID: 39321537 DOI: 10.1016/j.envint.2024.109014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 09/13/2024] [Accepted: 09/14/2024] [Indexed: 09/27/2024]
Abstract
During pregnancy, exposure to alcohol represents an environmental insult capable of negatively impacting embryonic development. This influence can stem from disruption of molecular profiles, ultimately leading to manifestation of fetal alcohol spectrum disorder. Despite the central role of the placenta in proper embryonic development and successful pregnancy, studies on the placenta in a prenatal alcohol exposure and fetal alcohol spectrum disorder context are markedly lacking. Here, we employed a well-established model for preimplantation alcohol exposure, specifically targeting embryonic day 2.5, corresponding to the 8-cell stage. The exposure was administered to pregnant C57BL/6 female mice through subcutaneous injection, involving two doses of either 2.5 g/kg 50 % ethanol or an equivalent volume of saline at 2-hour intervals. Morphology, DNA methylation and gene expression patterns were assessed in male and female late-gestation (E18.5) placentas. While overall placental morphology was not altered, we found a significant decrease in male ethanol-exposed embryo weights. When looking at molecular profiles, we uncovered numerous differentially methylated regions (DMRs; 991 in males; 1309 in females) and differentially expressed genes (DEGs; 1046 in males; 340 in females) in the placentas. Remarkably, only 21 DMRs and 54 DEGs were common to both sexes, which were enriched for genes involved in growth factor response pathways. Preimplantation alcohol exposure had a greater impact on imprinted genes expression in male placentas (imprinted DEGs: 18 in males; 1 in females). Finally, by using machine learning model (L1 regularization), we were able to precisely discriminate control and ethanol-exposed placentas based on their specific DNA methylation patterns. This is the first study demonstrating that preimplantation alcohol exposure alters the DNA methylation and transcriptomic profiles of late-gestation placentas in a sex-specific manner. Our findings highlight that the DNA methylation profiles of the placenta could serve as a potent predictive molecular signature for early preimplantation alcohol exposure.
Collapse
Affiliation(s)
- Lisa-Marie Legault
- CHU Ste-Justine Azrieli Research Center, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada; Department of Biochemistry and Molecular Medicine, Université de Montréal, 2900 Boulevard Edouard‑Montpetit, Montréal, QC H3T 1J4, Canada.
| | - Thomas Dupas
- CHU Ste-Justine Azrieli Research Center, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada; Department of Obstetrics and Gynecology, Université de Montréal, 2900 Boulevard Edouard‑Montpetit, Montréal, QC H3T 1J4, Canada.
| | - Mélanie Breton-Larrivée
- CHU Ste-Justine Azrieli Research Center, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada; Department of Biochemistry and Molecular Medicine, Université de Montréal, 2900 Boulevard Edouard‑Montpetit, Montréal, QC H3T 1J4, Canada.
| | - Fannie Filion-Bienvenue
- CHU Ste-Justine Azrieli Research Center, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada; Department of Biochemistry and Molecular Medicine, Université de Montréal, 2900 Boulevard Edouard‑Montpetit, Montréal, QC H3T 1J4, Canada.
| | - Anthony Lemieux
- CHU Ste-Justine Azrieli Research Center, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada.
| | - Alexandra Langford-Avelar
- CHU Ste-Justine Azrieli Research Center, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada; Department of Biochemistry and Molecular Medicine, Université de Montréal, 2900 Boulevard Edouard‑Montpetit, Montréal, QC H3T 1J4, Canada.
| | - Serge McGraw
- CHU Ste-Justine Azrieli Research Center, 3175 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5, Canada; Department of Biochemistry and Molecular Medicine, Université de Montréal, 2900 Boulevard Edouard‑Montpetit, Montréal, QC H3T 1J4, Canada; Department of Obstetrics and Gynecology, Université de Montréal, 2900 Boulevard Edouard‑Montpetit, Montréal, QC H3T 1J4, Canada.
| |
Collapse
|
2
|
Lu Y, Cao N, Zhao M, Zhang G, Zhang Q, Wang L. Importance of CD8 Tex cell-associated gene signatures in the prognosis and immunology of osteosarcoma. Sci Rep 2024; 14:9769. [PMID: 38684858 PMCID: PMC11058769 DOI: 10.1038/s41598-024-60539-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 04/24/2024] [Indexed: 05/02/2024] Open
Abstract
As a highly aggressive bone malignancy, osteosarcoma poses a significant therapeutic challenge, especially in the setting of metastasis or recurrence. This study aimed to investigate the potential of CD8-Tex cell-associated genes as prognostic biomarkers to reveal the immunogenomic profile of osteosarcoma and guide therapeutic decisions. mRNA expression data and clinical details of osteosarcoma patients were obtained from the TCGA database (TARGET-OS dataset). The GSE21257 dataset (from the GEO database) was used as an external validation set to provide additional information on osteosarcoma specimens. 84 samples from the TARGET-OS dataset were used as the training set, and 53 samples from the GSE21257 dataset served as the external validation cohort. Univariate Cox regression analysis was utilized to identify CD8 Tex cell genes associated with prognosis. The LASSO algorithm was performed for 1000 iterations to select the best subset to form the CD8 Tex cell gene signature (TRS). Final genes were identified using the multivariate Cox regression model of the LASSO algorithm. Risk scores were calculated to categorize patients into high- and low-risk groups, and clinical differences were explored by Kaplan-Meier survival analysis to assess model performance. Prediction maps were constructed to estimate 1-, 3-, and 5 year survival rates for osteosarcoma patients, including risk scores for CD8 Texcell gene markers and clinicopathologic factors. The ssGSEA algorithm was used to assess the differences in immune function between TRS-defined high- and low-risk groups. TME and immune cell infiltration were further assessed using the ESTIMATE and CIBERSORT algorithms. To explore the relationship between immune checkpoint gene expression levels and the two risk-defined groups. A CD8 Tex cell-associated gene signature was extracted from the TISCH database and prognostic markers including two genes were developed. The high-risk group showed lower survival, and model performance was validated by ROC curves and C-index. Predictive plots were constructed to demonstrate survival estimates, combining CD8 Tex cell gene markers and clinical factors. This study provides valuable insights into the molecular and immune characteristics of osteosarcoma and offers potential avenues for advances in therapeutic approaches.
Collapse
Affiliation(s)
- Yining Lu
- Department of Orthopedic Research Center, The Third Hospital of Hebei Medical University, Shijiazhuang, Hebei, People's Republic of China
- Department of Orthopedic Oncology, The Third Hospital of Hebei Medical University, Shijiazhuang, Hebei, People's Republic of China
| | - Nana Cao
- Blood Transfusion Department of the Fourth Hospital of Hebei Medical University, Shijiazhuang, Hebei, People's Republic of China
| | - Ming Zhao
- Department of Orthopedic Oncology, The Third Hospital of Hebei Medical University, Shijiazhuang, Hebei, People's Republic of China
| | - Guochuan Zhang
- Department of Orthopedic Oncology, The Third Hospital of Hebei Medical University, Shijiazhuang, Hebei, People's Republic of China
| | - Qi Zhang
- Department of Orthopedic Research Center, The Third Hospital of Hebei Medical University, Shijiazhuang, Hebei, People's Republic of China.
| | - Ling Wang
- Department of Orthopedic Research Center, The Third Hospital of Hebei Medical University, Shijiazhuang, Hebei, People's Republic of China.
- Department of Orthopedic Oncology, The Third Hospital of Hebei Medical University, Shijiazhuang, Hebei, People's Republic of China.
| |
Collapse
|
3
|
Yang Y, McMahan CS, Wang YB, Ouyang Y. Estimation of l0 Norm Penalized Models: A Statistical Treatment. Comput Stat Data Anal 2024; 192:107902. [PMID: 38222104 PMCID: PMC10785287 DOI: 10.1016/j.csda.2023.107902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2024]
Abstract
Fitting penalized models for the purpose of merging the estimation and model selection problem has become commonplace in statistical practice. Of the various regularization strategies that can be leveraged to this end, the use of the l 0 norm to penalize parameter estimation poses the most daunting model fitting task. In fact, this particular strategy requires an end user to solve a non-convex NP-hard optimization problem irregardless of the underlying data model. For this reason, the use of the l 0 norm as a regularization strategy has been woefully under utilized. To obviate this difficulty, a strategy to solve such problems that is generally accessible by the statistical community is developed. The approach can be adopted to solve l 0 norm penalized problems across a very broad class of models, can be implemented using existing software, and is computationally efficient. The performance of the method is demonstrated through in-depth numerical experiments and through using it to analyze several prototypical data sets.
Collapse
Affiliation(s)
- Yuan Yang
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Christopher S McMahan
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Yu-Bo Wang
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| | - Yuyuan Ouyang
- School of Mathematical and Statistical Sciences, Clemson University, Clemson, 29634, SC, U.S.A
| |
Collapse
|
4
|
Li P, Xu Q, Liu K, Ye J. CRYL1 is a Potential Prognostic Biomarker of Clear Cell Renal Cell Carcinoma Correlated with Immune Infiltration and Cuproptosis. Technol Cancer Res Treat 2024; 23:15330338241237439. [PMID: 38497139 PMCID: PMC10946081 DOI: 10.1177/15330338241237439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 01/18/2024] [Accepted: 02/06/2024] [Indexed: 03/19/2024] Open
Abstract
BACKGROUND Clear cell renal cell carcinoma (ccRCC) is a widespread urogenital neoplasm. However, the therapeutic efficacy of these methods is unsatisfactory. In-depth screening of biomarkers could aid early diagnosis and therapy and predict patient prognosis. METHODS The GEO datasets were selected with specific criteria. Differentially expressed gene (DEG), weighted gene coexpression network analysis (WGCNA), protein-protein interaction, LASSO, random forest, and Cox regression analyses were applied to identify the independent prognostic biomarkers. Survival analysis, correlation with clinical features, gene set enrichment analysis (GSEA), GO enrichment, immune infiltration analysis, and correlation with cuproptosis-related genes were carried out to determine the prognostic value and possible molecular mechanisms of the TSVR. Wound healing assays, transwell assays, cell colony formation experiments, flow cytometry, and immunohistochemistry (IHC) analysis were used to validate the functional attributes of CRYL1. RESULTS Four GEO datasets were included to screen for hub genes. DEG combined with WGCNA showed a key module with 300 genes having the strongest correlation with "survival state" (R2 = -0.24 and P = 7e-8); six genes were identified by LASSO, random forest, and Cytoscape. Finally, CRYL1 (hazard ratio (HR) = 2.01, P < 0.001) was selected as an independent prognostic biomarker. The higher CRYL1 expression group had better DFS and overall survival (OS). GSEA demonstrated that the CRYL1-related DEGs were enriched mainly in the metabolism of sugar, fat, and amino acids. CRYL1 is positively correlated with FDX1 and the LIAS pathway, which are important molecule involved in cuproptosis. CRYL1 affects the infiltration abundance of four immune cells and can predict a positive OS. Wound healing, transwell, cell colony formation, and flow cytometry assays demonstrated that CRYL1 silencing enhances migration and proliferation and leads to a decreased apoptotic ratio. IHC analysis suggested that CRYL1 was highly expressed in adjacent tissues. CONCLUSIONS CRYL1 is a robust predictive marker for clinicopathological characteristics and survival status in ccRCC patients.
Collapse
Affiliation(s)
- Peng Li
- The 6th affiliated hospital of wenzhou medical university, Lishui city people's hospital, Lishui, Zhejiang, China
- Lishui city people's hospital, The first affiliated hospital of Lishui University, Lishui, Zhejiang, China
| | - Qiangqiang Xu
- The 6th affiliated hospital of wenzhou medical university, Lishui city people's hospital, Lishui, Zhejiang, China
- Lishui city people's hospital, The first affiliated hospital of Lishui University, Lishui, Zhejiang, China
| | - Ken Liu
- The 6th affiliated hospital of wenzhou medical university, Lishui city people's hospital, Lishui, Zhejiang, China
- Lishui city people's hospital, The first affiliated hospital of Lishui University, Lishui, Zhejiang, China
| | - Junjie Ye
- The 6th affiliated hospital of wenzhou medical university, Lishui city people's hospital, Lishui, Zhejiang, China
- Lishui city people's hospital, The first affiliated hospital of Lishui University, Lishui, Zhejiang, China
| |
Collapse
|
5
|
Jin Y, Terhorst J. The solution surface of the Li-Stephens haplotype copying model. Algorithms Mol Biol 2023; 18:12. [PMID: 37559098 PMCID: PMC10410957 DOI: 10.1186/s13015-023-00237-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/30/2023] [Indexed: 08/11/2023] Open
Abstract
The Li-Stephens (LS) haplotype copying model forms the basis of a number of important statistical inference procedures in genetics. LS is a probabilistic generative model which supposes that a sampled chromosome is an imperfect mosaic of other chromosomes found in a population. In the frequentist setting which is the focus of this paper, the output of LS is a "copying path" through chromosome space. The behavior of LS depends crucially on two user-specified parameters, [Formula: see text] and [Formula: see text], which are respectively interpreted as the rates of mutation and recombination. However, because LS is not based on a realistic model of ancestry, the precise connection between these parameters and the biological phenomena they represent is unclear. Here, we offer an alternative perspective, which considers [Formula: see text] and [Formula: see text] as tuning parameters, and seeks to understand their impact on the LS output. We derive an algorithm which, for a given dataset, efficiently partitions the [Formula: see text] plane into regions where the output of the algorithm is constant, thereby enumerating all possible solutions to the LS model in one go. We extend this approach to the "diploid LS" model commonly used for phasing. We demonstrate the usefulness of our method by studying the effects of changing [Formula: see text] and [Formula: see text] when using LS for common bioinformatic tasks. Our findings indicate that using the conventional (i.e., population-scaled) values for [Formula: see text] and [Formula: see text] produces near optimal results for imputation, but may systematically inflate switch error in the case of phasing diploid genotypes.
Collapse
Affiliation(s)
- Yifan Jin
- Department of Statistics, University of Michigan, 1085 South University Avenue, Ann Arbor, MI, 48103, USA
| | - Jonathan Terhorst
- Department of Statistics, University of Michigan, 1085 South University Avenue, Ann Arbor, MI, 48103, USA.
| |
Collapse
|
6
|
Duan M, Wang Y, Zhao D, Liu H, Zhang G, Li K, Zhang H, Huang L, Zhang R, Zhou F. Orchestrating information across tissues via a novel multitask GAT framework to improve quantitative gene regulation relation modeling for survival analysis. Brief Bioinform 2023; 24:bbad238. [PMID: 37427963 DOI: 10.1093/bib/bbad238] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 05/29/2023] [Accepted: 06/08/2023] [Indexed: 07/11/2023] Open
Abstract
Survival analysis is critical to cancer prognosis estimation. High-throughput technologies facilitate the increase in the dimension of genic features, but the number of clinical samples in cohorts is relatively small due to various reasons, including difficulties in participant recruitment and high data-generation costs. Transcriptome is one of the most abundantly available OMIC (referring to the high-throughput data, including genomic, transcriptomic, proteomic and epigenomic) data types. This study introduced a multitask graph attention network (GAT) framework DQSurv for the survival analysis task. We first used a large dataset of healthy tissue samples to pretrain the GAT-based HealthModel for the quantitative measurement of the gene regulatory relations. The multitask survival analysis framework DQSurv used the idea of transfer learning to initiate the GAT model with the pretrained HealthModel and further fine-tuned this model using two tasks i.e. the main task of survival analysis and the auxiliary task of gene expression prediction. This refined GAT was denoted as DiseaseModel. We fused the original transcriptomic features with the difference vector between the latent features encoded by the HealthModel and DiseaseModel for the final task of survival analysis. The proposed DQSurv model stably outperformed the existing models for the survival analysis of 10 benchmark cancer types and an independent dataset. The ablation study also supported the necessity of the main modules. We released the codes and the pretrained HealthModel to facilitate the feature encodings and survival analysis of transcriptome-based future studies, especially on small datasets. The model and the code are available at http://www.healthinformaticslab.org/supp/.
Collapse
Affiliation(s)
- Meiyu Duan
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Yueying Wang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Dong Zhao
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
| | - Hongmei Liu
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| | - Gongyou Zhang
- School of Biology and Engineering, and Engineering Research Center of Medical Biotechnology, Guizhou Medical University, Guiyang, Guizhou 550025, China
| | - Kewei Li
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Haotian Zhang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
| | - Lan Huang
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| | - Ruochi Zhang
- School of Artificial Intelligence, Jilin University, Changchun, China, 130012
| | - Fengfeng Zhou
- College of Computer Science and Technology, Jilin University, Changchun, Jilin, China, 130012
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, China, 130012
| |
Collapse
|
7
|
Improving the Post-Operative Prediction of BCR-Free Survival Time with mRNA Variables and Machine Learning. Cancers (Basel) 2023; 15:cancers15041276. [PMID: 36831619 PMCID: PMC9954694 DOI: 10.3390/cancers15041276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 02/11/2023] [Accepted: 02/13/2023] [Indexed: 02/19/2023] Open
Abstract
Predicting the risk of, and time to biochemical recurrence (BCR) in prostate cancer patients post-operatively is critical in patient treatment decision pathways following surgical intervention. This study aimed to investigate the predictive potential of mRNA information to improve upon reference nomograms and clinical-only models, using a dataset of 187 patients that includes over 20,000 features. Several machine learning methodologies were implemented for the analysis of censored patient follow-up information with such high-dimensional genomic data. Our findings demonstrated the potential of inclusion of mRNA information for BCR-free survival prediction. A random survival forest pipeline was found to achieve high predictive performance with respect to discrimination, calibration, and net benefit. Two mRNA variables, namely ESM1 and DHAH8, were identified as consistently strong predictors with this dataset.
Collapse
|
8
|
Chen J, Chen S, Li B, Zhou S, Lin H. A pyroptosis-related signature predicts prognosis and indicates immune microenvironment infiltration in glioma. Cancer Med 2023; 12:5071-5087. [PMID: 36161280 PMCID: PMC9972150 DOI: 10.1002/cam4.5247] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2022] [Revised: 08/08/2022] [Accepted: 09/01/2022] [Indexed: 02/05/2023] Open
Abstract
BACKGROUND Glioma, the most common malignant brain tumor, leads to high recurrence rates and disabilities in patients. Pyroptosis is an inflammasomes-induced programmed cell death in response to infection or chemotherapy. However, the role of pyroptosis in glioma has not yet been elucidated. METHODS RNA-seq data and clinical information of 660 gliomas and 847 samples were downloaded from the TCGA and CGGA, respectively. Then, data of 104 normal brain tissues was retrieved from the GTEx for differential expression analysis. Twelve pairs of peritumoral tissue and glioma samples were used for validation. Gene alteration status of differentially expressed pyroptosis-related regulators in gliomas was detected in cBioPortal algorithm. Consensus clustering was employed to classify gliomas based on differentially expressed pyroptosis-related regulators. Subsequently, a PS-signature was constructed using LASSO-congressional analysis for clinical application. The immune infiltration of glioma microenvironment (TME) was explored using ESTIMATE, CIBERSORT, and the other immune signatures. RESULTS cBioPortal algorithm revealed alteration of these regulators was correlated to better prognosis of gliomas. Then, our study showed that pyroptosis-related regulators can be used to sort out patients into two clusters with distinct prognostic outcome and immune status. Moreover, a PS-signature for predicting the prognosis of glioma patients was developed based on the identified subtypes. The high PS-score group showed more abundant inflammatory cell infiltration and stronger immune response, but with poorer prognosis of gliomas. CONCLUSION The findings of this study provide a therapeutic basis for future research on pyroptosis and unravel the relationship between pyroptosis and glioma prognosis. The risk signature can be utilized as a prognostic biomarker for glioma.
Collapse
Affiliation(s)
- Jia Chen
- The Fourth People's Hospital of ChengduChengduChina
- The Clinical Hospital of Chengdu Brain Science InstituteMOE Key Lab for Neuroinformation, University of Electronic Science and Technology of ChinaChengduChina
| | - Shanwei Chen
- Department of Neurosurgery, Guangdong Provincial People's Hospital, Guangdong Academy of Medical SciencesGuangzhouChina
- Shantou University Medical CollegeShantouChina
| | - Bingxian Li
- Department of Neurology, Shantou Central HospitalShantouChina
| | - Shaojiong Zhou
- Department of Neurology, Shantou Central HospitalShantouChina
| | - Han Lin
- Department of Neurosurgery, Beijing Tiantan HospitalCapital Medical UniversityBeijingChina
| |
Collapse
|
9
|
Hassan AM, Biaggi-Ondina A, Rajesh A, Asaad M, Nelson JA, Coert JH, Mehrara BJ, Butler CE. Predicting Patient-Reported Outcomes Following Surgery Using Machine Learning. Am Surg 2023; 89:31-35. [PMID: 35722685 PMCID: PMC9759616 DOI: 10.1177/00031348221109478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Patient-reported outcomes (PROs) enable providers to identify differences in treatment effectiveness, postoperative recovery, quality of life, and patient satisfaction. By allowing a shift from disease-specific factors to the patient perspective, PROs provide a tailored patient-centric approach to shared decision-making. Artificial intelligence (AI) and machine learning (ML) techniques can facilitate such shared decision-making and improve patient outcomes by accurate prediction of PROs. This article aims to provide a comprehensive review of the use of AI and ML models in predicting PROs following surgery through an overview of common predictive algorithms and modeling techniques, as well as current applications and limitations in the surgical field.
Collapse
Affiliation(s)
- Abbas M. Hassan
- Department of Plastic and Reconstructive Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Andrea Biaggi-Ondina
- Department of Plastic and Reconstructive Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Aashish Rajesh
- Department of Surgery, University of Texas Health Science Center, San Antonio, TX, USA
| | - Malke Asaad
- Department of Plastic Surgery, University of Pittsburgh Medical Center, Pittsburgh, PA, USA
| | - Jonas A. Nelson
- Department of Plastic & Reconstructive Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - J Henk Coert
- Department of Plastic and Reconstructive Surgery, University Medical Center Utrecht, Utrecht, Netherlands
| | - Babak J. Mehrara
- Department of Plastic & Reconstructive Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Charles E. Butler
- Department of Plastic and Reconstructive Surgery, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
10
|
Ai N, Yang Z, Yuan H, Ouyang D, Miao R, Ji Y, Liang Y. A distributed sparse logistic regression with $$L_{1/2}$$ regularization for microarray biomarker discovery in cancer classification. Soft comput 2022. [DOI: 10.1007/s00500-022-07551-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
11
|
Yang S, Zhou X. PGS-server: accuracy, robustness and transferability of polygenic score methods for biobank scale studies. Brief Bioinform 2022; 23:6534383. [PMID: 35193147 DOI: 10.1093/bib/bbac039] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Revised: 12/29/2021] [Accepted: 01/26/2022] [Indexed: 01/02/2023] Open
Abstract
Polygenic scores (PGS) are important tools for carrying out genetic prediction of common diseases and disease related complex traits, facilitating the development of precision medicine. Unfortunately, despite the critical importance of PGS and the vast number of PGS methods recently developed, few comprehensive comparison studies have been performed to evaluate the effectiveness of PGS methods. To fill this critical knowledge gap, we performed a comprehensive comparison study on 12 different PGS methods through internal evaluations on 25 quantitative and 25 binary traits within the UK Biobank with sample sizes ranging from 147 408 to 336 573, and through external evaluations via 25 cross-study and 112 cross-ancestry analyses on summary statistics from multiple genome-wide association studies with sample sizes ranging from 1415 to 329 345. We evaluate the prediction accuracy, computational scalability, as well as robustness and transferability of different PGS methods across datasets and/or genetic ancestries, providing important guidelines for practitioners in choosing PGS methods. Besides method comparison, we present a simple aggregation strategy that combines multiple PGS from different methods to take advantage of their distinct benefits to achieve stable and superior prediction performance. To facilitate future applications of PGS, we also develop a PGS webserver (http://www.pgs-server.com/) that allows users to upload summary statistics and choose different PGS methods to fit the data directly. We hope that our results, method and webserver will facilitate the routine application of PGS across different research areas.
Collapse
Affiliation(s)
- Sheng Yang
- Department of Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Xiang Zhou
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, MI 48109, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
12
|
Bertrand F, Maumy-Bertrand M. Fitting and Cross-Validating Cox Models to Censored Big Data With Missing Values Using Extensions of Partial Least Squares Regression Models. Front Big Data 2021; 4:684794. [PMID: 34790895 PMCID: PMC8591675 DOI: 10.3389/fdata.2021.684794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 10/07/2021] [Indexed: 11/22/2022] Open
Abstract
Fitting Cox models in a big data context -on a massive scale in terms of volume, intensity, and complexity exceeding the capacity of usual analytic tools-is often challenging. If some data are missing, it is even more difficult. We proposed algorithms that were able to fit Cox models in high dimensional settings using extensions of partial least squares regression to the Cox models. Some of them were able to cope with missing data. We were recently able to extend our most recent algorithms to big data, thus allowing to fit Cox model for big data with missing values. When cross-validating standard or extended Cox models, the commonly used criterion is the cross-validated partial loglikelihood using a naive or a van Houwelingen scheme -to make efficient use of the death times of the left out data in relation to the death times of all the data. Quite astonishingly, we will show, using a strong simulation study involving three different data simulation algorithms, that these two cross-validation methods fail with the extensions, either straightforward or more involved ones, of partial least squares regression to the Cox model. This is quite an interesting result for at least two reasons. Firstly, several nice features of PLS based models, including regularization, interpretability of the components, missing data support, data visualization thanks to biplots of individuals and variables -and even parsimony or group parsimony for Sparse partial least squares or sparse group SPLS based models, account for a common use of these extensions by statisticians who usually select their hyperparameters using cross-validation. Secondly, they are almost always featured in benchmarking studies to assess the performance of a new estimation technique used in a high dimensional or big data context and often show poor statistical properties. We carried out a vast simulation study to evaluate more than a dozen of potential cross-validation criteria, either AUC or prediction error based. Several of them lead to the selection of a reasonable number of components. Using these newly found cross-validation criteria to fit extensions of partial least squares regression to the Cox model, we performed a benchmark reanalysis that showed enhanced performances of these techniques. In addition, we proposed sparse group extensions of our algorithms and defined a new robust measure based on the Schmid score and the R coefficient of determination for least absolute deviation: the integrated R Schmid Score weighted. The R-package used in this article is available on the CRAN, http://cran.r-project.org/web/packages/plsRcox/index.html. The R package bigPLS will soon be available on the CRAN and, until then, is available on Github https://github.com/fbertran/bigPLS.
Collapse
Affiliation(s)
- Frédéric Bertrand
- LIST3N, Université de Technologie de Troyes, Troyes, France
- IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France
| | - Myriam Maumy-Bertrand
- LIST3N, Université de Technologie de Troyes, Troyes, France
- IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, Strasbourg, France
| |
Collapse
|
13
|
Zhang Z, Shen Z, Wang H, Ng SK. A fast adaptive Lasso for the cox regression via safe screening rules. J STAT COMPUT SIM 2021. [DOI: 10.1080/00949655.2021.1914043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Affiliation(s)
- Zhuan Zhang
- School of Mathematics and Statistics, Central South University, Changsha, People's Republic of China
| | - Zhenyuan Shen
- School of Mathematics and Statistics, Central South University, Changsha, People's Republic of China
| | - Hong Wang
- School of Mathematics and Statistics, Central South University, Changsha, People's Republic of China
| | - Shu Kay Ng
- School of Medicine, Menzies Health Institute Queensland, Griffith University, Nathan, Australia
| |
Collapse
|
14
|
Affiliation(s)
- Rahim Alhamzawi
- Department of Statistics, University of Al-Qadisiyah, Al Diwaniyah, Iraq
| |
Collapse
|
15
|
Rothschild CW, Richardson BA, Guthrie BL, Kithao P, Omurwa T, Mukabi J, Lokken EM, John-Stewart G, Unger JA, Kinuthia J, Drake AL. A risk scoring tool for predicting Kenyan women at high risk of contraceptive discontinuation. Contracept X 2020; 2:100045. [PMID: 33294838 PMCID: PMC7683324 DOI: 10.1016/j.conx.2020.100045] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2020] [Revised: 10/12/2020] [Accepted: 10/13/2020] [Indexed: 11/16/2022] Open
Abstract
Objective We developed and validated a pragmatic risk assessment tool for identifying contraceptive discontinuation among Kenyan women who do not desire pregnancy. Study design Within a prospective cohort of contraceptive users, participants were randomly allocated to derivation (n = 558) and validation (n = 186) cohorts. Risk scores were developed by selecting the Cox proportional hazards model with the minimum Akaike information criterion. Predictive performance was evaluated using time-dependent receiver operating characteristic curves and area under the curve (AUC). Results The overall contraceptive discontinuation rate was 36.9 per 100 woman-years (95% confidence interval [CI] 30.3–44.9). The predictors of discontinuation selected for the risk score included use of a short-term method or copper intrauterine device (vs. injectable or implant), method continuation or switch (vs. initiation), < 9 years of completed education, not having a child aged < 6 months, and having no spouse or a spouse supportive of family planning (vs. having a spouse who has unsupportive or uncertain attitudes towards family planning). AUC at 24 weeks was 0.76 (95% CI 0.64–0.87) with 70.0% sensitivity and 78.6% specificity at the optimal cut point in the derivation cohort. Discontinuation was 3.8-fold higher among high- vs. low-risk women (95% CI 2.33–6.30). AUC was 0.68 (95% CI 0.47–0.90) in the validation cohort. A simplified score comprising routinely collected variables demonstrated similar performance (derivation-AUC: 0.73 [95% CI 0.60–0.85]; validation-AUC: 0.73 [95% CI 0.51–0.94]). Positive predictive value in the derivation cohort was 31.4% for the full and 28.1% for the simplified score. Conclusions The risk scores demonstrated moderate predictive ability but identified large proportions of women as high risk. Future research is needed to improve sensitivity and specificity of a clinical tool to identify women at high risk for experiencing method-related challenges. Implications Contraceptive discontinuation is a major driver of unmet contraceptive need globally. Few tools exist for identifying women who may benefit most from additional support in order to meet their contraceptive needs and preferences. This study developed and assessed the validity of a provider-focused risk prediction tool for contraceptive discontinuation among Kenyan women using modern contraception. High rates of early discontinuation observed in this study emphasize the necessity of investing in efforts to develop new contraceptive technologies and stronger delivery systems to better align with women's needs and preferences for voluntary family planning.
Collapse
Affiliation(s)
| | - Barbra A Richardson
- Departments of Biostatistics and Global Health, University of Washington, Seattle, USA; Division of Vaccine and Infectious Diseases, Fred Hutchinson Cancer Research Center, Seattle, USA
| | - Brandon L Guthrie
- Departments of Epidemiology and Global Health, University of Washington, Seattle, USA
| | | | | | | | - Erica M Lokken
- Department of Global Health, University of Washington, Seattle, USA
| | - Grace John-Stewart
- Departments of Global Health, Epidemiology, Medicine, and Pediatrics, University of Washington, Seattle, USA
| | - Jennifer A Unger
- Department of Obstetrics and Gynecology, University of Washington, Seattle, USA
| | - John Kinuthia
- Department of Research and Programs, Kenyatta National Hospital, Nairobi, Kenya
| | - Alison L Drake
- Department of Global Health, University of Washington, Seattle, USA
| |
Collapse
|
16
|
Chen W, Bi K, Zhang X, Jiang J, Diao H. In-depth characterization of the biomarkers based on tumor-infiltrated immune cells reveals implications for diagnosis and prognosis in hepatocellular carcinoma. J Transl Autoimmun 2020; 3:100067. [PMID: 33073226 PMCID: PMC7548299 DOI: 10.1016/j.jtauto.2020.100067] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Revised: 09/10/2020] [Accepted: 09/27/2020] [Indexed: 12/13/2022] Open
Abstract
Hepatocellular carcinoma (HCC) is an immune-related tumor, that the type and number of tumor-infiltrated immune cells can serve as biomarkers for the clinical application. In this study, we constructed the immune model for diagnostic and prognostic prediction of HCC based on the systematic bioinformatics analyses on the component of immune cells from large samples transcriptome. CIBERSORT analysis found that the component of immune cells between 513 HCC and 473 adjacent normal tissues was different. M0 macrophages and regulatory T cells were mainly enriched in tumor tissues, whereas the CD8+ T cell and activated CD4+ memory T cells were the most in normal tissues. Using random forest and LASSO analyses, eleven immune cell types were mined out to construct the immune diagnostic model (IDG), which showed high efficiency in distinguishing cancer from normal tissues both in testing and validation groups. In addition, the immune prognostic model (IPG) consisting of five types of immune cells was constructed using the LASSO-Cox algorithm. It showed that HCC patients of the high-risk group had a significantly shorter survival time than those of low-risk group in testing, validation, and entire cohorts. Besides, Nomogram plots and decision curve analyses revealed that the IPG was positively associated with the HCC clinical classification of the Barcelona Clinic Liver Cancer (BCLC) stage, and showing more accuracy of prediction than independent BCLC stage. Related analyses found that IDG positively correlated with epithelial-mesenchymal transition (EMT) and cytotoxic factor-related genes and negatively correlated with immune checkpoint regulators related genes. From the GSEA analysis of the biological function of genes related to IPG, it was found that the genes of the high-risk group were enriched in some tumorigenesis related pathways, such as DNA replication, cell cycle, and PPARA. Therefore, this study identified IDG and IPG as efficient biomarkers for the diagnosis and prognosis of HCC. We comprehensive analyzed the infiltrated immune cells between HCC and adjacent tissues in big data samples. The IDG and IPG models were potential biomarkers for the diagnosis and prognosis of HCC, respectively. The IPG model was associated with the HCC clinical characteristics.
Collapse
Affiliation(s)
- Wenbiao Chen
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Disease, Collaborative InnovatEion Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| | - Kefan Bi
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Disease, Collaborative InnovatEion Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| | - Xujun Zhang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Disease, Collaborative InnovatEion Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| | - Jingjing Jiang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Disease, Collaborative InnovatEion Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| | - Hongyan Diao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Disease, Collaborative InnovatEion Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, 310003, China
| |
Collapse
|
17
|
Li R, Chang C, Justesen JM, Tanigawa Y, Qiang J, Hastie T, Rivas MA, Tibshirani R. Fast Lasso method for large-scale and ultrahigh-dimensional Cox model with applications to UK Biobank. Biostatistics 2020; 23:522-540. [PMID: 32989444 DOI: 10.1093/biostatistics/kxaa038] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2020] [Revised: 08/15/2020] [Accepted: 08/18/2020] [Indexed: 11/13/2022] Open
Abstract
We develop a scalable and highly efficient algorithm to fit a Cox proportional hazard model by maximizing the $L^1$-regularized (Lasso) partial likelihood function, based on the Batch Screening Iterative Lasso (BASIL) method developed in Qian and others (2019). Our algorithm is particularly suitable for large-scale and high-dimensional data that do not fit in the memory. The output of our algorithm is the full Lasso path, the parameter estimates at all predefined regularization parameters, as well as their validation accuracy measured using the concordance index (C-index) or the validation deviance. To demonstrate the effectiveness of our algorithm, we analyze a large genotype-survival time dataset across 306 disease outcomes from the UK Biobank (Sudlow and others, 2015). We provide a publicly available implementation of the proposed approach for genetics data on top of the PLINK2 package and name it snpnet-Cox.
Collapse
Affiliation(s)
- Ruilin Li
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA
| | | | - Johanne M Justesen
- Department of Statistics and Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Yosuke Tanigawa
- Department of Statistics and Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Junyang Qiang
- Department of Statistics and Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Trevor Hastie
- Department of Statistics and Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Manuel A Rivas
- Department of Statistics and Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| | - Robert Tibshirani
- Department of Statistics and Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
18
|
Xu Z, Wu Z, Zhang J, Zhou R, Ye L, Yang P, Yu B. Development and validation of an oxidative phosphorylation-related gene signature in lung adenocarcinoma. Epigenomics 2020; 12:1333-1348. [PMID: 32787683 DOI: 10.2217/epi-2020-0217] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Aim: To develop an oxidative phosphorylation (OXPHOS)-related gene signature of lung adenocarcinoma (LUAD). Materials & methods: We split The Cancer Genome Atlas LUAD cohort into a training set and a test set; we used the least absolute shrinkage and selection operator Cox method to structure the OXPHOS-related prognostic signature in the training set and verified in the test set and GSE30219 dataset. Meanwhile, the diagnostic model was constructed using the logistic Cox method. Results: The signature consisted of seven genes (LDHA, CFTR, HSPD1, SNHG3, MAP1LC3C, COX6B2, and TWIST1). LUAD patients were divided into high- and low-risk groups, demonstrating good diagnostic and prognostic capabilities. Conclusion: We developed the first-ever OXPHOS-related signature with both prognostic predictive power and diagnostic efficacy.
Collapse
Affiliation(s)
- Zihao Xu
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, 330006, PR China.,Jiangxi Medical College, Nanchang University, Nanchang, Jiangxi, 330031, PR China
| | - Zilong Wu
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, 330006, PR China
| | - Jingtao Zhang
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, 330006, PR China
| | - Ruihao Zhou
- Department of Pain Management, West China Hospital, Sichuan University, Chengdu, Sichuan Province, 610041, PR China
| | - Ling Ye
- Department of Pain Management, West China Hospital, Sichuan University, Chengdu, Sichuan Province, 610041, PR China
| | - Pingliang Yang
- Department of Anesthesiology, The First Affiliated Hospital of Chengdu Medical College, Xindu, Sichuan, 610500, PR China
| | - Bentong Yu
- Department of Thoracic Surgery, The First Affiliated Hospital of Nanchang University, Nanchang, Jiangxi, 330006, PR China
| |
Collapse
|
19
|
Wen P, Gao Y, Chen B, Qi X, Hu G, Xu A, Xia J, Wu L, Lu H, Zhao G. Pan-Cancer Analysis of Radiotherapy Benefits and Immune Infiltration in Multiple Human Cancers. Cancers (Basel) 2020; 12:cancers12040957. [PMID: 32294976 PMCID: PMC7226004 DOI: 10.3390/cancers12040957] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2020] [Revised: 03/29/2020] [Accepted: 04/08/2020] [Indexed: 12/12/2022] Open
Abstract
Response to radiotherapy (RT) in cancers varies widely among patients. Therefore, it is very important to predict who will benefit from RT before clinical treatment. Consideration of the immune tumor microenvironment (TME) could provide novel insight into tumor treatment options. In this study, we investigated the link between immune infiltration status and clinical RT outcome in order to identify certain leukocyte subsets that could potentially influence the clinical RT benefit across cancers. By integrally analyzing the TCGA data across seven cancers, we identified complex associations between immune infiltration and patients RT outcomes. Besides, immune cells showed large differences in their populations in various cancers, and the most abundant cells were resting memory CD4 T cells. Additionally, the proportion of activated CD4 memory T cells and activated mast cells, albeit at low number, were closely related to RT overall survival in multiple cancers. Furthermore, a prognostic model for RT outcomes was established with good performance based on the immune infiltration status. Summarized, immune infiltration was found to be of significant clinical relevance to RT outcomes. These findings may help to shed light on the impact of tumor-associated immune cell infiltration on cancer RT outcomes, and identify biomarkers and therapeutic targets.
Collapse
Affiliation(s)
- Pengbo Wen
- Key Laboratory of High Magnetic Field and Ion Beam Physical Biology, Hefei Institutes of Physical Science, Chinese Academy of Sciences; Anhui Province Key Laboratory of Environmental Toxicology and Pollution Control Technology, Hefei 230031, China; (P.W.); (Y.G.); (B.C.); (X.Q.); (G.H.); (A.X.); (L.W.)
- University of Science and Technology of China, Hefei 230026, China
| | - Yang Gao
- Key Laboratory of High Magnetic Field and Ion Beam Physical Biology, Hefei Institutes of Physical Science, Chinese Academy of Sciences; Anhui Province Key Laboratory of Environmental Toxicology and Pollution Control Technology, Hefei 230031, China; (P.W.); (Y.G.); (B.C.); (X.Q.); (G.H.); (A.X.); (L.W.)
- University of Science and Technology of China, Hefei 230026, China
| | - Bin Chen
- Key Laboratory of High Magnetic Field and Ion Beam Physical Biology, Hefei Institutes of Physical Science, Chinese Academy of Sciences; Anhui Province Key Laboratory of Environmental Toxicology and Pollution Control Technology, Hefei 230031, China; (P.W.); (Y.G.); (B.C.); (X.Q.); (G.H.); (A.X.); (L.W.)
- University of Science and Technology of China, Hefei 230026, China
| | - Xiaojing Qi
- Key Laboratory of High Magnetic Field and Ion Beam Physical Biology, Hefei Institutes of Physical Science, Chinese Academy of Sciences; Anhui Province Key Laboratory of Environmental Toxicology and Pollution Control Technology, Hefei 230031, China; (P.W.); (Y.G.); (B.C.); (X.Q.); (G.H.); (A.X.); (L.W.)
- University of Science and Technology of China, Hefei 230026, China
| | - Guanshuo Hu
- Key Laboratory of High Magnetic Field and Ion Beam Physical Biology, Hefei Institutes of Physical Science, Chinese Academy of Sciences; Anhui Province Key Laboratory of Environmental Toxicology and Pollution Control Technology, Hefei 230031, China; (P.W.); (Y.G.); (B.C.); (X.Q.); (G.H.); (A.X.); (L.W.)
- University of Science and Technology of China, Hefei 230026, China
| | - An Xu
- Key Laboratory of High Magnetic Field and Ion Beam Physical Biology, Hefei Institutes of Physical Science, Chinese Academy of Sciences; Anhui Province Key Laboratory of Environmental Toxicology and Pollution Control Technology, Hefei 230031, China; (P.W.); (Y.G.); (B.C.); (X.Q.); (G.H.); (A.X.); (L.W.)
| | - Junfeng Xia
- Institute of Physical Science and Information Technology, School of Computer Science and Technology, Anhui University, Hefei 230039, China;
| | - Lijun Wu
- Key Laboratory of High Magnetic Field and Ion Beam Physical Biology, Hefei Institutes of Physical Science, Chinese Academy of Sciences; Anhui Province Key Laboratory of Environmental Toxicology and Pollution Control Technology, Hefei 230031, China; (P.W.); (Y.G.); (B.C.); (X.Q.); (G.H.); (A.X.); (L.W.)
| | - Huayi Lu
- Department of Ophthalmology & Visual Sciences, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei 230026, China
- Correspondence: (H.L.); (G.Z.)
| | - Guoping Zhao
- Key Laboratory of High Magnetic Field and Ion Beam Physical Biology, Hefei Institutes of Physical Science, Chinese Academy of Sciences; Anhui Province Key Laboratory of Environmental Toxicology and Pollution Control Technology, Hefei 230031, China; (P.W.); (Y.G.); (B.C.); (X.Q.); (G.H.); (A.X.); (L.W.)
- Correspondence: (H.L.); (G.Z.)
| |
Collapse
|
20
|
Wit EC, Augugliaro L, Pazira H, González J, Abegaz F. Sparse relative risk regression models. Biostatistics 2020; 21:e131-e147. [PMID: 30380025 PMCID: PMC7868056 DOI: 10.1093/biostatistics/kxy060] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2017] [Revised: 09/20/2018] [Accepted: 09/24/2018] [Indexed: 11/15/2022] Open
Abstract
Clinical studies where patients are routinely screened for many genomic features are becoming more routine. In principle, this holds the promise of being able to find genomic signatures for a particular disease. In particular, cancer survival is thought to be closely linked to the genomic constitution of the tumor. Discovering such signatures will be useful in the diagnosis of the patient, may be used for treatment decisions and, perhaps, even the development of new treatments. However, genomic data are typically noisy and high-dimensional, not rarely outstripping the number of patients included in the study. Regularized survival models have been proposed to deal with such scenarios. These methods typically induce sparsity by means of a coincidental match of the geometry of the convex likelihood and a (near) non-convex regularizer. The disadvantages of such methods are that they are typically non-invariant to scale changes of the covariates, they struggle with highly correlated covariates, and they have a practical problem of determining the amount of regularization. In this article, we propose an extension of the differential geometric least angle regression method for sparse inference in relative risk regression models. A software implementation of our method is available on github (https://github.com/LuigiAugugliaro/dgcox).
Collapse
Affiliation(s)
- Ernst C Wit
- Institute of Computational Science, USI, Via Buffi 13, Lugano, Switzerland
| | - Luigi Augugliaro
- Department of Economics, Business and Statistics, University of Palermo, Building 13, Viale delle Scienze, Palermo, Italy
| | - Hassan Pazira
- Bernoulli Institute, University of Groningen, Nijenborg 9, AG Groningen, The Netherlands
| | - Javier González
- Amazon Research Cambridge, Poseidon House, Castle Park, Cambridge, UK
| | - Fentaw Abegaz
- Bernoulli Institute, University of Groningen, Nijenborg 9, AG Groningen, The Netherlands
- Department of Pediatrics and Systems Biology Centre for Energy Metabolism and Ageing, University of Groningen, University Medical Center Groningen, AD Groningen, The Netherlands
| |
Collapse
|
21
|
Zou J, Wang E. Cancer Biomarker Discovery for Precision Medicine: New Progress. Curr Med Chem 2020; 26:7655-7671. [PMID: 30027846 DOI: 10.2174/0929867325666180718164712] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2018] [Revised: 06/26/2018] [Accepted: 07/06/2018] [Indexed: 12/30/2022]
Abstract
BACKGROUND Precision medicine puts forward customized healthcare for cancer patients. An important way to accomplish this task is to stratify patients into those who may respond to a treatment and those who may not. For this purpose, diagnostic and prognostic biomarkers have been pursued. OBJECTIVE This review focuses on novel approaches and concepts of exploring biomarker discovery under the circumstances that technologies are developed, and data are accumulated for precision medicine. RESULTS The traditional mechanism-driven functional biomarkers have the advantage of actionable insights, while data-driven computational biomarkers can fulfill more needs, especially with tremendous data on the molecules of different layers (e.g. genetic mutation, mRNA, protein etc.) which are accumulated based on a plenty of technologies. Besides, the technology-driven liquid biopsy biomarker is very promising to improve patients' survival. The developments of biomarker discovery on these aspects are promoting the understanding of cancer, helping the stratification of patients and improving patients' survival. CONCLUSION Current developments on mechanisms-, data- and technology-driven biomarker discovery are achieving the aim of precision medicine and promoting the clinical application of biomarkers. Meanwhile, the complexity of cancer requires more effective biomarkers, which could be accomplished by a comprehensive integration of multiple types of biomarkers together with a deep understanding of cancer.
Collapse
Affiliation(s)
- Jinfeng Zou
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, ON, M5G 23C1, Canada
| | - Edwin Wang
- College of Life Science, Tianjin Normal University, Tianjin, China.,Cumming School of Medicine, University of Calgary, Calgary, Alberta AB T2N 1N4, Canada
| |
Collapse
|
22
|
Yang ZY, Liu XY, Shu J, Zhang H, Ren YQ, Xu ZB, Liang Y. Multi-view based integrative analysis of gene expression data for identifying biomarkers. Sci Rep 2019; 9:13504. [PMID: 31534156 PMCID: PMC6751173 DOI: 10.1038/s41598-019-49967-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 08/30/2019] [Indexed: 01/05/2023] Open
Abstract
The widespread applications in microarray technology have produced the vast quantity of publicly available gene expression datasets. However, analysis of gene expression data using biostatistics and machine learning approaches is a challenging task due to (1) high noise; (2) small sample size with high dimensionality; (3) batch effects and (4) low reproducibility of significant biomarkers. These issues reveal the complexity of gene expression data, thus significantly obstructing microarray technology in clinical applications. The integrative analysis offers an opportunity to address these issues and provides a more comprehensive understanding of the biological systems, but current methods have several limitations. This work leverages state of the art machine learning development for multiple gene expression datasets integration, classification and identification of significant biomarkers. We design a novel integrative framework, MVIAm - Multi-View based Integrative Analysis of microarray data for identifying biomarkers. It applies multiple cross-platform normalization methods to aggregate multiple datasets into a multi-view dataset and utilizes a robust learning mechanism Multi-View Self-Paced Learning (MVSPL) for gene selection in cancer classification problems. We demonstrate the capabilities of MVIAm using simulated data and studies of breast cancer and lung cancer, it can be applied flexibly and is an effective tool for facing the four challenges of gene expression data analysis. Our proposed model makes microarray integrative analysis more systematic and expands its range of applications.
Collapse
Affiliation(s)
- Zi-Yi Yang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau, China
| | - Xiao-Ying Liu
- Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, 519090, China
| | - Jun Shu
- School of Mathematics and Statistics & Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Hui Zhang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau, China
| | - Yan-Qiong Ren
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau, China
| | - Zong-Ben Xu
- School of Mathematics and Statistics & Ministry of Education Key Lab of Intelligent Networks and Network Security, Xi'an Jiaotong University, Xi'an, 710049, China
| | - Yong Liang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Taipa, 999078, Macau, China.
| |
Collapse
|
23
|
Shokoohi F, Khalili A, Asgharian M, Lin S. Capturing heterogeneity of covariate effects in hidden subpopulations in the presence of censoring and large number of covariates. Ann Appl Stat 2019. [DOI: 10.1214/18-aoas1198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
24
|
Shen H, Chai H, Li M, Zhou Z, Liang Y, Yang Z, Huang H, Liu X, Zhang B. Robust sparse accelerated failure time model for survival analysis. Technol Health Care 2018; 26:55-63. [PMID: 29689755 PMCID: PMC6004954 DOI: 10.3233/thc-174141] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
To identify the bio-mark genes related to disease with high dimension and low sample size gene expression data, various regression approaches with different regularization methods have been proposed to solve this problem. Nevertheless, high-noises in biological data significantly reduce the performances of methods. The accelerated failure time (AFT) modelwas designed for gene selection and survival time estimation in cancer survival analysis. In this article, we proposed a novel robust sparse accelerated failure time model (RS-AFT) through combining the least absolute deviation (LAD) and Lq regularization. An iterative weighted linear programming algorithm without regularization parameter tuning was proposed to solve this RS-AFT model. The results of the experiments show our method has better performancebothin gene selection and survival time estimationthan some widely used regularization methods such as lasso, elastic net and SCAD. Hence we thought the RS-AFT model may be a competitive regularization method in cancer survival analysis.
Collapse
Affiliation(s)
| | | | | | | | - Yong Liang
- Corresponding author: Yong Liang, Faculty of Information Technology and State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macau 999078, China. Tel.: +853 63869506; Fax: +853 88972034; E-mail: .
| | | | | | | | | |
Collapse
|
25
|
Kim HK, Park KH, Kim Y, Park SE, Lee HS, Lim SW, Cho JH, Kim JY, Lee JE, Ahn JS, Im YH, Yu JH, Park YH. Discordance of the PAM50 Intrinsic Subtypes Compared with Immunohistochemistry-Based Surrogate in Breast Cancer Patients: Potential Implication of Genomic Alterations of Discordance. Cancer Res Treat 2018; 51:737-747. [PMID: 30189722 PMCID: PMC6473265 DOI: 10.4143/crt.2018.342] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Accepted: 09/03/2018] [Indexed: 12/21/2022] Open
Abstract
Purpose We aimed to analyze the discordance between immunohistochemistry (IHC)-based surrogate subtyping and PAM50 intrinsic subtypes and to assess overall survival (OS) according to discordance. Materials and Methods A total of 607 patients were analyzed. Hormone receptor (HR) expression was evaluated by IHC, and human epidermal growth factor receptor 2 (HER2) expression was analyzed by IHC and/or fluorescence in situ hybridization. PAM50 intrinsic subtypes were determined according to 50 cancer genes using the NanoString nCounter Analysis System. We matched concordant tumor as luminal A and HR+/HER2–, luminal B and HR+/HER2+, HR–/HER2+ and HER2–enriched, and triple-negative breast cancer (TNBC) and normal- or basal-like. We used Ion Ampliseq Cancer Panel v2 was used to identify the genomic alteration related with discordance. The Kaplan-Meier method was used to estimate OS. Results In total, 233 patients (38.4%) were discordant between IHC-based subtype and PAM50 intrinsic subtype. Using targeted sequencing, we detected somatic mutation–related discordant breast cancer including the VHL gene in the HR+/HER2– group (31% in concordant group, 0% in discordant group, p=0.03) and the IDH and RET genes (7% vs. 12%, p=0.02 and 0% vs. 25%, p=0.02, respectively) in the TNBC group. Among the luminal A/B patients with a discordant result had significantly worse OS (median OS, 73.6 months vs. not reached; p < 0.001), and among the patients with HR positivity, the basal-like group as determined by PAM50 showed significantly inferior OS compared to other intrinsic subtypes (5-year OS rate, 92.2% vs. 75.6%; p=0.01). Conclusion A substantial portion of patients showed discrepancy between IHC subtype and PAM50 intrinsic subtype in our study. The survival analysis demonstrated that current IHC-based classification could mislead the treatment and result in poor outcome. Current guidelines for IHC might be updated accordingly.
Collapse
Affiliation(s)
- Hee Kyung Kim
- Division of Hematology-Oncology, Department of Internal Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea.,Department of Internal Medicine, Chungbuk National University Hospital, Chungbuk National University College of Medicine, Cheongju, Korea
| | - Kyung Hee Park
- Samsung Genome Institute, Samsung Medical Center, Seoul, Korea
| | - Youjin Kim
- Division of Hematology-Oncology, Department of Internal Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Song Ee Park
- Division of Hematology-Oncology, Department of Internal Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Han Sang Lee
- Division of Hematology-Oncology, Department of Internal Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Sung Won Lim
- Division of Hematology-Oncology, Department of Internal Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Jang Ho Cho
- Division of Hematology-Oncology, Department of Internal Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Ji-Yeon Kim
- Division of Hematology-Oncology, Department of Internal Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Jeong Eon Lee
- Division of Breast Surgery, Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Jin Seok Ahn
- Division of Hematology-Oncology, Department of Internal Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Young-Hyuck Im
- Division of Hematology-Oncology, Department of Internal Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Jong Han Yu
- Division of Breast Surgery, Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Yeon Hee Park
- Division of Hematology-Oncology, Department of Internal Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| |
Collapse
|
26
|
Choi J, Park S, Yoon Y, Ahn J. Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers. Bioinformatics 2018; 33:3619-3626. [PMID: 28961949 DOI: 10.1093/bioinformatics/btx487] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2017] [Accepted: 07/27/2017] [Indexed: 02/07/2023] Open
Abstract
Motivation Identification of genes that can be used to predict prognosis in patients with cancer is important in that it can lead to improved therapy, and can also promote our understanding of tumor progression on the molecular level. One of the common but fundamental problems that render identification of prognostic genes and prediction of cancer outcomes difficult is the heterogeneity of patient samples. Results To reduce the effect of sample heterogeneity, we clustered data samples using K-means algorithm and applied modified PageRank to functional interaction (FI) networks weighted using gene expression values of samples in each cluster. Hub genes among resulting prioritized genes were selected as biomarkers to predict the prognosis of samples. This process outperformed traditional feature selection methods as well as several network-based prognostic gene selection methods when applied to Random Forest. We were able to find many cluster-specific prognostic genes for each dataset. Functional study showed that distinct biological processes were enriched in each cluster, which seems to reflect different aspect of tumor progression or oncogenesis among distinct patient groups. Taken together, these results provide support for the hypothesis that our approach can effectively identify heterogeneous prognostic genes, and these are complementary to each other, improving prediction accuracy. Availability and implementation https://github.com/mathcom/CPR. Contact jgahn@inu.ac.kr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jonghwan Choi
- Department of Computer Science and Engineering, Incheon National University, Incheon, The Republic of Korea
| | - Sanghyun Park
- Department of Computer Science, Yonsei University, Seoul, The Republic of Korea
| | - Youngmi Yoon
- Department of Computer Engineering, Gachon University, Seongnam-si, Gyeonggi-do, The Republic of Korea
| | - Jaegyoon Ahn
- Department of Computer Science and Engineering, Incheon National University, Incheon, The Republic of Korea
| |
Collapse
|
27
|
Huh JW, Kim SC, Sohn I, Jung SH, Kim HC. Serum protein profiling using an aptamer array predicts clinical outcomes of stage IIA colon cancer: A leave-one-out crossvalidation. Oncotarget 2017; 7:16338-48. [PMID: 26908450 PMCID: PMC4941318 DOI: 10.18632/oncotarget.7488] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 02/11/2016] [Indexed: 11/29/2022] Open
Abstract
Background In this study, we established and validated a model for predicting prognosis of stage IIA colon cancer patients based on expression profiles of aptamers in serum. Methods Bloods samples were collected from 227 consecutive patients with pathologic T3N0M0 (stage IIA) colon cancer. We incubated 1,149 serum molecule-binding aptamer pools of clinical significance with serum from patients to obtain aptamers bound to serum molecules, which were then amplified and marked. Oligonucleotide arrays were constructed with the base sequences of the 1,149 aptamers, and the marked products identified above were reacted with one another to produce profiles of the aptamers bound to serum molecules. These profiles were organized into low- and high-risk groups of colon cancer patients based on clinical information for the serum samples. Cox proportional hazards model and leave-one-out cross-validation (LOOCV) were used to evaluate predictive performance. Results During a median follow-up period of 5 years, 29 of the 227 patients (11.9%) experienced recurrence. There were 212 patients (93.4%) in the low-risk group and 15 patients (6.6%) in the high-risk group in our aptamer prognosis model. Postoperative recurrence significantly correlated with age and aptamer risk stratification (p = 0.046 and p = 0.001, respectively). In multivariate analysis, aptamer risk stratification (p < 0.001) was an independent predictor of recurrence. Disease-free survival curves calculated according to aptamer risk level predicted through a LOOCV procedure and age showed significant differences (p < 0.001 from permutations). Conclusion Aptamer risk stratification can be a valuable prognostic factor in stage II colon cancer patients.
Collapse
Affiliation(s)
- Jung Wook Huh
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | | | - Insuk Sohn
- Biostatistics and Clinical Epidemiology Center, Samsung Medical Center, Seoul, Korea
| | - Sin-Ho Jung
- Biostatistics and Clinical Epidemiology Center, Samsung Medical Center, Seoul, Korea.,Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - Hee Cheol Kim
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| |
Collapse
|
28
|
Attallah O, Karthikesalingam A, Holt PJE, Thompson MM, Sayers R, Bown MJ, Choke EC, Ma X. Feature selection through validation and un-censoring of endovascular repair survival data for predicting the risk of re-intervention. BMC Med Inform Decis Mak 2017; 17:115. [PMID: 28774329 PMCID: PMC5543447 DOI: 10.1186/s12911-017-0508-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 07/24/2017] [Indexed: 12/25/2022] Open
Abstract
Background Feature selection (FS) process is essential in the medical area as it reduces the effort and time needed for physicians to measure unnecessary features. Choosing useful variables is a difficult task with the presence of censoring which is the unique characteristic in survival analysis. Most survival FS methods depend on Cox’s proportional hazard model; however, machine learning techniques (MLT) are preferred but not commonly used due to censoring. Techniques that have been proposed to adopt MLT to perform FS with survival data cannot be used with the high level of censoring. The researcher’s previous publications proposed a technique to deal with the high level of censoring. It also used existing FS techniques to reduce dataset dimension. However, in this paper a new FS technique was proposed and combined with feature transformation and the proposed uncensoring approaches to select a reduced set of features and produce a stable predictive model. Methods In this paper, a FS technique based on artificial neural network (ANN) MLT is proposed to deal with highly censored Endovascular Aortic Repair (EVAR). Survival data EVAR datasets were collected during 2004 to 2010 from two vascular centers in order to produce a final stable model. They contain almost 91% of censored patients. The proposed approach used a wrapper FS method with ANN to select a reduced subset of features that predict the risk of EVAR re-intervention after 5 years to patients from two different centers located in the United Kingdom, to allow it to be potentially applied to cross-centers predictions. The proposed model is compared with the two popular FS techniques; Akaike and Bayesian information criteria (AIC, BIC) that are used with Cox’s model. Results The final model outperforms other methods in distinguishing the high and low risk groups; as they both have concordance index and estimated AUC better than the Cox’s model based on AIC, BIC, Lasso, and SCAD approaches. These models have p-values lower than 0.05, meaning that patients with different risk groups can be separated significantly and those who would need re-intervention can be correctly predicted. Conclusion The proposed approach will save time and effort made by physicians to collect unnecessary variables. The final reduced model was able to predict the long-term risk of aortic complications after EVAR. This predictive model can help clinicians decide patients’ future observation plan. Electronic supplementary material The online version of this article (doi:10.1186/s12911-017-0508-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Omneya Attallah
- School of Engineering and Applied Science, Aston University, B4 7ET, Birmingham, UK.,Department of Electronics and Communications, College of Engineering and Technology, Arab Academy for Science and Technology, Alexandria, Egypt
| | | | | | | | - Rob Sayers
- St George's Vascular Institute, St George's University Hospitals NHS Foundation Trust, Blackshaw Road, London, SW17 0QT, UK
| | - Matthew J Bown
- Vascular Surgery Group, University of Leicester, Leicester, UK
| | - Eddie C Choke
- Vascular Surgery Group, Robert Kilpatrick Clinical Sciences Building, Leicester Royal Infirmary, University of Leicester, Leicester, LE2 7LX, UK
| | - Xianghong Ma
- School of Engineering and Applied Science, Aston University, B4 7ET, Birmingham, UK.
| |
Collapse
|
29
|
Lee JY, Park K, Lim SH, Kim HS, Yoo KH, Jung KS, Song HN, Hong M, Do IG, Ahn T, Lee SK, Bae SY, Kim SW, Lee JE, Nam SJ, Kim DH, Jung HH, Kim JY, Ahn JS, Im YH, Park YH. Mutational profiling of brain metastasis from breast cancer: matched pair analysis of targeted sequencing between brain metastasis and primary breast cancer. Oncotarget 2016; 6:43731-42. [PMID: 26527317 PMCID: PMC4791262 DOI: 10.18632/oncotarget.6192] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Accepted: 10/06/2015] [Indexed: 12/13/2022] Open
Abstract
Although breast cancer is the second most common cause of brain metastasis with a notable increase of incidence, genes that mediate breast cancer brain metastasis (BCBM) are not fully understood. To study the molecular nature of brain metastasis, we performed gene expression profiling of brain metastasis and matched primary breast cancer (BC). We used the Ion AmpliSeq Cancer Panel v2 covering 2,855 mutations from 50 cancer genes to analyze 18 primary BC and 42 BCBM including 15 matched pairs. The most common BCBM subtypes were triple-negative (42.9%) and basal-like (36.6%). In a total of 42 BCBM samples, 32 (76.2%) harbored at least one mutation (median 1, range 0–7 mutations). Frequently detected somatic mutations included TP53 (59.5%), MLH1 (14.3%), PIK3CA (14.3%), and KIT (7.1%). We compared BCBM with patient-matched primary BC specimens. There were no significant differences in mutation profiles between the two groups. Notably, gene expression in BCBM such as TP53, PIK3CA, KIT, MLH1, and RB1 also seemed to be present in primary breast cancers. The TP53 mutation frequency was higher in BCBM than in primary BC (59.5% vs 38.9%, respectively). In conclusion, we found actionable gene alterations in BCBM that were maintained in primary BC. Further studies with functional testing and a delineation of the role of these genes in specific steps of the metastatic process should lead to a better understanding of the biology of metastasis and its susceptibility to treatment.
Collapse
Affiliation(s)
- Ji Yun Lee
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Kyunghee Park
- Samsung Genomic Institute, Samsung Biological Research Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Sung Hee Lim
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Hae Su Kim
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Kwai Han Yoo
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Ki Sun Jung
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Haa-Na Song
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Mineui Hong
- Center of Companion Diagnostics, Innovative Cancer Medicine Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - In-Gu Do
- Center of Companion Diagnostics, Innovative Cancer Medicine Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - TaeJin Ahn
- Samsung Genomic Institute, Samsung Biological Research Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Se Kyung Lee
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Soo Youn Bae
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Seok Won Kim
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Jeong Eon Lee
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Seok Jin Nam
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Duk-Hwan Kim
- Department of Molecular Cell Biology, Samsung Biomedical Research Institute, Sungkyunkwan University School of Medicine, Suwon, Korea
| | - Hae Hyun Jung
- Biomedical Research Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Ji-Yeon Kim
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Jin Seok Ahn
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Young-Hyuck Im
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Yeon Hee Park
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| |
Collapse
|
30
|
Molecular characterization of patients with pathologic complete response or early failure after neoadjuvant chemotherapy for locally advanced breast cancer using next generation sequencing and nCounter assay. Oncotarget 2016; 6:24499-510. [PMID: 26009992 PMCID: PMC4695201 DOI: 10.18632/oncotarget.4119] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 05/02/2015] [Indexed: 12/31/2022] Open
Abstract
Neoadjuvant chemotherapy (NAC) has the added advantage of increasing breast conservation rates with equivalent survival outcomes compared with adjuvant chemotherapy. A subset of breast cancer patients who received NAC experienced early failure (EF) during the course of therapy or within a short period after curative breast surgery. In contrast, patients with pathological complete response (pCR) were reported to have markedly favorable outcomes. This study was performed to identify actionable mutation(s) and to explain refractoriness and responsiveness to NAC. Included in this analysis were 76 patients among 397 with locally advanced breast cancer for whom a preoperative fresh-frozen paraffin-embedded tumor block was available for next-generation sequencing using AmpliSeq. The incidence of missense mutations in KRAS was much higher in patients with EF than in other groups (p < 0.01). In contrast, polymorphisms of the cMET gene were found in patients with pCR exclusively (p < 0.01).
Collapse
|
31
|
Barrett JE, Coolen ACC. Covariate dimension reduction for survival data via the Gaussian process latent variable model. Stat Med 2016; 35:1340-53. [PMID: 26526057 DOI: 10.1002/sim.6784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Revised: 10/07/2015] [Accepted: 10/09/2015] [Indexed: 11/09/2022]
Abstract
The analysis of high-dimensional survival data is challenging, primarily owing to the problem of overfitting, which occurs when spurious relationships are inferred from data that subsequently fail to exist in test data. Here, we propose a novel method of extracting a low-dimensional representation of covariates in survival data by combining the popular Gaussian process latent variable model with a Weibull proportional hazards model. The combined model offers a flexible non-linear probabilistic method of detecting and extracting any intrinsic low-dimensional structure from high-dimensional data. By reducing the covariate dimension, we aim to diminish the risk of overfitting and increase the robustness and accuracy with which we infer relationships between covariates and survival outcomes. In addition, we can simultaneously combine information from multiple data sources by expressing multiple datasets in terms of the same low-dimensional space. We present results from several simulation studies that illustrate a reduction in overfitting and an increase in predictive performance, as well as successful detection of intrinsic dimensionality. We provide evidence that it is advantageous to combine dimensionality reduction with survival outcomes rather than performing unsupervised dimensionality reduction on its own. Finally, we use our model to analyse experimental gene expression data and detect and extract a low-dimensional representation that allows us to distinguish high-risk and low-risk groups with superior accuracy compared with doing regression on the original high-dimensional data.
Collapse
Affiliation(s)
- James E Barrett
- Institute for Mathematical and Molecular Biomedicine, King's College London, London, U.K
| | - Anthony C C Coolen
- Institute for Mathematical and Molecular Biomedicine, King's College London, London, U.K
| |
Collapse
|
32
|
Avalos M, Pouyes H, Grandvalet Y, Orriols L, Lagarde E. Sparse conditional logistic regression for analyzing large-scale matched data from epidemiological studies: a simple algorithm. BMC Bioinformatics 2015; 16 Suppl 6:S1. [PMID: 25916593 PMCID: PMC4416185 DOI: 10.1186/1471-2105-16-s6-s1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
This paper considers the problem of estimation and variable selection for large high-dimensional data (high number of predictors p and large sample size N, without excluding the possibility that N < p) resulting from an individually matched case-control study. We develop a simple algorithm for the adaptation of the Lasso and related methods to the conditional logistic regression model. Our proposal relies on the simplification of the calculations involved in the likelihood function. Then, the proposed algorithm iteratively solves reweighted Lasso problems using cyclical coordinate descent, computed along a regularization path. This method can handle large problems and deal with sparse features efficiently. We discuss benefits and drawbacks with respect to the existing available implementations. We also illustrate the interest and use of these techniques on a pharmacoepidemiological study of medication use and traffic safety.
Collapse
|
33
|
Tian S, Wang C, An MW. Test on existence of histology subtype-specific prognostic signatures among early stage lung adenocarcinoma and squamous cell carcinoma patients using a Cox-model based filter. Biol Direct 2015; 10:15. [PMID: 25887039 PMCID: PMC4415297 DOI: 10.1186/s13062-015-0051-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 03/24/2015] [Indexed: 01/18/2023] Open
Abstract
Background Non-small cell lung cancer (NSCLC) is the predominant histological type of lung cancer, accounting for up to 85% of cases. Disease stage is commonly used to determine adjuvant treatment eligibility of NSCLC patients, however, it is an imprecise predictor of the prognosis of an individual patient. Currently, many researchers resort to microarray technology for identifying relevant genetic prognostic markers, with particular attention on trimming or extending a Cox regression model. Adenocarcinoma (AC) and squamous cell carcinoma (SCC) are two major histology subtypes of NSCLC. It has been demonstrated that fundamental differences exist in their underlying mechanisms, which motivated us to postulate the existence of specific genes related to the prognosis of each histology subtype. Results In this article, we propose a simple filter feature selection algorithm with a Cox regression model as the base. Applying this method to real-world microarray data identifies a histology-specific prognostic gene signature. Furthermore, the resulting 32-gene (32/12 for AC/SCC) prognostic signature for early-stage AC and SCC samples has superior predictive ability relative to two relevant prognostic signatures, and has comparable performance with signatures obtained by applying two state-of-the art algorithms separately to AC and SCC samples. Conclusions Our proposal is conceptually simple, and straightforward to implement. Furthermore, it can be easily adapted and applied to a range of other research settings. Reviewers This article was reviewed by Leonid Hanin (nominated by Dr. Lev Klebanov), Limsoon Wong and Jun Yu. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0051-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Suyan Tian
- Division of Clinical Epidemiology, First Hospital of Jilin University, 71Xinmin Street, Changchun, Jilin, 130021, China.
| | - Chi Wang
- Department of Biostatistics and Markey Cancer Center, University of Kentucky, 800 Rose St., Lexington, KY, 40536, USA.
| | - Ming-Wen An
- Department of Mathematics, Vassar College, Poughkeepsie, NY, 12604, USA.
| |
Collapse
|
34
|
Park YH, Jung HH, Do IG, Cho EY, Sohn I, Jung SH, Kil WH, Kim SW, Lee JE, Nam SJ, Ahn JS, Im YH. A seven-gene signature can predict distant recurrence in patients with triple-negative breast cancers who receive adjuvant chemotherapy following surgery. Int J Cancer 2014; 136:1976-84. [PMID: 25537444 DOI: 10.1002/ijc.29233] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Accepted: 09/03/2014] [Indexed: 01/21/2023]
Abstract
The aim of this study was to investigate candidate genes that might function as biomarkers to differentiate triple negative breast cancers (TNBCs) among patients, who received adjuvant chemotherapy after curative surgery. We tested whether the results of a NanoString expression assay that targeted 250 prospectively selected genes and used mRNA extracted from formalin-fixed, paraffin-embedded would predict distant recurrence in patients with TNBC. The levels of expression of seven genes were used in a prospectively defined algorithm to allocate each patient to a risk group (low or high). NanoString expression profiles were obtained for 203 tumor tissue blocks. Increased expressions of the five genes (SMAD2, HRAS, KRT6A, TP63 and ETV6) and decreased expression of the two genes (NFKB1 and MDM4) were associated favorable prognosis and were validated with cross-validation. The Kaplan-Meier estimates of the rates of distant recurrence at 10 years in the low- and high-risk groups according to gene expression signature were 62% [95% confidence interval (CI), 48.6-78.9%] and 85% (95% CI, 79.2-90.7%), respectively. When adjusting for TNM stage, the distant recurrence-free survival (DRFS)s in the low-risk group was significantly longer than that in the high-risk group (p <0.001) for early stage (I and II) and advanced stage (III) tumors. In a multivariate Cox regression model, the gene expression signature provided significant predictive power jointly with the TNM staging system. A seven-gene signature could be used as a prognostic model to predict DRFS in patients with TNBC who received curative surgery followed by adjuvant chemotherapy.
Collapse
Affiliation(s)
- Yeon Hee Park
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea; Biomedical Research Institute, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
35
|
Jung SH, Sohn I. Statistical Issues in the Design and Analysis of nCounter Projects. Cancer Inform 2014; 13:35-43. [PMID: 25574131 PMCID: PMC4266201 DOI: 10.4137/cin.s16343] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Revised: 09/29/2014] [Accepted: 10/04/2014] [Indexed: 11/05/2022] Open
Abstract
Numerous statistical methods have been published for designing and analyzing microarray projects. Traditional genome-wide microarray platforms (such as Affymetrix, Illumina, and DASL) measure the expression level of tens of thousands genes. Since the sets of genes included in these array chips are selected by the manufacturers, the number of genes associated with a specific disease outcome is limited and a large portion of the genes are not associated. nCounter is a new technology by NanoString to measure the expression of a selected number (up to 800) of genes. The list of genes for nCounter chips can be selected by customers. Due to the limited number of genes and the price increase in the number of selected genes, the genes for nCounter chips are carefully selected among those discovered from previous studies, usually using traditional high-throughput platforms, and only a small number of definitely unassociated genes, called control genes, are included to standardize the overall expression level across different chips. Furthermore, nCounter chips measure the expression level of each gene using a counting observation while the traditional high-throughput platforms produce continuous observations. Due to these differences, some statistical methods developed for the design and analysis of high-throughput projects may need modification or may be inappropriate for nCounter projects. In this paper, we discuss statistical methods that can be used for designing and analyzing nCounter projects.
Collapse
Affiliation(s)
- Sin-Ho Jung
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA. ; Biostatistics and Clinical Epidemiology Center, Samsung Medical Center, Seoul, Korea
| | - Insuk Sohn
- Biostatistics and Clinical Epidemiology Center, Samsung Medical Center, Seoul, Korea
| |
Collapse
|
36
|
Novel harmonic regularization approach for variable selection in Cox's proportional hazards model. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2014; 2014:857398. [PMID: 25506389 PMCID: PMC4259133 DOI: 10.1155/2014/857398] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Revised: 07/13/2014] [Accepted: 07/25/2014] [Indexed: 11/18/2022]
Abstract
Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods.
Collapse
|
37
|
Bastien P, Bertrand F, Meyer N, Maumy-Bertrand M. Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data. ACTA ACUST UNITED AC 2014; 31:397-404. [PMID: 25286920 DOI: 10.1093/bioinformatics/btu660] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION A vast literature from the past decade is devoted to relating gene profiles and subject survival or time to cancer recurrence. Biomarker discovery from high-dimensional data, such as transcriptomic or single nucleotide polymorphism profiles, is a major challenge in the search for more precise diagnoses. The proportional hazard regression model suggested by Cox (1972), to study the relationship between the time to event and a set of covariates in the presence of censoring is the most commonly used model for the analysis of survival data. However, like multivariate regression, it supposes that more observations than variables, complete data, and not strongly correlated variables are available. In practice, when dealing with high-dimensional data, these constraints are crippling. Collinearity gives rise to issues of over-fitting and model misidentification. Variable selection can improve the estimation accuracy by effectively identifying the subset of relevant predictors and enhance the model interpretability with parsimonious representation. To deal with both collinearity and variable selection issues, many methods based on least absolute shrinkage and selection operator penalized Cox proportional hazards have been proposed since the reference paper of Tibshirani. Regularization could also be performed using dimension reduction as is the case with partial least squares (PLS) regression. We propose two original algorithms named sPLSDR and its non-linear kernel counterpart DKsPLSDR, by using sparse PLS regression (sPLS) based on deviance residuals. We compared their predicting performance with state-of-the-art algorithms on both simulated and real reference benchmark datasets. RESULTS sPLSDR and DKsPLSDR compare favorably with other methods in their computational time, prediction and selectivity, as indicated by results based on benchmark datasets. Moreover, in the framework of PLS regression, they feature other useful tools, including biplots representation, or the ability to deal with missing data. Therefore, we view them as a useful addition to the toolbox of estimation and prediction methods for the widely used Cox's model in the high-dimensional and low-sample size settings. AVAILABILITY AND IMPLEMENTATION The R-package plsRcox is available on the CRAN and is maintained by Frédéric Bertrand. http://cran.r-project.org/web/packages/plsRcox/index.html. CONTACT pbastien@rd.loreal.com or fbertran@math.unistra.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Philippe Bastien
- L'Oréal Recherche & Innovation, 93601 Aulnay-sous-Bois, IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, 67084 Strasbourg Cedex, INSERM EA3430, Laboratoire de Biostatistique, Faculté de Médecine de Strasbourg, Labex IRMIA, Université de Strasbourg, 67085 Strasbourg Cedex, France
| | - Frédéric Bertrand
- L'Oréal Recherche & Innovation, 93601 Aulnay-sous-Bois, IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, 67084 Strasbourg Cedex, INSERM EA3430, Laboratoire de Biostatistique, Faculté de Médecine de Strasbourg, Labex IRMIA, Université de Strasbourg, 67085 Strasbourg Cedex, France
| | - Nicolas Meyer
- L'Oréal Recherche & Innovation, 93601 Aulnay-sous-Bois, IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, 67084 Strasbourg Cedex, INSERM EA3430, Laboratoire de Biostatistique, Faculté de Médecine de Strasbourg, Labex IRMIA, Université de Strasbourg, 67085 Strasbourg Cedex, France
| | - Myriam Maumy-Bertrand
- L'Oréal Recherche & Innovation, 93601 Aulnay-sous-Bois, IRMA, CNRS UMR 7501, Labex IRMIA, Université de Strasbourg, 67084 Strasbourg Cedex, INSERM EA3430, Laboratoire de Biostatistique, Faculté de Médecine de Strasbourg, Labex IRMIA, Université de Strasbourg, 67085 Strasbourg Cedex, France
| |
Collapse
|
38
|
Mittal S, Madigan D. High-dimensional, massive sample-size Cox proportional hazards regression for survival analysis. Biostatistics 2014; 15:207-21. [PMID: 24096388 PMCID: PMC3944969 DOI: 10.1093/biostatistics/kxt043] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Revised: 07/23/2013] [Accepted: 09/03/2013] [Indexed: 11/14/2022] Open
Abstract
Survival analysis endures as an old, yet active research field with applications that spread across many domains. Continuing improvements in data acquisition techniques pose constant challenges in applying existing survival analysis methods to these emerging data sets. In this paper, we present tools for fitting regularized Cox survival analysis models on high-dimensional, massive sample-size (HDMSS) data using a variant of the cyclic coordinate descent optimization technique tailored for the sparsity that HDMSS data often present. Experiments on two real data examples demonstrate that efficient analyses of HDMSS data using these tools result in improved predictive performance and calibration.
Collapse
Affiliation(s)
- Sushil Mittal
- Department of Statistics, Columbia University, New York, NY 10027, USA
| | - David Madigan
- Department of Statistics, Columbia University, New York, NY 10027, USA
| |
Collapse
|
39
|
Lee J, Sohn I, Do IG, Kim KM, Park SH, Park JO, Park YS, Lim HY, Sohn TS, Bae JM, Choi MG, Lim DH, Min BH, Lee JH, Rhee PL, Kim JJ, Choi DI, Tan IB, Das K, Tan P, Jung SH, Kang WK, Kim S. Nanostring-based multigene assay to predict recurrence for gastric cancer patients after surgery. PLoS One 2014; 9:e90133. [PMID: 24598828 PMCID: PMC3943911 DOI: 10.1371/journal.pone.0090133] [Citation(s) in RCA: 91] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2013] [Accepted: 01/26/2014] [Indexed: 12/13/2022] Open
Abstract
Despite the benefits from adjuvant chemotherapy or chemoradiotherapy, approximately one-third of stage II gastric cancer (GC) patients developed recurrences. The aim of this study was to develop and validate a prognostic algorithm for gastric cancer (GCPS) that can robustly identify high-risk group for recurrence among stage II patients. A multi-step gene expression profiling study was conducted. First, a microarray gene expression profiling of archived paraffin-embedded tumor blocks was used to identify candidate prognostic genes (N=432). Second, a focused gene expression assay including prognostic genes was used to develop a robust clinical assay (GCPS) in stage II patients from the same cohort (N=186). Third, a predefined cut off for the GCPS was validated using an independent stage II cohort (N=216). The GCPS was validated in another set with stage II GC who underwent surgery without adjuvant treatment (N=300). GCPS was developed by summing the product of Cox regression coefficients and normalized expression levels of 8 genes (LAMP5, CDC25B, CDK1, CLIP4, LTB4R2, MATN3, NOX4, TFDP1). A prospectively defined cut-point for GCPS classified 22.7% of validation cohort treated with chemoradiotherapy (N=216) as high-risk group with 5-year recurrence rate of 58.6% compared to 85.4% in the low risk group (hazard ratio for recurrence=3.16, p=0.00004). GCPS also identified high-risk group among stage II patients treated with surgery only (hazard ratio=1.77, p=0.0053).
Collapse
Affiliation(s)
- Jeeyun Lee
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Insuk Sohn
- Samsung Cancer Research Institute, Samsung Medical Center, Seoul, Korea
| | - In-Gu Do
- Samsung Cancer Research Institute, Samsung Medical Center, Seoul, Korea
- Department of Pathology, Samsung Medical Center Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Kyoung-Mee Kim
- Department of Pathology, Samsung Medical Center Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Se Hoon Park
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Joon Oh Park
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Young Suk Park
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Ho Yeong Lim
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Tae Sung Sohn
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Jae Moon Bae
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Min Gew Choi
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Do Hoon Lim
- Department of Radiation Oncology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Byung Hoon Min
- Department of Gastroenterology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Joon Haeng Lee
- Department of Gastroenterology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Poong Lyul Rhee
- Department of Gastroenterology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Jae J. Kim
- Department of Gastroenterology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Dong Il Choi
- Department of Radiology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Iain Beehuat Tan
- Cancer and Stem Cell Biology, Duke-NUS Graduate Medical School Singapore, Singapore
| | - Kakoli Das
- Genome Institute of Singapore, Singapore, Singapore
| | - Patrick Tan
- Cancer and Stem Cell Biology, Duke-NUS Graduate Medical School Singapore, Singapore
- Genome Institute of Singapore, Singapore, Singapore
- Cancer Science Institute of Singapore, Singapore, Singapore
| | - Sin Ho Jung
- Samsung Cancer Research Institute, Samsung Medical Center, Seoul, Korea
| | - Won Ki Kang
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center Sungkyunkwan University School of Medicine, Seoul, Korea
| | - Sung Kim
- Department of Surgery, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| |
Collapse
|
40
|
Gong H, Wu TT, Clarke EM. Pathway-gene identification for pancreatic cancer survival via doubly regularized Cox regression. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 1:S3. [PMID: 24565114 PMCID: PMC4080266 DOI: 10.1186/1752-0509-8-s1-s3] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Background Recent global genomic analyses identified 69 gene sets and 12 core signaling pathways genetically altered in pancreatic cancer, which is a highly malignant disease. A comprehensive understanding of the genetic signatures and signaling pathways that are directly correlated to pancreatic cancer survival will help cancer researchers to develop effective multi-gene targeted, personalized therapies for the pancreatic cancer patients at different stages. A previous work that applied a LASSO penalized regression method, which only considered individual genetic effects, identified 12 genes associated with pancreatic cancer survival. Results In this work, we integrate pathway information into pancreatic cancer survival analysis. We introduce and apply a doubly regularized Cox regression model to identify both genes and signaling pathways related to pancreatic cancer survival. Conclusions Four signaling pathways, including Ion transport, immune phagocytosis, TGFβ (spermatogenesis), regulation of DNA-dependent transcription pathways, and 15 genes within the four pathways are identified and verified to be directly correlated to pancreatic cancer survival. Our findings can help cancer researchers design new strategies for the early detection and diagnosis of pancreatic cancer.
Collapse
|
41
|
|
42
|
Lee YY, Kim TJ, Kim JY, Choi CH, Do IG, Song SY, Sohn I, Jung SH, Bae DS, Lee JW, Kim BG. Genetic profiling to predict recurrence of early cervical cancer. Gynecol Oncol 2013; 131:650-4. [DOI: 10.1016/j.ygyno.2013.10.003] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Revised: 10/02/2013] [Accepted: 10/03/2013] [Indexed: 11/17/2022]
|
43
|
Lim HY, Sohn I, Deng S, Lee J, Jung SH, Mao M, Xu J, Wang K, Shi S, Joh JW, Choi YL, Park CK. Prediction of disease-free survival in hepatocellular carcinoma by gene expression profiling. Ann Surg Oncol 2013; 20:3747-53. [PMID: 23800896 DOI: 10.1245/s10434-013-3070-y] [Citation(s) in RCA: 102] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2012] [Indexed: 12/12/2022]
Abstract
BACKGROUND Progression of hepatocellular carcinoma (HCC) often leads to vascular invasion and intrahepatic metastasis, which correlate with recurrence after surgical treatment and poor prognosis. The molecular prognostic model that could be applied to the HCC patient population in general is needed for effectively predicting disease-free survival (DFS). METHODS A cohort of 286 HCC patients from South Korea and a second cohort of 83 patients from Hong Kong, China, were used as training and validation sets, respectively. RNA extracted from both tumor and adjacent nontumor liver tissues was subjected to microarray gene expression profiling. DFS was the primary clinical end point. Gradient lasso algorithm was used to build prognostic signatures. RESULTS High-quality gene expression profiles were obtained from 240 tumors and 193 adjacent nontumor liver tissues from the training set. Sets of 30 and 23 gene-based DFS signatures were developed from gene expression profiles of tumor and adjacent nontumor liver, respectively. DFS gene signature of tumor was significantly associated with DFS in an independent validation set of 83 tumors (P = 0.002). DFS gene signature of nontumor liver was not significantly associated with DFS in the validation set (P = 0.827). Multivariate analysis in the validation set showed that DFS gene signature of tumor was an independent predictor of shorter DFS (P = 0.018). CONCLUSIONS We developed and validated survival gene signatures of tumor to successfully predict the length of DFS in HCC patients after surgical resection.
Collapse
Affiliation(s)
- Ho-Yeong Lim
- Division of Hematology-Oncology, Department of Medicine, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul, Korea
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
44
|
Liang Y, Liu C, Luan XZ, Leung KS, Chan TM, Xu ZB, Zhang H. Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification. BMC Bioinformatics 2013; 14:198. [PMID: 23777239 PMCID: PMC3718705 DOI: 10.1186/1471-2105-14-198] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2012] [Accepted: 05/30/2013] [Indexed: 11/21/2022] Open
Abstract
Background Microarray technology is widely used in cancer diagnosis. Successfully identifying gene biomarkers will significantly help to classify different cancer types and improve the prediction accuracy. The regularization approach is one of the effective methods for gene selection in microarray data, which generally contain a large number of genes and have a small number of samples. In recent years, various approaches have been developed for gene selection of microarray data. Generally, they are divided into three categories: filter, wrapper and embedded methods. Regularization methods are an important embedded technique and perform both continuous shrinkage and automatic gene selection simultaneously. Recently, there is growing interest in applying the regularization techniques in gene selection. The popular regularization technique is Lasso (L1), and many L1 type regularization terms have been proposed in the recent years. Theoretically, the Lq type regularization with the lower value of q would lead to better solutions with more sparsity. Moreover, the L1/2 regularization can be taken as a representative of Lq (0 <q < 1) regularizations and has been demonstrated many attractive properties. Results In this work, we investigate a sparse logistic regression with the L1/2 penalty for gene selection in cancer classification problems, and propose a coordinate descent algorithm with a new univariate half thresholding operator to solve the L1/2 penalized logistic regression. Experimental results on artificial and microarray data demonstrate the effectiveness of our proposed approach compared with other regularization methods. Especially, for 4 publicly available gene expression datasets, the L1/2 regularization method achieved its success using only about 2 to 14 predictors (genes), compared to about 6 to 38 genes for ordinary L1 and elastic net regularization approaches. Conclusions From our evaluations, it is clear that the sparse logistic regression with the L1/2 penalty achieves higher classification accuracy than those of ordinary L1 and elastic net regularization approaches, while fewer but informative genes are selected. This is an important consideration for screening and diagnostic applications, where the goal is often to develop an accurate test using as few features as possible in order to control cost. Therefore, the sparse logistic regression with the L1/2 penalty is effective technique for gene selection in real classification problems.
Collapse
Affiliation(s)
- Yong Liang
- Faculty of Information Technology & State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macau, China.
| | | | | | | | | | | | | |
Collapse
|
45
|
Zhang W, Ota T, Shridhar V, Chien J, Wu B, Kuang R. Network-based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLoS Comput Biol 2013; 9:e1002975. [PMID: 23555212 PMCID: PMC3605061 DOI: 10.1371/journal.pcbi.1002975] [Citation(s) in RCA: 123] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2012] [Accepted: 01/23/2013] [Indexed: 11/24/2022] Open
Abstract
Cox regression is commonly used to predict the outcome by the time to an event of interest and in addition, identify relevant features for survival analysis in cancer genomics. Due to the high-dimensionality of high-throughput genomic data, existing Cox models trained on any particular dataset usually generalize poorly to other independent datasets. In this paper, we propose a network-based Cox regression model called Net-Cox and applied Net-Cox for a large-scale survival analysis across multiple ovarian cancer datasets. Net-Cox integrates gene network information into the Cox's proportional hazard model to explore the co-expression or functional relation among high-dimensional gene expression features in the gene network. Net-Cox was applied to analyze three independent gene expression datasets including the TCGA ovarian cancer dataset and two other public ovarian cancer datasets. Net-Cox with the network information from gene co-expression or functional relations identified highly consistent signature genes across the three datasets, and because of the better generalization across the datasets, Net-Cox also consistently improved the accuracy of survival prediction over the Cox models regularized by or . This study focused on analyzing the death and recurrence outcomes in the treatment of ovarian carcinoma to identify signature genes that can more reliably predict the events. The signature genes comprise dense protein-protein interaction subnetworks, enriched by extracellular matrix receptors and modulators or by nuclear signaling components downstream of extracellular signal-regulated kinases. In the laboratory validation of the signature genes, a tumor array experiment by protein staining on an independent patient cohort from Mayo Clinic showed that the protein expression of the signature gene FBN1 is a biomarker significantly associated with the early recurrence after 12 months of the treatment in the ovarian cancer patients who are initially sensitive to chemotherapy. Net-Cox toolbox is available at http://compbio.cs.umn.edu/Net-Cox/. Network-based computational models are attracting increasing attention in studying cancer genomics because molecular networks provide valuable information on the functional organizations of molecules in cells. Survival analysis mostly with the Cox proportional hazard model is widely used to predict or correlate gene expressions with time to an event of interest (outcome) in cancer genomics. Surprisingly, network-based survival analysis has not received enough attention. In this paper, we studied resistance to chemotherapy in ovarian cancer with a network-based Cox model, called Net-Cox. The experiments confirm that networks representing gene co-expression or functional relations can be used to improve the accuracy and the robustness of survival prediction of outcome in ovarian cancer treatment. The study also revealed subnetwork signatures that are enriched by extracellular matrix receptors and modulators and the downstream nuclear signaling components of extracellular signal-regulators, respectively. In particular, FBN1, which was detected as a signature gene of high confidence by Net-Cox with network information, was validated as a biomarker for predicting early recurrence in platinum-sensitive ovarian cancer patients in laboratory.
Collapse
Affiliation(s)
- Wei Zhang
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America
| | - Takayo Ota
- Department of Laboratory Medicine and Experimental Pathology, Mayo Clinic College of Medicine, Rochester, Minnesota, United States of America
| | - Viji Shridhar
- Department of Laboratory Medicine and Experimental Pathology, Mayo Clinic College of Medicine, Rochester, Minnesota, United States of America
| | - Jeremy Chien
- Department of Laboratory Medicine and Experimental Pathology, Mayo Clinic College of Medicine, Rochester, Minnesota, United States of America
| | - Baolin Wu
- Division of Biostatistics, School of Public Health, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America
| | - Rui Kuang
- Department of Computer Science and Engineering, University of Minnesota Twin Cities, Minneapolis, Minnesota, United States of America
- * E-mail:
| |
Collapse
|
46
|
Kim J, Sohn I, Son DS, Kim DH, Ahn T, Jung SH. Prediction of a time-to-event trait using genome wide SNP data. BMC Bioinformatics 2013; 14:58. [PMID: 23418752 PMCID: PMC3651372 DOI: 10.1186/1471-2105-14-58] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Accepted: 02/12/2013] [Indexed: 02/07/2023] Open
Abstract
Background A popular objective of many high-throughput genome projects is to discover various genomic markers associated with traits and develop statistical models to predict traits of future patients based on marker values. Results In this paper, we present a prediction method for time-to-event traits using genome-wide single-nucleotide polymorphisms (SNPs). We also propose a MaxTest associating between a time-to-event trait and a SNP accounting for its possible genetic models. The proposed MaxTest can help screen out nonprognostic SNPs and identify genetic models of prognostic SNPs. The performance of the proposed method is evaluated through simulations. Conclusions In conjunction with the MaxTest, the proposed method provides more parsimonious prediction models but includes more prognostic SNPs than some naive prediction methods. The proposed method is demonstrated with real GWAS data.
Collapse
Affiliation(s)
- Jinseog Kim
- Department of Statistics and Information Science, Dongguk University, Gyeongju 780-714, Korea
| | | | | | | | | | | |
Collapse
|
47
|
|
48
|
Wu TT, Gong H, Clarke EM. A transcriptome analysis by lasso penalized Cox regression for pancreatic cancer survival. J Bioinform Comput Biol 2012; 9 Suppl 1:63-73. [PMID: 22144254 DOI: 10.1142/s0219720011005744] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2011] [Revised: 09/05/2011] [Accepted: 09/15/2011] [Indexed: 11/18/2022]
Abstract
Pancreatic cancer is the fourth leading cause of cancer deaths in the United States with five-year survival rates less than 5% due to rare detection in early stages. Identification of genes that are directly correlated to pancreatic cancer survival is crucial for pancreatic cancer diagnostics and treatment. However, no existing GWAS or transcriptome studies are available for addressing this problem. We apply lasso penalized Cox regression to a transcriptome study to identify genes that are directly related to pancreatic cancer survival. This method is capable of handling the right censoring effect of survival times and the ultrahigh dimensionality of genetic data. A cyclic coordinate descent algorithm is employed to rapidly select the most relevant genes and eliminate the irrelevant ones. Twelve genes have been identified and verified to be directly correlated to pancreatic cancer survival time and can be used for the prediction of future patient's survival.
Collapse
Affiliation(s)
- Tong Tong Wu
- Department of Epidemiology and Biostatistics, University of Maryland, College Park, MD 20742, USA.
| | | | | |
Collapse
|
49
|
Liu Z, Magder LS, Hyslop T, Mao L. Survival associated pathway identification with group Lp penalized global AUC maximization. Algorithms Mol Biol 2010; 5:30. [PMID: 20712896 PMCID: PMC2930641 DOI: 10.1186/1748-7188-5-30] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2010] [Accepted: 08/16/2010] [Indexed: 11/24/2022] Open
Abstract
It has been demonstrated that genes in a cell do not act independently. They interact with one another to complete certain biological processes or to implement certain molecular functions. How to incorporate biological pathways or functional groups into the model and identify survival associated gene pathways is still a challenging problem. In this paper, we propose a novel iterative gradient based method for survival analysis with group Lp penalized global AUC summary maximization. Unlike LASSO, Lp (p < 1) (with its special implementation entitled adaptive LASSO) is asymptotic unbiased and has oracle properties [1]. We first extend Lp for individual gene identification to group Lp penalty for pathway selection, and then develop a novel iterative gradient algorithm for penalized global AUC summary maximization (IGGAUCS). This method incorporates the genetic pathways into global AUC summary maximization and identifies survival associated pathways instead of individual genes. The tuning parameters are determined using 10-fold cross validation with training data only. The prediction performance is evaluated using test data. We apply the proposed method to survival outcome analysis with gene expression profile and identify multiple pathways simultaneously. Experimental results with simulation and gene expression data demonstrate that the proposed procedures can be used for identifying important biological pathways that are related to survival phenotype and for building a parsimonious model for predicting the survival times.
Collapse
|
50
|
Abstract
MOTIVATION Variable selection is a typical approach used for molecular-signature and biomarker discovery; however, its application to survival data is often complicated by censored samples. We propose a new algorithm for variable selection suitable for the analysis of high-dimensional, right-censored data called Survival Max-Min Parents and Children (SMMPC). The algorithm is conceptually simple, scalable, based on the theory of Bayesian networks (BNs) and the Markov blanket and extends the corresponding algorithm (MMPC) for classification tasks. The selected variables have a structural interpretation: if T is the survival time (in general the time-to-event), SMMPC returns the variables adjacent to T in the BN representing the data distribution. The selected variables also have a causal interpretation that we discuss. RESULTS We conduct an extensive empirical analysis of prototypical and state-of-the-art variable selection algorithms for survival data that are applicable to high-dimensional biological data. SMMPC selects on average the smallest variable subsets (less than a dozen per dataset), while statistically significantly outperforming all of the methods in the study returning a manageable number of genes that could be inspected by a human expert. AVAILABILITY Matlab and R code are freely available from http://www.mensxmachina.org
Collapse
Affiliation(s)
- Vincenzo Lagani
- Institute of Computer Science, Foundation for Research and Technology-Hellas (FORTH) and Computer Science Department, University of Crete, Heraklion, Greece.
| | | |
Collapse
|