1
|
Liu Y, Li G. Sure Joint Screening for High Dimensional Cox's Proportional Hazards Model Under the Case-Cohort Design. J Comput Biol 2023; 30:663-677. [PMID: 37140454 PMCID: PMC10282795 DOI: 10.1089/cmb.2022.0416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/05/2023] Open
Abstract
This study develops a sure joint feature screening method for the case-cohort design with ultrahigh-dimensional covariates. Our method is based on a sparsity-restricted Cox proportional hazards model. An iterative reweighted hard thresholding algorithm is proposed to approximate the sparsity-restricted, pseudo-partial likelihood estimator for joint screening. We rigorously show that our method possesses the sure screening property, with the probability of retaining all relevant covariates tending to 1 as the sample size goes to infinity. Our simulation results demonstrate that the proposed procedure has substantially improved screening performance over some existing feature screening methods for the case-cohort design, especially when some covariates are jointly correlated, but marginally uncorrelated, with the event time outcome. A real data illustration is provided using breast cancer data with high-dimensional genomic covariates. We have implemented the proposed method using MATLAB and made it available to readers through GitHub.
Collapse
Affiliation(s)
- Yi Liu
- Department of Mathematics, School of Mathematical Sciences, Ocean University of China, Qingdao, China
| | - Gang Li
- Department of Biostatistics, University of California at Los Angeles, Los Angeles, California, USA
| |
Collapse
|
2
|
Salerno S, Li Y. High-Dimensional Survival Analysis: Methods and Applications. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2023; 10:25-49. [PMID: 36968638 PMCID: PMC10038209 DOI: 10.1146/annurev-statistics-032921-022127] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
In the era of precision medicine, time-to-event outcomes such as time to death or progression are routinely collected, along with high-throughput covariates. These high-dimensional data defy classical survival regression models, which are either infeasible to fit or likely to incur low predictability due to over-fitting. To overcome this, recent emphasis has been placed on developing novel approaches for feature selection and survival prognostication. We will review various cutting-edge methods that handle survival outcome data with high-dimensional predictors, highlighting recent innovations in machine learning approaches for survival prediction. We will cover the statistical intuitions and principles behind these methods and conclude with extensions to more complex settings, where competing events are observed. We exemplify these methods with applications to the Boston Lung Cancer Survival Cohort study, one of the largest cancer epidemiology cohorts investigating the complex mechanisms of lung cancer.
Collapse
Affiliation(s)
- Stephen Salerno
- Department of Biostatistics, University of Michigan, Ann Arbor, United States, 48109
| | - Yi Li
- Department of Biostatistics, University of Michigan, Ann Arbor, United States, 48109
| |
Collapse
|
3
|
Escorcia-Rodríguez JM, Gaytan-Nuñez E, Hernandez-Benitez EM, Zorro-Aranda A, Tello-Palencia MA, Freyre-González JA. Improving gene regulatory network inference and assessment: The importance of using network structure. Front Genet 2023; 14:1143382. [PMID: 36926589 PMCID: PMC10012345 DOI: 10.3389/fgene.2023.1143382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 02/20/2023] [Indexed: 03/03/2023] Open
Abstract
Gene regulatory networks are graph models representing cellular transcription events. Networks are far from complete due to time and resource consumption for experimental validation and curation of the interactions. Previous assessments have shown the modest performance of the available network inference methods based on gene expression data. Here, we study several caveats on the inference of regulatory networks and methods assessment through the quality of the input data and gold standard, and the assessment approach with a focus on the global structure of the network. We used synthetic and biological data for the predictions and experimentally-validated biological networks as the gold standard (ground truth). Standard performance metrics and graph structural properties suggest that methods inferring co-expression networks should no longer be assessed equally with those inferring regulatory interactions. While methods inferring regulatory interactions perform better in global regulatory network inference than co-expression-based methods, the latter is better suited to infer function-specific regulons and co-regulation networks. When merging expression data, the size increase should outweigh the noise inclusion and graph structure should be considered when integrating the inferences. We conclude with guidelines to take advantage of inference methods and their assessment based on the applications and available expression datasets.
Collapse
Affiliation(s)
- Juan M Escorcia-Rodríguez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Estefani Gaytan-Nuñez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Ericka M Hernandez-Benitez
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Andrea Zorro-Aranda
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Department of Chemical Engineering, Universidad de Antioquia, Medellín, Colombia
| | - Marco A Tello-Palencia
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico.,Undergraduate Program in Genomic Sciences, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| | - Julio A Freyre-González
- Regulatory Systems Biology Research Group, Program of Systems Biology, Center for Genomic Sciences, Universidad Nacional Autónoma de México, Cuernavaca, Mexico
| |
Collapse
|
4
|
Chu J, Sun N, Hu W, Chen X, Yi N, Shen Y. Bayesian hierarchical lasso Cox model: A 9-gene prognostic signature for overall survival in gastric cancer in an Asian population. PLoS One 2022; 17:e0266805. [PMID: 35421138 PMCID: PMC9009599 DOI: 10.1371/journal.pone.0266805] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 03/29/2022] [Indexed: 12/24/2022] Open
Abstract
Objective
Gastric cancer (GC) is one of the most common tumour diseases worldwide and has poor survival, especially in the Asian population. Exploration based on biomarkers would be efficient for better diagnosis, prediction, and targeted therapy.
Methods
Expression profiles were downloaded from the Gene Expression Omnibus (GEO) database. Survival-related genes were identified by gene set enrichment analysis (GSEA) and univariate Cox. Then, we applied a Bayesian hierarchical lasso Cox model for prognostic signature screening. Protein-protein interaction and Spearman analysis were performed. Kaplan–Meier and receiver operating characteristic (ROC) curve analysis were applied to evaluate the prediction performance. Multivariate Cox regression was used to identify prognostic factors, and a prognostic nomogram was constructed for clinical application.
Results
With the Bayesian lasso Cox model, a 9-gene signature included TNFRSF11A, NMNAT1, EIF5A, NOTCH3, TOR2A, E2F8, PSMA5, TPMT, and KIF11 was established to predict overall survival in GC. Protein-protein interaction analysis indicated that E2F8 was likely related to KIF11. Kaplan-Meier analysis showed a significant difference between the high-risk and low-risk groups (P<0.001). Multivariate analysis demonstrated that the 9-gene signature was an independent predictor (HR = 2.609, 95% CI 2.017–3.370), and the C-index of the integrative model reached 0.75. Function enrichment analysis for different risk groups revealed the most significant enrichment pathway/term, including pyrimidine metabolism and respiratory electron transport chain.
Conclusion
Our findings suggested that a novel prognostic model based on a 9-gene signature was developed to predict GC patients in high-risk and improve prediction performance. We hope our model could provide a reference for risk classification and clinical decision-making.
Collapse
Affiliation(s)
- Jiadong Chu
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Na Sun
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Wei Hu
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Xuanli Chen
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Nengjun Yi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | - Yueping Shen
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
- * E-mail:
| |
Collapse
|
5
|
Chu J, Sun NA, Hu W, Chen X, Yi N, Shen Y. The Application of Bayesian Methods in Cancer Prognosis and Prediction. Cancer Genomics Proteomics 2022; 19:1-11. [PMID: 34949654 PMCID: PMC8717957 DOI: 10.21873/cgp.20298] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 11/24/2021] [Accepted: 11/30/2021] [Indexed: 11/10/2022] Open
Abstract
With the development of high-throughput biological techniques, high-dimensional omics data have emerged. These molecular data provide a solid foundation for precision medicine and prognostic prediction of cancer. Bayesian methods contribute to constructing prognostic models with complex relationships in omics and improving performance by introducing different prior distribution, which is suitable for modelling the high-dimensional data involved. Using different omics, several Bayesian hierarchical approaches have been proposed for variable selection and model construction. In particular, the Bayesian methods of multi-omics integration have also been consistently proposed in recent years. Compared with single-omics, multi-omics integration modelling will contribute to improving predictive performance, gaining insights into the underlying mechanisms of tumour occurrence and development, and the discovery of more reliable biomarkers. In this work, we present a review of current proposed Bayesian approaches in prognostic prediction modelling in cancer.
Collapse
Affiliation(s)
- Jiadong Chu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - N A Sun
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Wei Hu
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Xuanli Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China
| | - Nengjun Yi
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, U.S.A
| | - Yueping Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, P.R. China;
| |
Collapse
|
6
|
Grist JT, Withey S, Bennett C, Rose HEL, MacPherson L, Oates A, Powell S, Novak J, Abernethy L, Pizer B, Bailey S, Clifford SC, Mitra D, Arvanitis TN, Auer DP, Avula S, Grundy R, Peet AC. Combining multi-site magnetic resonance imaging with machine learning predicts survival in pediatric brain tumors. Sci Rep 2021; 11:18897. [PMID: 34556677 PMCID: PMC8460620 DOI: 10.1038/s41598-021-96189-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Accepted: 07/27/2021] [Indexed: 12/02/2022] Open
Abstract
Brain tumors represent the highest cause of mortality in the pediatric oncological population. Diagnosis is commonly performed with magnetic resonance imaging. Survival biomarkers are challenging to identify due to the relatively low numbers of individual tumor types. 69 children with biopsy-confirmed brain tumors were recruited into this study. All participants had perfusion and diffusion weighted imaging performed at diagnosis. Imaging data were processed using conventional methods, and a Bayesian survival analysis performed. Unsupervised and supervised machine learning were performed with the survival features, to determine novel sub-groups related to survival. Sub-group analysis was undertaken to understand differences in imaging features. Survival analysis showed that a combination of diffusion and perfusion imaging were able to determine two novel sub-groups of brain tumors with different survival characteristics (p < 0.01), which were subsequently classified with high accuracy (98%) by a neural network. Analysis of high-grade tumors showed a marked difference in survival (p = 0.029) between the two clusters with high risk and low risk imaging features. This study has developed a novel model of survival for pediatric brain tumors. Tumor perfusion plays a key role in determining survival and should be considered as a high priority for future imaging protocols.
Collapse
Affiliation(s)
- James T Grist
- Institute of Cancer and Genomic Sciences, School of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Stephanie Withey
- Institute of Cancer and Genomic Sciences, School of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- Oncology, Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK
- RRPPS, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK
| | - Christopher Bennett
- Institute of Cancer and Genomic Sciences, School of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Heather E L Rose
- Institute of Cancer and Genomic Sciences, School of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- Oncology, Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK
| | - Lesley MacPherson
- Radiology, Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK
| | - Adam Oates
- Radiology, Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK
| | - Stephen Powell
- Institute of Cancer and Genomic Sciences, School of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
| | - Jan Novak
- Oncology, Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK
- Psychology, College of Health and Life Sciences Aston University, Birmingham, UK
- Aston Neuroscience Institute, Aston University, Birmingham, UK
| | | | - Barry Pizer
- Oncology, Alder Hey Children's NHS Foundation Trust, Liverpool, UK
| | - Simon Bailey
- Sir James Spence Institute of Child Health, Royal Victoria Infirmary, Newcastle upon Tyne, UK
| | - Steven C Clifford
- Wolfson Childhood Cancer Research Centre, Newcastle University Centre for Cancer, University of Newcastle, Newcastle upon Tyne, UK
| | - Dipayan Mitra
- Neuroradiology, Royal Victoria Infirmary, Newcastle Upon Tyne, UK
| | - Theodoros N Arvanitis
- Institute of Cancer and Genomic Sciences, School of Medical and Dental Sciences, University of Birmingham, Birmingham, UK
- Oncology, Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK
- Institute of Digital Healthcare, WMG, University of Warwick, Coventry, UK
| | - Dorothee P Auer
- Sir Peter Mansfield Imaging Centre, University of Nottingham Biomedical Research Centre, Nottingham, UK
- NIHR Nottingham Biomedical Research Centre, Nottingham, UK
| | - Shivaram Avula
- Radiology, Alder Hey Children's NHS Foundation Trust, Liverpool, UK
| | - Richard Grundy
- The Children's Brain Tumor Research Centre, University of Nottingham, Nottingham, UK
| | - Andrew C Peet
- Institute of Cancer and Genomic Sciences, School of Medical and Dental Sciences, University of Birmingham, Birmingham, UK.
- Oncology, Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK.
| |
Collapse
|
7
|
Fukushima A, Sugimoto M, Hiwa S, Hiroyasu T. Bayesian approach for predicting responses to therapy from high-dimensional time-course gene expression profiles. BMC Bioinformatics 2021; 22:132. [PMID: 33736614 PMCID: PMC7977599 DOI: 10.1186/s12859-021-04052-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 02/28/2021] [Indexed: 12/14/2022] Open
Abstract
Background Historical and updated information provided by time-course data collected during an entire treatment period proves to be more useful than information provided by single-point data. Accurate predictions made using time-course data on multiple biomarkers that indicate a patient’s response to therapy contribute positively to the decision-making process associated with designing effective treatment programs for various diseases. Therefore, the development of prediction methods incorporating time-course data on multiple markers is necessary. Results We proposed new methods that may be used for prediction and gene selection via time-course gene expression profiles. Our prediction method consolidated multiple probabilities calculated using gene expression profiles collected over a series of time points to predict therapy response. Using two data sets collected from patients with hepatitis C virus (HCV) infection and multiple sclerosis (MS), we performed numerical experiments that predicted response to therapy and evaluated their accuracies. Our methods were more accurate than conventional methods and successfully selected genes, the functions of which were associated with the pathology of HCV infection and MS. Conclusions The proposed method accurately predicted response to therapy using data at multiple time points. It showed higher accuracies at early time points compared to those of conventional methods. Furthermore, this method successfully selected genes that were directly associated with diseases. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04052-4.
Collapse
Affiliation(s)
- Arika Fukushima
- Graduate School of Life and Medical Sciences, Doshisha University, Kyotanabe-shi, Kyoto, 610-0321, Japan
| | - Masahiro Sugimoto
- Research and Development Center for Minimally Invasive Therapies, Institute of Medical Science, Tokyo Medical University, Shinjuku, Tokyo, 160-8402, Japan.,Institute for Advanced Biosciences, Keio University, Tsuruoka, Yamagata, 997-0052, Japan
| | - Satoru Hiwa
- Faculty of Life and Medical Sciences, Doshisha University, Kyotanabe-shi, Kyoto, 610-0321, Japan
| | - Tomoyuki Hiroyasu
- Faculty of Life and Medical Sciences, Doshisha University, Kyotanabe-shi, Kyoto, 610-0321, Japan.
| |
Collapse
|
8
|
Liu Y, Chen X, Li G. A new joint screening method for right-censored time-to-event data with ultra-high dimensional covariates. Stat Methods Med Res 2020; 29:1499-1513. [PMID: 31359834 PMCID: PMC8285086 DOI: 10.1177/0962280219864710] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In an ultra-high dimensional setting with a huge number of covariates, variable screening is useful for dimension reduction before applying a more refined method for model selection and statistical analysis. This paper proposes a new sure joint screening procedure for right-censored time-to-event data based on a sparsity-restricted semiparametric accelerated failure time model. Our method, referred to as Buckley-James assisted sure screening (BJASS), consists of an initial screening step using a sparsity-restricted least-squares estimate based on a synthetic time variable and a refinement screening step using a sparsity-restricted least-squares estimate with the Buckley-James imputed event times. The refinement step may be repeated several times to obtain more stable results. We show that with any fixed number of refinement steps, the BJASS procedure retains all important variables with probability tending to 1. Simulation results are presented to illustrate its performance in comparison with some marginal screening methods. Real data examples are provided using a diffuse large-B-cell lymphoma (DLBCL) data and a breast cancer data. We have implemented the BJASS method using Matlab and made it available to readers through Github https://github.com/yiucla/BJASS .
Collapse
Affiliation(s)
- Yi Liu
- School of Mathematical Sciences, Ocean University of China, Qingdao, China
| | - Xiaolin Chen
- School of Statistics, Qufu Normal University, Qufu, China
| | - Gang Li
- Department of Biostatistics, University of California at Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
9
|
Liu Y, Chen X. A new robust model-free feature screening method for ultra-high dimensional right censored data. COMMUN STAT-THEOR M 2020. [DOI: 10.1080/03610926.2020.1769672] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
- Yi Liu
- School of Mathematical Sciences, Ocean University of China, Qingdao, China
| | - Xiaolin Chen
- School of Statistics, Qufu Normal University, Qufu, China
| |
Collapse
|
10
|
Vickers A. An Evaluation of Survival Curve Extrapolation Techniques Using Long-Term Observational Cancer Data. Med Decis Making 2019; 39:926-938. [PMID: 31631772 PMCID: PMC6900572 DOI: 10.1177/0272989x19875950] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Objectives. Uncertainty in survival prediction beyond trial
follow-up is highly influential in cost-effectiveness analyses of oncology
products. This research provides an empirical evaluation of the accuracy of
alternative methods and recommendations for their implementation.
Methods. Mature (15-year) survival data were reconstructed from
a published database study for “no treatment,” radiotherapy, surgery plus
radiotherapy, and surgery in early stage non–small cell lung cancer in an
elderly patient population. Censored data sets were created from these data to
simulate immature trial data (for 1- to 10-year follow-up). A second data set
with mature (9-year) survival data for no treatment was used to extrapolate the
predictions from models fitted to the first data set. Six methodological
approaches were used to fit models to the simulated data and extrapolate beyond
trial follow-up. Model performance was evaluated by comparing the relative
difference in mean survival estimates and the absolute error in the difference
in mean survival v. the control with those from the original mature survival
data set. Results. Model performance depended on the treatment
comparison scenario. All models performed reasonably well when there was a small
short-term treatment effect, with the Bayesian model coping better with shorter
follow-up times. However, in other scenarios, the most flexible Bayesian model
that could be estimated in practice appeared to fit the data less well than the
models that used the external data separately. Where there was a large treatment
effect (hazard ratio = 0.4), models that used external data separately performed
best. Conclusions. Models that directly use mature external data
can improve the accuracy of survival predictions. Recommendations on modeling
strategies are made for different treatment benefit scenarios.
Collapse
Affiliation(s)
- Adrian Vickers
- RTI Health Solutions, Manchester, Greater Manchester, UK
| |
Collapse
|
11
|
Joint feature screening for ultra-high-dimensional sparse additive hazards model by the sparsity-restricted pseudo-score estimator. ANN I STAT MATH 2018. [DOI: 10.1007/s10463-018-0675-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
12
|
Hung LH, Shi K, Wu M, Young WC, Raftery AE, Yeung KY. fastBMA: scalable network inference and transitive reduction. Gigascience 2018; 6:1-10. [PMID: 29020744 PMCID: PMC5632288 DOI: 10.1093/gigascience/gix078] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 08/10/2017] [Indexed: 11/15/2022] Open
Abstract
Inferring genetic networks from genome-wide expression data is extremely demanding computationally. We have developed fastBMA, a distributed, parallel, and scalable implementation of Bayesian model averaging (BMA) for this purpose. fastBMA also includes a computationally efficient module for eliminating redundant indirect edges in the network by mapping the transitive reduction to an easily solved shortest-path problem. We evaluated the performance of fastBMA on synthetic data and experimental genome-wide time series yeast and human datasets. When using a single CPU core, fastBMA is up to 100 times faster than the next fastest method, LASSO, with increased accuracy. It is a memory-efficient, parallel, and distributed application that scales to human genome-wide expression data. A 10 000-gene regulation network can be obtained in a matter of hours using a 32-core cloud cluster (2 nodes of 16 cores). fastBMA is a significant improvement over its predecessor ScanBMA. It is more accurate and orders of magnitude faster than other fast network inference methods such as the 1 based on LASSO. The improved scalability allows it to calculate networks from genome scale data in a reasonable time frame. The transitive reduction method can improve accuracy in denser networks. fastBMA is available as code (M.I.T. license) from GitHub (https://github.com/lhhunghimself/fastBMA), as part of the updated networkBMA Bioconductor package (https://www.bioconductor.org/packages/release/bioc/html/networkBMA.html) and as ready-to-deploy Docker images (https://hub.docker.com/r/biodepot/fastbma/).
Collapse
Affiliation(s)
- Ling-Hong Hung
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - Kaiyuan Shi
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - Migao Wu
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
| | - William Chad Young
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195-4322, U.S.A
| | - Adrian E. Raftery
- Department of Statistics, University of Washington, Box 354322, Seattle, WA 98195-4322, U.S.A
| | - Ka Yee Yeung
- Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A
- Correspondence address. Ka Yee Yeung, Institute of Technology, University of Washington, Tacoma Campus, Box 358426, 1900 Commerce Street, Tacoma, WA 98402-3100, U.S.A.; Tel: 253-692-4924; Fax: 253-692-5862; E-mail:
| |
Collapse
|
13
|
Chen X, Chen X, Wang H. Robust feature screening for ultra-high dimensional right censored data via distance correlation. Comput Stat Data Anal 2018. [DOI: 10.1016/j.csda.2017.10.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|
14
|
Fragoso TM, Bertoli W, Louzada F. Bayesian Model Averaging: A Systematic Review and Conceptual Classification. Int Stat Rev 2017. [DOI: 10.1111/insr.12243] [Citation(s) in RCA: 106] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Tiago M. Fragoso
- Fundação CESGRANRIO; Rua Santa Alexandrina, 1011 Rio de Janeiro 20261-903 Brazil
| | - Wesley Bertoli
- Departamento Acadêmico de Matemática - Universidade Tecnológica Federal do Paraná; Avenida Sete de Setembro, 3165 Curitiba 80230-901 Brazil
| | - Francisco Louzada
- Instituto de Ciências Matemáticas e de Computação - Universidade de São Paulo; Avenida Trabalhador São-carlense, 400 São Carlos 13566-590 Brazil
| |
Collapse
|
15
|
Liu Y, Chen X. Quantile screening for ultra-high-dimensional heterogeneous data conditional on some variables. J STAT COMPUT SIM 2017. [DOI: 10.1080/00949655.2017.1389944] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Yi Liu
- College of Science, China University of Petroleum (East China), Qingdao, People's Republic of China
| | - Xiaolin Chen
- School of Statistics, Qufu Normal University, Qufu, People's Republic of China
| |
Collapse
|
16
|
Biermann J, Nemes S, Parris TZ, Engqvist H, Rönnerman EW, Forssell-Aronsson E, Steineck G, Karlsson P, Helou K. A Novel 18-Marker Panel Predicting Clinical Outcome in Breast Cancer. Cancer Epidemiol Biomarkers Prev 2017; 26:1619-1628. [PMID: 28877888 DOI: 10.1158/1055-9965.epi-17-0606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2017] [Revised: 08/23/2017] [Accepted: 08/28/2017] [Indexed: 11/16/2022] Open
Abstract
Background: Gene expression profiling has made considerable contributions to our understanding of cancer biology and clinical care. This study describes a novel gene expression signature for breast cancer-specific survival that was validated using external datasets.Methods: Gene expression signatures for invasive breast carcinomas (mainly luminal B subtype) corresponding to 136 patients were analyzed using Cox regression, and the effect of each gene on disease-specific survival (DSS) was estimated. Iterative Bayesian model averaging was applied on multivariable Cox regression models resulting in an 18-marker panel, which was validated using three external validation datasets. The 18 genes were analyzed for common pathways and functions using the Ingenuity Pathway Analysis software. This study complied with the REMARK criteria.Results: The 18-gene multivariable model showed a high predictive power for DSS in the training and validation cohort and a clear stratification between high- and low-risk patients. The differentially expressed genes were predominantly involved in biological processes such as cell cycle, DNA replication, recombination, and repair. Furthermore, the majority of the 18 genes were found to play a pivotal role in cancer.Conclusions: Our findings demonstrated that the 18 molecular markers were strong predictors of breast cancer-specific mortality. The stable time-dependent area under the ROC curve function (AUC(t)) and high C-indices in the training and validation cohorts were further improved by fitting a combined model consisting of the 18-marker panel and established clinical markers.Impact: Our work supports the applicability of this 18-marker panel to improve clinical outcome prediction for breast cancer patients. Cancer Epidemiol Biomarkers Prev; 26(11); 1619-28. ©2017 AACR.
Collapse
Affiliation(s)
- Jana Biermann
- Department of Oncology, Institute of Clinical Sciences, Sahlgrenska Cancer Center, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden.
| | - Szilárd Nemes
- Swedish Hip Arthroplasty Register, Gothenburg, Sweden
| | - Toshima Z Parris
- Department of Oncology, Institute of Clinical Sciences, Sahlgrenska Cancer Center, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden
| | - Hanna Engqvist
- Department of Oncology, Institute of Clinical Sciences, Sahlgrenska Cancer Center, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden
| | - Elisabeth Werner Rönnerman
- Department of Oncology, Institute of Clinical Sciences, Sahlgrenska Cancer Center, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden.,Department of Clinical Pathology and Genetics, Sahlgrenska University Hospital, Gothenburg, Sweden
| | - Eva Forssell-Aronsson
- Department of Radiation Physics, Institute of Clinical Sciences, Sahlgrenska Cancer Center, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden
| | - Gunnar Steineck
- Department of Oncology, Institute of Clinical Sciences, Sahlgrenska Cancer Center, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden
| | - Per Karlsson
- Department of Oncology, Institute of Clinical Sciences, Sahlgrenska Cancer Center, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden
| | - Khalil Helou
- Department of Oncology, Institute of Clinical Sciences, Sahlgrenska Cancer Center, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|
17
|
Chen CCM, Keith JM, Mengersen KL. Accurate phenotyping: Reconciling approaches through Bayesian model averaging. PLoS One 2017; 12:e0176136. [PMID: 28423058 PMCID: PMC5396931 DOI: 10.1371/journal.pone.0176136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 04/05/2017] [Indexed: 11/18/2022] Open
Abstract
Genetic research into complex diseases is frequently hindered by a lack of clear biomarkers for phenotype ascertainment. Phenotypes for such diseases are often identified on the basis of clinically defined criteria; however such criteria may not be suitable for understanding the genetic composition of the diseases. Various statistical approaches have been proposed for phenotype definition; however our previous studies have shown that differences in phenotypes estimated using different approaches have substantial impact on subsequent analyses. Instead of obtaining results based upon a single model, we propose a new method, using Bayesian model averaging to overcome problems associated with phenotype definition. Although Bayesian model averaging has been used in other fields of research, this is the first study that uses Bayesian model averaging to reconcile phenotypes obtained using multiple models. We illustrate the new method by applying it to simulated genetic and phenotypic data for Kofendred personality disorder-an imaginary disease with several sub-types. Two separate statistical methods were used to identify clusters of individuals with distinct phenotypes: latent class analysis and grade of membership. Bayesian model averaging was then used to combine the two clusterings for the purpose of subsequent linkage analyses. We found that causative genetic loci for the disease produced higher LOD scores using model averaging than under either individual model separately. We attribute this improvement to consolidation of the cores of phenotype clusters identified using each individual method.
Collapse
Affiliation(s)
- Carla Chia-Ming Chen
- Australian Institute of Marine Science, Cape Cleveland QLD, Australia
- ARC Centre of Excellence for Mathematical & Statistical Frontiers, Queensland University of Technology, Brisbane QLD, Australia
| | | | - Kerrie Lee Mengersen
- ARC Centre of Excellence for Mathematical & Statistical Frontiers, Queensland University of Technology, Brisbane QLD, Australia
| |
Collapse
|
18
|
Wu C, Zhang D. Identification of early-stage lung adenocarcinoma prognostic signatures based on statistical modeling. Cancer Biomark 2017; 18:117-123. [PMID: 27935544 DOI: 10.3233/cbm-151368] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
BACKGROUND Current staging methods are lack of precision in predicting prognosis of early-stage lung adenocarcinomas. OBJECTIVE We aimed to develop a gene expression signature to identify high- and low-risk groups of patients. METHODS We used the Bayesian Model Averaging algorithm to analyze the DNA microarray data from 442 lung adenocarcinoma patients from three independent cohorts, one of which was used for training. RESULTS The patients were assigned to either high- or low-risk groups based on the calculated risk scores based on the identified 25-gene signature. The prognostic power was evaluated using Kaplan-Meier analysis and the log-rank test. The testing sets were divided into two distinct groups with log-rank test p-values of 0.00601 and 0.0274 respectively. CONCLUSIONS Our results show that the prognostic models could successfully predict patients' outcome and serve as biomarkers for early-stage lung adenocarcinoma overall survival analysis.
Collapse
Affiliation(s)
- Chunxiao Wu
- Department of Thoracic Surgery, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai 200032, China.,Department of Thoracic Surgery, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai 200032, China
| | - Donglei Zhang
- Department of Thoracic Surgery, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 201112, China.,Department of Thoracic Surgery, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai 200032, China
| |
Collapse
|
19
|
Moslemi A, Mahjub H, Saidijam M, Poorolajal J, Soltanian AR. Bayesian Survival Analysis of High-Dimensional Microarray Data for Mantle Cell Lymphoma Patients. Asian Pac J Cancer Prev 2016; 17:95-100. [PMID: 26838261 DOI: 10.7314/apjcp.2016.17.1.95] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Survival time of lymphoma patients can be estimated with the help of microarray technology. In this study, with the use of iterative Bayesian Model Averaging (BMA) method, survival time of Mantle Cell Lymphoma patients (MCL) was estimated and in reference to the findings, patients were divided into two high- risk and low-risk groups. MATERIALS AND METHODS In this study, gene expression data of MCL patients were used in order to select a subset of genes for survival analysis with microarray data, using the iterative BMA method. To evaluate the performance of the method, patients were divided into high-risk and low-risk based on their scores. Performance prediction was investigated using the log-rank test. The bioconductor package "iterativeBMAsurv" was applied with R statistical software for classification and survival analysis. RESULTS In this study, 25 genes associated with survival for MCL patients were identified across 132 selected models. The maximum likelihood estimate coefficients of the selected genes and the posterior probabilities of the selected models were obtained from training data. Using this method, patients could be separated into high-risk and low-risk groups with high significance (p<0.001). CONCLUSIONS The iterative BMA algorithm has high precision and ability for survival analysis. This method is capable of identifying a few predictive variables associated with survival, among many variables in a set of microarray data. Therefore, it can be used as a low-cost diagnostic tool in clinical research.
Collapse
Affiliation(s)
- Azam Moslemi
- Department of Biostatistics and Epidemiology, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran E-mail :
| | | | | | | | | |
Collapse
|
20
|
Lee KH, Chakraborty S, Sun J. Survival prediction and variable selection with simultaneous shrinkage and grouping priors. Stat Anal Data Min 2015. [DOI: 10.1002/sam.11266] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Kyu Ha Lee
- Department of Biostatistics, Harvard School of Public Health Boston, MA 02115 USA
| | - Sounak Chakraborty
- Department of Statistics University of Missouri‐Columbia Columbia, MO 65211 USA
| | - Jianguo Sun
- Department of Statistics University of Missouri‐Columbia Columbia, MO 65211 USA
| |
Collapse
|
21
|
Gu JL, Lu Y, Liu C, Lu H. Multiclass classification of sarcomas using pathway based feature selection method. J Theor Biol 2014; 362:3-8. [DOI: 10.1016/j.jtbi.2014.06.038] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2014] [Revised: 06/03/2014] [Accepted: 06/28/2014] [Indexed: 12/17/2022]
|
22
|
Zheng B, Liu J, Gu J, Lu Y, Zhang W, Li M, Lu H. A three-gene panel that distinguishes benign from malignant thyroid nodules. Int J Cancer 2014; 136:1646-54. [PMID: 25175491 DOI: 10.1002/ijc.29172] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2013] [Revised: 06/27/2014] [Accepted: 07/03/2014] [Indexed: 12/26/2022]
Abstract
Reliable preoperative diagnosis of malignant thyroid tumors remains challenging because of the inconclusive cytological examination of fine-needle aspiration biopsies. Although numerous studies have successfully demonstrated the use of high-throughput molecular diagnostics in cancer prediction, the application of microarrays in routine clinical use remains limited. Our aim was, therefore, to identify a small subset of genes to develop a practical and inexpensive diagnostic tool for clinical use. We developed a two-step feature selection method composed of a linear models for microarray data (LIMMA) linear model and an iterative Bayesian model averaging model to identify a suitable gene set signature. Using one public dataset for training, we discovered a three-gene signature dipeptidyl-peptidase 4 (DPP4), secretogranin V (SCG5) and carbonic anhydrase XII (CA12). We then evaluated the robustness of our gene set using three other independent public datasets. The gene signature accuracy was 85.7, 78.8 and 85.7%, respectively. For experimental validation, we collected 70 thyroid samples from surgery and our three-gene signature method achieved an accuracy of 94.3% by quantitative polymerase chain reaction (QPCR) experiment. Furthermore, immunohistochemistry in 29 samples showed proteins expressed by these three genes are also differentially expressed in thyroid samples. Our protocol discovered a robust three-gene signature that can distinguish benign from malignant thyroid tumors, which will have daily clinical application.
Collapse
Affiliation(s)
- Bing Zheng
- Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University, Shanghai, China; Key Laboratory of Molecular Embryology, Ministry of Health and Shanghai Key Laboratory of Embryo and Reproduction Engineering, Shanghai, China; Department of Laboratory Medicine, Renji Hospital, Shanghai Jiao Tong University, Shanghai, China
| | | | | | | | | | | | | |
Collapse
|
23
|
Thamrin SA, McGree JM, Mengersen KL. Modelling survival data to account for model uncertainty: a single model or model averaging? SPRINGERPLUS 2013; 2:665. [PMID: 24386617 PMCID: PMC3877415 DOI: 10.1186/2193-1801-2-665] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 11/18/2013] [Indexed: 11/10/2022]
Abstract
ABSTRACT This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single "best" model, where "best" is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to account for model uncertainty. This was illustrated using a case study in which the aim was the description of lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample size was reduced, no single model was revealed as "best", suggesting that a BMA approach would be appropriate. Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can provide robust predictions and facilitate more detailed investigation of the relationships between gene expression and patient survival.
Collapse
Affiliation(s)
- Sri Astuti Thamrin
- Mathematics Department, Hasanuddin University, Jl. Perintis Kemerdekaan Km 10, 90245 Makassar, South Sulawesi Indonesia ; Mathematics Department, Hasanuddin University, Jl. Perintis Kemerdekaan Km 10, 90245 Makassar, South Sulawesi Indonesia
| | - James M McGree
- Mathematics Department, Hasanuddin University, Jl. Perintis Kemerdekaan Km 10, 90245 Makassar, South Sulawesi Indonesia
| | - Kerrie L Mengersen
- Mathematics Department, Hasanuddin University, Jl. Perintis Kemerdekaan Km 10, 90245 Makassar, South Sulawesi Indonesia
| |
Collapse
|
24
|
Peng B, Zhu D, Ander BP, Zhang X, Xue F, Sharp FR, Yang X. An integrative framework for Bayesian variable selection with informative priors for identifying genes and pathways. PLoS One 2013; 8:e67672. [PMID: 23844055 PMCID: PMC3700986 DOI: 10.1371/journal.pone.0067672] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Accepted: 05/21/2013] [Indexed: 12/27/2022] Open
Abstract
The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with 'large p, small n' problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed.
Collapse
Affiliation(s)
- Bin Peng
- Department of Health Statistics, Chongqing Medical University, Chongqing, China
- Division of Biostatistics, Bayessoft, Inc., Davis, California, United States of America
| | - Dianwen Zhu
- Hunter College–School of Public Health, City University of New York, New York, United States of America
| | - Bradley P. Ander
- Medical Investigation of Neurodevelopmental Disorders (MIND) Institute, University of California Davis, Sacramento, California, United States of America
| | - Xiaoshuai Zhang
- School of Public Health, Shandong University, Jinan, Shandong, China
| | - Fuzhong Xue
- School of Public Health, Shandong University, Jinan, Shandong, China
| | - Frank R. Sharp
- Medical Investigation of Neurodevelopmental Disorders (MIND) Institute, University of California Davis, Sacramento, California, United States of America
| | - Xiaowei Yang
- Division of Biostatistics, Bayessoft, Inc., Davis, California, United States of America
- Hunter College–School of Public Health, City University of New York, New York, United States of America
| |
Collapse
|
25
|
Khoshhali M, Mahjub H, Saidijam M, Poorolajal J, Soltanian AR. Predicting the survival time for diffuse large B-cell lymphoma using microarray data. J Mol Genet Med 2012; 6:287-92. [PMID: 23173013 PMCID: PMC3410377 DOI: 10.4172/1747-0862.1000051] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2011] [Revised: 04/26/2012] [Accepted: 04/30/2012] [Indexed: 11/25/2022] Open
Abstract
The present study was conducted to predict survival time in patients with diffuse large B-cell lymphoma, DLBCL, based on microarray data using Cox regression model combined with seven dimension reduction methods. This historical cohort included 2042 gene expression measurements from 40 patients with DLBCL. In order to predict survival, a combination of Cox regression model was used with seven methods for dimension reduction or shrinkage including univariate selection, forward stepwise selection, principal component regression, supervised principal component regression, partial least squares regression, ridge regression and Losso. The capacity of predictions was examined by three different criteria including log rank test, prognostic index and deviance. MATLAB r2008a and RKWard software were used for data analysis. Based on our findings, performance of ridge regression was better than other methods. Based on ridge regression coefficients and a given cut point value, 16 genes were selected. By using forward stepwise selection method in Cox regression model, it was indicated that the expression of genes GENE3555X and GENE3807X decreased the survival time (P=0.008 and P=0.003, respectively), whereas the genes GENE3228X and GENE1551X increased survival time (P=0.002 and P<0.001, respectively). This study indicated that ridge regression method had higher capacity than other dimension reduction methods for the prediction of survival time in patients with DLBCL. Furthermore, a combination of statistical methods and microarray data could help to detect influential genes in survival.
Collapse
Affiliation(s)
- Mehri Khoshhali
- Department of Biostatistics & Epidemiology, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
| | | | | | | | | |
Collapse
|
26
|
Portnov BA, Reiser B, Karkabi K, Cohen-Kastel O, Dubnov J. High prevalence of childhood asthma in Northern Israel is linked to air pollution by particulate matter: evidence from GIS analysis and Bayesian Model Averaging. INTERNATIONAL JOURNAL OF ENVIRONMENTAL HEALTH RESEARCH 2011; 22:249-269. [PMID: 22077820 DOI: 10.1080/09603123.2011.634387] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The medical records of 3922 school children residing in the Greater Haifa Metropolitan Area in Northern Israel were analyzed. Individual exposure to ambient air pollution (SO(2) and PM(10)) for each child was estimated using Geographic Information Systems tools. Factors affecting childhood asthma risk were then investigated using logistic regression and the more recently developed Bayesian Model Averaging (BMA) tools. The analysis reveals that childhood asthma in the study area appears to be significantly associated with particulate matter of less than 10 μm in aerodynamic diameter (PM(10)) (Odds Ratio (OR) = .11; P<0.001). However, no significant association with asthma prevalence was found for SO(2) (P >0.2), when PM(10) and SO(2) were introduced into the models simultaneously. When considering a change in PM(10) between the least and the most polluted parts of the study area (9.4 μg/m(3)), the corresponding OR, calculated using the BMA analysis, is 2.58 (with 95% posterior probability limits of OR ranging from 1.52 to 4.41), controlled for gender, age, proximity to main roads, the town of a child's residence, and family's socio-economic status. Thus, it is concluded that exposure to airborne particular matter, even at relatively low concentrations (40-50 μg/m(3)), generally below international air pollution standards (55-70 μg/m(3)), appears to be a considerable risk factor for childhood asthma in urban areas. This should be a cause of concern for public health authorities and environmental decision-makers.
Collapse
Affiliation(s)
- Boris A Portnov
- Department of Natural Resources & Environmental Management, University of Haifa, Israel.
| | | | | | | | | |
Collapse
|
27
|
Construction of regulatory networks using expression time-series data of a genotyped population. Proc Natl Acad Sci U S A 2011; 108:19436-41. [PMID: 22084118 DOI: 10.1073/pnas.1116442108] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
The inference of regulatory and biochemical networks from large-scale genomics data is a basic problem in molecular biology. The goal is to generate testable hypotheses of gene-to-gene influences and subsequently to design bench experiments to confirm these network predictions. Coexpression of genes in large-scale gene-expression data implies coregulation and potential gene-gene interactions, but provide little information about the direction of influences. Here, we use both time-series data and genetics data to infer directionality of edges in regulatory networks: time-series data contain information about the chronological order of regulatory events and genetics data allow us to map DNA variations to variations at the RNA level. We generate microarray data measuring time-dependent gene-expression levels in 95 genotyped yeast segregants subjected to a drug perturbation. We develop a Bayesian model averaging regression algorithm that incorporates external information from diverse data types to infer regulatory networks from the time-series and genetics data. Our algorithm is capable of generating feedback loops. We show that our inferred network recovers existing and novel regulatory relationships. Following network construction, we generate independent microarray data on selected deletion mutants to prospectively test network predictions. We demonstrate the potential of our network to discover de novo transcription-factor binding sites. Applying our construction method to previously published data demonstrates that our method is competitive with leading network construction algorithms in the literature.
Collapse
|
28
|
Schifano ED, Strawderman RL, Wells MT. Majorization-Minimization algorithms for nonsmoothly penalized objective functions. Electron J Stat 2010. [DOI: 10.1214/10-ejs582] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|