1
|
Sun Y, Chiou SH, Wu CO, McGarry M, Huang CY. DYNAMIC RISK PREDICTION TRIGGERED BY INTERMEDIATE EVENTS USING SURVIVAL TREE ENSEMBLES. Ann Appl Stat 2023; 17:1375-1397. [PMID: 37284167 PMCID: PMC10241448 DOI: 10.1214/22-aoas1674] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
With the availability of massive amounts of data from electronic health records and registry databases, incorporating time-varying patient information to improve risk prediction has attracted great attention. To exploit the growing amount of predictor information over time, we develop a unified framework for landmark prediction using survival tree ensembles, where an updated prediction can be performed when new information becomes available. Compared to conventional landmark prediction with fixed landmark times, our methods allow the landmark times to be subject-specific and triggered by an intermediate clinical event. Moreover, the nonparametric approach circumvents the thorny issue of model incompatibility at different landmark times. In our framework, both the longitudinal predictors and the event time outcome are subject to right censoring, and thus existing tree-based approaches cannot be directly applied. To tackle the analytical challenges, we propose a risk-set-based ensemble procedure by averaging martingale estimating equations from individual trees. Extensive simulation studies are conducted to evaluate the performance of our methods. The methods are applied to the Cystic Fibrosis Foundation Patient Registry (CFFPR) data to perform dynamic prediction of lung disease in cystic fibrosis patients and to identify important prognosis factors.
Collapse
Affiliation(s)
- Yifei Sun
- Department of Biostatistics, Columbia University
| | - Sy Han Chiou
- Department of Mathematical Sciences, University of Texas at Dallas
| | - Colin O Wu
- National Heart, Lung, and Blood Institute, National Institutes of Health
| | - Meghan McGarry
- Department of Pediatrics, University of California San Francisco
| | - Chiung-Yu Huang
- Department of Epidemiology and Biostatistics, University of California San Francisco
| |
Collapse
|
2
|
Jia B, Zeng D, Liao JJZ, Liu GF, Tan X, Diao G, Ibrahim JG. Mixture survival trees for cancer risk classification. LIFETIME DATA ANALYSIS 2022; 28:356-379. [PMID: 35486260 PMCID: PMC10402927 DOI: 10.1007/s10985-022-09552-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Accepted: 03/04/2022] [Indexed: 06/14/2023]
Abstract
In oncology studies, it is important to understand and characterize disease heterogeneity among patients so that patients can be classified into different risk groups and one can identify high-risk patients at the right time. This information can then be used to identify a more homogeneous patient population for developing precision medicine. In this paper, we propose a mixture survival tree approach for direct risk classification. We assume that the patients can be classified into a pre-specified number of risk groups, where each group has distinct survival profile. Our proposed tree-based methods are devised to estimate latent group membership using an EM algorithm. The observed data log-likelihood function is used as the splitting criterion in recursive partitioning. The finite sample performance is evaluated by extensive simulation studies and the proposed method is illustrated by a case study in breast cancer.
Collapse
Affiliation(s)
- Beilin Jia
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA.
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | | | - Guanghan F Liu
- Biostatistics and Research Decision Sciences, Merck & Co., Inc, North Wales, PA, USA
| | - Xianming Tan
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Guoqing Diao
- Department of Biostatistics and Bioinformatics, The George Washington University, Washington, DC, USA
| | - Joseph G Ibrahim
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
3
|
Abstract
AbstractTree-based models are increasingly popular due to their ability to identify complex relationships that are beyond the scope of parametric models. Survival tree methods adapt these models to allow for the analysis of censored outcomes, which often appear in medical data. We present a new Optimal Survival Trees algorithm that leverages mixed-integer optimization (MIO) and local search techniques to generate globally optimized survival tree models. We demonstrate that the OST algorithm improves on the accuracy of existing survival tree methods, particularly in large datasets.
Collapse
|
4
|
Zhang Q, Ahn H. Subgroup analysis of censored data on cancer treatment. COMMUN STAT-SIMUL C 2021. [DOI: 10.1080/03610918.2019.1636998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Qing Zhang
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
| | - Hongshik Ahn
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York, USA
| |
Collapse
|
5
|
Gerber G, Faou YL, Lopez O, Trupin M. The Impact of Churn on Client Value in Health Insurance, Evaluation Using a Random Forest Under Various Censoring Mechanisms. J Am Stat Assoc 2021. [DOI: 10.1080/01621459.2020.1764364] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Affiliation(s)
| | - Yohann Le Faou
- Forsides & Sorbonne Université, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, LPSM, Paris, France
| | - Olivier Lopez
- Sorbonne Université, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, LPSM, Paris, France
| | | |
Collapse
|
6
|
Survival trees based on heterogeneity in time‐to‐event and censoring distributions using parameter instability test. Stat Anal Data Min 2021. [DOI: 10.1002/sam.11539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
7
|
Shimokawa A, Miyaoka E. Construction of a survival tree for dependent censoring. J Biopharm Stat 2020; 31:63-78. [PMID: 32684123 DOI: 10.1080/10543406.2020.1792478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
In this study, we examined the problem of constructing a model for time-to-event data considering dependent censoring. Our goal was to construct a set of subgroups of covariate space, wherein each element had the same failure model considering the dependency of failure and censoring times. As such, a model was constructed based on the parametric form from the identifiability problem of censoring. We used the copula to represent the dependency between failure and censoring times. Under the assumption of parametric models for failure and censoring times and a copula function, which have unknown parameters, we proposed a method for constructing the tree-structured model through the test statistics. We subsequently evaluated the performance of the splitting rule and tree obtained using the proposed method and compared it with the general method that assumes independent censoring through simulation studies. We also present the analysis results for AIDS clinical trial research to show the utility of the method.
Collapse
Affiliation(s)
- Asanao Shimokawa
- Department of Mathematics, Tokyo University of Science, Tokyo, Japan
| | - Etsuo Miyaoka
- Department of Mathematics, Tokyo University of Science, Tokyo, Japan
| |
Collapse
|
8
|
Abstract
Tree-based methods have become one of the most flexible, intuitive, and powerful data analytic tools for exploring complex data structures. Tree-based methods provide a natural framework for creating patient subgroups for risk classification. In this article, we review methodological and practical aspects of tree-based methods, with a focus on diagnostic classification (binary outcome) and prognostication (censored survival outcome). Creating an ensemble of trees improves prediction accuracy and addresses instability in a single tree. Ensemble methods are described that rely on resampling from the original data. Finally, we present methods to identify a representative tree from the ensemble that can be used for clinical decision-making. The methods are illustrated using data on ischemic heart disease classification, and data from the SPRINT trial (Systolic Blood Pressure Intervention Trial) on adverse events in patients with high blood pressure.
Collapse
Affiliation(s)
- Mousumi Banerjee
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor (M.B., E.R.).,Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor (M.B., H.B.A., B.K.N.)
| | - Evan Reynolds
- Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor (M.B., E.R.)
| | - Hedvig B Andersson
- Department of Internal Medicine, University of Michigan, Ann Arbor (H.B.A., B.K.N.).,Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor (M.B., H.B.A., B.K.N.).,Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark. (H.B.A.)
| | - Brahmajee K Nallamothu
- Department of Internal Medicine, University of Michigan, Ann Arbor (H.B.A., B.K.N.).,Institute for Healthcare Policy and Innovation, University of Michigan, Ann Arbor (M.B., H.B.A., B.K.N.)
| |
Collapse
|
9
|
Sokol E, Desai AV, Applebaum MA, Valteau-Couanet D, Park JR, Pearson ADJ, Schleiermacher G, Irwin MS, Hogarty M, Naranjo A, Volchenboum S, Cohn SL, London WB. Age, Diagnostic Category, Tumor Grade, and Mitosis-Karyorrhexis Index Are Independently Prognostic in Neuroblastoma: An INRG Project. J Clin Oncol 2020; 38:1906-1918. [PMID: 32315273 DOI: 10.1200/jco.19.03285] [Citation(s) in RCA: 44] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
PURPOSE The Children's Oncology Group (COG) stratifies the treatment of patients with neuroblastoma on the basis of a combination of biomarkers that include age and tumor histology classified by age-linked International Neuroblastoma Pathology Classification (INPC) criteria. By definition, this leads to a duplication of the prognostic contribution of age. The individual histologic features underlying the INPC have prognostic strength and are incorporated in the International Neuroblastoma Risk Group classification schema. Here, we analyzed data in the International Neuroblastoma Risk Group Data Commons to validate the prognostic strength of the underlying INPC criteria and to determine whether a risk classification devoid of the confounding of age and INPC criteria will identify new prognostic subgroups. PATIENTS AND METHODS Event-free survival of patients diagnosed between 1990 and 2002 (cohort 1; n = 10,104) and between 2003 and 2016 (cohort 2; n = 8,761) was analyzed. Recursive partitioning with univariate Cox models of event-free survival ("survival tree regression") was performed using (1) individual INPC criteria (age at diagnosis, histologic category, mitosis-karyorrhexis index (MKI), grade of differentiation) and (2) factors in (1) plus other COG-risk biomarkers (International Neuroblastoma Staging System [INSS] stage, MYCN status, ploidy). RESULTS The independent prognostic ability of age, histologic category, MKI, and grade were validated. Four histologic prognostic groups were identified (< 18 months with low v high MKI, and ≥ 18 months with differentiating v undifferentiated/poorly differentiating tumors). Compared with survival trees generated with established COG risk criteria, an additional prognostic subgroup was identified and validated when individual histologic features were analyzed in lieu of INPC. CONCLUSION Replacing INPC with individual histologic features in the COG risk classification will eliminate confounding, facilitate international harmonization of risk classification, and provide a schema for more precise prognostication and refined therapeutic approaches.
Collapse
Affiliation(s)
- Elizabeth Sokol
- Department of Pediatrics and Lurie Children's Hospital, Northwestern University, Chicago, IL
| | - Ami V Desai
- Department of Pediatrics and Comer Children's Hospital, University of Chicago, Chicago, IL
| | - Mark A Applebaum
- Department of Pediatrics and Comer Children's Hospital, University of Chicago, Chicago, IL
| | | | - Julie R Park
- Department of Pediatrics, Seattle Children's Hospital, University of Washington, Seattle, Washington
| | - Andrew D J Pearson
- Paediatric Drug Development, Children and Young People's Unit, Royal Marsden Hospital, London, United Kingdom
| | - Gudrun Schleiermacher
- Department of Pediatric, Adolescents and Young Adults Oncology and INSERM U830, Institut Curie, Paris, France
| | - Meredith S Irwin
- Department of Pediatrics, The Hospital for Sick Children, University of Toronto, Ontario, Canada
| | - Michael Hogarty
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA
| | - Arlene Naranjo
- Department of Biostatistics, Children's Oncology Group Statistics and Data Center, University of Florida, Gainesville, FL
| | - Samuel Volchenboum
- Department of Pediatrics and Comer Children's Hospital, University of Chicago, Chicago, IL
| | - Susan L Cohn
- Department of Pediatrics and Comer Children's Hospital, University of Chicago, Chicago, IL
| | - Wendy B London
- Dana-Farber/Boston Children's Cancer and Blood Disorders Center, Harvard Medical School, Boston, MA
| |
Collapse
|
10
|
Podhorska I, Vrbka J, Lazaroiu G, Kovacova M. Innovations in Financial Management: Recursive Prediction Model Based on Decision Trees. MARKETING AND MANAGEMENT OF INNOVATIONS 2020. [DOI: 10.21272/mmi.2020.3-20] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The issue of enterprise financial distress represents the actual and interdisciplinary topic for the economic community. The bankrupt is thus one of the major externalities of today’s modern economies, which cannot be avoided even with every effort. Where there are investment opportunities, there are individuals and businesses that are willing to assume their financial obligations and the resulting risks to maintain and develop their standard of living or their economic activities. The decision tree algorithm is one of the most intuitive methods of data mining that can be used for financial distress prediction. Systematization literary sources and approaches prove that decision trees represent the part of the innovations in financial management. The main propose of the research is a possibility of application of a decision tree algorithm for the creation of the prediction model, which can be used in economy practice. The Paper's main aim is to create a comprehensive prediction model of enterprise financial distress based on decision trees, under the conditions of emerging markets. Paper methods are based on the decision tree, with emphasis on algorithm CART. Emerging markets included 17 countries: Slovak Republic, Czech Republic, Poland, Hungary, Romania, Bulgaria, Lithuania, Latvia, Estonia, Slovenia, Croatia, Serbia, Russia, Ukraine, Belarus, Montenegro, and Macedonia. Paper research is focused on the possibilities of implementation of a decision tree algorithm for the creation of a prediction model in the condition of emerging markets. Used data contained 2,359,731 enterprises from emerging markets (30% of total amount); divided into prosperous enterprises (1,802,027) and non-prosperous enterprises (557,704); obtained from Amadeus database. Input variables for the model represented 24 financial indicators, 3 dummy variables, and the countries' GDP data, in the years 2015 and 2016. The 80% of enterprises represented the training sample and 20% test sample, for model creation. The model correctly classified 93.2% of enterprises from both the training and test sample. Correctly classification of non-prosperous enterprises was 83.5% in both samples. The result of the research brings a new model for the identification of bankrupt enterprises. The created prediction model can be considered sufficiently suitable for classifying enterprises in emerging markets.
Keywords
prediction model, decision tree, emerging markets.
Collapse
Affiliation(s)
| | - Jaromir Vrbka
- Institute of Technology and Business in Ceske Budejovice (Czech Republic)
| | - George Lazaroiu
- The Cognitive Labor Institute (USA); Spiru Haret University (Romania)
| | | |
Collapse
|
11
|
|
12
|
Utkin LV, Konstantinov AV, Chukanov VS, Kots MV, Ryabinin MA, Meldo AA. A weighted random survival forest. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.04.015] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
13
|
Abstract
One of the main objectives of survival analysis is to predict the failure time that is usually considered as a continuous variable. In longitudinal studies, the data are often collected at every certain period of time, for example, monthly, quarterly, or yearly. Such data require appropriate techniques to handle the discrete time values that often have incomplete information about the failure occurrence-so-called "censored cases." Tree-based models are common, assumption-free methods of survival prediction. In this paper, the author proposes three recursive partitioning techniques able to cope with discrete-time censored survival data, which, in contrast to already-existing models limited to univariate trees, allow splits to have a form of any hyperplane. The performance of proposed methods, expressed as a mean absolute error, was examined on the basis of both synthetic and real data sets available in the literature and compared with existing tree-based models. To demonstrate the applicability of the methods in identifying subgroups of patients with a similar survival experience and to assess the influence of covariates on the risk of failure, a Veteran's Administration lung cancer data set was used. The results confirm proposed models to be good prediction tools for discrete-time survival data.
Collapse
|
14
|
Lee UJ, Tzeng S, Chen YC, Chen JJ. Prognostic and predictive signatures for treatment decisions. Biomark Med 2018; 12:849-859. [PMID: 30022678 DOI: 10.2217/bmm-2017-0320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
AIM We develop a subgroup selection procedure using both prognostic and predictive biomarkers to identify four patient subpopulations: low- and high-risk responders, and low- and high-risk nonresponders. METHODS We utilize three regression models to identify three sets of biomarkers: S, prognostic biomarkers; T, predictive biomarkers; and U, prognostic and predictive biomarkers. The prognostic signature C(S) combines with a predictive signature, either C(T) or C(U), to develop two procedures C(S,T) and C(S,U) for identification of four subgroups. RESULTS Simulation experiment showed that proposed models for identifying the biomarker sets S and U performed well, as did the procedure C(S,U) for subgroup identification. CONCLUSION The proposed model provides more comprehensive characterization of patient subpopulations, and better accuracy in patient treatment assignment.
Collapse
Affiliation(s)
- Un Jung Lee
- Division of Biochemical Toxicology, National Center for Toxicological Research, US FDA, 3900 NCTR Road, Jefferson, AR 72079, USA
| | - ShengLi Tzeng
- Institute of Statistical Science, Academia Sinica, 128 Academia Road, Section 2, Nankang, Taipei 11529, Taiwan
| | - Yu-Chuan Chen
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, US FDA, 3900 NCTR Road, Jefferson, AR 72029, USA
| | - James J Chen
- Department of Biostatistics, University of Arkansas for Medical Science, Little Rock, AR 72205, USA
| |
Collapse
|
15
|
Steingrimsson JA, Diao L, Strawderman RL. Censoring Unbiased Regression Trees and Ensembles. J Am Stat Assoc 2018; 114:370-383. [PMID: 31190691 DOI: 10.1080/01621459.2017.1407775] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
This paper proposes a novel paradigm for building regression trees and ensemble learning in survival analysis. Generalizations of the CART and Random Forests algorithms for general loss functions, and in the latter case more general bootstrap procedures, are both introduced. These results, in combination with an extension of the theory of censoring unbiased transformations applicable to loss functions, underpin the development of two new classes of algorithms for constructing survival trees and survival forests: Censoring Unbiased Regression Trees and Censoring Unbiased Regression Ensembles. For a certain "doubly robust" censoring unbiased transformation of squared error loss, we further show how these new algorithms can be implemented using existing software (e.g., CART, random forests). Comparisons of these methods to existing ensemble procedures for predicting survival probabilities are provided in both simulated settings and through applications to four datasets. It is shown that these new methods either improve upon, or remain competitive with, existing implementations of random survival forests, conditional inference forests, and recursively imputed survival trees.
Collapse
Affiliation(s)
| | - Liqun Diao
- Department of Statistics and Actuarial Science University of Waterloo, Waterloo ON, Canada
| | - Robert L Strawderman
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester NY, USA
| |
Collapse
|
16
|
Kretowska M. Piecewise-linear criterion functions in oblique survival tree induction. Artif Intell Med 2017; 75:32-39. [PMID: 28363454 DOI: 10.1016/j.artmed.2016.12.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2016] [Revised: 11/07/2016] [Accepted: 12/28/2016] [Indexed: 11/17/2022]
Abstract
OBJECTIVE Recursive partitioning is a common, assumption-free method of survival data analysis. It focuses mainly on univariate trees, which use splits based on a single variable in each internal node. In this paper, I provide an extension of an oblique survival tree induction technique, in which axis-parallel splits are replaced by hyperplanes, dividing the feature space into areas with a homogeneous survival experience. METHOD AND MATERIALS The proposed tree induction algorithm consists of two steps. The first covers the induction of a large tree with internal nodes represented by hyperplanes, whose positions are calculated by the minimization of a piecewise-linear criterion function, the dipolar criterion. The other phase uses a split-complexity algorithm to prune unnecessary tree branches and a 10-fold cross-validation technique to choose the best tree. The terminal nodes of the final tree are characterised by Kaplan-Meier survival functions. A synthetic data set was used to test the performance, while seven real data sets were exploited to validate the proposed method. RESULTS The evaluation of the method was focused on two features: predictive ability and tree size. These were compared with two univariate tree models: the conditional inference tree and recursive partitioning for survival trees, respectively. The comparison of the predictive ability, expressed as an integrated Brier score, showed no statistically significant differences (p=0.486) among the three methods. Similar results were obtained for the tree size (p=0.11), which was calculated as a median value over 20 runs of a 10-fold cross-validation. CONCLUSIONS The predictive ability of trees generated using piecewise-linear criterion functions is comparable to that of univariate tree-based models. Although a similar conclusion may be drawn from the analysis of the tree size, in the majority of the studied cases, the number of nodes of the dipolar tree is one of the smallest among all the methods.
Collapse
Affiliation(s)
- Malgorzata Kretowska
- Faculty of Computer Science, Bialystok University of Technology, Wiejska 45a, 15-351 Bialystok, Poland.
| |
Collapse
|
17
|
|
18
|
Wright MN, Dankowski T, Ziegler A. Unbiased split variable selection for random survival forests using maximally selected rank statistics. Stat Med 2017; 36:1272-1284. [PMID: 28088842 DOI: 10.1002/sim.7212] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2016] [Revised: 10/31/2016] [Accepted: 12/11/2016] [Indexed: 11/11/2022]
Abstract
The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Marvin N Wright
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany
| | - Theresa Dankowski
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany
| | - Andreas Ziegler
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany.,Zentrum für Klinische Studien, Universität zu Lübeck, Lübeck, Germany.,School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, South Africa.,Deutsches Zentrum für Herz-Kreislauf-Forschung, Standort Hamburg/Kiel/Lübeck, Lübeck, Germany
| |
Collapse
|
19
|
Shimokawa A, Narita Y, Shibui S, Miyaoka E. Tree Based Method for Aggregate Survival Data Modeling. Int J Biostat 2016; 12:/j/ijb.ahead-of-print/ijb-2015-0071/ijb-2015-0071.xml. [PMID: 26882561 DOI: 10.1515/ijb-2015-0071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
In many scenarios, a patient in medical research is treated as a statistical unit. However, in some scenarios, we are interested in treating aggregate data as a statistical unit. In such situations, each set of aggregated data is considered to be a concept in a symbolic representation, and each concept has a hyperrectangle or multiple points in the variable space. To construct a tree-structured model from these aggregate survival data, we propose a new approach, where a datum can be included in several terminal nodes in a tree. By constructing a model under this condition, we expect to obtain a more flexible model while retaining the interpretive ease of a hierarchical structure. In this approach, the survival function of concepts that are partially included in a node is constructed using the Kaplan-Meier method, where the number of events and risks at each time point is replaced by the expectation value of the number of individual descriptions of concepts. We present an application of this proposed model using primary brain tumor patient data. As a result, we obtained a new interpretation of the data in comparison to the classical survival tree modeling methods.
Collapse
|
20
|
Devlin SM, Ostrovnaya I, Gönen M. Boomerang: A method for recursive reclassification. Biometrics 2016; 72:995-1002. [PMID: 26754051 PMCID: PMC4940305 DOI: 10.1111/biom.12469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2015] [Revised: 11/01/2015] [Accepted: 11/01/2015] [Indexed: 11/29/2022]
Abstract
While there are many validated prognostic classifiers used in practice, often their accuracy is modest and heterogeneity in clinical outcomes exists in one or more risk subgroups. Newly available markers, such as genomic mutations, may be used to improve the accuracy of an existing classifier by reclassifying patients from a heterogenous group into a higher or lower risk category. The statistical tools typically applied to develop the initial classifiers are not easily adapted toward this reclassification goal. In this article, we develop a new method designed to refine an existing prognostic classifier by incorporating new markers. The two-stage algorithm called Boomerang first searches for modifications of the existing classifier that increase the overall predictive accuracy and then merges to a prespecified number of risk groups. Resampling techniques are proposed to assess the improvement in predictive accuracy when an independent validation data set is not available. The performance of the algorithm is assessed under various simulation scenarios where the marker frequency, degree of censoring, and total sample size are varied. The results suggest that the method selects few false positive markers and is able to improve the predictive accuracy of the classifier in many settings. Lastly, the method is illustrated on an acute myeloid leukemia data set where a new refined classifier incorporates four new mutations into the existing three category classifier and is validated on an independent data set.
Collapse
Affiliation(s)
- Sean M Devlin
- Memorial Sloan Kettering Cancer Center, New York, New York 10065, U.S.A..
| | - Irina Ostrovnaya
- Memorial Sloan Kettering Cancer Center, New York, New York 10065, U.S.A
| | - Mithat Gönen
- Memorial Sloan Kettering Cancer Center, New York, New York 10065, U.S.A
| |
Collapse
|
21
|
Mbogning C, Broët P. Bagging survival tree procedure for variable selection and prediction in the presence of nonsusceptible patients. BMC Bioinformatics 2016; 17:230. [PMID: 27266372 PMCID: PMC4895817 DOI: 10.1186/s12859-016-1090-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Accepted: 05/21/2016] [Indexed: 12/15/2022] Open
Abstract
Background For clinical genomic studies with high-dimensional datasets, tree-based ensemble methods offer a powerful solution for variable selection and prediction taking into account the complex interrelationships between explanatory variables. One of the key component of the tree-building process is the splitting criterion. For survival data, the classical splitting criterion is the Logrank statistic. However, the presence of a fraction of nonsusceptible patients in the studied population advocates for considering a criterion tailored to this peculiar situation. Results We propose a bagging survival tree procedure for variable selection and prediction where the survival tree-building process relies on a splitting criterion that explicitly focuses on time-to-event survival distribution among susceptible patients. A simulation study shows that our method achieves good performance for the variable selection and prediction. Different criteria for evaluating the importance of the explanatory variables and the prediction performance are reported. Our procedure is illustrated on a genomic dataset with gene expression measurements from early breast cancer patients. Conclusions In the presence of nonsusceptible patients among the studied population, our procedure represents an efficient way to select event-related explanatory covariates with potential higher-order interaction and identify homogeneous groups of susceptible patients.
Collapse
Affiliation(s)
- Cyprien Mbogning
- Université Paris-Saclay, Univ. Paris-Sud, UVSQ, CESP, INSERM, 14-16 Avenue Paul-Vaillant Couturier, Villejuif, 94807, France. .,Abirisk consortium WP4, 14-16 Avenue Paul-Vaillant Couturier, Villejuif, 94807, France.
| | - Philippe Broët
- Université Paris-Saclay, Univ. Paris-Sud, UVSQ, CESP, INSERM, 14-16 Avenue Paul-Vaillant Couturier, Villejuif, 94807, France.,Abirisk consortium WP4, 14-16 Avenue Paul-Vaillant Couturier, Villejuif, 94807, France.,Faculty of Medicine, Univ. Paris-Sud, 63 Rue Gabriel Péri, Le Kremlin-Bicêtre, 94276, France.,Assistance Publique - Hôpitaux de Paris, Hôpital Paul Brousse, 14-16 Avenue Paul-Vaillant Couturier,, Villejuif, 94807, France
| |
Collapse
|
22
|
Chen Y, Chen JJ. Ensemble survival trees for identifying subpopulations in personalized medicine. Biom J 2016; 58:1151-63. [DOI: 10.1002/bimj.201500075] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2015] [Revised: 12/03/2015] [Accepted: 01/18/2016] [Indexed: 11/08/2022]
Affiliation(s)
- Yu‐Chuan Chen
- Division of Bioinformatics and Biostatistics National Center for Toxicological Research U.S. Food and Drug Administration Jefferson AR 72079 USA
| | - James J. Chen
- Division of Bioinformatics and Biostatistics National Center for Toxicological Research U.S. Food and Drug Administration Jefferson AR 72079 USA
| |
Collapse
|
23
|
Dazard JE, Choe M, LeBlanc M, Rao JS. Cross-validation and Peeling Strategies for Survival Bump Hunting using Recursive Peeling Methods. Stat Anal Data Min 2016; 9:12-42. [PMID: 27034730 DOI: 10.1002/sam.11301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We introduce a framework to build a survival/risk bump hunting model with a censored time-to-event response. Our Survival Bump Hunting (SBH) method is based on a recursive peeling procedure that uses a specific survival peeling criterion derived from non/semi-parametric statistics such as the hazards-ratio, the log-rank test or the Nelson--Aalen estimator. To optimize the tuning parameter of the model and validate it, we introduce an objective function based on survival or prediction-error statistics, such as the log-rank test and the concordance error rate. We also describe two alternative cross-validation techniques adapted to the joint task of decision-rule making by recursive peeling and survival estimation. Numerical analyses show the importance of replicated cross-validation and the differences between criteria and techniques in both low and high-dimensional settings. Although several non-parametric survival models exist, none addresses the problem of directly identifying local extrema. We show how SBH efficiently estimates extreme survival/risk subgroups unlike other models. This provides an insight into the behavior of commonly used models and suggests alternatives to be adopted in practice. Finally, our SBH framework was applied to a clinical dataset. In it, we identified subsets of patients characterized by clinical and demographic covariates with a distinct extreme survival outcome, for which tailored medical interventions could be made. An R package PRIMsrc (Patient Rule Induction Method in Survival, Regression and Classification settings) is available on CRAN (Comprehensive R Archive Network) and GitHub.
Collapse
Affiliation(s)
- Jean-Eudes Dazard
- Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Michael Choe
- Center for Proteomics and Bioinformatics, Case Western Reserve University, Cleveland, OH 44106, USA
| | - Michael LeBlanc
- Department of Biostatistics, School of Public Health, University of Washington, Seattle, WA 98195, USA; Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - J Sunil Rao
- Division of Biostatistics, Department of Epidemiology and Public Health, The University of Miami, Miami, FL 33136, USA
| |
Collapse
|
24
|
Steingrimsson JA, Diao L, Molinaro AM, Strawderman RL. Doubly robust survival trees. Stat Med 2016; 35:3595-612. [PMID: 27037609 DOI: 10.1002/sim.6949] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2015] [Revised: 02/06/2016] [Accepted: 03/01/2016] [Indexed: 11/09/2022]
Abstract
Estimating a patient's mortality risk is important in making treatment decisions. Survival trees are a useful tool and employ recursive partitioning to separate patients into different risk groups. Existing 'loss based' recursive partitioning procedures that would be used in the absence of censoring have previously been extended to the setting of right censored outcomes using inverse probability censoring weighted estimators of loss functions. In this paper, we propose new 'doubly robust' extensions of these loss estimators motivated by semiparametric efficiency theory for missing data that better utilize available data. Simulations and a data analysis demonstrate strong performance of the doubly robust survival trees compared with previously used methods. Copyright © 2016 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Jon Arni Steingrimsson
- Department of Biostatistics, Johns Hopkins University, Baltimore, 14853, MD, 21205 U.S.A
| | - Liqun Diao
- Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, N2L 3G1, ON, CANADA
| | - Annette M Molinaro
- Department of Neurological Surgery, University of California, San Francisco, 94143-0372, CA, U.S.A
| | - Robert L Strawderman
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, 14642, NY, U.S.A
| |
Collapse
|
25
|
Abstract
We compare splitting methods for constructing survival trees that are used as a model of survival time based on covariates. A number of splitting criteria on the classification and regression tree (CART) have been proposed by various authors, and we compare nine criteria through simulations. Comparative studies have been restricted to criteria that suppose the survival model for each terminal node in the final tree as a non-parametric model. As the main results, the criteria using the exponential log-likelihood loss, log-rank test statistics, the deviance residual under the proportional hazard model, or square error of martingale residual are recommended when it appears that the data have constant hazard with the passage of time. On the other hand, when the data are thought to have decreasing hazard with passage of time, the criterion using the two-sample test statistic, or square error of deviance residual would be optimal. Moreover, when the data are thought to have increasing hazard with the passage of time, the criterion using the exponential log-likelihood loss, or impurity that combines observed times and the proportion of censored observations would be the best. We also present the results of an actual medical research to show the utility of survival trees.
Collapse
|
26
|
Rancoita PM, Zaffalon M, Zucca E, Bertoni F, de Campos CP. Bayesian network data imputation with application to survival tree analysis. Comput Stat Data Anal 2016. [DOI: 10.1016/j.csda.2014.12.008] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
27
|
Chen JJ, Lu TP, Chen YC, Lin WJ. Predictive biomarkers for treatment selection: statistical considerations. Biomark Med 2015; 9:1121-35. [DOI: 10.2217/bmm.15.84] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Predictive biomarkers are developed for treatment selection to identify patients who are likely to benefit from a particular therapy. This review describes statistical methods and discusses issues in the development of predictive biomarkers to enhance study efficiency for detection of treatment effect on the selected responder patients in clinical studies. The statistical procedure for treatment selection consists of three components: biomarker identification, subgroup selection and clinical utility assessment. Major statistical issues discussed include biomarker designs, procedures to identify predictive biomarkers, classification models for subgroup selection, subgroup analysis and multiple testing for clinical utility assessment and evaluation.
Collapse
Affiliation(s)
- James J Chen
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR 72079, USA
- Graduate Institute of Biostatistics, China Medical University, Taichung, Taiwan
| | - Tzu-Pin Lu
- Department of Public Health, Institute of Epidemiology & Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Yu-Chuan Chen
- Division of Bioinformatics & Biostatistics, National Center for Toxicological Research, US Food & Drug Administration, Jefferson, AR 72079, USA
| | - Wei-Jiun Lin
- Department of Applied Mathematics, Feng Chia University, Taichung, Taiwan
| |
Collapse
|
28
|
Zhou Y, McArdle JJ. Rationale and Applications of Survival Tree and Survival Ensemble Methods. PSYCHOMETRIKA 2015; 80:811-833. [PMID: 25228495 PMCID: PMC4409541 DOI: 10.1007/s11336-014-9413-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/21/2013] [Indexed: 06/03/2023]
Abstract
Classification and Regression Trees (CART), and their successors-bagging and random forests, are statistical learning tools that are receiving increasing attention. However, due to characteristics of censored data collection, standard CART algorithms are not immediately transferable to the context of survival analysis. Questions about the occurrence and timing of events arise throughout psychological and behavioral sciences, especially in longitudinal studies. The prediction power and other key features of tree-based methods are promising in studies where an event occurrence is the outcome of interest. This article reviews existing tree algorithms designed specifically for censored responses as well as recently developed survival ensemble methods, and introduces available computer software. Through simulations and a practical example, merits and limitations of these methods are discussed. Suggestions are provided for practical use.
Collapse
Affiliation(s)
- Yan Zhou
- Mary S. Easton Center for Alzheimer's Disease Research, Department of Neurology, University of California, Los Angeles, 10911 Weyburn Avenue, Suite 200, Los Angeles, CA, 90095, USA,
| | | |
Collapse
|
29
|
Shimokawa A, Kawasaki Y, Miyaoka E. A comparative study on splitting criteria of a survival tree based on the Cox proportional model. J Biopharm Stat 2015; 26:386-401. [PMID: 26043356 DOI: 10.1080/10543406.2015.1052485] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
We treat the situations that the effect of covariates on hazard is differed in subgroups of patients. To handle this situation, we can consider the hybrid model of the Cox model and tree-structured model. Through simulation studies, we compared several splitting criteria for constructing this hybrid model. As a result, the criterion using the degree of the improvement in the negative maximum partial log-likelihood obtained by splitting showed a good performance for many situations. We also present the results obtained by applying this tree model in an actual medical research study to show its utility.
Collapse
Affiliation(s)
- Asanao Shimokawa
- a Graduate School of Science , Tokyo University of Science , Tokyo , Japan
| | - Yohei Kawasaki
- b Department of Mathematics , Tokyo University of Science , Tokyo , Japan
| | - Etsuo Miyaoka
- b Department of Mathematics , Tokyo University of Science , Tokyo , Japan
| |
Collapse
|
30
|
|
31
|
Wallace ML. Time-dependent tree-structured survival analysis with unbiased variable selection through permutation tests. Stat Med 2014; 33:4790-804. [PMID: 25043382 DOI: 10.1002/sim.6261] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2013] [Revised: 06/01/2014] [Accepted: 06/13/2014] [Indexed: 11/06/2022]
Abstract
Incorporating time-dependent covariates into tree-structured survival analysis (TSSA) may result in more accurate prognostic models than if only baseline values are used. Available time-dependent TSSA methods exhaustively test every binary split on every covariate; however, this approach may result in selection bias toward covariates with more observed values. We present a method that uses unbiased significance levels from newly proposed permutation tests to select the time-dependent or baseline covariate with the strongest relationship with the survival outcome. The specific splitting value is identified using only the selected covariate. Simulation results show that the proposed time-dependent TSSA method produces tree models of equal or greater accuracy as compared to baseline TSSA models, even with high censoring rates and large within-subject variability in the time-dependent covariate. To illustrate, the proposed method is applied to data from a cohort of bipolar youths to identify subgroups at risk for self-injurious behavior.
Collapse
Affiliation(s)
- M L Wallace
- Departments of Psychiatry and Statistics, University of Pittsburgh, Pittsburgh, PA, U.S.A
| |
Collapse
|
32
|
Affiliation(s)
- Wei-Yin Loh
- Department of Statistics; University of Wisconsin; Madison WI 53706 USA
| |
Collapse
|
33
|
Garg L, McClean S, Barton M, Meenan B, Fullerton K. An Extended Mixture Distribution Survival Tree for Patient Pathway Prognostication. COMMUN STAT-THEOR M 2013. [DOI: 10.1080/03610926.2012.725262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
34
|
Mogensen UB, Gerds TA. A random forest approach for competing risks based on pseudo-values. Stat Med 2013; 32:3102-14. [PMID: 23508720 DOI: 10.1002/sim.5775] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2011] [Accepted: 01/30/2013] [Indexed: 11/08/2022]
Abstract
Random forest is a supervised learning method that combines many classification or regression trees for prediction. Here we describe an extension of the random forest method for building event risk prediction models in survival analysis with competing risks. In case of right-censored data, the event status at the prediction horizon is unknown for some subjects. We propose to replace the censored event status by a jackknife pseudo-value, and then to apply an implementation of random forests for uncensored data. Because the pseudo-responses take on values on a continuous scale, the node variance is chosen as split criterion for growing regression trees. In a simulation study, the pseudo split criterion is compared with the Gini split criterion when the latter is applied to the uncensored event status. To investigate the resulting pseudo random forest method for building risk prediction models, we analyze it in a simulation study of predictive performance where we compare it to Cox regression and random survival forest. The method is further illustrated in two real data sets.
Collapse
Affiliation(s)
- Ulla B Mogensen
- Department of Biostatistics, University of Copenhagen, Denmark.
| | | |
Collapse
|
35
|
Satoh K, Tanaka M, Yano A, Ying J, Kakuma T. Treatment when prognostic factors do not match St. Gallen recommendations: profiling of prognostic factors among HR(+) and HER2(-) breast cancer patients. World J Surg 2012; 37:516-24. [PMID: 23229849 DOI: 10.1007/s00268-012-1881-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
BACKGROUND The St. Gallen consensus provides treatment recommendations for breast cancer based on prognostic factors. Although many patients' prognostic patterns are not easily matched with the prognostic patterns listed in the St. Gallen consensus, there has been no systematic investigation reporting the gap between treatment recommendations and actual postoperative treatment choices in clinical practice. METHODS Four hundred seventy-one patients with hormone receptor-positive [HR(+)] and human epidermal growth factor receptor type 2-negative [HER2(-)] breast cancer were analyzed. These patients were classified into either the "crisp treatment group" or "fuzzy treatment group" based on the definitiveness of postoperative treatment selection based on St. Gallen treatment recommendations. The patients in the fuzzy treatment group were further classified into strata in which patients within each stratum shared the same prognostic factor patterns with similar recurrence rates. RESULTS A total of 87.3% of HR(+)HER2(-) patients were designated to the fuzzy treatment group. Four prognostic strata were constructed according to the survival tree model, and revealed that patients with poor prognostic profiles tended to receive endocrine therapy with chemotherapy. This suggests that postoperative chemotherapy is useful, although there was no statistical significance. CONCLUSIONS We constructed prognostic profiles of patients in the fuzzy treatment group and examined the recurrence rates associated with two treatment regimens within each prognostic profile. These findings are exploratory, but they may be useful for planning prospective studies of the effectiveness of postoperative treatment regimens among patients with a heterogeneous combination of prognostic factors.
Collapse
Affiliation(s)
- Kyoko Satoh
- Graduate School of Medicine, Kurume University, 67 Asahi-machi, Kurume City, Fukuoka, 830-0011, Japan,
| | | | | | | | | |
Collapse
|
36
|
Garg L, McClean SI, Barton M, Meenan BJ, Fullerton K. Intelligent Patient Management and Resource Planning for Complex, Heterogeneous, and Stochastic Healthcare Systems. ACTA ACUST UNITED AC 2012. [DOI: 10.1109/tsmca.2012.2210211] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
37
|
Wang Y, Ziedins I, Holmes M, Challands N. Tree models for difference and change detection in a complex environment. Ann Appl Stat 2012. [DOI: 10.1214/12-aoas548] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
38
|
Rigat F, Mira A. Parallel hierarchical sampling: A general-purpose interacting Markov chains Monte Carlo algorithm. Comput Stat Data Anal 2012. [DOI: 10.1016/j.csda.2011.11.020] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
39
|
Nunn ME, Fan J, Su X, Levine RA, Lee HJ, McGuire MK. Development of prognostic indicators using classification and regression trees for survival. Periodontol 2000 2012; 58:134-42. [PMID: 22133372 DOI: 10.1111/j.1600-0757.2011.00421.x] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
40
|
Bou-Hamad I, Larocque D, Ben-Ameur H. Discrete-time survival trees and forests with time-varying covariates. STAT MODEL 2011. [DOI: 10.1177/1471082x1001100503] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The aim of this paper is to propose a new survival tree method for discrete-time survival data with time-varying covariates. This method can accommodate simultaneously time-varying covariates and time-varying effects. The method is then used for bankruptcy analysis of US firms that conducted an Initial Public Offerings between 1990 and 1999 using accounting and financial ratios.
Collapse
Affiliation(s)
- Imad Bou-Hamad
- Department of Business Information and Decision Systems, Olayan School of Business, American University of Beirut, Lebanon
| | | | | |
Collapse
|
41
|
London WB, Castel V, Monclair T, Ambros PF, Pearson ADJ, Cohn SL, Berthold F, Nakagawara A, Ladenstein RL, Iehara T, Matthay KK. Clinical and biologic features predictive of survival after relapse of neuroblastoma: a report from the International Neuroblastoma Risk Group project. J Clin Oncol 2011; 29:3286-92. [PMID: 21768459 DOI: 10.1200/jco.2010.34.3392] [Citation(s) in RCA: 222] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Survival after neuroblastoma relapse is poor. Understanding the relationship between clinical and biologic features and outcome after relapse may help in selection of optimal therapy. Our aim was to determine which factors were significantly predictive of postrelapse overall survival (OS) in patients with recurrent neuroblastoma--particularly whether time from diagnosis to first relapse (TTFR) was a significant predictor of OS. PATIENTS AND METHODS Patients with first relapse/progression were identified in the International Neuroblastoma Risk Group (INRG) database. Time from study enrollment until first event and OS time starting from first event were calculated. Cox regression models were used to calculate the hazard ratio of increased death risk and perform survival tree regression. TTFR was tested in a multivariable Cox model with other factors. RESULTS In the INRG database (N = 8,800), 2,266 patients experienced first progression/relapse. Median time to relapse was 13.2 months (range, 1 day to 11.4 years). Five-year OS from time of first event was 20% (SE, ± 1%). TTFR was statistically significantly associated with OS time in a nonlinear relationship; patients with TTFR of 36 months or longer had the lowest risk of death, followed by patients who relapsed in the period of 0 to less than 6 months or 18 to 36 months. Patients who relapsed between 6 and 18 months after diagnosis had the highest risk of death. TTFR, age, International Neuroblastoma Staging System stage, and MYCN copy number status were independently predictive of postrelapse OS in multivariable analysis. CONCLUSION Age, stage, MYCN status, and TTFR are significant prognostic factors for postrelapse survival and may help in the design of clinical trials evaluating novel agents.
Collapse
Affiliation(s)
- Wendy B London
- Children's Oncology Group Statistics and Data Center and Dana-Farber Children's Hospital Cancer Center, Boston, MA, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Su X, Meneses K, McNees P, Johnson WO. Interaction trees: exploring the differential effects of an intervention programme for breast cancer survivors. J R Stat Soc Ser C Appl Stat 2011. [DOI: 10.1111/j.1467-9876.2010.00754.x] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
43
|
|
44
|
Wallace ML, Anderson SJ, Mazumdar S. A stochastic multiple imputation algorithm for missing covariate data in tree-structured survival analysis. Stat Med 2010; 29:3004-16. [PMID: 20963751 DOI: 10.1002/sim.4079] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2009] [Accepted: 08/02/2010] [Indexed: 11/06/2022]
Abstract
Missing covariate data present a challenge to tree-structured methodology due to the fact that a single tree model, as opposed to an estimated parameter value, may be desired for use in a clinical setting. To address this problem, we suggest a multiple imputation algorithm that adds draws of stochastic error to a tree-based single imputation method presented by Conversano and Siciliano (Technical Report, University of Naples, 2003). Unlike previously proposed techniques for accommodating missing covariate data in tree-structured analyses, our methodology allows the modeling of complex and nonlinear covariate structures while still resulting in a single tree model. We perform a simulation study to evaluate our stochastic multiple imputation algorithm when covariate data are missing at random and compare it to other currently used methods. Our algorithm is advantageous for identifying the true underlying covariate structure when complex data and larger percentages of missing covariate observations are present. It is competitive with other current methods with respect to prediction accuracy. To illustrate our algorithm, we create a tree-structured survival model for predicting time to treatment response in older, depressed adults.
Collapse
Affiliation(s)
- Meredith L Wallace
- University of Pittsburgh School of Medicine, Department of Psychiatry, Western Psychiatric Institute and Clinic, Pittsburgh, PA, USA.
| | | | | |
Collapse
|
45
|
Bou-hamad I, Larocque D, Ben-Ameur H, Mâsse LC, Vitaro F, Tremblay RE. Discrete-time survival trees. CAN J STAT 2009. [DOI: 10.1002/cjs.10007] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
46
|
Multivariate Exponential Survival Trees And Their Application to Tooth Prognosis. Comput Stat Data Anal 2009; 53:1110-1121. [PMID: 21709804 DOI: 10.1016/j.csda.2008.10.019] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
This paper is concerned with developing rules for assignment of tooth prognosis based on actual tooth loss in the VA Dental Longitudinal Study. It is also of interest to rank the relative importance of various clinical factors for tooth loss. A multivariate survival tree procedure is proposed. The procedure is built on a parametric exponential frailty model, which leads to greater computational efficiency. We adopted the goodness-of-split pruning algorithm of LeBlanc and Crowley (1993) to determine the best tree size. In addition, the variable importance method is extended to trees grown by goodness-of-fit using an algorithm similar to the random forest procedure in Breiman (2001). Simulation studies for assessing the proposed tree and variable importance methods are presented. To limit the final number of meaningful prognostic groups, an amalgamation algorithm is employed to merge terminal nodes that are homogenous in tooth survival. The resulting prognosis rules and variable importance rankings seem to offer simple yet clear and insightful interpretations.
Collapse
|
47
|
Cohn SL, Pearson ADJ, London WB, Monclair T, Ambros PF, Brodeur GM, Faldum A, Hero B, Iehara T, Machin D, Mosseri V, Simon T, Garaventa A, Castel V, Matthay KK. The International Neuroblastoma Risk Group (INRG) classification system: an INRG Task Force report. J Clin Oncol 2008; 27:289-97. [PMID: 19047291 DOI: 10.1200/jco.2008.16.6785] [Citation(s) in RCA: 1221] [Impact Index Per Article: 76.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
PURPOSE Because current approaches to risk classification and treatment stratification for children with neuroblastoma (NB) vary greatly throughout the world, it is difficult to directly compare risk-based clinical trials. The International Neuroblastoma Risk Group (INRG) classification system was developed to establish a consensus approach for pretreatment risk stratification. PATIENTS AND METHODS The statistical and clinical significance of 13 potential prognostic factors were analyzed in a cohort of 8,800 children diagnosed with NB between 1990 and 2002 from North America and Australia (Children's Oncology Group), Europe (International Society of Pediatric Oncology Europe Neuroblastoma Group and German Pediatric Oncology and Hematology Group), and Japan. Survival tree regression analyses using event-free survival (EFS) as the primary end point were performed to test the prognostic significance of the 13 factors. RESULTS Stage, age, histologic category, grade of tumor differentiation, the status of the MYCN oncogene, chromosome 11q status, and DNA ploidy were the most highly statistically significant and clinically relevant factors. A new staging system (INRG Staging System) based on clinical criteria and tumor imaging was developed for the INRG Classification System. The optimal age cutoff was determined to be between 15 and 19 months, and 18 months was selected for the classification system. Sixteen pretreatment groups were defined on the basis of clinical criteria and statistically significantly different EFS of the cohort stratified by the INRG criteria. Patients with 5-year EFS more than 85%, more than 75% to < or = 85%, > or = 50% to < or = 75%, or less than 50% were classified as very low risk, low risk, intermediate risk, or high risk, respectively. CONCLUSION By defining homogenous pretreatment patient cohorts, the INRG classification system will greatly facilitate the comparison of risk-based clinical trials conducted in different regions of the world and the development of international collaborative studies.
Collapse
Affiliation(s)
- Susan L Cohn
- Department of Pediatrics, The University of Chicago, Chicago, IL 60637, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
48
|
Tsai CA, Chen DT, Chen JJ, Balch CM, Thompson JF, Soong SJ. An Integrated Tree-Based Classification Approach to Prognostic Grouping with Application to Localized Melanoma Patients. J Biopharm Stat 2007; 17:445-60. [PMID: 17479393 DOI: 10.1080/10543400701199585] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
We propose an integrated tree-based approach for prognostic grouping of localized melanoma patients. This approach incorporates the survival tree model with the agglomerative hierarchical clustering to group terminal subgroups with similar prognoses together. The Brier score is used to evaluate the goodness of fit and the k-fold cross-validation test is used to evaluate the reproducibility of the scheme for prediction. The proposed approach is applied to an American Joint Committee on Cancer (AJCC) localized melanoma data set and compared with the current AJCC staging system. This approach performs more efficiently than the standard tree methods and has made improvement over the current AJCC melanoma staging system.
Collapse
Affiliation(s)
- Chen-An Tsai
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | | | | | | | | | | |
Collapse
|
49
|
Gao F, Manatunga AK, Chen S. Non-parametric estimation for baseline hazards function and covariate effects with time-dependent covariates. Stat Med 2007; 26:857-68. [PMID: 16685705 DOI: 10.1002/sim.2574] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Often in many biomedical and epidemiologic studies, estimating hazards function is of interest. The Breslow's estimator is commonly used for estimating the integrated baseline hazard, but this estimator requires the functional form of covariate effects to be correctly specified. It is generally difficult to identify the true functional form of covariate effects in the presence of time-dependent covariates. To provide a complementary method to the traditional proportional hazard model, we propose a tree-type method which enables simultaneously estimating both baseline hazards function and the effects of time-dependent covariates. Our interest will be focused on exploring the potential data structures rather than formal hypothesis testing. The proposed method approximates the baseline hazards and covariate effects with step-functions. The jump points in time and in covariate space are searched via an algorithm based on the improvement of the full log-likelihood function. In contrast to most other estimating methods, the proposed method estimates the hazards function rather than integrated hazards. The method is applied to model the risk of withdrawal in a clinical trial that evaluates the anti-depression treatment in preventing the development of clinical depression. Finally, the performance of the method is evaluated by several simulation studies.
Collapse
Affiliation(s)
- Feng Gao
- Division of Biostatistics, Washington University School of Medicine, Campus Box 8067, 660 S. Euclid Ave., St Louis, MO 63110, USA
| | | | | |
Collapse
|
50
|
Berrar D, Sturgeon B, Bradbury I, Downes CS, Dubitzky W. Survival trees for analyzing clinical outcome in lung adenocarcinomas based on gene expression profiles: identification of neogenin and diacylglycerol kinase alpha expression as critical factors. J Comput Biol 2005; 12:534-44. [PMID: 15952876 DOI: 10.1089/cmb.2005.12.534] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We present survival trees as an exploratory tool for revealing new insights into gene expression profiles in combination with clinical patient data. Survival trees partition the patient data studied into groups with similar survival outcomes and identify characteristic genetic profiles within these groups. We demonstrate the application of survival trees in a study involving the expression profiles of 3,588 genes in 211 lung adenocarcinoma patients. The survival tree identified a group of early-stage cancer patients with relatively low survival rates and another group of advanced-stage patients with remarkably good survival outcome. For both groups, the tree identified characteristic expression profiles of genes that might play a role in cancerogenesis and disease progression, notably the genes for the netrin receptor neogenin and the Ras/Rho kinase modulator diacylglycerol kinase alpha.
Collapse
Affiliation(s)
- Daniel Berrar
- Bioinformatics Research Group, School of Biomedical Sciences, Faculty of Life and Health Sciences, University of Ulster, Cromore Road, BT52 1SA, Coleraine, Northern Ireland.
| | | | | | | | | |
Collapse
|