1
|
Sun Z, Tao Y, Li S, Ferguson KK, Meeker JD, Park SK, Batterman SA, Mukherjee B. Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons. Environ Health 2013; 12:85. [PMID: 24093917 PMCID: PMC3857674 DOI: 10.1186/1476-069x-12-85] [Citation(s) in RCA: 110] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Accepted: 10/02/2013] [Indexed: 05/19/2023]
Abstract
BACKGROUND As public awareness of consequences of environmental exposures has grown, estimating the adverse health effects due to simultaneous exposure to multiple pollutants is an important topic to explore. The challenges of evaluating the health impacts of environmental factors in a multipollutant model include, but are not limited to: identification of the most critical components of the pollutant mixture, examination of potential interaction effects, and attribution of health effects to individual pollutants in the presence of multicollinearity. METHODS In this paper, we reviewed five methods available in the statistical literature that are potentially helpful for constructing multipollutant models. We conducted a simulation study and presented two data examples to assess the performance of these methods on feature selection, effect estimation and interaction identification using both cross-sectional and time-series designs. We also proposed and evaluated a two-step strategy employing an initial screening by a tree-based method followed by further dimension reduction/variable selection by the aforementioned five approaches at the second step. RESULTS Among the five methods, least absolute shrinkage and selection operator regression performs well in general for identifying important exposures, but will yield biased estimates and slightly larger model dimension given many correlated candidate exposures and modest sample size. Bayesian model averaging, and supervised principal component analysis are also useful in variable selection when there is a moderately strong exposure-response association. Substantial improvements on reducing model dimension and identifying important variables have been observed for all the five statistical methods using the two-step modeling strategy when the number of candidate variables is large. CONCLUSIONS There is no uniform dominance of one method across all simulation scenarios and all criteria. The performances differ according to the nature of the response variable, the sample size, the number of pollutants involved, and the strength of exposure-response association/interaction. However, the two-step modeling strategy proposed here is potentially applicable under a multipollutant framework with many covariates by taking advantage of both the screening feature of an initial tree-based method and dimension reduction/variable selection property of the subsequent method. The choice of the method should also depend on the goal of the study: risk prediction, effect estimation or screening for important predictors and their interactions.
Collapse
|
Research Support, N.I.H., Extramural |
12 |
110 |
2
|
Henrard S, Speybroeck N, Hermans C. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia. Haemophilia 2015; 21:715-22. [PMID: 26248714 DOI: 10.1111/hae.12778] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/27/2015] [Indexed: 11/29/2022]
Abstract
INTRODUCTION Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. AIMS The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. MATERIALS & METHODS The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. RESULTS The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. CONCLUSION There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable.
Collapse
|
Review |
10 |
66 |
3
|
Lin Z, Kahrilas PJ, Roman S, Boris L, Carlson D, Pandolfino JE. Refining the criterion for an abnormal Integrated Relaxation Pressure in esophageal pressure topography based on the pattern of esophageal contractility using a classification and regression tree model. Neurogastroenterol Motil 2012; 24:e356-63. [PMID: 22716041 PMCID: PMC3616504 DOI: 10.1111/j.1365-2982.2012.01952.x] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
BACKGROUND The Integrated Relaxation Pressure (IRP) is the esophageal pressure topography (EPT) metric used for assessing the adequacy of esophagogastric junction (EGJ) relaxation in the Chicago Classification of motility disorders. However, because the IRP value is also influenced by distal esophageal contractility, we hypothesized that its normal limits should vary with different patterns of contractility. METHODS Five hundred and twenty two selected EPT studies were used to compare the accuracy of alternative analysis paradigms to that of a motility expert (the 'gold standard'). Chicago Classification metrics were scored manually and used as inputs for MATLAB™ programs that utilized either strict algorithm-based interpretation (fixed abnormal IRP threshold of 15 mmHg) or a classification and regression tree (CART) model that selected variable IRP thresholds depending on the associated esophageal contractility. KEY RESULTS The sensitivity of the CART model for achalasia (93%) was better than that of the algorithm-based approach (85%) on account of using variable IRP thresholds that ranged from a low value of >10 mmHg to distinguish type I achalasia from absent peristalsis to a high value of >17 mmHg to distinguish type III achalasia from distal esophageal spasm. Additionally, type II achalasia was diagnosed solely by panesophageal pressurization without the IRP entering the algorithm. CONCLUSIONS & INFERENCES Automated interpretation of EPT studies more closely mimics that of a motility expert when IRP thresholds for impaired EGJ relaxation are adjusted depending on the pattern of associated esophageal contractility. The range of IRP cutoffs suggested by the CART model ranged from 10 to 17 mmHg.
Collapse
|
research-article |
13 |
64 |
4
|
Zhou J, Yao J, Deng J, Dewald JP. EEG-based classification for elbow versus shoulder torque intentions involving stroke subjects. Comput Biol Med 2009; 39:443-52. [PMID: 19380125 PMCID: PMC2865155 DOI: 10.1016/j.compbiomed.2009.02.004] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2007] [Revised: 02/17/2009] [Accepted: 02/26/2009] [Indexed: 11/26/2022]
Abstract
The ultimate aim for classifying elbow versus shoulder torque intentions is to develop robust brain-computer interface (BCI) devices for patients who suffer from movement disorders following brain injury such as stroke. In this paper, we investigate the advanced classification approach classifier-enhanced time-frequency synthesized spatial pattern algorithm (classifier-enhanced TFSP) in classifying a subject's intent of generating an isometric shoulder abduction (SABD) or elbow flexion (EF) torque using signals obtained from 163 scalp electroencephalographic (EEG) electrodes. Two classifiers, the support vector classifier (SVC) and the classification and regression tree (CART), are integrated in the TFSP algorithm that decomposes the signal into a weighted time, frequency and spatial feature space. The resulting high-performing methods (SVC-TFSP and CART-TFSP) are then applied to experimental data collected in four healthy subjects and two stroke subjects. Results are compared with the original TFSP, and significantly higher reliability in both healthy subjects (92% averaged over four healthy subjects) and stroke subjects (75% averaged over two subjects) are achieved. The accuracies of classifier-enhanced TFSP methods are further improved after a rejection scheme is applied (approximately 100% in healthy subjects and >80% in stroke subjects). The results are among the highest reliability reported in literature for tasks with spatial representations on the motor cortex as close as shoulder and elbow. The paper also discusses the impact of applying rejection strategy in detail and reports the existence of an optimal rejection rate on a stroke subject. The results indicate that the proposed algorithms are promising for future use of rehabilitative BCI applications in neurologically impaired patients.
Collapse
|
Research Support, N.I.H., Extramural |
16 |
38 |
5
|
Lu F, Petkova E. A comparative study of variable selection methods in the context of developing psychiatric screening instruments. Stat Med 2013; 33:401-21. [PMID: 23934941 DOI: 10.1002/sim.5937] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Accepted: 07/08/2013] [Indexed: 11/09/2022]
Abstract
The development of screening instruments for psychiatric disorders involves item selection from a pool of items in existing questionnaires assessing clinical and behavioral phenotypes. A screening instrument should consist of only a few items and have good accuracy in classifying cases and non-cases. Variable/item selection methods such as Least Absolute Shrinkage and Selection Operator (LASSO), Elastic Net, Classification and Regression Tree, Random Forest, and the two-sample t-test can be used in such context. Unlike situations where variable selection methods are most commonly applied (e.g., ultra high-dimensional genetic or imaging data), psychiatric data usually have lower dimensions and are characterized by the following factors: correlations and possible interactions among predictors, unobservability of important variables (i.e., true variables not measured by available questionnaires), amount and pattern of missing values in the predictors, and prevalence of cases in the training data. We investigate how these factors affect the performance of several variable selection methods and compare them with respect to selection performance and prediction error rate via simulations. Our results demonstrated that: (1) for complete data, LASSO and Elastic Net outperformed other methods with respect to variable selection and future data prediction, and (2) for certain types of incomplete data, Random Forest induced bias in imputation, leading to incorrect ranking of variable importance. We propose the Imputed-LASSO combining Random Forest imputation and LASSO; this approach offsets the bias in Random Forest and offers a simple yet efficient item selection approach for missing data. As an illustration, we apply the methods to items from the standard Autism Diagnostic Interview-Revised version.
Collapse
|
Research Support, N.I.H., Extramural |
12 |
34 |
6
|
Carnevale V, Morano S, Fontana A, Annese MA, Fallarino M, Filardi T, Copetti M, Pellegrini F, Romagnoli E, Eller-Vainicher C, Zhukouskaya VV, Chiodini I, D'Amico G. Assessment of fracture risk by the FRAX algorithm in men and women with and without type 2 diabetes mellitus: a cross-sectional study. Diabetes Metab Res Rev 2014; 30:313-22. [PMID: 24420974 DOI: 10.1002/dmrr.2497] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Revised: 10/15/2013] [Accepted: 11/08/2013] [Indexed: 12/19/2022]
Abstract
BACKGROUND The FRAX algorithm is a diffuse tool to assess fracture risk, but it has not been clinically applied in European patients with diabetes. We investigated FRAX-estimated fracture risk in patients with type 2 diabetes mellitus (DM), compared with concomitantly enrolled control subjects. METHODS In our multicentric cross-sectional study, we assessed the FRAX scores of 974 DM and 777 control subjects from three Italian diabetes outpatient clinics, and in DM. We tested the association between parameters and complications of the disease and FRAX scores. RESULTS DM had significantly lower FRAX-estimated probability of both major osteoporotic fracture (MOF) and hip fracture (HF) than control subjects (6.35 ± 5.07% versus 7.75 ± 6.93%, p < 0.001, and 2.17 ± 3.07% versus 2.91 ± 4.56%, p = 0.023, respectively). When grouping by gender, such differences were found only in men. In DM, the frequency of previous fracture was higher than in control subjects (29.88% versus 20.46%, p < 0.001). In diabetic patients, age, sex, body mass index, HbA1c and hypoglycaemia are significantly associated with FRAX scores; gender-specific regression models differed. Among DM, the tree-based regression (classification and regression tree (CART)) analysis identified groups of patients with different mean FRAX scores. In female DM aged > 65 years with or without obesity, MOF > 20% was found in 5.66% and 13.53% and H > 3% in 40.57% and 63.91% of patients, respectively. CONCLUSIONS Patients with DM had mean FRAX scores lower than control subjects, despite the higher number of previous fractures. Some features and complications of DM did associate with FRAX scores. Among DM patients, the CART analysis identified subgroups with higher FRAX scores. However, despite its potential utility, concerns still remain for using FRAX in DM patients.
Collapse
|
Multicenter Study |
11 |
28 |
7
|
Superolateral hoffa fat-pad edema and patellofemoral maltracking: predictive modeling. AJR Am J Roentgenol 2014; 203:W207-12. [PMID: 25055295 DOI: 10.2214/ajr.13.11848] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
OBJECTIVE Superolateral Hoffa fat-pad edema is a frequent finding with patellar maltracking and may precede clinically significant chondrosis. The purpose of this study was to clarify which patellofemoral measurements are most highly associated and to develop a prediction rule to guide clinical decision making. MATERIALS AND METHODS Twenty-three patellofemoral measurements were performed on 71 knees retrospectively identified as having superolateral Hoffa fat-pad edema at MRI (Hoffa group) and on 45 normal knees (normal group). Univariate analysis was performed to examine the association between these measurements and Hoffa fat-pad edema. Classification and regression tree analysis with 10-fold cross validation was used to generate a prediction model. RESULTS For 16 of the 23 patellofemoral measurements, there was a statistically significant difference (p < 0.05) between the Hoffa and normal groups. Classification and regression tree analysis identified a prediction model in which a patient is placed into the Hoffa group if one of three conditions is met: lateral patellar displacement greater than -3.6 mm and Insall-Salvati ratio greater than 0.99; lateral patellar displacement of -3.6 mm or less and Insall-Salvati ratio greater than 1.23; or lateral patellar displacement of -3.6 mm or less, Insall-Salvati ratio of 1.23 or less, and lateral trochlear inclination of 16.5° or less. In fitting of the original sample, this model had 91.6% sensitivity and 88.9% specificity for identifying the Hoffa group. When 10-fold cross validation was applied, the estimated generalizable sensitivity and specificity were 85.9% and 75.6%. CONCLUSION Superolateral Hoffa fat-pad is strongly associated with a number of measures of patellar maltracking. A prediction model based on these measurements is accurate for differentiating knees with superolateral Hoffa fat-pad edema from normal knees.
Collapse
|
Journal Article |
11 |
24 |
8
|
Choi SK, Fram MS, Frongillo EA. Very Low Food Security in US Households Is Predicted by Complex Patterns of Health, Economics, and Service Participation. J Nutr 2017; 147:1992-2000. [PMID: 28855422 DOI: 10.3945/jn.117.253179] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2017] [Revised: 06/02/2017] [Accepted: 08/09/2017] [Indexed: 11/14/2022] Open
Abstract
Background: Very low food security (VLFS) happens at the intersection of nuanced and complex patterns of risk characteristics across multiple domains. Little is known about the idiosyncratic situations that lead households to experience VLFS.Objective: We used classification and regression tree (CART) analysis, which can handle complex combinations of predictors, to identify patterns of characteristics that distinguish VLFS households in the United States from other households.Methods: Data came from 3 surveys, the 2011-2014 National Health Interview Survey (NHIS), the 2005-2012 NHANES, and the 2002-2012 Current Population Survey (CPS), with sample participants aged ≥18 y and households with income <300% of the federal poverty line. Survey participants were stratified into households with children, adult-only households, and older-adult households (NHIS, CPS) or individuals aged 18-64 y and individuals aged ≥65 y (NHANES). Household food security was measured with the use of the 10-item US Adult Food Security Scale. Variables from multiple domains, including sociodemographic characteristics, health, health care, and participation in social welfare and food assistance programs, were considered as predictors. The 3 data sources were analyzed separately with the use of CART analysis.Results: Household experiences of VLFS were associated with different predictors for different types of households and often occurred at the intersection of multiple characteristics spanning unmet medical needs, poor health, disability, limitation, depressive symptoms, low income, and food assistance program participation. These predictors built complex trees with various combinations in different types of households.Conclusions: This study showed that multiple characteristics across multiple domains distinguished VLFS households. Flexible and nonlinear methods focusing on a wide range of risk characteristics should be used to identify VLFS households and to inform policies and programs that can address VLFS households' various needs.
Collapse
|
|
8 |
23 |
9
|
Lo-Ciganic W, Zgibor JC, Ruppert K, Arena VC, Stone RA. Identifying type 1 and type 2 diabetic cases using administrative data: a tree-structured model. J Diabetes Sci Technol 2011; 5:486-93. [PMID: 21722564 PMCID: PMC3192615 DOI: 10.1177/193229681100500303] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND To date, few administrative diabetes mellitus (DM) registries have distinguished type 1 diabetes mellitus (T1DM) from type 2 diabetes mellitus (T2DM). OBJECTIVE Using a classification tree model, a prediction rule was developed to distinguish T1DM from T2DM in a large administrative database. METHODS The Medical Archival Retrieval System at the University of Pittsburgh Medical Center included administrative and clinical data from January 1, 2000, through September 30, 2009, for 209,647 DM patients aged ≥18 years. Probable cases (8,173 T1DM and 125,111 T2DM) were identified by applying clinical criteria to administrative data. Nonparametric classification tree models were fit using TIBCO Spotfire S+ 8.1 (TIBCO Software), with model size based on 10-fold cross validation. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of T1DM were estimated. RESULTS The main predictors that distinguished T1DM from T2DM are age <40 years; International Classification of Disease, 9th revision, codes of T1DM or T2DM diagnosis; inpatient oral hypoglycemic agent use; inpatient insulin use; and episode(s) of diabetic ketoacidosis diagnosis. Compared with a complex clinical algorithm, the tree-structured model to predict T1DM had 92.8% sensitivity, 99.3% specificity, 89.5% PPV, and 99.5% NPV. CONCLUSION The preliminary predictive rule appears to be promising. Being able to distinguish between DM subtypes in administrative databases will allow large-scale subtype-specific analyses of medical care costs, morbidity, and mortality.
Collapse
|
research-article |
14 |
23 |
10
|
Liu L, Wu J, Zhong R, Wu C, Zou L, Yang B, Chen W, Zhu B, Duan S, Yu D, Tan W, Nie S, Lin D, Miao X. Multi-loci analysis reveals the importance of genetic variations in sensitivity of platinum-based chemotherapy in non-small-cell lung cancer. Mol Carcinog 2012; 52:923-31. [PMID: 22821704 DOI: 10.1002/mc.21942] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2012] [Revised: 05/02/2012] [Accepted: 06/26/2012] [Indexed: 01/12/2023]
Abstract
Polymorphisms in DNA repair and apoptotic pathways may cause variations in chemosensitivity of non-small-cell lung cancer (NSCLC) through complex gene-gene and gene-environment interactions. A total of 200 advanced NSCLC patients who received platinum-based chemotherapies were recruited. The short-term clinical outcomes were classified as chemosensitive group, including complete remission (CR) and partial remission (PR), and chemoresistant group, namely stable disease (SD) and progression disease (PD) at the end of treatment. We applied multifactor dimensionality reduction (MDR), classification and regression tree (CART) and traditional logistic regression (LR) to explore high-order gene-gene and gene-environment interactions among 11 functional single nucleotide polymorphisms (SNPs), smoking status, cancer stages and treatment regimens in the response to chemotherapy. Multi-loci analyses consistently indicated that interactions among XRCC1 Arg194Trp, XPC PAT, FAS G-1377A, and FASL T-844C were associated with sensitivity to platinum-based chemotherapy. In MDR analysis, the four-factor model yielded the highest test accuracy of 0.72 (permutation P = 0.001). In CART analysis, these four SNPs were the determinant nodes of the growth of regression tree. Patients carrying XRCC1 Arg194Arg, FAS-1377GG, and FASL-844T allele displayed completely no response to platinum, whereas patients with XRCC1 194Trp allele and XPC PAT +/+ had 68.8% response rate to platinum. In LR analysis, a significant gene-dosage effect was detected along with the increasing number of favorable genotypes of these four polymorphisms (P trend = 0.00002). Multi-loci analysis reveals the importance of genetic variations involved in DNA repair and apoptotic pathways in sensitivity of platinum-based chemotherapy in NSCLC.
Collapse
|
Research Support, Non-U.S. Gov't |
13 |
22 |
11
|
Brom H, Carthon JMB, Ikeaba U, Chittams J. Leveraging Electronic Health Records and Machine Learning to Tailor Nursing Care for Patients at High Risk for Readmissions. J Nurs Care Qual 2020; 35:27-33. [PMID: 31136529 PMCID: PMC6874718 DOI: 10.1097/ncq.0000000000000412] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
BACKGROUND Electronic health record-derived data and novel analytics, such as machine learning, offer promising approaches to identify high-risk patients and inform nursing practice. PURPOSE The aim was to identify patients at risk for readmissions by applying a machine-learning technique, Classification and Regression Tree, to electronic health record data from our 300-bed hospital. METHODS We conducted a retrospective analysis of 2165 clinical encounters from August to October 2017 using data from our health system's data store. Classification and Regression Tree was employed to determine patient profiles predicting 30-day readmission. RESULTS The 30-day readmission rate was 11.2% (n = 242). Classification and Regression Tree analysis revealed highest risk for readmission among patients who visited the emergency department, had 9 or more comorbidities, were insured through Medicaid, and were 65 years of age and older. CONCLUSIONS Leveraging information through the electronic health record and Classification and Regression Tree offers a useful way to identify high-risk patients. Findings from our algorithm may be used to improve the quality of nursing care delivery for patients at highest readmission risk.
Collapse
|
Observational Study |
5 |
19 |
12
|
Ghosh J, Pradhan S, Mittal B. Multilocus analysis of hormonal, neurotransmitter, inflammatory pathways and genome-wide associated variants in migraine susceptibility. Eur J Neurol 2014; 21:1011-1020. [PMID: 24698360 DOI: 10.1111/ene.12427] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 02/24/2014] [Indexed: 02/03/2023]
Abstract
BACKGROUND AND PURPOSE Migraine pathophysiology involves a complex interplay of processes wherein the hormonal, neurotransmitter and inflammatory pathways interact to influence the migraine phenotype. However, all studies pertaining to the role of genetic variants in migraine have been restricted to a specific pathway and none of the studies has looked into inter-pathway genetic analysis. Our aim was to combine all the genetic variants from our previously reported studies to conduct higher order gene-gene interaction analysis using different multi-analytical approaches. METHODS The study group included 324 migraine patients and 134 healthy controls. The study included 20 polymorphisms from hormonal, neurotransmitter, inflammatory and genome-wide associated variants from our published reports. Univariate and multivariate analyses were carried out by logistic regression. Classification and regression tree (CART) analysis was performed to build a decision tree via recursive partitioning. The high order genetic interactions associated with migraine risk were analyzed using multifactor dimensionality reduction (MDR). RESULTS Univariate analysis revealed significant associations of polymorphisms in CYP19A1, ESR1, TNFA and PRDM16 genes with migraine susceptibility. Multiple regression analysis found significant results for four markers in CYP19A1, TNFA, ESR1 and LRP1 genes. In CART, the most prominent splitting variable was CYP19A1 polymorphism followed by TNFA, ESR1 and PRDM16 markers. The MDR analysis identified markers of CYP19A1, CYP19A1- TNFA, CYP19A1- ESR1- TNFA and CYP19A1- ESR1- TRPM8- PRDM16 as best models for one, two, three and four factors, respectively. CONCLUSIONS The present study suggests interactions amongst hormonal, inflammatory and genome-wide associated variants but not with neurotransmitter pathway variants in migraine susceptibility.
Collapse
|
|
11 |
18 |
13
|
Albano TR, Rodrigues CAS, Melo AKP, de Paula PO, Almeida GPL. Clinical Decision Algorithm Associated With Return to Sport After Anterior Cruciate Ligament Reconstruction. J Athl Train 2020; 55:691-698. [PMID: 32396470 DOI: 10.4085/1062-6050-82-19] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
CONTEXT Understanding the factors that predict return to sport (RTS) after anterior cruciate ligament reconstruction facilitates clinical decision making. OBJECTIVE To develop a clinical decision algorithm that could predict RTS and non-RTS based on the differences in the variables after anterior cruciate ligament reconstruction. DESIGN Cross-sectional study. SETTING University laboratory. PATIENTS OR OTHER PARTICIPANTS A total of 150 athletes in any sport involving deceleration, jumping, cutting, or turning enrolled in the study. All participants answered the International Knee Documentation Committee and Anterior Cruciate Ligament Return to Sport After Injury (ACL-RSI) questionnaires and performed balance and isokinetic tests. MAIN OUTCOME MEASURE(S) The classification and regression tree (CART) was used to determine the clinical decision algorithm associated with RTS at any level and RTS at the preinjury level. The diagnostic accuracy of the CART was verified. RESULTS Of the 150 participants, 57.3% (n = 86) returned to sport at any level and 12% (n = 18) returned to sport at the preinjury level. The interactions among the peak torque extension at 300°/s >93.55 Nm, ACL-RSI score >27.05 (P = .06), and postoperative time >7.50 months were associated with RTS at any level identified by CART and were factors associated with RTS. An ACL-RSI score >72.85% was the main variable associated with RTS at the preinjury level. The interaction among an ACL-RSI score of 50.40% to 72.85%, agonist : antagonist ratio at 300°/s ≤63.6%, and anteroposterior stability index ≤2.4 in these participants was the second factor associated with RTS at the preinjury level. CONCLUSIONS Athletes who had more quadriceps strength tended to RTS at any level more quickly, even with less-than-expected psychological readiness. Regarding a return at the preinjury level, psychological readiness was the most important factor in not returning, followed by a better agonist : antagonist ratio and better balance.
Collapse
|
Journal Article |
5 |
17 |
14
|
Guilbault RWR, Ohlsson MA, Afonso AM, Ebell MH. External Validation of Two Classification and Regression Tree Models to Predict the Outcome of Inpatient Cardiopulmonary Resuscitation. J Intensive Care Med 2017; 32:333-338. [PMID: 28049389 DOI: 10.1177/0885066616686924] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
OBJECTIVE To prospectively validate a previously developed classification and regression tree (CART) model that predicts the likelihood of a good outcome among patients undergoing inpatient cardiopulmonary resuscitation. DESIGN Prospective validation of a clinical decision rule. SETTING Skåne University Hospital in Malmo, Sweden. PATIENTS All adult patients (N = 287) experiencing in-hospital cardiopulmonary arrest and undergoing cardiopulmonary resuscitation between 2007 and 2010. INTERVENTIONS Patients from Skåne University Hospital who underwent CPR (N = 287) were classified using the CART models to predict their likelihood of survival neurologically intact or with minimal deficits, based on a cerebral performance category score of 1. Discrimination and classification accuracy of the score in the Swedish population was compared to that in the original (derivation and internal validation) populations. MEASUREMENTS AND MAIN RESULTS For model 1, the area under the receiver-operating characteristic curve (AUROCC) was 0.77, compared with 0.76 and 0.73 in the original derivation and validation populations, respectively. Model 1 classified 71 (2.8%) of 287 patients as being at a very low risk of a good neurologic outcome compared with 157 (26.1%) of 287 patients predicted to be at an above average risk of a good neurologic outcome. Model 2 had a similar AUROCC as the original validation population of 0.71 but lower than the original derivation population. Model 2 performed similarly to Model 1 with regards to its ability to correctly classify patients as very low or higher than average likelihood of a good neurologic outcome. CONCLUSION Two CART models validated well in a different population, displaying similar discrimination and classification accuracy compared to the original population. Although additional validation in larger populations is desirable before widespread adoption, these results are very encouraging.
Collapse
|
Validation Study |
8 |
15 |
15
|
Krishna OB, Maiti J, Ray PK, Samanta B, Mandal S, Sarkar S. Measurement and Modeling of Job Stress of Electric Overhead Traveling Crane Operators. Saf Health Work 2016; 6:279-88. [PMID: 26929839 PMCID: PMC4682032 DOI: 10.1016/j.shaw.2015.06.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 06/22/2015] [Accepted: 06/24/2015] [Indexed: 11/18/2022] Open
Abstract
Background In this study, the measurement of job stress of electric overhead traveling crane operators and quantification of the effects of operator and workplace characteristics on job stress were assessed. Methods Job stress was measured on five subscales: employee empowerment, role overload, role ambiguity, rule violation, and job hazard. The characteristics of the operators that were studied were age, experience, body weight, and body height. The workplace characteristics considered were hours of exposure, cabin type, cabin feature, and crane height. The proposed methodology included administration of a questionnaire survey to 76 electric overhead traveling crane operators followed by analysis using analysis of variance and a classification and regression tree. Results The key findings were: (1) the five subscales can be used to measure job stress; (2) employee empowerment was the most significant factor followed by the role overload; (3) workplace characteristics contributed more towards job stress than operator's characteristics; and (4) of the workplace characteristics, crane height was the major contributor. Conclusion The issues related to crane height and cabin feature can be fixed by providing engineering or foolproof solutions than relying on interventions related to the demographic factors.
Collapse
|
|
9 |
14 |
16
|
Du J, Liu J, Zhang X, Chen X, Yu R, Gu D, Zou J, Liu Y, Liu S. Pre-treatment neutrophil-to-lymphocyte ratio predicts survival in patients with laryngeal cancer. Oncol Lett 2017; 15:1664-1672. [PMID: 29399193 PMCID: PMC5774534 DOI: 10.3892/ol.2017.7501] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 10/13/2017] [Indexed: 02/05/2023] Open
Abstract
An increased neutrophil-to-lymphocyte ratio (NLR) is associated with poorer prognostic outcomes in numerous types of cancer. However, a small number of studies have demonstrated the prognostic role of NLR in patients with laryngeal cancer. The present study evaluated the association between NLR and survival outcomes in patients with laryngeal squamous cancer. All patients were scheduled for follow-up visits. The levels of cytokines from tumor tissues were analyzed by ELISA. A classification and regression tree (CART) was used to determine the optimal cutoff values of NLR. The clinical features and NLR were determined using Kaplan-Meier analysis and Cox regression to analyze the survival outcomes and associated risks. Of the total 654 patients, 70 patients (70/654; 10.7%) failed to receive follow-up. Blood and biochemical parameters, including NLR, platelet-to-lymphocyte ratio and albumin-to-globulin ratio were associated with clinical characteristics of the patients, with the exception of histologic grade. Only one node with NLR at 3.18 divided patients into different categories, according to CART analysis. Survival analysis demonstrated that NLR at cutoff values subdivided patients into different survival outcomes (P<0.001). Subsequent to adjustments for age and other clinical features, NLR was identified to be an independent prognostic factor for overall survival and progression-free survival (P<0.05). Increased levels of cytokines, including IL-6 and IL-8, in tumor tissues were associated with NLR values. In summary, pre-treatment NLR was associated with the prognostic outcomes for patients with laryngeal cancer, and may assist to establish prognostic factors for these patients.
Collapse
|
Journal Article |
8 |
14 |
17
|
Shi KQ, Zhou YY, Yan HD, Li H, Wu FL, Xie YY, Braddock M, Lin XY, Zheng MH. Classification and regression tree analysis of acute-on-chronic hepatitis B liver failure: Seeing the forest for the trees. J Viral Hepat 2017; 24:132-140. [PMID: 27686368 DOI: 10.1111/jvh.12617] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 08/10/2016] [Indexed: 12/13/2022]
Abstract
At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification.
Collapse
|
|
8 |
14 |
18
|
Ius T, Somma T, Altieri R, Angileri FF, Barbagallo GM, Cappabianca P, Certo F, Cofano F, D'Elia A, Della Pepa GM, Esposito V, Fontanella MM, Germanò A, Garbossa D, Isola M, La Rocca G, Maiuri F, Olivi A, Panciani PP, Pignotti F, Skrap M, Spena G, Sabatino G. Is age an additional factor in the treatment of elderly patients with glioblastoma? A new stratification model: an Italian Multicenter Study. Neurosurg Focus 2020; 49:E13. [PMID: 33002864 DOI: 10.3171/2020.7.focus20420] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 07/23/2020] [Indexed: 11/06/2022]
Abstract
OBJECTIVE Approximately half of glioblastoma (GBM) cases develop in geriatric patients, and this trend is destined to increase with the aging of the population. The optimal strategy for management of GBM in elderly patients remains controversial. The aim of this study was to assess the role of surgery in the elderly (≥ 65 years old) based on clinical, molecular, and imaging data routinely available in neurosurgical departments and to assess a prognostic survival score that could be helpful in stratifying the prognosis for elderly GBM patients. METHODS Clinical, radiological, surgical, and molecular data were retrospectively analyzed in 322 patients with GBM from 9 neurosurgical centers. Univariate and multivariate analyses were performed to identify predictors of survival. A random forest approach (classification and regression tree [CART] analysis) was utilized to create the prognostic survival score. RESULTS Survival analysis showed that overall survival (OS) was influenced by age as a continuous variable (p = 0.018), MGMT (p = 0.012), extent of resection (EOR; p = 0.002), and preoperative tumor growth pattern (evaluated with the preoperative T1/T2 MRI index; p = 0.002). CART analysis was used to create the prognostic survival score, forming six different survival groups on the basis of tumor volumetric, surgical, and molecular features. Terminal nodes with similar hazard ratios were grouped together to form a final diagram composed of five classes with different OSs (p < 0.0001). EOR was the most robust influencing factor in the algorithm hierarchy, while age appeared at the third node of the CART algorithm. The ability of the prognostic survival score to predict death was determined by a Harrell's c-index of 0.75 (95% CI 0.76-0.81). CONCLUSIONS The CART algorithm provided a promising, thorough, and new clinical prognostic survival score for elderly surgical patients with GBM. The prognostic survival score can be useful to stratify survival risk in elderly GBM patients with different surgical, radiological, and molecular profiles, thus assisting physicians in daily clinical management. The preliminary model, however, requires validation with future prospective investigations. Practical recommendations for clinicians/surgeons would strengthen the quality of the study; e.g., surgery can be considered as a first therapeutic option in the workflow of elderly patients with GBM, especially when the preoperative estimated EOR is greater than 80%.
Collapse
|
Journal Article |
5 |
14 |
19
|
A threshold analysis of dengue transmission in terms of weather variables and imported dengue cases in Australia. Emerg Microbes Infect 2013; 2:e87. [PMID: 26038449 PMCID: PMC3880872 DOI: 10.1038/emi.2013.85] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Revised: 11/11/2013] [Accepted: 11/13/2013] [Indexed: 12/02/2022]
Abstract
Dengue virus (DENV) transmission in Australia is driven by weather factors and imported dengue fever (DF) cases. However, uncertainty remains regarding the threshold effects of high-order interactions among weather factors and imported DF cases and the impact of these factors on autochthonous DF. A time-series regression tree model was used to assess the threshold effects of natural temporal variations of weekly weather factors and weekly imported DF cases in relation to incidence of weekly autochthonous DF from 1 January 2000 to 31 December 2009 in Townsville and Cairns, Australia. In Cairns, mean weekly autochthonous DF incidence increased 16.3-fold when the 3-week lagged moving average maximum temperature was <32 °C, the 4-week lagged moving average minimum temperature was ≥24 °C and the sum of imported DF cases in the previous 2 weeks was >0. When the 3-week lagged moving average maximum temperature was ≥32 °C and the other two conditions mentioned above remained the same, mean weekly autochthonous DF incidence only increased 4.6-fold. In Townsville, the mean weekly incidence of autochthonous DF increased 10-fold when 3-week lagged moving average rainfall was ≥27 mm, but it only increased 1.8-fold when rainfall was <27 mm during January to June. Thus, we found different responses of autochthonous DF incidence to weather factors and imported DF cases in Townsville and Cairns. Imported DF cases may also trigger and enhance local outbreaks under favorable climate conditions.
Collapse
|
Journal Article |
12 |
14 |
20
|
Wang N, Cong S, Fan J, Bao H, Wang B, Yang T, Feng Y, Liu Y, Wang L, Wang C, Hu W, Fang L. Geographical Disparity and Associated Factors of COPD Prevalence in China: A Spatial Analysis of National Cross-Sectional Study. Int J Chron Obstruct Pulmon Dis 2020; 15:367-377. [PMID: 32103935 PMCID: PMC7025678 DOI: 10.2147/copd.s234042] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 01/16/2020] [Indexed: 01/30/2023] Open
Abstract
Purpose COPD prevalence has rapidly increased in China, but the geographical disparities in COPD prevalence remain largely unknown. This study aimed to assess city-level disparities in COPD prevalence and identify the relative importance of COPD related risk factors in mainland China. Patients and Methods A nationwide cross-sectional study of COPD recruited 66,752 adults across the mainland China between 2014 and 2015. Patients with COPD were ascertained by a post-bronchodilator pulmonary function test. We estimated the city-specific prevalence of COPD by spatial kriging interpolation method. We detected spatial clusters with a significantly higher prevalence of COPD by spatial scan statistics. We determined the relative importance of COPD associated risk factors by a nonparametric and nonlinear classification and regression tree (CART) model. Results The three spatial clusters with the highest prevalence of COPD were located in parts of Sichuan, Gansu, and Shaanxi, etc. (relative risks (RRs)) ranging from 1.55 (95% CI 1.55-1.56) to 1.33 (95% CI 1.33-1.33)). CART showed that advanced age (≥60 years) was the most important factor associated with COPD in the overall population, followed by smoking. We estimated that there were about 28.5 million potentially avoidable cases of COPD among people aged 40 or older if they never smoked. PM2.5 was an important associated risk factor for COPD in the north, northeast, and southwest of China. After adjusting for age and smoking, the spatial cluster with the highest prevalence shifted to most of Sichuan, Gansu, Qinghai, and Ningxia, etc. (RR 1.65 (95% CI 1.63-1.67)). Conclusion The spatial clusters of COPD at the city level and regionally varied important risk factors for COPD would help develop tailored interventions for COPD in China. After adjusting for the main risk factors, the spatial clusters of COPD shifted, indicating that there would be other potential risk factors for the remaining clusters which call for further studies.
Collapse
|
research-article |
5 |
13 |
21
|
Influence Factors on Injury Severity of Traffic Accidents and Differences in Urban Functional Zones: The Empirical Analysis of Beijing. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2018; 15:ijerph15122722. [PMID: 30513896 PMCID: PMC6313644 DOI: 10.3390/ijerph15122722] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 11/22/2018] [Accepted: 11/28/2018] [Indexed: 11/20/2022]
Abstract
The objective of this study was to identify influence factors on injury severity of traffic accidents and discuss the differences in urban functional zones in Beijing. A total of 3982 sets of accident data in Beijing were analyzed from the perspective of whole city and different urban functional zones. From the aspects of accident attribute, occurrence time, infrastructure, management status, and environmental condition, the influence factors set of injury severity of traffic accidents in Beijing are set up in this paper, which include 17 influence factors. Based on Pearson’s chi-squared test, factors are preselected. On the basis of binary logistic regression analysis, the impact of the value of influence factors on injury severity of traffic accidents is calibrated. Based on classification and regression tree analysis, the impact of influence factors is analyzed. Through Pearson’s chi-squared test and binary logistic regression analysis, it is found that there are similarities and differences among different urban functional zones. There are two common influence factors, including accident type and cross-section position, and six personalized influence factors, including lighting conditions, visibility, signal control, road physical isolation facility, occurrence period and road type, and the other nine weak influence factors. The results of binary logistic regression analysis and classification and regression tree analysis are basically the same. The factors that should be paid attention to in different urban functional zones and the value of the factors that need special attention are determined by synthesizing two methods.
Collapse
|
Research Support, Non-U.S. Gov't |
7 |
12 |
22
|
Speiser JL, Wolf BJ, Chung D, Karvellas CJ, Koch DG, Durkalski VL. BiMM tree: A decision tree method for modeling clustered and longitudinal binary outcomes. COMMUN STAT-SIMUL C 2018; 49:1004-1023. [PMID: 32377032 PMCID: PMC7202553 DOI: 10.1080/03610918.2018.1490429] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 06/04/2018] [Accepted: 06/13/2018] [Indexed: 10/28/2022]
Abstract
Clustered binary outcomes are frequently encountered in clinical research (e.g. longitudinal studies). Generalized linear mixed models (GLMMs) for clustered endpoints have challenges for some scenarios (e.g. data with multi-way interactions and nonlinear predictors unknown a priori). We develop an alternative, data-driven method called Binary Mixed Model (BiMM) tree, which combines decision tree and GLMM within a unified framework. Simulation studies show that BiMM tree achieves slightly higher or similar accuracy compared to standard methods. The method is applied to a real dataset from the Acute Liver Failure Study Group.
Collapse
|
research-article |
7 |
10 |
23
|
Iiames JS, Salls WB, Mehaffey MH, Nash MS, Christensen JR, Schaeffer BA. Modeling Anthropogenic and Environmental Influences on Freshwater Harmful Algal Bloom Development Detected by MERIS Over the Central United States. WATER RESOURCES RESEARCH 2021; 57:e2020WR028946. [PMID: 35860362 PMCID: PMC9285409 DOI: 10.1029/2020wr028946] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Revised: 06/21/2021] [Accepted: 09/06/2021] [Indexed: 05/31/2023]
Abstract
Human and ecological health have been threatened by the increase of cyanobacteria harmful algal blooms (cyanoHABs) in freshwater systems. Successful mitigation of this risk requires understanding the factors driving cyanoHABs at a broad scale. To inform management priorities and decisions, we employed random forest modeling to identify major cyanoHAB drivers in 369 freshwater lakes distributed across 15 upper Midwest states during the 2011 bloom season (July-October). We used Cyanobacteria Index (CI_cyano)-A remotely sensed product derived from the MEdium Resolution Imaging Spectrometer (MERIS) aboard the European Space Agency's Envisat satellite-as the response variable to obtain variable importance metrics for 75 landscape and lake physiographic predictor variables. Lakes were stratified into high and low elevation categories to further focus CI_cyano variable importance identification by anthropogenic and natural influences. "High elevation" watershed land cover (LC) was primarily forest or natural vegetation, compared with "low elevation" watersheds LC dominated by anthropogenic landscapes (e.g., agriculture and municipalities). We used the top ranked 25 Random Forest variables to create a classification and regression tree (CART) for both low and high elevation lake designations to identify variable thresholds for possible management mitigation. Mean CI_cyano was 3 times larger for "low elevation" lakes than for "high elevation" lakes, with both mean values exceeding the "High" World Health Organization recreational guidance/action level threshold for cyanobacteria (100,000 cells/mL). Agrarian-related variables were prominent across all 369 lakes and low elevation lakes. High elevation lakes showed more influence of lakeside LC than for the low elevation lakes.
Collapse
|
research-article |
4 |
9 |
24
|
Xu Y, Park YS, Park JD. Measuring the Response Performance of U.S. States against COVID-19 Using an Integrated DEA, CART, and Logistic Regression Approach. Healthcare (Basel) 2021; 9:healthcare9030268. [PMID: 33802276 PMCID: PMC7998215 DOI: 10.3390/healthcare9030268] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Revised: 02/22/2021] [Accepted: 02/25/2021] [Indexed: 12/20/2022] Open
Abstract
Measuring the U.S.'s COVID-19 response performance is an extremely important challenge for health care policymakers. This study integrates Data Envelopment Analysis (DEA) with four different machine learning (ML) techniques to assess the efficiency and evaluate the U.S.'s COVID-19 response performance. First, DEA is applied to measure the efficiency of fifty U.S. states considering four inputs: number of tested, public funding, number of health care employees, number of hospital beds. Then, number of recovered from COVID-19 as a desirable output and number of confirmed COVID-19 cases as a undesirable output are considered. In the second stage, Classification and Regression Tree (CART), Boosted Tree (BT), Random Forest (RF), and Logistic Regression (LR) were applied to predict the COVID-19 response performance based on fifteen environmental factors, which were classified into social distancing, health policy, and socioeconomic measures. The results showed that 23 states were efficient with an average efficiency score of 0.97. Furthermore, BT and RF models produced the best prediction results and CART performed better than LR. Lastly, urban, physical inactivity, number of tested per population, population density, and total hospital beds per population were the most influential factors on efficiency.
Collapse
|
Journal Article |
4 |
8 |
25
|
Yang CC, Su YC, Lin YW, Huang CI, Lee CC. Differential impact of age on survival in head and neck cancer according to classic Cox regression and decision tree analysis. Clin Otolaryngol 2019; 44:244-253. [PMID: 30578588 DOI: 10.1111/coa.13274] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Revised: 11/01/2018] [Accepted: 12/17/2018] [Indexed: 01/01/2023]
Abstract
OBJECTIVES To assess the impact of age on the survival of patients with head and neck squamous cell carcinoma (HNSCC) using different statistical methods. DESIGN A retrospective population-based study. SETTING Surveillance, Epidemiology, and End Results database. SUBJECTS AND METHODS A total of 28 639 patients with newly diagnosed HNSCC were enrolled between 1 January 2007 and 31 December 2013. The effect of age on 5-year disease-specific survival was calculated using a Kaplan-Meier method and compared using log-rank tests. A Cox proportional hazards model was used for a multivariate analysis. A classification and regression tree (CART) analysis that partitioned patients with significantly different Kaplan-Meier curves was introduced to identify the important cancer-related parameters influencing survival. RESULTS Uni- and multivariate analyses indicated that patients who were older than 60 years had poorer 5-year disease-specific survival regardless of tumour subsite and tumor-node-metastasis (TNM) stage. However, the CART analysis determined that age played only a minor role in survival after comparing with other prognosticators. The relative importance of age using the Gini index was as follows: 3.21% for oral cancer, 8.32% for oropharyngeal cancer, 2.56% for hypopharyngeal cancer and 16.51% for laryngeal cancer. CONCLUSIONS Different to traditional statistical methods, the CART analysis which was used to identify homogeneous populations revealed that the impact of age varied for different patient groups according to the presence or absence of other prognosticators. This important information could help to guide our clinical decisions and future researches.
Collapse
|
Multicenter Study |
6 |
8 |