1
|
Jin Y, Xu S, Shao Z, Luo X, Wang Y, Yu Y, Wang Y. Discovery of depression-associated factors among childhood trauma victims from a large sample size: Using machine learning and network analysis. J Affect Disord 2024; 345:300-310. [PMID: 37865343 DOI: 10.1016/j.jad.2023.10.101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 09/25/2023] [Accepted: 10/15/2023] [Indexed: 10/23/2023]
Abstract
BACKGROUND Experiences of childhood trauma (CT) would lead to serious mental problems, especially depression. Therefore, it becomes crucial to identify influential factors related to depression and explore their associations. The objectives were to 1) identify critical depression-related factors using the extreme gradient boosting (XGBoost) method from a large-scale survey data; 2) explore associations between these factors for targeted interventions and treatments. METHODS A large-scale epidemiological study covering 63 universities was conducted in Jilin Province, China. The XGBoost model was trained and tested to classify young adults with CT experiences who had or did not have depression (N = 27,671). The essential factors were selected by SHapley Additive exPlanations (SHAP) value. Multiple logistic regression analyses were conducted for validation. The associations between these depression-related factors were further explored using network analysis. RESULTS The XGBoost model selected the top 10 features associated with depression with satisfactory performance (AUC = 0.91; sensitivity = 0.88 and specificity = 0.76). These factors significantly differed between depression and non-depression groups (p < 0.001). There are strong positive associations between anxiety and obsessive-compulsive disorder (OCD), anxiety and post-traumatic stress disorder (PTSD), social anxiety disorder (SAD) and appearance anxiety, and negative associations between sleep quality and anxiety, sleep quality and PTSD among CT participants with depression. LIMITATIONS The cross-sectional design cannot draw causality, and biases in self-report measurements cannot be ignored. CONCLUSIONS XGBoost model and network analysis were useful methods for discovering and understanding depression-related factors in this epidemiological study. Moreover, these essential factors could offer insights into future interventions and treatments for depressed young adults with CT experiences.
Collapse
Affiliation(s)
- Yu Jin
- College of Education for the Future, Beijing Normal University, Beijing, China
| | - Shicun Xu
- Northeast Asian Research Center, Jilin University, Changchun, China; Department of Population, Resources and Environment, Northeast Asian Studies College, Jilin University, Changchun, China; China Center for Aging Studies and Social-Economic Development, Jilin University, Changchun, China
| | - Zhixian Shao
- School of Statistics, Beijing Normal University, Beijing, China
| | - Xianyu Luo
- College of Education for the Future, Beijing Normal University, Beijing, China
| | - Yinzhe Wang
- Vanke School of Public Health, Tsinghua University, Beijing, China
| | - Yi Yu
- Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, Guangzhou, China; School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China
| | - Yuanyuan Wang
- Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, Guangzhou, China; School of Psychology, Center for Studies of Psychological Application, and Guangdong Key Laboratory of Mental Health and Cognitive Science, South China Normal University, Guangzhou, China.
| |
Collapse
|
2
|
Hanke M, Dijkstra L, Foraita R, Didelez V. Variable selection in linear regression models: Choosing the best subset is not always the best choice. Biom J 2024; 66:e2200209. [PMID: 37643390 DOI: 10.1002/bimj.202200209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 06/19/2023] [Accepted: 06/22/2023] [Indexed: 08/31/2023]
Abstract
We consider the question of variable selection in linear regressions, in the sense of identifying the correct direct predictors (those variables that have nonzero coefficients given all candidate predictors). Best subset selection (BSS) is often considered the "gold standard," with its use being restricted only by its NP-hard nature. Alternatives such as the least absolute shrinkage and selection operator (Lasso) or the Elastic net (Enet) have become methods of choice in high-dimensional settings. A recent proposal represents BSS as a mixed-integer optimization problem so that large problems have become computationally feasible. We present an extensive neutral comparison assessing the ability to select the correct direct predictors of BSS compared to forward stepwise selection (FSS), Lasso, and Enet. The simulation considers a range of settings that are challenging regarding dimensionality (number of observations and variables), signal-to-noise ratios, and correlations between predictors. As fair measure of performance, we primarily used the best possible F1-score for each method, and results were confirmed by alternative performance measures and practical criteria for choosing the tuning parameters and subset sizes. Surprisingly, it was only in settings where the signal-to-noise ratio was high and the variables were uncorrelated that BSS reliably outperformed the other methods, even in low-dimensional settings. Furthermore, FSS performed almost identically to BSS. Our results shed new light on the usual presumption of BSS being, in principle, the best choice for selecting the correct direct predictors. Especially for correlated variables, alternatives like Enet are faster and appear to perform better in practical settings.
Collapse
Affiliation(s)
- Moritz Hanke
- Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
| | - Louis Dijkstra
- Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
| | - Ronja Foraita
- Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
| | - Vanessa Didelez
- Department of Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS, Bremen, Germany
- Department of Mathematics and Computer Science, University of Bremen, Bremen, Germany
| |
Collapse
|
3
|
Chang CC, Yeh JH, Chiu HC, Liu TC, Chen YM, Jhou MJ, Lu CJ. Assessing the length of hospital stay for patients with myasthenia gravis based on the data mining MARS approach. Front Neurol 2023; 14:1283214. [PMID: 38156090 PMCID: PMC10752965 DOI: 10.3389/fneur.2023.1283214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 11/27/2023] [Indexed: 12/30/2023] Open
Abstract
Predicting the length of hospital stay for myasthenia gravis (MG) patients is challenging due to the complex pathogenesis, high clinical variability, and non-linear relationships between variables. Considering the management of MG during hospitalization, it is important to conduct a risk assessment to predict the length of hospital stay. The present study aimed to successfully predict the length of hospital stay for MG based on an expandable data mining technique, multivariate adaptive regression splines (MARS). Data from 196 MG patients' hospitalization were analyzed, and the MARS model was compared with classical multiple linear regression (MLR) and three other machine learning (ML) algorithms. The average hospital stay duration was 12.3 days. The MARS model, leveraging its ability to capture non-linearity, identified four significant factors: disease duration, age at admission, MGFA clinical classification, and daily prednisolone dose. Cut-off points and correlation curves were determined for these risk factors. The MARS model outperformed the MLR and the other ML methods (including least absolute shrinkage and selection operator MLR, classification and regression tree, and random forest) in assessing hospital stay length. This is the first study to utilize data mining methods to explore factors influencing hospital stay in patients with MG. The results highlight the effectiveness of the MARS model in identifying the cut-off points and correlation for risk factors associated with MG hospitalization. Furthermore, a MARS-based formula was developed as a practical tool to assist in the measurement of hospital stay, which can be feasibly supported as an extension of clinical risk assessment.
Collapse
Affiliation(s)
- Che-Cheng Chang
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan
- PhD Program in Nutrition and Food Science, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Jiann-Horng Yeh
- School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Neurology, Shin Kong Wu Ho-Su Memorial Hospital, Taipei City, Taiwan
- Department of Neurology, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Hou-Chang Chiu
- School of Medicine, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Neurology, Taipei Medical University, Shuang-Ho Hospital, New Taipei City, Taiwan
| | - Tzu-Chi Liu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Yen-Ming Chen
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City, Taiwan
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City, Taiwan
| |
Collapse
|
4
|
Srinivasan V, Hall J, Wahlster S, Johnson NJ, Branch K. Associations between clinical characteristics of cardiac arrest and early CT head findings of hypoxic ischaemic brain injury following out-of-hospital cardiac arrest. Resuscitation 2023; 190:109858. [PMID: 37270091 DOI: 10.1016/j.resuscitation.2023.109858] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 05/23/2023] [Accepted: 05/25/2023] [Indexed: 06/05/2023]
Abstract
BACKGROUND/OBJECTIVE Post-cardiac arrest patients are vulnerable to hypoxic-ischaemic brain injury (HIBI), but HIBI may not be identified until computed tomography (CT) scan of the brain is obtained post-resuscitation and stabilization. We aimed to evaluate the association of clinical arrest characteristics with early CT findings of HIBI to identify those at the highest risk for HIBI. METHODS This is a retrospective analysis of out-of-hospital cardiac arrest (OHCA) patients who underwent whole-body imaging. Head CT reports were analyzed with an emphasis on findings suggestive of HIBI; HIBI was present if any of the following were noted on the neuroradiologist read: global cerebral oedema, sulcal effacement, blurred grey-white junction, and ventricular compression. The primary exposure was duration of cardiac arrest. Secondary exposures included age, cardiac vs noncardiac etiology, and witnessed vs unwitnessed arrest. The primary outcome was CT findings of HIBI. RESULTS A total of 180 patients (average age 54 years, 32% female, 71% White, 53% witnessed arrest, 32% cardiac etiology of arrest, mean CPR duration of 15 ± 10 minutes) were included in this analysis. CT findings of HIBI were seen in 47 (48.3%) patients. Multivariate logistic regression demonstrated a significant association between CPR duration and HIBI (adjusted OR = 1.1, 95% CI 1.01-1.11, p < 0.01). CONCLUSION Signs of HIBI are commonly seen on CT head within 6 hours of OHCA, occurring in approximately half of patients, and are associated with CPR duration. Determining risk factors for abnormal CT findings can help clinically identify patients at higher risk for HIBI and target interventions appropriately.
Collapse
Affiliation(s)
- Vasisht Srinivasan
- Department of Emergency Medicine, University of Washington School of Medicine, United States.
| | - Jane Hall
- Department of Emergency Medicine, University of Washington School of Medicine, United States
| | - Sarah Wahlster
- Department of Neurology, University of Washington School of Medicine, United States; Department of Neurosurgery, University of Washington School of Medicine, United States; Department of Anesthesiology and Pain Medicine, University of Washington School of Medicine, United States
| | - Nicholas J Johnson
- Department of Emergency Medicine, University of Washington School of Medicine, United States; Department of Medicine, Division of Pulmonary, Critical Care, and Sleep Medicine, University of Washington School of Medicine, United States
| | - Kelley Branch
- Department of Medicine, Division of Cardiology, University of Washington School of Medicine, United States
| |
Collapse
|
5
|
Meester M, Tobias TJ, van den Broek J, Meulenbroek CB, Bouwknegt M, van der Poel WH, Stegeman A. Farm biosecurity measures to prevent hepatitis E virus infection in finishing pigs on endemically infected pig farms. One Health 2023; 16:100570. [PMID: 37363225 PMCID: PMC10288132 DOI: 10.1016/j.onehlt.2023.100570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Revised: 05/17/2023] [Accepted: 05/23/2023] [Indexed: 06/28/2023] Open
Abstract
Hepatitis E virus (HEV) can be transmitted from pigs to humans and cause liver inflammation. Pigs are a major reservoir of HEV and most slaughter pigs show evidence of infection by presence of antibodies (ELISA) or viral RNA (PCR). Reducing the number of HEV infected pigs at slaughter would likely reduce human exposure, yet how this can be achieved, is unknown. We aimed to identify and quantify the effect of biosecurity measures to deliver HEV negative batches of pigs to slaughter. A case-control study was performed with Dutch pig farms selected based on results of multiple slaughter batches. Case farms delivered at least one PCR and ELISA negative batch to slaughter (PCR-ELISA-), indicating absence of HEV infection, and control farms had the highest proportion of PCR and/or ELISA positive batches (PCR+ELISA+), indicating high within-farm transmission. Data about biosecurity and housing were collected via a questionnaire and an audit. Variables were selected by regularization (LASSO regression) and ranked, based the frequency of variable selection. The odds ratios (OR) for the relation between case-control status and the highest ranked variables were determined via grouped logistic regression. Thirty-five case farms, with 10 to 60% PCR-ELISA- batches, and 38 control farms with on average 40% PCR+ELISA+ batches, were included. Rubber and steel floor material in fattening pens had the highest ranking and increased the odds of a PCR-ELISA- batch by 5.87 (95%CI 3.03-11.6) and 7.13 (95%CI 3.05-16.9) respectively. Cleaning pig driving boards weekly (OR 1.99 (95%CI 1.07-3.80)), and fly control with predatory flies (OR 4.52 (95%CI 1.59-13.5)) were protective, whereas a long fattening period was a risk. This study shows that cleaning and cleanability of floors and fomites and adequate fly control are measures to consider for HEV control in infected farms. Yet, intervention studies are needed to confirm the robustness of these outcomes.
Collapse
Affiliation(s)
- Marina Meester
- Farm Animal Health Unit, Department of Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
| | - Tijs J. Tobias
- Farm Animal Health Unit, Department of Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
- Royal GD, Deventer, the Netherlands
| | - Jan van den Broek
- Farm Animal Health Unit, Department of Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
| | - Carmijn B. Meulenbroek
- Farm Animal Health Unit, Department of Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
| | | | | | - Arjan Stegeman
- Farm Animal Health Unit, Department of Population Health Sciences, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
| |
Collapse
|
6
|
Agamah FE, Bayjanov JR, Niehues A, Njoku KF, Skelton M, Mazandu GK, Ederveen THA, Mulder N, Chimusa ER, 't Hoen PAC. Computational approaches for network-based integrative multi-omics analysis. Front Mol Biosci 2022; 9:967205. [PMID: 36452456 PMCID: PMC9703081 DOI: 10.3389/fmolb.2022.967205] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 10/20/2022] [Indexed: 08/27/2023] Open
Abstract
Advances in omics technologies allow for holistic studies into biological systems. These studies rely on integrative data analysis techniques to obtain a comprehensive view of the dynamics of cellular processes, and molecular mechanisms. Network-based integrative approaches have revolutionized multi-omics analysis by providing the framework to represent interactions between multiple different omics-layers in a graph, which may faithfully reflect the molecular wiring in a cell. Here we review network-based multi-omics/multi-modal integrative analytical approaches. We classify these approaches according to the type of omics data supported, the methods and/or algorithms implemented, their node and/or edge weighting components, and their ability to identify key nodes and subnetworks. We show how these approaches can be used to identify biomarkers, disease subtypes, crosstalk, causality, and molecular drivers of physiological and pathological mechanisms. We provide insight into the most appropriate methods and tools for research questions as showcased around the aetiology and treatment of COVID-19 that can be informed by multi-omics data integration. We conclude with an overview of challenges associated with multi-omics network-based analysis, such as reproducibility, heterogeneity, (biological) interpretability of the results, and we highlight some future directions for network-based integration.
Collapse
Affiliation(s)
- Francis E. Agamah
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Jumamurat R. Bayjanov
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Anna Niehues
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Kelechi F. Njoku
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Michelle Skelton
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Gaston K. Mazandu
- Division of Human Genetics, Department of Pathology, Institute of Infectious Disease and Molecular Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- African Institute for Mathematical Sciences, Cape Town, South Africa
| | - Thomas H. A. Ederveen
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| | - Nicola Mulder
- Computational Biology Division, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, CIDRI-Africa Wellcome Trust Centre, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
| | - Emile R. Chimusa
- Department of Applied Sciences, Faculty of Health and Life Sciences, Northumbria University, Newcastle, United Kingdom
| | - Peter A. C. 't Hoen
- Center for Molecular and Biomolecular Informatics (CMBI), Radboud Institute for Molecular Life Sciences, Radboud University Medical Center, Nijmegen, Netherlands
| |
Collapse
|
7
|
Clifton R, Monaghan EM, Green MJ, Purdy KJ, Green LE. Differences in composition of interdigital skin microbiota predict sheep and feet that develop footrot. Sci Rep 2022; 12:8931. [PMID: 35624131 PMCID: PMC9142565 DOI: 10.1038/s41598-022-12772-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Accepted: 05/12/2022] [Indexed: 11/29/2022] Open
Abstract
Footrot has a major impact on health and productivity of sheep worldwide. The current paradigm for footrot pathogenesis is that physical damage to the interdigital skin (IDS) facilitates invasion of the essential pathogen Dichelobacter nodosus. The composition of the IDS microbiota is different in healthy and diseased feet, so an alternative hypothesis is that changes in the IDS microbiota facilitate footrot. We investigated the composition and diversity of the IDS microbiota of ten sheep, five that did develop footrot and five that did not (healthy) at weekly intervals for 20 weeks. The IDS microbiota was less diverse on sheep 2 + weeks before they developed footrot than on healthy sheep. This change could be explained by only seven of > 2000 bacterial taxa detected. The incubation period of footrot is 8–10 days, and there was a further reduction in microbial diversity on feet that developed footrot in that incubation period. We conclude that there are two stages of dysbiosis in footrot: the first predisposes sheep to footrot and the second occurs in feet during the incubation of footrot. These findings represent a step change in our understanding of the role of the IDS microbiota in footrot pathogenesis.
Collapse
Affiliation(s)
- Rachel Clifton
- Institute of Microbiology and Infection, University of Birmingham, Edgbaston, UK.
| | - Emma M Monaghan
- Institute of Microbiology and Infection, University of Birmingham, Edgbaston, UK
| | - Martin J Green
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, UK
| | - Kevin J Purdy
- School of Life Sciences, University of Warwick, Coventry, UK
| | - Laura E Green
- Institute of Microbiology and Infection, University of Birmingham, Edgbaston, UK
| |
Collapse
|
8
|
Development and Validation of Web-Based Tool to Predict Lamina Propria Fibrosis in Eosinophilic Esophagitis. Am J Gastroenterol 2022; 117:272-279. [PMID: 34932022 PMCID: PMC8858426 DOI: 10.14309/ajg.0000000000001587] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 11/19/2021] [Indexed: 02/03/2023]
Abstract
INTRODUCTION Approximately half of esophageal biopsies from patients with eosinophilic esophagitis (EoE) contain inadequate lamina propria, making it impossible to determine the lamina propria fibrosis (LPF). This study aimed to develop and validate a web-based tool to predict LPF in esophageal biopsies with inadequate lamina propria. METHODS Prospectively collected demographic and clinical data and scores for 7 relevant EoE histology scoring system epithelial features from patients with EoE participating in the Consortium of Eosinophilic Gastrointestinal Disease Researchers observational study were used to build the models. Using the least absolute shrinkage and selection operator method, variables strongly associated with LPF were identified. Logistic regression was used to develop models to predict grade and stage of LPF. The grade model was validated using an independent data set. RESULTS Of 284 patients in the discovery data set, median age (quartiles) was 16 (8-31) years, 68.7% were male patients, and 93.4% were White. Age of the patient, basal zone hyperplasia, dyskeratotic epithelial cells, and surface epithelial alteration were associated with presence of LPF. The area under the receiver operating characteristic curve for the grade model was 0.84 (95% confidence interval: 0.80-0.89) and for stage model was 0.79 (95% confidence interval: 0.74-0.84). Our grade model had 82% accuracy in predicting the presence of LPF in an external validation data set. DISCUSSION We developed parsimonious models (grade and stage) to predict presence of LPF in esophageal biopsies with inadequate lamina propria and validated our grade model. Our predictive models can be easily used in the clinical setting to include LPF in clinical decisions and determine its effect on treatment outcomes.
Collapse
|
9
|
Kucukseymen S, Arafati A, Al-Otaibi T, El-Rewaidy H, Fahmy AS, Ngo LH, Nezafat R. Noncontrast Cardiac Magnetic Resonance Imaging Predictors of Heart Failure Hospitalization in Heart Failure With Preserved Ejection Fraction. J Magn Reson Imaging 2021; 55:1812-1825. [PMID: 34559435 DOI: 10.1002/jmri.27932] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 09/11/2021] [Accepted: 09/13/2021] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Heart failure patients with preserved ejection fraction (HFpEF) are at increased risk of future hospitalization. Contrast agents are often contra-indicated in HFpEF patients due to the high prevalence of concomitant kidney disease. Therefore, the prognostic value of a noncontrast cardiac magnetic resonance imaging (MRI) for HF-hospitalization is important. PURPOSE To develop and test an explainable machine learning (ML) model to investigate incremental value of noncontrast cardiac MRI for predicting HF-hospitalization. STUDY TYPE Retrospective. POPULATION A total of 203 HFpEF patients (mean, 64 ± 12 years, 48% women) referred for cardiac MRI were randomly split into training validation (143 patients, ~70%) and test sets (60 patients, ~30%). FIELD STRENGTH A 1.5 T, balanced steady-state free precession (bSSFP) sequence. ASSESSMENT Two ML models were built based on the tree boosting technique and the eXtreme Gradient Boosting model (XGBoost): 1) basic clinical ML model using clinical and echocardiographic data and 2) cardiac MRI-based ML model that included noncontrast cardiac MRI markers in addition to the basic model. The primary end point was defined as HF-hospitalization. STATISTICAL TESTS ML tool was used for advanced statistics, and the Elastic Net method for feature selection. Area under the receiver operating characteristic (ROC) curve (AUC) was compared between models using DeLong's test. To gain insight into the ML model, the SHapley Additive exPlanations (SHAP) method was leveraged. A P-value <0.05 was considered statistically significant. RESULTS During follow-up (mean, 50 ± 39 months), 85 patients (42%) reached the end point. The cardiac MRI-based ML model using the XGBoost algorithm provided a significantly superior prediction of HF-hospitalization (AUC: 0.81) compared to the basic model (AUC: 0.64). The SHAP analysis revealed left atrium (LA) and right atrium (RV) strains as top imaging markers contributing to its performance with cutoff values of 17.5% and -15%, respectively. DATA CONCLUSIONS Using an ML model, RV and LA strains measured in noncontrast cardiac MRI provide incremental value in predicting future hospitalization in HFpEF. EVIDENCE LEVEL 3 TECHNICAL EFFICACY: Stage 2.
Collapse
Affiliation(s)
- Selcuk Kucukseymen
- Department of Medicine, Cardiovascular Division, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA
| | - Arghavan Arafati
- Department of Medicine, Cardiovascular Division, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA
| | - Talal Al-Otaibi
- Department of Medicine, Cardiovascular Division, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA
| | - Hossam El-Rewaidy
- Department of Medicine, Cardiovascular Division, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA.,Department of Computer Science, Technical University of Munich, Munich, Germany
| | - Ahmed S Fahmy
- Department of Medicine, Cardiovascular Division, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA
| | - Long H Ngo
- Department of Medicine, Cardiovascular Division, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA
| | - Reza Nezafat
- Department of Medicine, Cardiovascular Division, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
10
|
Nam SM, Peterson TA, Seo KY, Han HW, Kang JI. Discovery of Depression-Associated Factors From a Nationwide Population-Based Survey: Epidemiological Study Using Machine Learning and Network Analysis. J Med Internet Res 2021; 23:e27344. [PMID: 34184998 PMCID: PMC8277318 DOI: 10.2196/27344] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 03/06/2021] [Accepted: 05/06/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND In epidemiological studies, finding the best subset of factors is challenging when the number of explanatory variables is large. OBJECTIVE Our study had two aims. First, we aimed to identify essential depression-associated factors using the extreme gradient boosting (XGBoost) machine learning algorithm from big survey data (the Korea National Health and Nutrition Examination Survey, 2012-2016). Second, we aimed to achieve a comprehensive understanding of multifactorial features in depression using network analysis. METHODS An XGBoost model was trained and tested to classify "current depression" and "no lifetime depression" for a data set of 120 variables for 12,596 cases. The optimal XGBoost hyperparameters were set by an automated machine learning tool (TPOT), and a high-performance sparse model was obtained by feature selection using the feature importance value of XGBoost. We performed statistical tests on the model and nonmodel factors using survey-weighted multiple logistic regression and drew a correlation network among factors. We also adopted statistical tests for the confounder or interaction effect of selected risk factors when it was suspected on the network. RESULTS The XGBoost-derived depression model consisted of 18 factors with an area under the weighted receiver operating characteristic curve of 0.86. Two nonmodel factors could be found using the model factors, and the factors were classified into direct (P<.05) and indirect (P≥.05), according to the statistical significance of the association with depression. Perceived stress and asthma were the most remarkable risk factors, and urine specific gravity was a novel protective factor. The depression-factor network showed clusters of socioeconomic status and quality of life factors and suggested that educational level and sex might be predisposing factors. Indirect factors (eg, diabetes, hypercholesterolemia, and smoking) were involved in confounding or interaction effects of direct factors. Triglyceride level was a confounder of hypercholesterolemia and diabetes, smoking had a significant risk in females, and weight gain was associated with depression involving diabetes. CONCLUSIONS XGBoost and network analysis were useful to discover depression-related factors and their relationships and can be applied to epidemiological studies using big survey data.
Collapse
Affiliation(s)
- Sang Min Nam
- Department of Ophthalmology, CHA Bundang Medical Center, CHA University, Seongnam, Republic of Korea
| | - Thomas A Peterson
- UCSF REACH Informatics Core, Department of Orthopaedic Surgery, Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, United States
| | - Kyoung Yul Seo
- Department of Ophthalmology, Institute of Vision Research, Eye and Ear Hospital, Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Hyun Wook Han
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam, Republic of Korea
| | - Jee In Kang
- Department of Psychiatry, Institute of Behavioral Science in Medicine, Yonsei University College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
11
|
Lewis KE, Green MJ, Witt J, Green LE. Multiple model triangulation to identify factors associated with lameness in British sheep flocks. Prev Vet Med 2021; 193:105395. [PMID: 34119859 PMCID: PMC8326248 DOI: 10.1016/j.prevetmed.2021.105395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 05/25/2021] [Accepted: 05/29/2021] [Indexed: 11/13/2022]
Abstract
Multiple model triangulation identifies variables that are likely true positives. Triangulation increases confidence in which managements to recommend in practice. Effective management of ewes can lower prevalence of lameness in ewes and lambs.
Identification of factors associated with an outcome can be challenging when the number of explanatory variables is large in relation to the number of observations. Multiple model triangulation, where results from several model types are combined, improves the likelihood of identifying true predictor variables. The aim of this study was to use triangulation to identify covariates likely to be truly associated with the prevalence of lameness in sheep flocks in Great Britain. Data were collected using a questionnaire sent to 3200 sheep farmers in Great Britain in 2018. The useable response rate was 14.1 %. The geometric mean prevalence of lameness was 1.4 % (95 % CI 1.2−1.7) for ewes, and 0.6 % (95 % CI 0.5−0.9) for lambs, however, approximately 60 % flocks had >2% prevalence of lameness in ewes. Four model types were investigated, two generalised linear models (negative binomial and quasi-Poisson) built using stepwise selection, and two elastic net models (Poisson and Gaussian distributions) refined with selection stability estimation. Triangulated covariates were those selected in three or all four models – 10 for ewes and 12 for lambs. Higher prevalence of lameness in ewes was associated with 5−100% feet bleeding during routine foot trimming compared with not foot trimming, footbathing the flock to treat severe footrot (SFR) and always using formalin in footbaths, both compared with not footbathing, using FootVax™ for <1 year compared with not using FootVax™, and never quarantining new or returning sheep to the farm for >3 weeks compared with always. Lower prevalence of lameness in ewes was associated with vaccinating with FootVax™ for >5 years compared with not vaccinating, peat soil compared with no peat soil, and having no lame ewes to treat. Higher prevalence of lameness in lambs was associated with 5−100% feet bleeding during routine foot trimming, always foot trimming ewes with SFR, not knowingly selecting replacement ewes from ewes that were never lame compared with always, replacement sheep purchased and homebred compared with only homebred, treating lambs >3 days after recognition of lameness compared with 0-3 days and footbathing the flock to treat interdigital dermatitis compared with not footbathing at all. Lower prevalence of lameness in lambs was associated with peat soil, flocks in Scotland versus England, an altitude of >230−500 m compared with ≤230 m, never using antibiotic injection to treat lambs with SFR compared with always, and having no lame lambs to treat. We conclude triangulation identified reliable management practices for farmers to implement to minimise lameness in sheep.
Collapse
Affiliation(s)
- K E Lewis
- School of Life Sciences, Gibbet Hill, Warwick University, Coventry, United Kingdom.
| | - M J Green
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, United Kingdom
| | - J Witt
- School of Life Sciences, Gibbet Hill, Warwick University, Coventry, United Kingdom
| | - L E Green
- Institute of Microbiology and Infection, College of Life and Environmental Sciences, University of Birmingham, Birmingham, United Kingdom
| |
Collapse
|
12
|
Lima E, Hyde R, Green M. Model selection for inferential models with high dimensional data: synthesis and graphical representation of multiple techniques. Sci Rep 2021; 11:412. [PMID: 33431921 PMCID: PMC7801732 DOI: 10.1038/s41598-020-79317-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2020] [Accepted: 12/07/2020] [Indexed: 12/18/2022] Open
Abstract
Inferential research commonly involves identification of causal factors from within high dimensional data but selection of the 'correct' variables can be problematic. One specific problem is that results vary depending on statistical method employed and it has been argued that triangulation of multiple methods is advantageous to safely identify the correct, important variables. To date, no formal method of triangulation has been reported that incorporates both model stability and coefficient estimates; in this paper we develop an adaptable, straightforward method to achieve this. Six methods of variable selection were evaluated using simulated datasets of different dimensions with known underlying relationships. We used a bootstrap methodology to combine stability matrices across methods and estimate aggregated coefficient distributions. Novel graphical approaches provided a transparent route to visualise and compare results between methods. The proposed aggregated method provides a flexible route to formally triangulate results across any chosen number of variable selection methods and provides a combined result that incorporates uncertainty arising from between-method variability. In these simulated datasets, the combined method generally performed as well or better than the individual methods, with low error rates and clearer demarcation of the true causal variables than for the individual methods.
Collapse
Affiliation(s)
- Eliana Lima
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, LE12 5RD, UK
| | - Robert Hyde
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, LE12 5RD, UK
| | - Martin Green
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, LE12 5RD, UK.
| |
Collapse
|
13
|
Zandler H, Senftl T, Vanselow KA. Reanalysis datasets outperform other gridded climate products in vegetation change analysis in peripheral conservation areas of Central Asia. Sci Rep 2020; 10:22446. [PMID: 33384431 PMCID: PMC7775429 DOI: 10.1038/s41598-020-79480-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Accepted: 12/08/2020] [Indexed: 11/29/2022] Open
Abstract
Global environmental research requires long-term climate data. Yet, meteorological infrastructure is missing in the vast majority of the world’s protected areas. Therefore, gridded products are frequently used as the only available climate data source in peripheral regions. However, associated evaluations are commonly biased towards well observed areas and consequently, station-based datasets. As evaluations on vegetation monitoring abilities are lacking for regions with poor data availability, we analyzed the potential of several state-of-the-art climate datasets (CHIRPS, CRU, ERA5-Land, GPCC-Monitoring-Product, IMERG-GPM, MERRA-2, MODIS-MOD10A1) for assessing NDVI anomalies (MODIS-MOD13Q1) in two particularly suitable remote conservation areas. We calculated anomalies of 156 climate variables and seasonal periods during 2001–2018, correlated these with vegetation anomalies while taking the multiple comparison problem into consideration, and computed their spatial performance to derive suitable parameters. Our results showed that four datasets (MERRA-2, ERA5-Land, MOD10A1, CRU) were suitable for vegetation analysis in both regions, by showing significant correlations controlled at a false discovery rate < 5% and in more than half of the analyzed areas. Cross-validated variable selection and importance assessment based on the Boruta algorithm indicated high importance of the reanalysis datasets ERA5-Land and MERRA-2 in both areas but higher differences and variability between the regions with all other products. CHIRPS, GPCC and the bias-corrected version of MERRA-2 were unsuitable and not important in both regions. We provide evidence that reanalysis datasets are most suitable for spatiotemporally consistent environmental analysis whereas gauge- or satellite-based products and their combinations are highly variable and may not be applicable in peripheral areas.
Collapse
Affiliation(s)
- Harald Zandler
- Working Group of Climatology, Department of Geography, University of Bayreuth, Universitätsstr. 30, 95447, Bayreuth, Germany. .,Bayreuth Center of Ecology and Environmental Research, University of Bayreuth, Dr. Hans-Frisch-Straße 1-3, 95448, Bayreuth, Germany.
| | - Thomas Senftl
- Working Group of Climatology, Department of Geography, University of Bayreuth, Universitätsstr. 30, 95447, Bayreuth, Germany
| | - Kim André Vanselow
- Institute of Geography, Friedrich-Alexander-Universität Erlangen-Nürnberg, Wetterkreuz 15, 91058, Erlangen, Germany
| |
Collapse
|