1
|
Rafiee M, Jahangiri-Rad M, Mohseni-Bandpei A, Razmi E. Impacts of socioeconomic and environmental factors on neoplasms incidence rates using machine learning and GIS: a cross-sectional study in Iran. Sci Rep 2024; 14:10604. [PMID: 38719879 PMCID: PMC11078954 DOI: 10.1038/s41598-024-61397-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 05/06/2024] [Indexed: 05/12/2024] Open
Abstract
Neoplasm is an umbrella term used to describe either benign or malignant conditions. The correlations between socioeconomic and environmental factors and the occurrence of new-onset of neoplasms have already been demonstrated in a body of research. Nevertheless, few studies have specifically dealt with the nature of relationship, significance of risk factors, and geographic variation of them, particularly in low- and middle-income communities. This study, thus, set out to (1) analyze spatiotemporal variations of the age-adjusted incidence rate (AAIR) of neoplasms in Iran throughout five time periods, (2) investigate relationships between a collection of environmental and socioeconomic indicators and the AAIR of neoplasms all over the country, and (3) evaluate geographical alterations in their relative importance. Our cross-sectional study design was based on county-level data from 2010 to 2020. AAIR of neoplasms data was acquired from the Institute for Health Metrics and Evaluation (IHME). HotSpot analyses and Anselin Local Moran's I indices were deployed to precisely identify AAIR of neoplasms high- and low-risk clusters. Multi-scale geographically weight regression (MGWR) analysis was worked out to evaluate the association between each explanatory variable and the AAIR of neoplasms. Utilizing random forests (RF), we also examined the relationships between environmental (e.g., UV index and PM2.5 concentration) and socioeconomic (e.g., Gini coefficient and literacy rate) factors and AAIR of neoplasms. AAIR of neoplasms displayed a significant increasing trend over the study period. According to the MGWR, the only factor that significantly varied spatially and was associated with the AAIR of neoplasms in Iran was the UV index. A good accuracy RF model was confirmed for both training and testing data with correlation coefficients R2 greater than 0.91 and 0.92, respectively. UV index and Gini coefficient ranked the highest variables in the prediction of AAIR of neoplasms, based on the relative influence of each variable. More research using machine learning approaches taking the advantages of considering all possible determinants is required to assess health strategies outcomes and properly formulate policy planning.
Collapse
Affiliation(s)
- Mohammad Rafiee
- Air Quality and Climate Change Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Department of Environmental Health Engineering, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mahsa Jahangiri-Rad
- Department of Environmental Health Engineering, School of Health, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran.
- Water Purification Research Center, Islamic Azad University, Tehran, Iran.
| | - Anoushiravan Mohseni-Bandpei
- Air Quality and Climate Change Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Department of Environmental Health Engineering, School of Public Health and Safety, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Elham Razmi
- Department of Environmental Health Engineering, School of Public Health, Iran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
2
|
McCarty JC, Cross RE, Laane CLE, Hoftiezer YAJ, Gavagnin A, Regazzoni P, Fernandez Dell'Oca A, Jupiter JB, Bhashyam AR. Teardrop Alignment Changes After Volar Locking Plate Fixation of Distal Radius Fractures With Volar Ulnar Fragments. Hand (N Y) 2024:15589447241233762. [PMID: 38439630 DOI: 10.1177/15589447241233762] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/06/2024]
Abstract
BACKGROUND We assessed factors associated with change in radiographic teardrop angle following volar locking plate (VLP) fixation of volarly displaced intra-articular distal radius fractures with volar ulnar fragments (VUF) within the ICUC database. The primary outcome was change in radiographic alignment on follow-up imaging, defined as a change in teardrop angle from intra-operative fluoroscopy greater than 5°. METHODS Patients with distal radius fractures treated with a VLP within the ICUC database, an international collaborative and publicly available dataset, were identified. The primary outcome was volar rim loss of reduction on follow-up imaging, defined as a change in radiographic alignment from intra-operative fluoroscopy, teardrop angle less than 50°, or loss of normal radiocarpal alignment. Secondary outcomes were final range of motion (ROM) of the affected extremity. Radiographic Soong classification was used to grade plate position. Descriptive statistics were used to assess variables' distributions. A Random Forest supervised machine learning algorithm was used to classify variable importance for predicting the primary outcome. Traditional descriptive statistics were used to compare patient, fracture, and treatment characteristics with volar rim loss of reduction. Volar rim loss of reduction and final ROM in degrees and as compared with contralateral unaffected limb were also assessed. RESULTS Fifty patients with volarly displaced, intra-articular distal radius fractures treated with a VLP were identified. Six patients were observed to have a volar rim loss of reduction, but none required reoperation. Volar ulnar fragment size, Soong grade 0, and postfixation axial plate position in relation to the sigmoid notch were significantly associated (P < .05) with volar rim loss of reduction. All cases of volar rim loss of reduction occurred when VUF was 10.8 mm or less. CONCLUSIONS The size of the VUF was the most important variable for predicting volar rim loss of reduction followed by postfixation plate position in an axial position to the sigmoid notch and the number of volar fragments in the Random Forest machine learning algorithm. There were no significant differences in ROM between patients with volar ulnar escape and those without.
Collapse
Affiliation(s)
- Justin C McCarty
- Hand & Arm Service, Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, USA
- Department of Plastic and Reconstructive Surgery, Massachusetts General Hospital, Boston, USA
| | - Rachel E Cross
- Hand & Arm Service, Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, USA
- Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, USA
| | - Charlotte L E Laane
- Hand & Arm Service, Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, USA
- Trauma Research Unit, Department of Surgery, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Yannick Albert J Hoftiezer
- Hand & Arm Service, Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, USA
- Department of Plastic, Reconstructive and Hand Surgery, Radboud Institute for Health Sciences, Radboud University Medical Center, Nijmegen, The Netherlands
| | - Aquiles Gavagnin
- Department of Orthopedics, Hospital Britanico Montevideo, Uruguay
| | | | | | - Jesse B Jupiter
- Hand & Arm Service, Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, USA
| | - Abhiram R Bhashyam
- Hand & Arm Service, Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, USA
| |
Collapse
|
3
|
Wies C, Miltenberger R, Grieser G, Jahn-Eimermacher A. Exploring the variable importance in random forests under correlations: a general concept applied to donor organ quality in post-transplant survival. BMC Med Res Methodol 2023; 23:209. [PMID: 37726680 PMCID: PMC10507897 DOI: 10.1186/s12874-023-02023-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 08/23/2023] [Indexed: 09/21/2023] Open
Abstract
Random Forests are a powerful and frequently applied Machine Learning tool. The permutation variable importance (VIMP) has been proposed to improve the explainability of such a pure prediction model. It describes the expected increase in prediction error after randomly permuting a variable and disturbing its association with the outcome. However, VIMPs measure a variable's marginal influence only, that can make its interpretation difficult or even misleading. In the present work we address the general need for improving the explainability of prediction models by exploring VIMPs in the presence of correlated variables. In particular, we propose to use a variable's residual information for investigating if its permutation importance partially or totally originates from correlated predictors. Hypotheses tests are derived by a resampling algorithm that can further support results by providing test decisions and p-values. In simulation studies we show that the proposed test controls type I error rates. When applying the methods to a Random Forest analysis of post-transplant survival after kidney transplantation, the importance of kidney donor quality for predicting post-transplant survival is shown to be high. However, the transplant allocation policy introduces correlations with other well-known predictors, which raises the concern that the importance of kidney donor quality may simply originate from these predictors. By using the proposed method, this concern is addressed and it is demonstrated that kidney donor quality plays an important role in post-transplant survival, regardless of correlations with other predictors.
Collapse
Affiliation(s)
- Christoph Wies
- Department of Mathematics and Natural Sciences, Darmstadt University of Applied Sciences, Schöfferstraße 3, Darmstadt, 64295, Germany
- Digital Biomarkers for Oncology, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 223, Heidelberg, 69120, Germany
- Medical Facility, University Heidelberg, Im Neuenheimer Feld 672, Heidelberg, 69120, Germany
| | - Robert Miltenberger
- Department of Mathematics and Natural Sciences, Darmstadt University of Applied Sciences, Schöfferstraße 3, Darmstadt, 64295, Germany
| | - Gunter Grieser
- Department of Computer Science, Darmstadt University of Applied Sciences, Schöfferstraße 3, Darmstadt, 64295, Germany
| | - Antje Jahn-Eimermacher
- Department of Mathematics and Natural Sciences, Darmstadt University of Applied Sciences, Schöfferstraße 3, Darmstadt, 64295, Germany.
| |
Collapse
|
4
|
Huang J, Zhou Y, Zhang H, Wu Y. A neural network model to screen feature genes for pancreatic cancer. BMC Bioinformatics 2023; 24:193. [PMID: 37170188 PMCID: PMC10176951 DOI: 10.1186/s12859-023-05322-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 05/05/2023] [Indexed: 05/13/2023] Open
Abstract
All the time, pancreatic cancer is a problem worldwide because of its high degree of malignancy and increased mortality. Neural network model analysis is an efficient and accurate machine learning method that can quickly and accurately predict disease feature genes. The aim of our research was to build a neural network model that would help screen out feature genes for pancreatic cancer diagnosis and prediction of prognosis. Our study confirmed that the neural network model is a reliable way to predict feature genes of pancreatic cancer, and immune cells infiltrating play an essential role in the development of pancreatic cancer, especially neutrophils. ANO1, AHNAK2, and ADAM9 were eventually identified as feature genes of pancreatic cancer, helping to diagnose and predict prognosis. Neural network model analysis provides us with a new idea for finding new intervention targets for pancreatic cancer.
Collapse
Affiliation(s)
- Jing Huang
- Department of Gastroenterology, First Hospital of Jiaxing, Jiaxing, 314001, Zhejiang, China
| | - Yuting Zhou
- Department of Respiratory, The 904Th Hospital of Joint Logistic Support Force of PLA, Affiliated Hospital of Jiangnan University, Wuxi, 214000, Jiangsu, China
| | - Haoran Zhang
- Department of Gastroenterology, First Hospital of Jiaxing, Jiaxing, 314001, Zhejiang, China
| | - Yiming Wu
- Department of Gastroenterology, First Hospital of Jiaxing, Jiaxing, 314001, Zhejiang, China.
| |
Collapse
|
5
|
Deng H, Zhao N, Wang Y. Identifying Chinese social media users' need for affect from their online behaviors. Front Public Health 2023; 10:1045279. [PMID: 36703844 PMCID: PMC9871915 DOI: 10.3389/fpubh.2022.1045279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Accepted: 12/08/2022] [Indexed: 01/12/2023] Open
Abstract
The need for affect (NFA), which refers to the motivation to approach or avoid emotion-inducing situations, is a valuable indicator of mental health monitoring and intervention, as well as many other applications. Traditionally, NFA has been measured using self-reports, which is not applicable in today's online scenarios due to its shortcomings in fast, large-scale assessments. This study proposed an automatic and non-invasive method for recognizing NFA based on social media behavioral data. The NFA questionnaire scores of 934 participants and their social media data were acquired. Then we run machine learning algorithms to train predictive models, which can be used to automatically identify NFA degrees of online users. The results showed that Extreme Gradient Boosting (XGB) performed best among several algorithms. The Pearson correlation coefficients between predicted scores and NFA questionnaire scores achieved 0.25 (NFA avoidance), 0.31 (NFA approach) and 0.34 (NFA total), and the split-half reliabilities were 0.66-0.70. Our research demonstrated that adolescents' NFA can be identified based on their social media behaviors, and opened a novel way of non-intrusively perceiving users' NFA which can be used for mental health monitoring and other situations that require large-scale NFA measurements.
Collapse
Affiliation(s)
- Hong Deng
- Institute of Psychology, Chinese Academy of Sciences, Beijing, China,Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| | - Nan Zhao
- Institute of Psychology, Chinese Academy of Sciences, Beijing, China,Department of Psychology, University of Chinese Academy of Sciences, Beijing, China,*Correspondence: Nan Zhao ✉
| | - Yilin Wang
- Institute of Psychology, Chinese Academy of Sciences, Beijing, China,Department of Psychology, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
6
|
Zhang Y, Hua S, Jiang Q, Xie Z, Wu L, Wang X, Shi F, Dong S, Jiang J. Identification of Feature Genes of a Novel Neural Network Model for Bladder Cancer. Front Genet 2022; 13:912171. [PMID: 35719407 PMCID: PMC9198295 DOI: 10.3389/fgene.2022.912171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Accepted: 05/10/2022] [Indexed: 11/13/2022] Open
Abstract
Background: The combination of deep learning methods and oncogenomics can provide an effective diagnostic method for malignant tumors; thus, we attempted to construct a reliable artificial neural network model as a novel diagnostic tool for Bladder cancer (BLCA). Methods: Three expression profiling datasets (GSE61615, GSE65635, and GSE100926) were downloaded from the Gene Expression Omnibus (GEO) database. GSE61615 and GSE65635 were taken as the train group, while GSE100926 was set as the test group. Differentially expressed genes (DEGs) were filtered out based on the logFC and FDR values. We also performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses to explore the biological functions of the DEGs. Consequently, we utilized a random forest algorithm to identify feature genes and further constructed a neural network model. The test group was given the same procedures to validate the reliability of the model. We also explored immune cells' infiltration degree and correlation coefficients through the CiberSort algorithm and corrplot R package. The qRT-PCR assay was implemented to examine the expression level of the feature genes in vitro. Results: A total of 265 DEGs were filtered out and significantly enriched in muscle system processes, collagen-containing and focal adhesion signaling pathways. Based on the random forest algorithm, we selected 14 feature genes to construct the neural network model. The area under the curve (AUC) of the training group was 0.950 (95% CI: 0.850-1.000), and the AUC of the test group was 0.667 (95% CI: 0.333-1.000). Besides, we observed significant differences in the content of immune infiltrating cells and the expression levels of the feature genes. Conclusion: After repeated verification, our neural network model had clinical feasibility to identify bladder cancer patients and provided a potential target to improve the management of BLCA.
Collapse
Affiliation(s)
- Yongqing Zhang
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shan Hua
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Qiheng Jiang
- Department of Medicine, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Zhiwen Xie
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Lei Wu
- Department of Urology, Shanghai General Hospital, Nanjing Medical University School of Medicine, Shanghai, China
| | - Xinjie Wang
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Fei Shi
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Shengli Dong
- Nursing Department, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Juntao Jiang
- Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.,Department of Medicine, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| |
Collapse
|
7
|
Cho B, Geng E, Arvind V, Valliani AA, Tang JE, Schwartz J, Dominy C, Cho SK, Kim JS. Understanding Artificial Intelligence and Predictive Analytics: A Clinically Focused Review of Machine Learning Techniques. JBJS Rev 2022; 10:01874474-202203000-00013. [PMID: 35302963 DOI: 10.2106/jbjs.rvw.21.00142] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
» Machine learning and artificial intelligence have seen tremendous growth in recent years and have been applied in numerous studies in the field of orthopaedics. » Machine learning will soon become critical in the day-to-day operations of orthopaedic practice; therefore, it is imperative that providers become accustomed to and familiar with not only the terminology but also the fundamental techniques behind the technology. » A foundation of knowledge regarding machine learning is critical for physicians so they can begin to understand the details in the algorithms that are being developed, which provide improved accuracy compared with clinicians, decreased time required, and a heightened ability to triage patients.
Collapse
Affiliation(s)
- Brian Cho
- Department of Orthopedics, Icahn School of Medicine at Mount Sinai, New York, NY
| | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Chen Z, Chen J, Zhou J, Lei F, Zhou F, Qin JJ, Zhang XJ, Zhu L, Liu YM, Wang H, Chen MM, Zhao YC, Xie J, Shen L, Song X, Zhang X, Yang C, Liu W, Zhang X, Guo D, Yan Y, Liu M, Mao W, Liu L, Ye P, Xiao B, Luo P, Zhang Z, Lu Z, Wang J, Lu H, Xia X, Wang D, Liao X, Peng G, Liang L, Yang J, Chen G, Azzolini E, Aghemo A, Ciccarelli M, Condorelli G, Stefanini GG, Wei X, Zhang BH, Huang X, Xia J, Yuan Y, She ZG, Guo J, Wang Y, Zhang P, Li H. A risk score based on baseline risk factors for predicting mortality in COVID-19 patients. Curr Med Res Opin 2021; 37:917-927. [PMID: 33729889 PMCID: PMC8054492 DOI: 10.1080/03007995.2021.1904862] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
BACKGROUND To develop a sensitive and clinically applicable risk assessment tool identifying coronavirus disease 2019 (COVID-19) patients with a high risk of mortality at hospital admission. This model would assist frontline clinicians in optimizing medical treatment with limited resources. METHODS 6415 patients from seven hospitals in Wuhan city were assigned to the training and testing cohorts. A total of 6351 patients from another three hospitals in Wuhan, 2169 patients from outside of Wuhan, and 553 patients from Milan, Italy were assigned to three independent validation cohorts. A total of 64 candidate clinical variables at hospital admission were analyzed by random forest and least absolute shrinkage and selection operator (LASSO) analyses. RESULTS Eight factors, namely, Oxygen saturation, blood Urea nitrogen, Respiratory rate, admission before the date the national Maximum number of daily new cases was reached, Age, Procalcitonin, C-reactive protein (CRP), and absolute Neutrophil counts, were identified as having significant associations with mortality in COVID-19 patients. A composite score based on these eight risk factors, termed the OURMAPCN-score, predicted the risk of mortality among the COVID-19 patients, with a C-statistic of 0.92 (95% confidence interval [CI] 0.90-0.93). The hazard ratio for all-cause mortality between patients with OURMAPCN-score >11 compared with those with scores ≤ 11 was 18.18 (95% CI 13.93-23.71; p < .0001). The predictive performance, specificity, and sensitivity of the score were validated in three independent cohorts. CONCLUSIONS The OURMAPCN score is a risk assessment tool to determine the mortality rate in COVID-19 patients based on a limited number of baseline parameters. This tool can assist physicians in optimizing the clinical management of COVID-19 patients with limited hospital resources.
Collapse
Affiliation(s)
- Ze Chen
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Jing Chen
- Institute of Model Animal, Wuhan University, Wuhan, China
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan, China
| | - Jianghua Zhou
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Fang Lei
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Feng Zhou
- Institute of Model Animal, Wuhan University, Wuhan, China
- Medical Science Research Center, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Juan-Juan Qin
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Xiao-Jing Zhang
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Lihua Zhu
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Ye-Mao Liu
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Haitao Wang
- Department of Hepatobiliary and Pancreatic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Ming-Ming Chen
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Yan-Ci Zhao
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Jing Xie
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
| | - Lijun Shen
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Xiaohui Song
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Xingyuan Zhang
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Chengzhang Yang
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Weifang Liu
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Xiao Zhang
- Eye Center, Renmin Hospital of Wuhan University, Wuhan, China
| | - Deliang Guo
- Department of Hepatobiliary and Pancreatic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Youqin Yan
- Infections Department, Wuhan Seventh Hospital, Wuhan, China
| | - Mingyu Liu
- The Ninth Hospital of Wuhan City, Wuhan, China
| | - Weiming Mao
- Department of General Surgery, Huanggang Central Hospital, Huanggang, China
| | - Liming Liu
- Department of General Surgery, Ezhou Central Hospital, Ezhou, China
| | - Ping Ye
- Department of Cardiology, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Bing Xiao
- Department of Stomatology, Xiantao First People’s Hospital, Xiantao, China
| | - Pengcheng Luo
- Department of Urology, Wuhan Third Hospital and Tongren Hospital of Wuhan University, Wuhan, China
| | - Zixiong Zhang
- The Central Hospital of Enshi Tujia and Miao Autonomous Prefecture, Enshi, China
| | - Zhigang Lu
- Department of Neurology, The First People’s Hospital of Jingmen affiliated to Hubei Minzu University, Jingmen, China
| | - Junhai Wang
- Department of Orthopedics, The First People’s Hospital of Jingmen affiliated to Hubei Minzu University, Jingmen, China
| | - Haofeng Lu
- Department of Hepatobiliary Surgery, The First Affiliated Hospital of Changjiang University, Jingzhou, China
| | - Xigang Xia
- Department of Hepatobiliary Surgery, Jingzhou Central Hospital, Jingzhou, China
| | - Daihong Wang
- Department of Hepatobiliary and Pancreatic Surgery, Xianning Central Hospital, Hubei Province, Xianning, China
| | - Xiaofeng Liao
- Xiangyang Central Hospital, Affiliated Hospital of Hubei University of Arts and Science, Xiangyang, China
| | - Gang Peng
- Department of Hepatobiliary and Pancreatic Surgery, Suizhou Central Hospital Affiliated to Hubei Medical College, Suizhou, China
| | - Liang Liang
- Department of Cardiology, The First College of Clinical Medical Science, China Three Gorges University and Yichang Central People's Hospital and Institute of Cardiovascular Diseases, China Three Gorges University, Yichang China
| | - Jun Yang
- Department of Cardiology, The First College of Clinical Medical Science, China Three Gorges University and Yichang Central People's Hospital and Institute of Cardiovascular Diseases, China Three Gorges University, Yichang China
| | - Guohua Chen
- Department of Neurology, Wuhan First Hospital/Wuhan Hospital of Traditional Chinese and Western Medicine, Wuhan, China
| | - Elena Azzolini
- Humanitas Clinical and Research Hospital IRCCS, Rozzano-Milan, Italy
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele-Milan, Italy
| | - Alessio Aghemo
- Humanitas Clinical and Research Hospital IRCCS, Rozzano-Milan, Italy
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele-Milan, Italy
| | - Michele Ciccarelli
- Humanitas Clinical and Research Hospital IRCCS, Rozzano-Milan, Italy
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele-Milan, Italy
| | - Gianluigi Condorelli
- Humanitas Clinical and Research Hospital IRCCS, Rozzano-Milan, Italy
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele-Milan, Italy
| | - Giulio G. Stefanini
- Humanitas Clinical and Research Hospital IRCCS, Rozzano-Milan, Italy
- Department of Biomedical Sciences, Humanitas University, Pieve Emanuele-Milan, Italy
| | - Xiang Wei
- Division of Cardiothoracic and Vascular Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Bing-Hong Zhang
- Departments of Neonatology, Renmin Hospital of Wuhan University, Wuhan, China
| | - Xiaodong Huang
- Department of Gastroenterology, Wuhan Third Hospital and Tongren Hospital of Wuhan University, Wuhan, China
| | - Jiahong Xia
- Department of Cardiology, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yufeng Yuan
- Department of Hepatobiliary and Pancreatic Surgery, Zhongnan Hospital of Wuhan University, Wuhan, China
| | - Zhi-Gang She
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
| | - Jiao Guo
- Guangdong Metabolic Diseases Research Center of Integrated Chinese and Western Medicine and Key Laboratory of Glucolipid Metabolic Disorder, Guangdong TCM Key Laboratory for Metabolic Diseases, Ministry of Education of China and Institute of Chinese Medicine, Guangdong Pharmaceutical University, Guangzhou, China
- CONTACT Jiao Guo Institute of Chinese Medicine, Guangdong Pharmaceutical University, 280 Wai Huan Dong Road, Guangzhou510006, China
| | - Yibin Wang
- Departments of Anesthesiology, Physiology and Medicine, Cardiovascular Research Laboratories, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
- Yibin Wang Departments of Anesthesiology, Physiology and Medicine, Cardiovascular Research Laboratories, David Geffen School of Medicine, University of California, CHS 37-200J, Los Angeles, 90095CA, USA
| | - Peng Zhang
- Institute of Model Animal, Wuhan University, Wuhan, China
- Medical Science Research Center, Zhongnan Hospital of Wuhan University, Wuhan, China
- Peng Zhang Medical Science Research Center, Zhongnan Hospital of Wuhan University, 169 Donghu Road, Wuhan430071, China
| | - Hongliang Li
- Department of Cardiology, Renmin Hospital, School of Basic Medical Science, Wuhan University, Wuhan, China
- Institute of Model Animal, Wuhan University, Wuhan, China
- Medical Science Research Center, Zhongnan Hospital of Wuhan University, Wuhan, China
- Hongliang Li Department of Cardiology, Renmin Hospital of Wuhan University, 99 Zhangzhidong Road, Wuhan430060, China; Institute of Model Animal of Wuhan University, 169 Donghu Road, Wuhan430071, China
| |
Collapse
|
9
|
Ghazaleh N, Houghton R, Palermo G, Schobel SA, Wijeratne PA, Long JD. Ranking the Predictive Power of Clinical and Biological Features Associated With Disease Progression in Huntington's Disease. Front Neurol 2021; 12:678484. [PMID: 34093422 PMCID: PMC8176643 DOI: 10.3389/fneur.2021.678484] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Accepted: 04/26/2021] [Indexed: 12/01/2022] Open
Abstract
Huntington's disease (HD) is characterised by a triad of cognitive, behavioural, and motor symptoms which lead to functional decline and loss of independence. With potential disease-modifying therapies in development, there is interest in accurately measuring HD progression and characterising prognostic variables to improve efficiency of clinical trials. Using the large, prospective Enroll-HD cohort, we investigated the relative contribution and ranking of potential prognostic variables in patients with manifest HD. A random forest regression model was trained to predict change of clinical outcomes based on the variables, which were ranked based on their contribution to the prediction. The highest-ranked variables included novel predictors of progression—being accompanied at clinical visit, cognitive impairment, age at diagnosis and tetrabenazine or antipsychotics use—in addition to established predictors, cytosine adenine guanine (CAG) repeat length and CAG-age product. The novel prognostic variables improved the ability of the model to predict clinical outcomes and may be candidates for statistical control in HD clinical studies.
Collapse
Affiliation(s)
| | | | | | | | - Peter A Wijeratne
- Department of Computer Science, Centre for Medical Imaging Computing, University College London, London, United Kingdom.,Department of Neurodegenerative Disease, Huntington's Disease Research Centre, Queen Square Institute of Neurology, University College London, London, United Kingdom
| | - Jeffrey D Long
- Department of Psychiatry, University of Iowa, Iowa City, IA, United States.,Department of Biostatistics, University of Iowa, Iowa City, IA, United States
| |
Collapse
|
10
|
Pellagatti M, Masci C, Ieva F, Paganoni AM. Generalized mixed‐effects random forest: A flexible approach to predict university student dropout. Stat Anal Data Min 2021. [DOI: 10.1002/sam.11505] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
| | - Chiara Masci
- MOX—Department of Mathematics Politecnico di Milano Milan Italy
| | - Francesca Ieva
- MOX—Department of Mathematics Politecnico di Milano Milan Italy
| | | |
Collapse
|
11
|
Casanova R, Gaussoin SA, Wallace R, Baker LD, Chen JC, Manson JE, Henderson VW, Sachs BC, Justice JN, Whitsel EA, Hayden KM, Rapp SR. Investigating Predictors of Preserved Cognitive Function in Older Women Using Machine Learning: Women's Health Initiative Memory Study. J Alzheimers Dis 2021; 84:1267-1278. [PMID: 34633318 PMCID: PMC8934040 DOI: 10.3233/jad-210621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
BACKGROUND Identification of factors that may help to preserve cognitive function in late life could elucidate mechanisms and facilitate interventions to improve the lives of millions of people. However, the large number of potential factors associated with cognitive function poses an analytical challenge. OBJECTIVE We used data from the longitudinal Women's Health Initiative Memory Study (WHIMS) and machine learning to investigate 50 demographic, biomedical, behavioral, social, and psychological predictors of preserved cognitive function in later life. METHODS Participants in WHIMS and two consecutive follow up studies who were at least 80 years old and had at least one cognitive assessment following their 80th birthday were classified as cognitively preserved. Preserved cognitive function was defined as having a score ≥39 on the most recent administration of the modified Telephone Interview for Cognitive Status (TICSm) and a mean score across all assessments ≥39. Cognitively impaired participants were those adjudicated by experts to have probable dementia or at least two adjudications of mild cognitive impairment within the 14 years of follow-up and a last TICSm score < 31. Random Forests was used to rank the predictors of preserved cognitive function. RESULTS Discrimination between groups based on area under the curve was 0.80 (95%-CI-0.76-0.85). Women with preserved cognitive function were younger, better educated, and less forgetful, less depressed, and more optimistic at study enrollment. They also reported better physical function and less sleep disturbance, and had lower systolic blood pressure, hemoglobin, and blood glucose levels. CONCLUSION The predictors of preserved cognitive function include demographic, psychological, physical, metabolic, and vascular factors suggesting a complex mix of potential contributors.
Collapse
Affiliation(s)
- Ramon Casanova
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Sarah A Gaussoin
- Department of Biostatistics and Data Science, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Robert Wallace
- College of Public Health, University of Iowa, Iowa City, IA, USA
- Epidemiology and Internal Medicine, University of Iowa, Iowa City, IA, USA
| | - Laura D Baker
- Department of Gerontology and Geriatrics, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jiu-Chiuan Chen
- Department of Preventive Medicine and Neurology, University of Southern California, Los Angeles, CA, USA
| | - JoAnn E Manson
- Department of Medicine, Brigham and Women's Hospital, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Victor W Henderson
- Department of Epidemiology and Population Health and of Neurology and Neurological Sciences, Stanford University, Stanford, CA, USA
| | - Bonnie C Sachs
- Department of Social Sciences & Health Policy, Wake Forest School of Medicine, Winston-Salem, NC, USA
- Department of Neurology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Jamie N Justice
- Department of Gerontology and Geriatrics, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Eric A Whitsel
- Department of Epidemiology, Gillings School of Global Public Health and Department of Medicine, School of Medicine, University of North Carolina, Chapel Hill, NC, USA
| | - Kathleen M Hayden
- Department of Social Sciences & Health Policy, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Stephen R Rapp
- Department of Social Sciences & Health Policy, Wake Forest School of Medicine, Winston-Salem, NC, USA
- Department of Psychiatry and Behavioral Medicine, Wake Forest School of Medicine, Winston-Salem, NC, USA
| |
Collapse
|
12
|
Debeer D, Strobl C. Conditional permutation importance revisited. BMC Bioinformatics 2020; 21:307. [PMID: 32664864 PMCID: PMC7362659 DOI: 10.1186/s12859-020-03622-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Accepted: 06/19/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Random forest based variable importance measures have become popular tools for assessing the contributions of the predictor variables in a fitted random forest. In this article we reconsider a frequently used variable importance measure, the Conditional Permutation Importance (CPI). We argue and illustrate that the CPI corresponds to a more partial quantification of variable importance and suggest several improvements in its methodology and implementation that enhance its practical value. In addition, we introduce the threshold value in the CPI algorithm as a parameter that can make the CPI more partial or more marginal. RESULTS By means of extensive simulations, where the original version of the CPI is used as the reference, we examine the impact of the proposed methodological improvements. The simulation results show how the improved CPI methodology increases the interpretability and stability of the computations. In addition, the newly proposed implementation decreases the computation times drastically and is more widely applicable. The improved CPI algorithm is made freely available as an add-on package to the open-source software R. CONCLUSION The proposed methodology and implementation of the CPI is computationally faster and leads to more stable results. It has a beneficial impact on practical research by making random forest analyses more interpretable.
Collapse
Affiliation(s)
- Dries Debeer
- University of Zurich, Psychological Methods, Evaluation and Statistics, Binzmuehlestrasse 14, Box 27, Zurich, 8050, Switzerland. .,KU Leuven, Faculty of Psychology and Educational Sciences, Etienne Sabbelaan 51 box 7654, Kortrijk, 8500, Belgium. .,KU Leuven, imec research group ITEC, Etienne Sabbelaan 51 box 7654, Kortrijk, 8500, Belgium.
| | - Carolin Strobl
- University of Zurich, Psychological Methods, Evaluation and Statistics, Binzmuehlestrasse 14, Box 27, Zurich, 8050, Switzerland
| |
Collapse
|
13
|
Henneghan A, Haley AP, Kesler S. Exploring Relationships Among Peripheral Amyloid Beta, Tau, Cytokines, Cognitive Function, and Psychosomatic Symptoms in Breast Cancer Survivors. Biol Res Nurs 2020; 22:126-138. [PMID: 31707784 PMCID: PMC7068749 DOI: 10.1177/1099800419887230] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
OBJECTIVE Accelerated brain aging has been proposed to explain cancer-related cognitive impairment, but empirical evidence for this relationship is lacking. The purpose of this study was to evaluate amyloid beta (Aβ) and tau, biomarkers of neurodegeneration, in relation to cognition in breast cancer survivors (BCSs). We explored relationships among peripheral concentrations of Aβ42, Aβ-40, tau, and cytokines; cognitive function; and psychosomatic symptoms in a cohort of BCSs post-chemotherapy. METHODS This secondary analysis of a cross-sectional study was conducted with 65 BCSs. Serum total Aβ-42, Aβ-40, and tau levels were measured with single molecule array technology. Cytokines (interleukin [IL]-6, tumor necrosis factor [TNF]-α, granulocyte-macrophage colony-stimulating factor [GM-CSF], interferon [IFN]-g, IL-10, IL-12, IL-13, IL1-b, IL-2, IL-4, IL-5, IL-7, and IL-8) were simultaneously measured in serum using multiplex assays. Cognitive function was measured with five standardized neuropsychological tests and psychosomatic symptoms (stress, loneliness, anxiety, depressive symptoms, fatigue, sleep quality, and daytime sleepiness) with self-report questionnaires. Data analyses included correlations and random forest regression (RFR). RESULTS Significant correlations were identified among hip-to-waste ratio, number of treatment modalities, Aβ-42, Aβ-40, and tau levels (rs = .27-.35, ps < .05). RFR modeling including Aβ-42, Aβ-40, tau, and cytokines as features explained significant variance in cognitive function (R2 = .71, F = 9.01, p < .0001) and psychosomatic symptoms (R2 = .74, F = 10.22, p < .0001). CONCLUSIONS This study suggests that neurodegenerative biomarkers interact with cytokines to influence cognitive functioning and psychosomatic symptoms in BCSs following chemotherapy, but additional research is needed.
Collapse
Affiliation(s)
- Ashley Henneghan
- School of Nursing; Department of Oncology, University of Texas at Austin,
Austin, TX, USA
| | - Andreana P. Haley
- Department of Psychology, College of Liberal Arts, University of Texas at
Austin, Austin, TX, USA
| | - Shelli Kesler
- School of Nursing; Department of Oncology, University of Texas at Austin,
Austin, TX, USA
| |
Collapse
|
14
|
Fabris F, Doherty A, Palmer D, de Magalhães JP, Freitas AA. A new approach for interpreting Random Forest models and its application to the biology of ageing. Bioinformatics 2019; 34:2449-2456. [PMID: 29462247 PMCID: PMC6041990 DOI: 10.1093/bioinformatics/bty087] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2017] [Accepted: 02/15/2018] [Indexed: 01/11/2023] Open
Abstract
Motivation This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model. Hence, we propose a new algorithm for identifying the most important and most informative feature values in an RF model. Results The new feature importance measure identified highly relevant Gene Ontology terms for the aforementioned gene classification task, producing a feature ranking that is much more informative to biologists than an alternative, state-of-the-art feature importance measure. Availability and implementation The dataset and source codes used in this paper are available as 'Supplementary Material' and the description of the data can be found at: https://fabiofabris.github.io/bioinfo2018/web/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Fabio Fabris
- School of Computing, University of Kent, Canterbury, Kent, UK
| | - Aoife Doherty
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Daniel Palmer
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - João Pedro de Magalhães
- Integrative Genomics of Ageing Group, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Alex A Freitas
- School of Computing, University of Kent, Canterbury, Kent, UK
| |
Collapse
|
15
|
Zhang L, Geelen A, Boshuizen HC, Ferreira J, Ocké MC. Importance of details in food descriptions in estimating population nutrient intake distributions. Nutr J 2019; 18:17. [PMID: 30876417 PMCID: PMC6419831 DOI: 10.1186/s12937-019-0443-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2018] [Accepted: 03/07/2019] [Indexed: 12/24/2022] Open
Abstract
Background National food consumption surveys are important policy instruments that could monitor food consumption of a certain population. To be used for multiple purposes, this type of survey usually collects comprehensive food information using dietary assessment methods like 24-h dietary recalls (24HRs). However, the collection and handling of such detailed information require tremendous efforts. We aimed to improve the efficiency of data collection and handling in 24HRs, by identifying less important characteristics of food descriptions (facets) and assessing the impact of disregarding them on energy and nutrient intake distributions. Methods In the Dutch National Food Consumption Survey 2007–2010, food consumption data were collected through interviewer-administered 24HRs using GloboDiet software in 3819 persons. Interviewers asked participants about the characteristics of each food item according to applicable facets. Food consumption data were subsequently linked to the food composition database. The importance of facets for predicting energy and each of the 33 nutrients was estimated using the random forest algorithm. Then a simulation study was performed to determine the influence of deleting less important facets on population nutrient intake distributions. Results We identified 35% facets as unimportant and deleted them from the total food consumption database. The majority (79.4%) of the percent difference between percentile estimates of the population nutrient intake distributions before and after facet deletion ranged from 0 to 1%, while 20% cases ranged from 1 to 5% and 0.6% cases more than 10%. Conclusion We concluded that our procedure was successful in identifying less important food descriptions in estimating population nutrient intake distributions. The reduction in food descriptions has the potential to reduce the time needed for conducting interviews and data handling while maintaining the data quality of the survey. Electronic supplementary material The online version of this article (10.1186/s12937-019-0443-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Liangzi Zhang
- Division of Human Nutrition, Wageningen University, Wageningen, the Netherlands.,National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands
| | - Anouk Geelen
- Division of Human Nutrition, Wageningen University, Wageningen, the Netherlands
| | - Hendriek C Boshuizen
- Division of Human Nutrition, Wageningen University, Wageningen, the Netherlands.,National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands
| | - José Ferreira
- National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands
| | - Marga C Ocké
- Division of Human Nutrition, Wageningen University, Wageningen, the Netherlands. .,National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands.
| |
Collapse
|
16
|
Bias in the intervention in prediction measure in random forests: illustrations and recommendations. Bioinformatics 2018; 35:2343-2345. [DOI: 10.1093/bioinformatics/bty959] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Revised: 10/31/2018] [Accepted: 11/20/2018] [Indexed: 01/09/2023] Open
Abstract
Abstract
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
|
17
|
Te Beest DE, Mes SW, Wilting SM, Brakenhoff RH, van de Wiel MA. Improved high-dimensional prediction with Random Forests by the use of co-data. BMC Bioinformatics 2017; 18:584. [PMID: 29281963 PMCID: PMC5745983 DOI: 10.1186/s12859-017-1993-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Accepted: 12/06/2017] [Indexed: 12/13/2022] Open
Abstract
Background Prediction in high dimensional settings is difficult due to the large number of variables relative to the sample size. We demonstrate how auxiliary ‘co-data’ can be used to improve the performance of a Random Forest in such a setting. Results Co-data are incorporated in the Random Forest by replacing the uniform sampling probabilities that are used to draw candidate variables by co-data moderated sampling probabilities. Co-data here are defined as any type information that is available on the variables of the primary data, but does not use its response labels. These moderated sampling probabilities are, inspired by empirical Bayes, learned from the data at hand. We demonstrate the co-data moderated Random Forest (CoRF) with two examples. In the first example we aim to predict the presence of a lymph node metastasis with gene expression data. We demonstrate how a set of external p-values, a gene signature, and the correlation between gene expression and DNA copy number can improve the predictive performance. In the second example we demonstrate how the prediction of cervical (pre-)cancer with methylation data can be improved by including the location of the probe relative to the known CpG islands, the number of CpG sites targeted by a probe, and a set of p-values from a related study. Conclusion The proposed method is able to utilize auxiliary co-data to improve the performance of a Random Forest. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1993-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Dennis E Te Beest
- Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, 1007 MB, The Netherlands
| | - Steven W Mes
- Department of Otolaryngology-Head and Neck Surgery, VU University Medical Center, Amsterdam, 1007 MB, The Netherlands
| | - Saskia M Wilting
- Department of Medical Oncology, Erasmus MC Cancer Institute, Erasmus University Medical Center, Rotterdam, 3015 CE, The Netherlands
| | - Ruud H Brakenhoff
- Department of Otolaryngology-Head and Neck Surgery, VU University Medical Center, Amsterdam, 1007 MB, The Netherlands
| | - Mark A van de Wiel
- Department of Epidemiology and Biostatistics, VU University Medical Center, Amsterdam, 1007 MB, The Netherlands. .,Department of Mathematics, VU University, Amsterdam, 1081 HV, The Netherlands.
| |
Collapse
|