1
|
Wang LZ, Chi JF, Ding YQ, Yao HY, Guo Q, Yang HQ. Transformer fault diagnosis method based on SMOTE and NGO- GBDT. Sci Rep 2024; 14:7179. [PMID: 38531936 DOI: 10.1038/s41598-024-57509-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 03/19/2024] [Indexed: 03/28/2024] Open
Abstract
In order to improve the accuracy of transformer fault diagnosis and improve the influence of unbalanced samples on the low accuracy of model identification caused by insufficient model training, this paper proposes a transformer fault diagnosis method based on SMOTE and NGO-GBDT. Firstly, the Synthetic Minority Over-sampling Technique (SMOTE) was used to expand the minority samples. Secondly, the non-coding ratio method was used to construct multi-dimensional feature parameters, and the Light Gradient Boosting Machine (LightGBM) feature optimization strategy was introduced to screen the optimal feature subset. Finally, Northern Goshawk Optimization (NGO) algorithm was used to optimize the parameters of Gradient Boosting Decision Tree (GBDT), and then the transformer fault diagnosis was realized. The results show that the proposed method can reduce the misjudgment of minority samples. Compared with other integrated models, the proposed method has high fault identification accuracy, low misjudgment rate and stable performance.
Collapse
Affiliation(s)
- Li-Zhong Wang
- State Grid Zhejiang Power Co., Ltd, Hangzhou Linping Power Supply Company, Hangzhou, 311199, China
| | - Jian-Fei Chi
- State Grid Zhejiang Power Co., Ltd, Hangzhou Linping Power Supply Company, Hangzhou, 311199, China
| | - Ye-Qiang Ding
- State Grid Zhejiang Power Co., Ltd, Hangzhou Linping Power Supply Company, Hangzhou, 311199, China
| | - Hai-Yan Yao
- Hangzhou Electric Power Equipment Manufacturing Co., Ltd, Yuhang Qunli Complete Sets Electricity Manufacturing Branch Electric, Hangzhou, 311000, China
| | - Qiang Guo
- Hangzhou Electric Power Equipment Manufacturing Co., Ltd, Yuhang Qunli Complete Sets Electricity Manufacturing Branch Electric, Hangzhou, 311000, China
| | - Hai-Qi Yang
- School of Mechanical Engineering, Northeast Electric Power University, Jilin, 132012, China.
| |
Collapse
|
2
|
Qiu T, Wang S, Hu D, Feng N, Cui L. Predicting Risk of Bullying Victimization among Primary and Secondary School Students: Based on a Machine Learning Model. Behav Sci (Basel) 2024; 14:73. [PMID: 38275356 PMCID: PMC10813723 DOI: 10.3390/bs14010073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 01/12/2024] [Accepted: 01/18/2024] [Indexed: 01/27/2024] Open
Abstract
School bullying among primary and secondary school students has received increasing attention, and identifying relevant factors is a crucial way to reduce the risk of bullying victimization. Machine learning methods can help researchers predict and identify individual risk behaviors. Through a machine learning approach (i.e., the gradient boosting decision tree model, GBDT), the present longitudinal study aims to systematically examine individual, family, and school environment factors that can predict the risk of bullying victimization among primary and secondary school students a year later. A total of 2767 participants (2065 secondary school students, 702 primary school students, 55.20% female students, mean age at T1 was 12.22) completed measures of 24 predictors at the first wave, including individual factors (e.g., self-control, gender, grade), family factors (family cohesion, parental control, parenting style), peer factor (peer relationship), and school factors (teacher-student relationship, learning capacity). A year later (i.e., T2), they completed the Olweus Bullying Questionnaire. The GBDT model predicted whether primary and secondary school students would be exposed to school bullying after one year by training a series of base learners and outputting the importance ranking of predictors. The GBDT model performed well. The GBDT model yielded the top 6 predictors: teacher-student relationship, peer relationship, family cohesion, negative affect, anxiety, and denying parenting style. The protective factors (i.e., teacher-student relationship, peer relationship, and family cohesion) and risk factors (i.e., negative affect, anxiety, and denying parenting style) associated with the risk of bullying victimization a year later among primary and secondary school students are identified by using a machine learning approach. The GBDT model can be used as a tool to predict the future risk of bullying victimization for children and adolescents and to help improve the effectiveness of school bullying interventions.
Collapse
Affiliation(s)
- Tian Qiu
- Shanghai Key Laboratory of Mental Health and Psychological Crisis Intervention, Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China;
| | - Sizhe Wang
- School of Statistics, East China Normal University, Shanghai 200062, China;
| | - Di Hu
- Sliver School of Social Work, New York University, New York, NY 10012, USA;
| | - Ningning Feng
- Shanghai Key Laboratory of Mental Health and Psychological Crisis Intervention, Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China;
- Shanghai Centre for Brain Science and Brain-Inspired Technology, Shanghai 200062, China
| | - Lijuan Cui
- Shanghai Key Laboratory of Mental Health and Psychological Crisis Intervention, Institute of Brain and Education Innovation, School of Psychology and Cognitive Science, East China Normal University, Shanghai 200062, China;
- Shanghai Centre for Brain Science and Brain-Inspired Technology, Shanghai 200062, China
| |
Collapse
|
3
|
Lee JH, Cho JH, Kim BJ, Lee WE. Machine learning approach for carbon disclosure in the Korean market: The role of environmental performance. Sci Prog 2024; 107:368504231220766. [PMID: 38234092 PMCID: PMC10798094 DOI: 10.1177/00368504231220766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2024]
Abstract
Over the past few decades, scholars have employed a wide range of methodologies to determine the factors influencing firms' voluntary carbon disclosure. Most of these studies have been conducted in advanced markets. This article aims to examine the trend of voluntary carbon disclosure in the Korean financial market by utilizing machine learning models such as Random Forest and Gradient Boosted Decision Tree. Based on a set of hand-collected carbon disclosure data, we initially demonstrated significantly better performance of machine learning models compared to the traditional logistic model. Regarding the factors influencing disclosure, we consistently find the importance of environmental scores, emphasizing the role of the emerging mega-trend of ESG management practices in disclosure decisions. However, in contrast to recent studies, we do not find that the unique Korean governance structure, chaebol, has any significantly different implications in terms of prediction performance and variable importance in carbon disclosure decisions.
Collapse
Affiliation(s)
- Jeong Hwan Lee
- College of Economics and Finance, Hanyang University, Seoul, Korea
| | | | - Bong Jun Kim
- College of Economics and Finance, Hanyang University, Seoul, Korea
| | - Won Eung Lee
- College of Economics and Finance, Hanyang University, Seoul, Korea
| |
Collapse
|
4
|
Liu X, Zhu B, Dai XW, Xu ZA, Li R, Qian Y, Lu YP, Zhang W, Liu Y, Zheng J. GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier. BMC Genomics 2023; 24:765. [PMID: 38082413 PMCID: PMC10712101 DOI: 10.1186/s12864-023-09834-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Lysine glutarylation (Kglu) is one of the most important Post-translational modifications (PTMs), which plays significant roles in various cellular functions, including metabolism, mitochondrial processes, and translation. Therefore, accurate identification of the Kglu site is important for elucidating protein molecular function. Due to the time-consuming and expensive limitations of traditional biological experiments, computational-based Kglu site prediction research is gaining more and more attention. RESULTS In this paper, we proposed GBDT_KgluSite, a novel Kglu site prediction model based on GBDT and appropriate feature combinations, which achieved satisfactory performance. Specifically, seven features including sequence-based features, physicochemical property-based features, structural-based features, and evolutionary-derived features were used to characterize proteins. NearMiss-3 and Elastic Net were applied to address data imbalance and feature redundancy issues, respectively. The experimental results show that GBDT_KgluSite has good robustness and generalization ability, with accuracy and AUC values of 93.73%, and 98.14% on five-fold cross-validation as well as 90.11%, and 96.75% on the independent test dataset, respectively. CONCLUSION GBDT_KgluSite is an effective computational method for identifying Kglu sites in protein sequences. It has good stability and generalization ability and could be useful for the identification of new Kglu sites in the future. The relevant code and dataset are available at https://github.com/flyinsky6/GBDT_KgluSite .
Collapse
Affiliation(s)
- Xin Liu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
| | - Bao Zhu
- Cancer Institute, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Xia-Wei Dai
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Zhi-Ao Xu
- School of Life Sciences, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Rui Li
- School of Life Sciences, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Yuting Qian
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ya-Ping Lu
- School of Humanities and Arts, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, China
| | - Wenqing Zhang
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Yong Liu
- Cancer Institute, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
| | - Junnian Zheng
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Center of Clinical Oncology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, 221002, China.
| |
Collapse
|
5
|
Sabanayagam C, He F, Nusinovici S, Li J, Lim C, Tan G, Cheng CY. Prediction of diabetic kidney disease risk using machine learning models: A population-based cohort study of Asian adults. eLife 2023; 12:e81878. [PMID: 37706530 PMCID: PMC10531395 DOI: 10.7554/elife.81878] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Accepted: 09/12/2023] [Indexed: 09/15/2023] Open
Abstract
Background Machine learning (ML) techniques improve disease prediction by identifying the most relevant features in multidimensional data. We compared the accuracy of ML algorithms for predicting incident diabetic kidney disease (DKD). Methods We utilized longitudinal data from 1365 Chinese, Malay, and Indian participants aged 40-80 y with diabetes but free of DKD who participated in the baseline and 6-year follow-up visit of the Singapore Epidemiology of Eye Diseases Study (2004-2017). Incident DKD (11.9%) was defined as an estimated glomerular filtration rate (eGFR) <60 mL/min/1.73 m2 with at least 25% decrease in eGFR at follow-up from baseline. A total of 339 features, including participant characteristics, retinal imaging, and genetic and blood metabolites, were used as predictors. Performances of several ML models were compared to each other and to logistic regression (LR) model based on established features of DKD (age, sex, ethnicity, duration of diabetes, systolic blood pressure, HbA1c, and body mass index) using area under the receiver operating characteristic curve (AUC). Results ML model Elastic Net (EN) had the best AUC (95% CI) of 0.851 (0.847-0.856), which was 7.0% relatively higher than by LR 0.795 (0.790-0.801). Sensitivity and specificity of EN were 88.2 and 65.9% vs. 73.0 and 72.8% by LR. The top 15 predictors included age, ethnicity, antidiabetic medication, hypertension, diabetic retinopathy, systolic blood pressure, HbA1c, eGFR, and metabolites related to lipids, lipoproteins, fatty acids, and ketone bodies. Conclusions Our results showed that ML, together with feature selection, improves prediction accuracy of DKD risk in an asymptomatic stable population and identifies novel risk factors, including metabolites. Funding This study was supported by the National Medical Research Council, NMRC/OFLCG/001/2017 and NMRC/HCSAINV/MOH-001019-00. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Collapse
Affiliation(s)
- Charumathi Sabanayagam
- Singapore Eye Research InstituteSingaporeSingapore
- Ophthalmology and Visual Sciences Academic Clinical Program, Duke-NUS Medical SchoolSingaporeSingapore
| | - Feng He
- Singapore Eye Research InstituteSingaporeSingapore
| | | | - Jialiang Li
- Department of Statistics and Data Science, National University of SingaporeSingaporeSingapore
| | - Cynthia Lim
- Department of Renal Medicine, Singapore General HospitalSingaporeSingapore
| | - Gavin Tan
- Singapore Eye Research InstituteSingaporeSingapore
- Ophthalmology and Visual Sciences Academic Clinical Program, Duke-NUS Medical SchoolSingaporeSingapore
| | - Ching Yu Cheng
- Singapore Eye Research InstituteSingaporeSingapore
- Ophthalmology and Visual Sciences Academic Clinical Program, Duke-NUS Medical SchoolSingaporeSingapore
| |
Collapse
|
6
|
Bahrami S, Hajian-Tilaki K, Bayani M, Chehrazi M, Mohamadi-Pirouz Z, Amoozadeh A. Bayesian model averaging for predicting factors associated with length of COVID-19 hospitalization. BMC Med Res Methodol 2023; 23:163. [PMID: 37415112 PMCID: PMC10326965 DOI: 10.1186/s12874-023-01981-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 06/18/2023] [Indexed: 07/08/2023] Open
Abstract
INTRODUCTION The length of hospital stay (LOHS) caused by COVID-19 has imposed a financial burden, and cost on the healthcare service system and a high psychological burden on patients and health workers. The purpose of this study is to adopt the Bayesian model averaging (BMA) based on linear regression models and to determine the predictors of the LOHS of COVID-19. METHODS In this historical cohort study, from 5100 COVID-19 patients who had registered in the hospital database, 4996 patients were eligible to enter the study. The data included demographic, clinical, biomarkers, and LOHS. Factors affecting the LOHS were fitted in six models, including the stepwise method, AIC, BIC in classical linear regression models, two BMA using Occam's Window and Markov Chain Monte Carlo (MCMC) methods, and GBDT algorithm, a new method of machine learning. RESULTS The average length of hospitalization was 6.7 ± 5.7 days. In fitting classical linear models, both stepwise and AIC methods (R 2 = 0.168 and adjusted R 2 = 0.165) performed better than BIC (R 2 = 0.160 and adjusted = 0.158). In fitting the BMA, Occam's Window model has performed better than MCMC with R 2 = 0.174. The GBDT method with the value of R 2 = 0.64, has performed worse than the BMA in the testing dataset but not in the training dataset. Based on the six fitted models, hospitalized in ICU, respiratory distress, age, diabetes, CRP, PO2, WBC, AST, BUN, and NLR were associated significantly with predicting LOHS of COVID-19. CONCLUSION The BMA with Occam's Window method has a better fit and better performance in predicting affecting factors on the LOHS in the testing dataset than other models.
Collapse
Affiliation(s)
- Shabnam Bahrami
- Student Research Center, Research Institute, Babol University of Medical Sciences, Babol, Iran
| | - Karimollah Hajian-Tilaki
- Department of Biostatistics and Epidemiology, School of Public Health, Babol University of Medical Sciences, Babol, Iran.
- Social Determinants of Health Research Center, Research Institute, Babol University of Medical Sciences, Babol, Iran.
| | - Masomeh Bayani
- Department of Infectious Diseases, Ayatollah Rohani Hospital, Babol University of Medical Sciences, Babol, Iran
| | - Mohammad Chehrazi
- Department of Biostatistics and Epidemiology, School of Public Health, Babol University of Medical Sciences, Babol, Iran
- Neonatal Research Unit, Imperial College London, Exhibition Rd, South Kensington, London, SW7 2BX, UK
| | - Zahra Mohamadi-Pirouz
- Student Research Center, Research Institute, Babol University of Medical Sciences, Babol, Iran
| | - Abazar Amoozadeh
- Social Determinants of Health Research Center, Research Institute, Babol University of Medical Sciences, Babol, Iran
| |
Collapse
|
7
|
Guo Z, Xu M, Yang Y, Li Y, Wu H, Zhu Z, Zhao Y. CED: A case-level explainable paramedical diagnosis via Ada GBDT. Comput Biol Med 2023; 153:106500. [PMID: 36592608 DOI: 10.1016/j.compbiomed.2022.106500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 11/30/2022] [Accepted: 12/27/2022] [Indexed: 12/31/2022]
Abstract
OBJECTIVE The rapid growth of medical data has greatly promoted the wide exploitation of machine learning for paramedical diagnosis. Inversely proportional to their performance, most machine learning models generally suffer from the lack of explainability, especially the local explainability of the model, that is, the case-specific explainability. MATERIALS AND METHODS In this paper, we proposed a GBDT (Gradient Boosting Decision Tree)-based explainable model for case-specific paramedical diagnostics, and mainly make the following contributions: (1) an adaptive gradient boosting decision tree (AdaGBDT) model is proposed to boost the path-mining for decision effectively; (2) to learn a case-specific feature importance embedding for a specific patient, the bi-side mutual information is applied to characterize the backtracking on the decision path; (3) through the collaborative decision-making by globally explainable AdaGBDT with case-based reasoning (CBR) in the case-specific metric space, some hard cases can be identified by the means of visualized interpretation. The performance of our model is evaluated on the Wisconsin diagnostic breast cancer dataset and the UCI heart disease dataset. RESULTS Experiments conducted on two datasets show that our AdaGBDT achieves the best performance, with the F1-value of 0.9647 and 0.8405 respectively. Moreover, a series of experimental analyses and case studies further illustrate the excellent performance of feature importance embedding. CONCLUSION The proposed case-specific explainable paramedical diagnosis via AdaGBDT has excellent predictive performance, with both promising case-level and consistent global explainability.
Collapse
Affiliation(s)
- Zhenyu Guo
- Institute of Information Science, Beijing Jiaotong University, Beijing, China; Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
| | - Muhao Xu
- Institute of Information Science, Beijing Jiaotong University, Beijing, China; Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
| | - Yuchen Yang
- Department of Biology, Johns Hopkins University Krieger School of Arts and Sciences, Baltimore, MD, USA
| | - Youru Li
- Institute of Information Science, Beijing Jiaotong University, Beijing, China; Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
| | - Haiyan Wu
- Department of Otolaryngology, Peking Union Medical College Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.
| | - Zhenfeng Zhu
- Institute of Information Science, Beijing Jiaotong University, Beijing, China; Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China.
| | - Yao Zhao
- Institute of Information Science, Beijing Jiaotong University, Beijing, China; Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China
| |
Collapse
|
8
|
Ren Z, Zhao Y, Han X, Yue M, Wang B, Zhao Z, Wen B, Hong Y, Wang Q, Hong Y, Zhao T, Wang N, Zhao P. An objective model for diagnosing comorbid cognitive impairment in patients with epilepsy based on the clinical-EEG functional connectivity features. Front Neurosci 2023; 16:1060814. [PMID: 36711136 PMCID: PMC9878185 DOI: 10.3389/fnins.2022.1060814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2022] [Accepted: 12/28/2022] [Indexed: 01/15/2023] Open
Abstract
Objective Cognitive impairment (CI) is a common disorder in patients with epilepsy (PWEs). Objective assessment method for diagnosing CI in PWEs would be beneficial in reality. This study proposed to construct a diagnostic model for CI in PWEs using the clinical and the phase locking value (PLV) functional connectivity features of the electroencephalogram (EEG). Methods PWEs who met the inclusion and exclusion criteria were divided into a cognitively normal (CON) group (n = 55) and a CI group (n = 76). The 23 clinical features and 684 PLV EEG features at the time of patient visit were screened and ranked using the Fisher score. Adaptive Boosting (AdaBoost) and Gradient Boosting Decision Tree (GBDT) were used as algorithms to construct diagnostic models of CI in PWEs either with pure clinical features, pure PLV EEG features, or combined clinical and PLV EEG features. The performance of these models was assessed using a five-fold cross-validation method. Results GBDT-built model with combined clinical and PLV EEG features performed the best with accuracy, precision, recall, F1-score, and an area under the curve (AUC) of 90.11, 93.40, 89.50, 91.39, and 0.95%. The top 5 features found to influence the model performance based on the Fisher scores were the magnetic resonance imaging (MRI) findings of the head for abnormalities, educational attainment, PLV EEG in the beta (β)-band C3-F4, seizure frequency, and PLV EEG in theta (θ)-band Fp1-Fz. A total of 12 of the top 5% of features exhibited statistically different PLV EEG features, while eight of which were PLV EEG features in the θ band. Conclusion The model constructed from the combined clinical and PLV EEG features could effectively identify CI in PWEs and possess the potential as a useful objective evaluation method. The PLV EEG in the θ band could be a potential biomarker for the complementary diagnosis of CI comorbid with epilepsy.
Collapse
Affiliation(s)
- Zhe Ren
- Department of Neurology, Zhengzhou University People’s Hospital, Zhengzhou, Henan, China
| | - Yibo Zhao
- Department of Neurology, Zhengzhou University People’s Hospital, Zhengzhou, Henan, China
| | - Xiong Han
- Department of Neurology, Henan Provincial People’s Hospital, People’s Hospital of Zhengzhou University, Zhengzhou, Henan, China,*Correspondence: Xiong Han,
| | - Mengyan Yue
- Department of Rehabilitation, The First Hospital of Shanxi Medical University, Taiyuan, Shanxi, China
| | - Bin Wang
- Department of Neurology, Henan Provincial People’s Hospital, People’s Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Zongya Zhao
- School of Medical Engineering, Xinxiang Medical University, Xinxiang, Henan, China
| | - Bin Wen
- School of Life Sciences and Technology, Xi’an Jiaotong University, Xi’an, Shaanxi, China
| | - Yang Hong
- Department of Neurology, People’s Hospital of Henan University, Zhengzhou, Henan, China
| | - Qi Wang
- Department of Neurology, Zhengzhou University People’s Hospital, Zhengzhou, Henan, China
| | - Yingxing Hong
- Department of Neurology, People’s Hospital of Henan University, Zhengzhou, Henan, China
| | - Ting Zhao
- Department of Neurology, Henan Provincial People’s Hospital, People’s Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Na Wang
- Department of Neurology, Henan Provincial People’s Hospital, People’s Hospital of Zhengzhou University, Zhengzhou, Henan, China
| | - Pan Zhao
- Department of Neurology, Henan Provincial People’s Hospital, People’s Hospital of Zhengzhou University, Zhengzhou, Henan, China
| |
Collapse
|
9
|
Zhang S, Wang J, Li X, Liang Y. M6A-GSMS: Computational identification of N 6-methyladenosine sites with GBDT and stacking learning in multiple species. J Biomol Struct Dyn 2022; 40:12380-12391. [PMID: 34459713 DOI: 10.1080/07391102.2021.1970628] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
N6-methyladenosine (m6A) is one of the most abundant forms of RNA methylation modifications currently known. It involves a wide range of biological processes, including degradation, stability, alternative splicing, etc. Therefore, the development of convenient and efficient m6A prediction technologies are urgent. In this work, a novel predictor based on GBDT and stacking learning is developed to identify m6A sites, which is called M6A-GSMS. To achieve accurate prediction, we explore RNA sequence information from four aspects: correlation, structure, physicochemical properties and pseudo ribonucleic acid composition. After using the GBDT algorithm for feature selection, a stacking model is constructed by combining seven basic classifiers. Compared with other state-of-the-art methods, the results show that M6A-GSMS can obtain excellent performance for identifying the m6A sites. The prediction accuracy of A.thaliana, D.melanogaster, M.musculus, S.cerevisiae and Human reaches 88.4%, 60.8%, 80.5%, 92.4% and 61.8%, respectively. This method provides an effective prediction for the investigation of m6A sites. In addition, all the datasets and codes are currently available at https://github.com/Wang-Jinyue/M6A-GSMS.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - Jinyue Wang
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - Xinjie Li
- School of Mathematics and Statistics, Xidian University, Xi'an, P. R. China
| | - Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, P. R. China
| |
Collapse
|
10
|
Xu X, Lin M, Xu T. Epilepsy Seizures Prediction Based on Nonlinear Features of EEG Signal and Gradient Boosting Decision Tree. Int J Environ Res Public Health 2022; 19:ijerph191811326. [PMID: 36141613 PMCID: PMC9517630 DOI: 10.3390/ijerph191811326] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 09/05/2022] [Accepted: 09/06/2022] [Indexed: 05/17/2023]
Abstract
Epilepsy is a common neurological disorder with sudden and recurrent seizures. Early prediction of seizures and effective intervention can significantly reduce the harm suffered by patients. In this paper, a method based on nonlinear features of EEG signal and gradient boosting decision tree (GBDT) is proposed for early prediction of epilepsy seizures. First, the EEG signals were divided into two categories: those that had seizures onset over a period of time (represented by InT) and those that did not. Second, the noise in the EEG was removed using complementary ensemble empirical mode decomposition (CEEMD) and wavelet threshold denoising. Third, the nonlinear features of the two categories of EEG were extracted, including approximate entropy, sample entropy, permutation entropy, spectral entropy and wavelet entropy. Fourth, a GBDT classifier with random forest as the initial result was designed to distinguish the two categories of EEG. Fifth, a two-step "k of n" method was used to reduce the number of false alarms. The proposed method was evaluated on 13 patients' EEG data from the CHB-MIT Scalp EEG Database. Based on ten-fold cross validation, the average accuracy was 91.76% when the InT was taken at 30 min, and 38 out of 39 seizures were successfully predicted. When the InT was taken for 40 min, the average accuracy was 92.50% and all 42 seizures selected were successfully predicted. The results indicate the effectiveness of the proposed method for predicting epilepsy seizures.
Collapse
|
11
|
Hou L, Hu L, Gao W, Sheng W, Hao Z, Chen Y, Li J. Construction of a Risk Prediction Model for Hospital-Acquired Pulmonary Embolism in Hospitalized Patients. Clin Appl Thromb Hemost 2021; 27:10760296211040868. [PMID: 34558325 PMCID: PMC8495515 DOI: 10.1177/10760296211040868] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The purpose of this study is to establish a novel pulmonary embolism (PE) risk
prediction model based on machine learning (ML) methods and to evaluate the
predictive performance of the model and the contribution of variables to the
predictive performance. We conducted a retrospective study at the Shanghai Tenth
People's Hospital and collected the clinical data of in-patients that received
pulmonary computed tomography imaging between January 1, 2014 and December 31,
2018. We trained several ML models, including logistic regression (LR), support
vector machine (SVM), random forest (RF), and gradient boosting decision tree
(GBDT), compared the models with representative baseline algorithms, and
investigated their predictability and feature interpretation. A total of 3619
patients were included in the study. We discovered that the GBDT model
demonstrated the best prediction with an area under the curve value of 0.799,
whereas those of the RF, LR, and SVM models were 0.791, 0.716, and 0.743,
respectively. The sensibilities of the GBDT, LR, RF, and SVM models were 63.9%,
68.1%, 71.5%, and 75%, respectively; the specificities were 81.1%, 66.1, 72.7%,
and 65.1%, respectively; and the accuracies were 77.8%, 66.5%, 72.5%, and 67%,
respectively. We discovered that the maximum D-dimer level contributed the most
to the outcome prediction, followed by the extreme growth rate of the plasma
fibrinogen level, in-hospital duration, and extreme growth rate of the D-dimer
level. The study demonstrates the superiority of the GBDT model in predicting
the risk of PE in hospitalized patients. However, in order to be applied in
clinical practice and provide support for clinical decision-making, the
predictive performance of the model needs to be prospectively verified.
Collapse
Affiliation(s)
- Lengchen Hou
- Shanghai Tenth People's Hospital, Shanghai, China.,*As co-first authors, the two authors have an equally important contribution to this research
| | - Longjun Hu
- Shanghai Tenth People's Hospital, Shanghai, China.,*As co-first authors, the two authors have an equally important contribution to this research
| | - Wenxue Gao
- Shanghai Tenth People's Hospital, Shanghai, China
| | - Wenbo Sheng
- Shanghai Synyi Medical Technology Co., Ltd, Shanghai, China
| | - Zedong Hao
- Shanghai Synyi Medical Technology Co., Ltd, Shanghai, China
| | - Yiwei Chen
- Shanghai Synyi Medical Technology Co., Ltd, Shanghai, China
| | - Jiyu Li
- Shanghai Tenth People's Hospital, Shanghai, China
| |
Collapse
|
12
|
Ren Z, Xin Y, Wang Z, Liu D, Ho RCM, Ho CSH. What Factors Are Most Closely Associated With Mood Disorders in Adolescents During the COVID-19 Pandemic? A Cross-Sectional Study Based on 1,771 Adolescents in Shandong Province, China. Front Psychiatry 2021; 12:728278. [PMID: 34603106 PMCID: PMC8481827 DOI: 10.3389/fpsyt.2021.728278] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 08/24/2021] [Indexed: 12/02/2022] Open
Abstract
Background and Aims: COVID-19 has been proven to harm adolescents' mental health, and several psychological influence factors have been proposed. However, the importance of these factors in the development of mood disorders in adolescents during the pandemic still eludes researchers, and practical strategies for mental health education are limited. Methods: We constructed a sample of 1,771 adolescents from three junior high middle schools, three senior high middle schools, and three independent universities in Shandong province, China. The sample stratification was set as 5:4:3 for adolescent aged from 12 - 15, 15 - 18, 18 - 19. We examined the subjects' anxiety, depression, psychological resilience, perceived social support, coping strategies, subjective social/school status, screen time, and sleep quality with suitable psychological scales. We chose four widely used classification models-k-nearest neighbors, logistic regression, gradient-boosted decision tree (GBDT), and a combination of the GBDT and LR (GBDT + LR)-to construct machine learning models, and we utilized the Shapley additive explanations value (SHAP) to measure how the features affected the dependent variables. The area under the curve (AUC) of the receiver operating characteristic (ROC) curves was used to evaluate the performance of the models. Results: The current rates of occurrence of symptoms of anxiety and depression were 28.3 and 30.8% among the participants. The descriptive and univariate analyses showed that all of the factors included were statistically related to mood disorders. Among the four machine learning algorithms, the GBDT+LR algorithm achieved the best performance for anxiety and depression with average AUC values of 0.819 and 0.857. We found that the poor sleep quality was the most significant risk factor for mood disorders among Chinese adolescents. In addition, according to the feature importance (SHAP) of the psychological factors, we proposed a five-step mental health education strategy to be used during the COVID-19 pandemic (sleep quality-resilience-coping strategy-social support-perceived social status). Conclusion: In this study, we performed a cross-sectional investigation to examine the psychological impact of COVID-19 on adolescents. We applied machine learning algorithms to quantify the importance of each factor. In addition, we proposed a five-step mental health education strategy for school psychologists.
Collapse
Affiliation(s)
- Ziyuan Ren
- Department of Medical Psychology and Ethics, School of Basic Medicine Sciences, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Yaodong Xin
- School of Statistics and Management Shanghai University of Finance and Economics, Shanghai, China
| | - Zhonglin Wang
- School of Physical Science, University of California, Irvine, Irvine, CA, United States
| | - Dexiang Liu
- Department of Medical Psychology and Ethics, School of Basic Medicine Sciences, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Roger C M Ho
- Department of Psychological Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.,Institute of Health Innovation and Technology (iHealthtech), National University of Singapore, Singapore, Singapore
| | - Cyrus S H Ho
- Department of Psychological Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| |
Collapse
|
13
|
Cui Y, Zhu D, Liu Y. PRG: A Distance Measurement Algorithm Based on Phase Regeneration. Sensors (Basel) 2018; 18:s18082595. [PMID: 30096778 PMCID: PMC6111935 DOI: 10.3390/s18082595] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2018] [Revised: 08/04/2018] [Accepted: 08/06/2018] [Indexed: 11/16/2022]
Abstract
With the booming development of the Internet of things (IoT) industry, the demand of positioning technology in various IoT application scenarios is also greatly increased. To meet the positioning requirements of the IoT application, we propose a distance measurement method based on phase regeneration that can provide positioning capability for IoT applications in indoor and outdoor environments. The PRG algorithm consists of two phases: coarse ranging phase and fine ranging phase. Fingerprint positioning algorithm based on Gradient Boost Decision Tree (GBDT) is used to determine coarse distance. The host machine measures the difference between the transmitted carrier phase and the received regenerative carrier phase to fix the fine distance and then the coarse distance is used to determine the carrier phase integer ambiguity. Finally, high precision ranging is realized. Simulation results show that the PRG method can achieve range finding with decimeter level precision under the 10 MHz subcarrier frequency.
Collapse
Affiliation(s)
- Yansong Cui
- School of Electronic Engineering, Beijing University of Posts and Telecommunications, No. 10 Xitucheng Road, Beijing 100876, China.
| | - Di Zhu
- School of Electronic Engineering, Beijing University of Posts and Telecommunications, No. 10 Xitucheng Road, Beijing 100876, China.
| | - Yanxu Liu
- School of Electronic Engineering, Beijing University of Posts and Telecommunications, No. 10 Xitucheng Road, Beijing 100876, China.
| |
Collapse
|
14
|
Jia C, Yang Q, Zou Q. NucPosPred: Predicting species-specific genomic nucleosome positioning via four different modes of general PseKNC. J Theor Biol 2018; 450:15-21. [PMID: 29678692 DOI: 10.1016/j.jtbi.2018.04.025] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2018] [Revised: 04/13/2018] [Accepted: 04/16/2018] [Indexed: 11/20/2022]
Abstract
The nucleosome is the basic structure of chromatin in eukaryotic cells, with essential roles in the regulation of many biological processes, such as DNA transcription, replication and repair, and RNA splicing. Because of the importance of nucleosomes, the factors that determine their positioning within genomes should be investigated. High-resolution nucleosome-positioning maps are now available for organisms including Saccharomyces cerevisiae, Drosophila melanogaster and Caenorhabditis elegans, enabling the identification of nucleosome positioning by application of computational tools. Here, we describe a novel predictor called NucPosPred, which was specifically designed for large-scale identification of nucleosome positioning in C. elegans and D. melanogaster genomes. NucPosPred was separately optimized for each species for four types of DNA sequence feature extraction, with consideration of two classification algorithms (gradient-boosting decision tree and support vector machine). The overall accuracy obtained with NucPosPred was 92.29% for C. elegans and 88.26% for D. melanogaster, outperforming previous methods and demonstrating the potential for species-specific prediction of nucleosome positioning. For the convenience of most experimental scientists, a web-server for the predictor NucPosPred is available at http://121.42.167.206/NucPosPred/index.jsp.
Collapse
Affiliation(s)
- Cangzhi Jia
- Science of College, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China.
| | - Qing Yang
- Science of College, Dalian Maritime University, No. 1 Linghai Road, Dalian 116026, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China.
| |
Collapse
|