1
|
Kubiak KB, Więckowska B, Jodłowska-Siewert E, Guzik P. Visualising and quantifying the usefulness of new predictors stratified by outcome class: The U-smile method. PLoS One 2024; 19:e0303276. [PMID: 38768166 PMCID: PMC11104627 DOI: 10.1371/journal.pone.0303276] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2024] [Accepted: 04/22/2024] [Indexed: 05/22/2024] Open
Abstract
Binary classification methods encompass various algorithms to categorize data points into two distinct classes. Binary prediction, in contrast, estimates the likelihood of a binary event occurring. We introduce a novel graphical and quantitative approach, the U-smile method, for assessing prediction improvement stratified by binary outcome class. The U-smile method utilizes a smile-like plot and novel coefficients to measure the relative and absolute change in prediction compared with the reference method. The likelihood-ratio test was used to assess the significance of the change in prediction. Logistic regression models using the Heart Disease dataset and generated random variables were employed to validate the U-smile method. The receiver operating characteristic (ROC) curve was used to compare the results of the U-smile method. The likelihood-ratio test demonstrated that the proposed coefficients consistently generated smile-shaped U-smile plots for the most informative predictors. The U-smile plot proved more effective than the ROC curve in comparing the effects of adding new predictors to the reference method. It effectively highlighted differences in model performance for both non-events and events. Visual analysis of the U-smile plots provided an immediate impression of the usefulness of different predictors at a glance. The U-smile method can guide the selection of the most valuable predictors. It can also be helpful in applications beyond prediction.
Collapse
Affiliation(s)
- Katarzyna B. Kubiak
- Department of Computer Science and Statistics, Poznan University of Medical Sciences, Poznan, Poland
| | - Barbara Więckowska
- Department of Computer Science and Statistics, Poznan University of Medical Sciences, Poznan, Poland
| | | | - Przemysław Guzik
- Department of Cardiology - Intensive Therapy and Internal Medicine, Poznan University of Medical Sciences, Poznan, Poland
- University Centre for Sports and Medical Studies, Poznan University of Medical Sciences, Poznan, Poland
| |
Collapse
|
2
|
Shi J, Tang J, Liu L, Zhang C, Chen W, Qi M, Han Z, Chen X. Integrative Analyses of Bulk and Single-Cell RNA Seq Identified the Shared Genes in Acute Respiratory Distress Syndrome and Rheumatoid Arthritis. Mol Biotechnol 2024:10.1007/s12033-024-01141-6. [PMID: 38656728 DOI: 10.1007/s12033-024-01141-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 03/06/2024] [Indexed: 04/26/2024]
Abstract
Acute respiratory distress syndrome (ARDS), a progressive status of acute lung injury (ALI), is primarily caused by an immune-mediated inflammatory disorder, which can be an acute pulmonary complication of rheumatoid arthritis (RA). As a chronic inflammatory disease regulated by the immune system, RA is closely associated with the occurrence and progression of respiratory diseases. However, it remains elusive whether there are shared genes between the molecular mechanisms underlying RA and ARDS. The objective of this study is to identify potential shared genes for further clinical drug discovery through integrated analysis of bulk RNA sequencing datasets obtained from the Gene Expression Omnibus database, employing differentially expressed genes (DEGs) analysis and weighted gene co-expression network analysis (WGCNA). The hub genes were identified through the intersection of common DEGs and WGCNA-derived genes. The Random Forest (RF) and least absolute shrinkage and selection operator (LASSO) algorithms were subsequently employed to identify key shared target genes associated with two diseases. Additionally, RA immune infiltration analysis and COVID-19 single-cell transcriptome analysis revealed the correlation between these key genes and immune cells. A total of 59 shared genes were identified from the intersection of DEGs and gene clusters obtained through WGCNA, which analyzed the integrated gene matrix of ALI/ARDS and RA. The RF and LASSO algorithms were employed to screen for target genes specific to ALI/ARDS and RA, respectively. The final set of overlapping genes (FCMR, ADAM28, HK3, GRB10, UBE2J1, HPSE, DDX24, BATF, and CST7) all exhibited a strong predictive effect with an area under the curve (AUC) value greater than 0.8. Then, the immune infiltration analysis revealed a strong correlation between UBE2J1 and plasma cells in RA. Furthermore, scRNA-seq analysis demonstrated differential expression of these nine target genes primarily in T cells and NK cells, with CST7 showing a significant positive correlation specifically with NK cells. Beyond that, transcriptome sequencing was conducted on lung tissue collected from ALI mice, confirming the substantial differential expression of FCMR, HK3, UBE2J1, and BATF. This study provides unprecedented evidence linking the pathophysiological mechanisms of ALI/ARDS and RA to immune regulation, which offers novel understanding for future clinical treatment and experimental research.
Collapse
Affiliation(s)
- Jun Shi
- School of Medicine, South China University of Technology, Guangzhou, 510006, China
- Department of Pulmonary and Critical Care Medicine, The Sixth Medical Center of Chinese PLA General Hospital, Beijing, 100048, China
| | - Jiajia Tang
- School of Medicine, South China University of Technology, Guangzhou, 510006, China
- Department of Pulmonary and Critical Care Medicine, The Sixth Medical Center of Chinese PLA General Hospital, Beijing, 100048, China
| | - Lu Liu
- School of Medicine, South China University of Technology, Guangzhou, 510006, China
- Department of Pulmonary and Critical Care Medicine, The Sixth Medical Center of Chinese PLA General Hospital, Beijing, 100048, China
| | - Chunyang Zhang
- Department of Pulmonary and Critical Care Medicine, The Sixth Medical Center of Chinese PLA General Hospital, Beijing, 100048, China
| | - Wei Chen
- Department of Pulmonary and Critical Care Medicine, The Sixth Medical Center of Chinese PLA General Hospital, Beijing, 100048, China
| | - Man Qi
- Department of Pulmonary and Critical Care Medicine, The Sixth Medical Center of Chinese PLA General Hospital, Beijing, 100048, China
| | - Zhihai Han
- School of Medicine, South China University of Technology, Guangzhou, 510006, China.
- Department of Pulmonary and Critical Care Medicine, The Sixth Medical Center of Chinese PLA General Hospital, Beijing, 100048, China.
| | - Xuxin Chen
- School of Medicine, South China University of Technology, Guangzhou, 510006, China.
- Department of Pulmonary and Critical Care Medicine, The Sixth Medical Center of Chinese PLA General Hospital, Beijing, 100048, China.
| |
Collapse
|
3
|
Li Q, Tang X, Li W. Potential diagnostic markers and biological mechanism for osteoarthritis with obesity based on bioinformatics analysis. PLoS One 2023; 18:e0296033. [PMID: 38127891 PMCID: PMC10735003 DOI: 10.1371/journal.pone.0296033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 12/04/2023] [Indexed: 12/23/2023] Open
Abstract
Numerous observational studies have shown that obesity (OB) is a significant risk factor in the occurrence and progression of osteoarthritis (OA), but the underlying molecular mechanism between them remains unclear. The study aimed to identify the key genes and pathogeneses for OA with OB. We obtained two OA and two OB datasets from the gene expression omnibus (GEO) database. First, the identification of differentially expressed genes (DEGs), weighted gene co-expression network analysis (WGCNA), and machine learning algorithms were used to identify key genes for diagnosing OA with OB, and then the nomogram and receiver operating characteristic (ROC) curve were conducted to assess the diagnostic value of key genes. Second, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed to explore the pathogenesis of OA with OB. Third, CIBERSORT was created to investigate immunocyte dysregulation in OA and OB. In this study, two genes (SOD2, ZNF24) were finally identified as key genes for OA with OB. These two key genes had high diagnostic values via nomogram and ROC curve calculation. Additionally, functional analysis emphasized that oxidative stress and inflammation response were shared pathogenesis of OB and AD. Finally, in OA and OB, immune infiltration analysis showed that SOD2 closely correlated to M2 macrophages, regulatory T cells, and CD8 T cells, and ZNF24 correlated to regulatory T cells. Overall, our findings might be new biomarkers or potential therapeutic targets for OA and OB comorbidity.
Collapse
Affiliation(s)
- Qiu Li
- Department of Cardiovascular, Liyuan Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430077, China
| | - Xijie Tang
- Department of Orthopedics, Wuhan Third Hospital, School of Medicine, Wuhan University of Science and Technology, Wuhan, 430061, China
| | - Weihua Li
- Department of Cardiovascular, Liyuan Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430077, China
| |
Collapse
|
4
|
Chen SF, Su CC, Huang CC, Ogink PT, Yen HK, Groot OQ, Hu MH. External validation of machine learning algorithm predicting prolonged opioid prescriptions in opioid-naïve lumbar spine surgery patients using a Taiwanese cohort. J Formos Med Assoc 2023; 122:1321-1330. [PMID: 37453900 DOI: 10.1016/j.jfma.2023.06.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Revised: 06/26/2023] [Accepted: 06/30/2023] [Indexed: 07/18/2023] Open
Abstract
BACKGROUND/PURPOSE Identifying patients at risk of prolonged opioid use after surgery prompts appropriate prescription and personalized treatment plans. The Skeletal Oncology Research Group machine learning algorithm (SORG-MLA) was developed to predict the risk of prolonged opioid use in opioid-naive patients after lumbar spine surgery. However, its utility in a distinct country remains unknown. METHODS A Taiwanese cohort containing 2795 patients who were 20 years or older undergoing primary surgery for lumbar decompression from 2010 to 2018 were used to validate the SORG-MLA. Discrimination (area under receiver operating characteristic curve [AUROC] and area under precision-recall curve [AUPRC]), calibration, overall performance (Brier score), and decision curve analysis were applied. RESULTS Among 2795 patients, the prolonged opioid prescription rate was 5.2%. The validation cohort were older, more inpatient disposition, and more common pharmaceutical history of NSAIDs. Despite the differences, the SORG-MLA provided a good discriminative ability (AUROC of 0.71 and AURPC of 0.36), a good overall performance (Brier score of 0.044 compared to that of 0.039 in the developmental cohort). However, the probability of prolonged opioid prescription tended to be overestimated (calibration intercept of -0.07 and calibration slope of 1.45). Decision curve analysis suggested greater clinical net benefit in a wide range of clinical scenarios. CONCLUSION The SORG-MLA retained good discriminative abilities and overall performances in a geologically and medicolegally different region. It was suitable for predicting patients in risk of prolonged postoperative opioid use in Taiwan.
Collapse
Affiliation(s)
- Shin-Fu Chen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taiwan; Department of Medical Education, National Taiwan University Hospital, Taiwan.
| | - Chih-Chi Su
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taiwan; Department of Medical Education, National Taiwan University Hospital, Taiwan.
| | - Chuan-Ching Huang
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taiwan; Department of Biomedical Engineering, College of Medicine and College of Engineering, National Taiwan University, Taipei, Taiwan.
| | - Paul T Ogink
- Department of Orthopaedics, University Medical Center Utrecht, Utrecht, the Netherlands.
| | - Hung-Kuan Yen
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Hsin-Chu Branch, Taiwan; Department of Medical Education, National Taiwan University Hospital, Hsin-Chu Branch, Taiwan.
| | - Olivier Q Groot
- Department of Orthopaedics, University Medical Center Utrecht, Utrecht, the Netherlands; Department of Orthopaedic Surgery, Massachusetts General Hospital, Boston, USA.
| | - Ming-Hsiao Hu
- Department of Orthopaedic Surgery, National Taiwan University Hospital, Taiwan; Department of Orthopedics, National Taiwan University College of Medicine, Taiwan.
| |
Collapse
|
5
|
Kurosawa R, Iida K, Ajiro M, Awaya T, Yamada M, Kosaki K, Hagiwara M. PDIVAS: Pathogenicity predictor for Deep-Intronic Variants causing Aberrant Splicing. BMC Genomics 2023; 24:601. [PMID: 37817060 PMCID: PMC10563346 DOI: 10.1186/s12864-023-09645-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 09/01/2023] [Indexed: 10/12/2023] Open
Abstract
BACKGROUND Deep-intronic variants that alter RNA splicing were ineffectively evaluated in the search for the cause of genetic diseases. Determination of such pathogenic variants from a vast number of deep-intronic variants (approximately 1,500,000 variants per individual) represents a technical challenge to researchers. Thus, we developed a Pathogenicity predictor for Deep-Intronic Variants causing Aberrant Splicing (PDIVAS) to easily detect pathogenic deep-intronic variants. RESULTS PDIVAS was trained on an ensemble machine-learning algorithm to classify pathogenic and benign variants in a curated dataset. The dataset consists of manually curated pathogenic splice-altering variants (SAVs) and commonly observed benign variants within deep introns. Splicing features and a splicing constraint metric were used to maximize the predictive sensitivity and specificity, respectively. PDIVAS showed an average precision of 0.92 and a maximum MCC of 0.88 in classifying these variants, which were the best of the previous predictors. When PDIVAS was applied to genome sequencing analysis on a threshold with 95% sensitivity for reported pathogenic SAVs, an average of 27 pathogenic candidates were extracted per individual. Furthermore, the causative variants in simulated patient genomes were more efficiently prioritized than the previous predictors. CONCLUSION Incorporating PDIVAS into variant interpretation pipelines will enable efficient detection of disease-causing deep-intronic SAVs and contribute to improving the diagnostic yield. PDIVAS is publicly available at https://github.com/shiro-kur/PDIVAS .
Collapse
Affiliation(s)
- Ryo Kurosawa
- Department of Anatomy and Developmental Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan.
| | - Kei Iida
- Faculty of Science and Engineering, Kindai University, 3-4-1 Kowakae, Higashi-osaka, Osaka, 577-8502, Japan
- Medical Research Support Center, Graduate School of Medicine, Kyoto University, Yoshida- Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Masahiko Ajiro
- Division of Cancer RNA Research, National Cancer Center Research Institute, Tokyo, 104- 0045, Japan
- Department of Drug Discovery Medicine, Graduate School of Medicine, Kyoto University, Yoshida Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Tomonari Awaya
- Department of Anatomy and Developmental Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
- Laboratory of Tumor Microenvironment and Immunity, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan
| | - Mamiko Yamada
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Kenjiro Kosaki
- Center for Medical Genetics, Keio University School of Medicine, Tokyo, 160-8582, Japan
| | - Masatoshi Hagiwara
- Department of Anatomy and Developmental Biology, Graduate School of Medicine, Kyoto University, Yoshida-Konoe-cho, Sakyo-ku, Kyoto, 606-8501, Japan.
| |
Collapse
|
6
|
Ebrahimi A, Wiil UK, Baskaran R, Peimankar A, Andersen K, Nielsen AS. AUD-DSS: a decision support system for early detection of patients with alcohol use disorder. BMC Bioinformatics 2023; 24:329. [PMID: 37658294 PMCID: PMC10474761 DOI: 10.1186/s12859-023-05450-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 08/21/2023] [Indexed: 09/03/2023] Open
Abstract
BACKGROUND Alcohol use disorder (AUD) causes significant morbidity, mortality, and injuries. According to reports, approximately 5% of all registered deaths in Denmark could be due to AUD. The problem is compounded by the late identification of patients with AUD, a situation that can cause enormous problems, from psychological to physical to economic problems. Many individuals suffering from AUD never undergo specialist treatment during their addiction due to obstacles such as taboo and the poor performance of current screening tools. Therefore, there is a lack of rapid intervention. This can be mitigated by the early detection of patients with AUD. A clinical decision support system (DSS) powered by machine learning (ML) methods can be used to diagnose patients' AUD status earlier. METHODS This study proposes an effective AUD prediction model (AUDPM), which can be used in a DSS. The proposed model consists of four distinct components: (1) imputation to address missing values using the k-nearest neighbours approach, (2) recursive feature elimination with cross validation to select the most relevant subset of features, (3) a hybrid synthetic minority oversampling technique-edited nearest neighbour approach to remove noise and balance the distribution of the training data, and (4) an ML model for the early detection of patients with AUD. Two data sources, including a questionnaire and electronic health records of 2571 patients, were collected from Odense University Hospital in the Region of Southern Denmark for the AUD-Dataset. Then, the AUD-Dataset was used to build ML models. The results of different ML models, such as support vector machine, K-nearest neighbour, decision tree, random forest, and extreme gradient boosting, were compared. Finally, a combination of all these models in an ensemble learning approach was selected for the AUDPM. RESULTS The results revealed that the proposed ensemble AUDPM outperformed other single models and our previous study results, achieving 0.96, 0.94, 0.95, and 0.97 precision, recall, F1-score, and accuracy, respectively. In addition, we designed and developed an AUD-DSS prototype. CONCLUSION It was shown that our proposed AUDPM achieved high classification performance. In addition, we identified clinical factors related to the early detection of patients with AUD. The designed AUD-DSS is intended to be integrated into the existing Danish health care system to provide novel information to clinical staff if a patient shows signs of harmful alcohol use; in other words, it gives staff a good reason for having a conversation with patients for whom a conversation is relevant.
Collapse
Affiliation(s)
- Ali Ebrahimi
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark.
| | - Uffe Kock Wiil
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| | - Ruben Baskaran
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| | - Abdolrahman Peimankar
- SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| | - Kjeld Andersen
- Unit for Clinical Alcohol Research, Clinical Institute, University of Southern Denmark, Odense, Denmark
| | - Anette Søgaard Nielsen
- Unit for Clinical Alcohol Research, Clinical Institute, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
7
|
Li B, Zhang Y, Peng H, Fan Q, He S, Zhang Y, Shi S, Zhang Y, Ma A. Multi-semantic feature fusion attention network for binary code similarity detection. Sci Rep 2023; 13:4096. [PMID: 36907937 PMCID: PMC10008825 DOI: 10.1038/s41598-023-31280-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 03/09/2023] [Indexed: 03/14/2023] Open
Abstract
Binary code similarity detection (BCSD) plays a big role in the process of binary application security test. It can be applied in several fields, such as software plagiarism detection, malware analysis, vulnerability detection. Most research is based on recurrent neural networks, which is difficult to get the overall or long-distance semantic information of functions. Besides, exiting works simply extract high-level semantic features, lacking in-depth investigations on the potential mechanisms for fusing low-level and high-level semantic features. In this paper we propose a multi-semantic feature fusion attention network (MFFA-Net) for BCSD. MFFA-Net contains two critical modules: semantic feature fusion (SFF) and attention feature fusion (AFF). The SFF module concatenates multiple semantic features to represent the semantics of the function, which helps to obtain the overall semantic information of the function. The AFF module is designed to find useful information from various features, which assigns an attention matrix to research the relationship between features. In order to evaluate the proposed method, we made extensive experiments on two datasets. MFFA-Net can achieve a high degree of AUC at 99.6% and 98.3% respectively on the two datasets. The experimental results show that MFFA-Net has better performance for BCSD.
Collapse
Affiliation(s)
- Bangling Li
- Department of Security Technology, China Mobile Research Institute, Beijing, 100053, China.
| | - Yuting Zhang
- Data & AI Technology Company, China Telecom Corporation Ltd, Beijing, 100011, China
| | - Huaxi Peng
- Department of Security Technology, China Mobile Research Institute, Beijing, 100053, China
| | - Qiguang Fan
- Department of Security Technology, China Mobile Research Institute, Beijing, 100053, China
| | - Shen He
- Department of Security Technology, China Mobile Research Institute, Beijing, 100053, China
| | - Yan Zhang
- Department of Security Technology, China Mobile Research Institute, Beijing, 100053, China
| | - Songquan Shi
- Department of Security Technology, China Mobile Research Institute, Beijing, 100053, China
| | - Yang Zhang
- Department of Security Technology, China Mobile Research Institute, Beijing, 100053, China
| | - Ailiang Ma
- Department of Security Technology, China Mobile Research Institute, Beijing, 100053, China
| |
Collapse
|
8
|
Ebrahimi A, Wiil UK, Naemi A, Mansourvar M, Andersen K, Nielsen AS. Identification of clinical factors related to prediction of alcohol use disorder from electronic health records using feature selection methods. BMC Med Inform Decis Mak 2022; 22:304. [PMID: 36424597 PMCID: PMC9686074 DOI: 10.1186/s12911-022-02051-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2021] [Accepted: 11/16/2022] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND High dimensionality in electronic health records (EHR) causes a significant computational problem for any systematic search for predictive, diagnostic, or prognostic patterns. Feature selection (FS) methods have been indicated to be effective in feature reduction as well as in identifying risk factors related to prediction of clinical disorders. This paper examines the prediction of patients with alcohol use disorder (AUD) using machine learning (ML) and attempts to identify risk factors related to the diagnosis of AUD. METHODS A FS framework consisting of two operational levels, base selectors and ensemble selectors. The first level consists of five FS methods: three filter methods, one wrapper method, and one embedded method. Base selector outputs are aggregated to develop four ensemble FS methods. The outputs of FS method were then fed into three ML algorithms: support vector machine (SVM), K-nearest neighbor (KNN), and random forest (RF) to compare and identify the best feature subset for the prediction of AUD from EHRs. RESULTS In terms of feature reduction, the embedded FS method could significantly reduce the number of features from 361 to 131. In terms of classification performance, RF based on 272 features selected by our proposed ensemble method (Union FS) with the highest accuracy in predicting patients with AUD, 96%, outperformed all other models in terms of AUROC, AUPRC, Precision, Recall, and F1-Score. Considering the limitations of embedded and wrapper methods, the best overall performance was achieved by our proposed Union Filter FS, which reduced the number of features to 223 and improved Precision, Recall, and F1-Score in RF from 0.77, 0.65, and 0.71 to 0.87, 0.81, and 0.84, respectively. Our findings indicate that, besides gender, age, and length of stay at the hospital, diagnosis related to digestive organs, bones, muscles and connective tissue, and the nervous systems are important clinical factors related to the prediction of patients with AUD. CONCLUSION Our proposed FS method could improve the classification performance significantly. It could identify clinical factors related to prediction of AUD from EHRs, thereby effectively helping clinical staff to identify and treat AUD patients and improving medical knowledge of the AUD condition. Moreover, the diversity of features among female and male patients as well as gender disparity were investigated using FS methods and ML techniques.
Collapse
Affiliation(s)
- Ali Ebrahimi
- grid.10825.3e0000 0001 0728 0170SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| | - Uffe Kock Wiil
- grid.10825.3e0000 0001 0728 0170SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| | - Amin Naemi
- grid.10825.3e0000 0001 0728 0170SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, Odense, Denmark
| | - Marjan Mansourvar
- grid.10825.3e0000 0001 0728 0170Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Kjeld Andersen
- grid.10825.3e0000 0001 0728 0170Unit for Clinical Alcohol Research, Clinical Institute, University of Southern Denmark, Odense, Denmark
| | - Anette Søgaard Nielsen
- grid.10825.3e0000 0001 0728 0170Unit for Clinical Alcohol Research, Clinical Institute, University of Southern Denmark, Odense, Denmark
| |
Collapse
|
9
|
Wang Y, Huang Z, Xiao Y, Wan W, Yang X. The shared biomarkers and pathways of systemic lupus erythematosus and metabolic syndrome analyzed by bioinformatics combining machine learning algorithm and single-cell sequencing analysis. Front Immunol 2022; 13:1015882. [PMID: 36341378 PMCID: PMC9627509 DOI: 10.3389/fimmu.2022.1015882] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 10/03/2022] [Indexed: 11/24/2022] Open
Abstract
Background Systemic lupus erythematosus (SLE) is one of the most prevalent systemic autoimmune diseases, and metabolic syndrome (MetS) is the most common metabolic disorder that contains hypertension, dyslipidemia, and obesity. Despite clinical evidence suggested potential associations between SLE and MetS, the underlying pathogenesis is yet unclear. Methods The microarray data sets of SLE and MetS were obtained from the Gene Expression Omnibus (GEO) database. To identify the shared genes between SLE and MetS, the Differentially Expressed Genes (DEGs) analysis and the weighted gene co-expression network analysis (WGCNA) were conducted. Then, the GO and KEGG analyses were performed, and the protein-protein interaction (PPI) network was constructed. Next, Random Forest and LASSO algorithms were used to screen shared hub genes, and a diagnostic model was built using the machine learning technique XG-Boost. Subsequently, CIBERSORT and GSVA were used to estimate the correlation between shared hub genes and immune infiltration as well as metabolic pathways. Finally, the significant hub genes were verified using single-cell RNA sequencing (scRNA-seq) data. Results Using limma and WGCNA, we identified 153 shared feature genes, which were enriched in immune- and metabolic-related pathways. Further, 20 shared hub genes were screened and successfully used to build a prognostic model. Those shared hub genes were associated with immunological and metabolic processes in peripheral blood. The scRNA-seq results verified that TNFSF13B and OAS1, possessing the highest diagnostic efficacy, were mainly expressed by monocytes. Additionally, they showed positive correlations with the pathways for the metabolism of xenobiotics and cholesterol, both of which were proven to be active in this comorbidity, and shown to be concentrated in monocytes. Conclusion This study identified shared hub genes and constructed an effective diagnostic model in SLE and MetS. TNFSF13B and OAS1 had a positive correlation with cholesterol and xenobiotic metabolism. Both of these two biomarkers and metabolic pathways were potentially linked to monocytes, which provides novel insights into the pathogenesis and combined therapy of SLE comorbidity with MetS.
Collapse
Affiliation(s)
- Yingyu Wang
- Division of Rheumatology, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Rheumatology, Immunology and Allergy, Fudan University, Shanghai, China
| | - Zhongzhou Huang
- Division of Rheumatology, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Rheumatology, Immunology and Allergy, Fudan University, Shanghai, China
| | - Yu Xiao
- Division of Rheumatology, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Rheumatology, Immunology and Allergy, Fudan University, Shanghai, China
| | - Weiguo Wan
- Division of Rheumatology, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Rheumatology, Immunology and Allergy, Fudan University, Shanghai, China
- *Correspondence: Weiguo Wan, ; Xue Yang,
| | - Xue Yang
- Division of Rheumatology, Huashan Hospital, Fudan University, Shanghai, China
- Institute of Rheumatology, Immunology and Allergy, Fudan University, Shanghai, China
- *Correspondence: Weiguo Wan, ; Xue Yang,
| |
Collapse
|
10
|
Yang J, Zhang L, Tang X, Han M. CodnNet: A lightweight CNN architecture for detection of COVID-19 infection. Appl Soft Comput 2022; 130:109656. [PMID: 36188336 PMCID: PMC9508701 DOI: 10.1016/j.asoc.2022.109656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 08/17/2022] [Accepted: 09/20/2022] [Indexed: 11/26/2022]
Abstract
The application of Convolutional Neural Network (CNN) on the detection of COVID-19 infection has yielded favorable results. However, with excessive model parameters, the CNN detection of COVID-19 is low in recall, highly complex in computation. In this paper, a novel lightweight CNN model, CodnNet is proposed for quick detection of COVID-19 infection. CodnNet builds a more effective dense connections based on DenseNet network to make features highly reusable and enhances interactivity of local and global features. It also uses depthwise separable convolution with large convolution kernels instead of traditional convolution to improve the range of receptive field and enhances classification performance while reducing model complexity. The 5-Fold cross validation results on Kaggle’s COVID-19 Dataset showed that CodnNet has an average precision of 97.9%, recall of 97.4%, F1score of 97.7%, accuracy of 98.5%, mAP of 99.3%, and mAUC of 99.7%. Compared to the typical CNNs, CodnNet with fewer parameters and lower computational complexity has achieved better classification accuracy and generalization performance. Therefore, the CodnNet model provides a good reference for quick detection of COVID-19 infection.
Collapse
|
11
|
A machine learning algorithm for predicting prolonged postoperative opioid prescription after lumbar disc herniation surgery. An external validation study using 1,316 patients from a Taiwanese cohort. Spine J 2022; 22:1119-1130. [PMID: 35202784 DOI: 10.1016/j.spinee.2022.02.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Revised: 01/31/2022] [Accepted: 02/14/2022] [Indexed: 02/03/2023]
Abstract
BACKGROUND CONTEXT Preoperative prediction of prolonged postoperative opioid prescription helps identify patients for increased surveillance after surgery. The SORG machine learning model has been developed and successfully tested using 5,413 patients from the United States (US) to predict the risk of prolonged opioid prescription after surgery for lumbar disc herniation. However, external validation is an often-overlooked element in the process of incorporating prediction models in current clinical practice. This cannot be stressed enough in prediction models where medicolegal and cultural differences may play a major role. PURPOSE The authors aimed to investigate the generalizability of the US citizens prediction model SORG to a Taiwanese patient cohort. STUDY DESIGN Retrospective study at a large academic medical center in Taiwan. PATIENT SAMPLE Of 1,316 patients who were 20 years or older undergoing initial operative management for lumbar disc herniation between 2010 and 2018. OUTCOME MEASURES The primary outcome of interest was prolonged opioid prescription defined as continuing opioid prescription to at least 90 to 180 days after the first surgery for lumbar disc herniation at our institution. METHODS Baseline characteristics were compared between the external validation cohort and the original developmental cohorts. Discrimination (area under the receiver operating characteristic curve and the area under the precision-recall curve), calibration, overall performance (Brier score), and decision curve analysis were used to assess the performance of the SORG ML algorithm in the validation cohort. This study had no funding source or conflict of interests. RESULTS Overall, 1,316 patients were identified with sustained postoperative opioid prescription in 41 (3.1%) patients. The validation cohort differed from the development cohort on several variables including 93% of Taiwanese patients receiving NSAIDS preoperatively compared with 22% of US citizens patients, while 30% of Taiwanese patients received opioids versus 25% in the US. Despite these differences, the SORG prediction model retained good discrimination (area under the receiver operating characteristic curve of 0.76 and the area under the precision-recall curve of 0.33) and good overall performance (Brier score of 0.028 compared with null model Brier score of 0.030) while somewhat overestimating the chance of prolonged opioid use (calibration slope of 1.07 and calibration intercept of -0.87). Decision-curve analysis showed the SORG model was suitable for clinical use. CONCLUSIONS Despite differences at baseline and a very strict opioid policy, the SORG algorithm for prolonged opioid use after surgery for lumbar disc herniation has good discriminative abilities and good overall performance in a Han Chinese patient group in Taiwan. This freely available digital application can be used to identify high-risk patients and tailor prevention policies for these patients that may mitigate the long-term adverse consequence of opioid dependence: https://sorg-apps.shinyapps.io/lumbardiscopioid/.
Collapse
|
12
|
Mao N, Shi Y, Lian C, Wang Z, Zhang K, Xie H, Zhang H, Chen Q, Cheng G, Xu C, Dai Y. Intratumoral and peritumoral radiomics for preoperative prediction of neoadjuvant chemotherapy effect in breast cancer based on contrast-enhanced spectral mammography. Eur Radiol 2022; 32:3207-3219. [DOI: 10.1007/s00330-021-08414-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 09/26/2021] [Accepted: 10/13/2021] [Indexed: 12/14/2022]
|
13
|
Qi J, Lei J, Li N, Huang D, Liu H, Zhou K, Dai Z, Sun C. Machine learning models to predict in-hospital mortality in septic patients with diabetes. Front Endocrinol (Lausanne) 2022; 13:1034251. [PMID: 36465642 PMCID: PMC9709414 DOI: 10.3389/fendo.2022.1034251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 10/25/2022] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Sepsis is a leading cause of morbidity and mortality in hospitalized patients. Up to now, there are no well-established longitudinal networks from molecular mechanisms to clinical phenotypes in sepsis. Adding to the problem, about one of the five patients presented with diabetes. For this subgroup, management is difficult, and prognosis is difficult to evaluate. METHODS From the three databases, a total of 7,001 patients were enrolled on the basis of sepsis-3 standard and diabetes diagnosis. Input variable selection is based on the result of correlation analysis in a handpicking way, and 53 variables were left. A total of 5,727 records were collected from Medical Information Mart for Intensive Care database and randomly split into a training set and an internal validation set at a ratio of 7:3. Then, logistic regression with lasso regularization, Bayes logistic regression, decision tree, random forest, and XGBoost were conducted to build the predictive model by using training set. Then, the models were tested by the internal validation set. The data from eICU Collaborative Research Database (n = 815) and dtChina critical care database (n = 459) were used to test the model performance as the external validation set. RESULTS In the internal validation set, the accuracy values of logistic regression with lasso regularization, Bayes logistic regression, decision tree, random forest, and XGBoost were 0.878, 0.883, 0.865, 0.883, and 0.882, respectively. Likewise, in the external validation set 1, lasso regularization = 0.879, Bayes logistic regression = 0.877, decision tree = 0.865, random forest = 0.886, and XGBoost = 0.875. In the external validation set 2, lasso regularization = 0.715, Bayes logistic regression = 0.745, decision tree = 0.763, random forest = 0.760, and XGBoost = 0.699. CONCLUSION The top three models for internal validation set were Bayes logistic regression, random forest, and XGBoost, whereas the top three models for external validation set 1 were random forest, logistic regression, and Bayes logistic regression. In addition, the top three models for the external validation set 2 were decision tree, random forest, and Bayes logistic regression. Random forest model performed well with the training and three validation sets. The most important features are age, albumin, and lactate.
Collapse
|