1
|
Yousef H, Malagurski Tortei B, Castiglione F. Predicting multiple sclerosis disease progression and outcomes with machine learning and MRI-based biomarkers: a review. J Neurol 2024; 271:6543-6572. [PMID: 39266777 PMCID: PMC11447111 DOI: 10.1007/s00415-024-12651-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Revised: 08/16/2024] [Accepted: 08/17/2024] [Indexed: 09/14/2024]
Abstract
Multiple sclerosis (MS) is a demyelinating neurological disorder with a highly heterogeneous clinical presentation and course of progression. Disease-modifying therapies are the only available treatment, as there is no known cure for the disease. Careful selection of suitable therapies is necessary, as they can be accompanied by serious risks and adverse effects such as infection. Magnetic resonance imaging (MRI) plays a central role in the diagnosis and management of MS, though MRI lesions have displayed only moderate associations with MS clinical outcomes, known as the clinico-radiological paradox. With the advent of machine learning (ML) in healthcare, the predictive power of MRI can be improved by leveraging both traditional and advanced ML algorithms capable of analyzing increasingly complex patterns within neuroimaging data. The purpose of this review was to examine the application of MRI-based ML for prediction of MS disease progression. Studies were divided into five main categories: predicting the conversion of clinically isolated syndrome to MS, cognitive outcome, EDSS-related disability, motor disability and disease activity. The performance of ML models is discussed along with highlighting the influential MRI-derived biomarkers. Overall, MRI-based ML presents a promising avenue for MS prognosis. However, integration of imaging biomarkers with other multimodal patient data shows great potential for advancing personalized healthcare approaches in MS.
Collapse
Affiliation(s)
- Hibba Yousef
- Technology Innovation Institute, Biotechnology Research Center, P.O.Box: 9639, Masdar City, Abu Dhabi, United Arab Emirates.
| | - Brigitta Malagurski Tortei
- Technology Innovation Institute, Biotechnology Research Center, P.O.Box: 9639, Masdar City, Abu Dhabi, United Arab Emirates
| | - Filippo Castiglione
- Technology Innovation Institute, Biotechnology Research Center, P.O.Box: 9639, Masdar City, Abu Dhabi, United Arab Emirates
- Institute for Applied Computing (IAC), National Research Council of Italy, Rome, Italy
| |
Collapse
|
2
|
Mehrbakhsh Z, Hassanzadeh R, Behnampour N, Tapak L, Zarrin Z, Khazaei S, Dinu I. Machine learning-based evaluation of prognostic factors for mortality and relapse in patients with acute lymphoblastic leukemia: a comparative simulation study. BMC Med Inform Decis Mak 2024; 24:261. [PMID: 39285373 PMCID: PMC11404043 DOI: 10.1186/s12911-024-02645-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Accepted: 08/21/2024] [Indexed: 09/22/2024] Open
Abstract
BACKGROUND Predicting mortality and relapse in children with acute lymphoblastic leukemia (ALL) is crucial for effective treatment and follow-up management. ALL is a common and deadly childhood cancer that often relapses after remission. In this study, we aimed to apply and evaluate machine learning-based models for predicting mortality and relapse in pediatric ALL patients. METHODS This retrospective cohort study was conducted on 161 children aged less than 16 years with ALL. Survival status (dead/alive) and patient experience of relapse (yes/no) were considered as the outcome variables. Ten machine learning (ML) algorithms were used to predict mortality and relapse. The performance of the algorithms was evaluated by cross-validation and reported as mean sensitivity, specificity, accuracy and area under the curve (AUC). Finally, prognostic factors were identified based on the best algorithms. RESULTS The mean accuracy of the ML algorithms for prediction of patient mortality ranged from 64 to 74% and for prediction of relapse, it varied from 64 to 84% on test data sets. The mean AUC of the ML algorithms for mortality and relapse was above 64%. The most important prognostic factors for predicting both mortality and relapse were identified as age at diagnosis, hemoglobin and platelets. In addition, significant prognostic factors for predicting mortality included clinical side effects such as splenomegaly, hepatomegaly and lymphadenopathy. CONCLUSIONS Our results showed that artificial neural networks and bagging algorithms outperformed other algorithms in predicting mortality, while boosting and random forest algorithms excelled in predicting relapse in ALL patients across all criteria. These results offer significant clinical insights into the prognostic factors for children with ALL, which can inform treatment decisions and improve patient outcomes.
Collapse
Affiliation(s)
- Zahra Mehrbakhsh
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
- Student Research Committee, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Roghayyeh Hassanzadeh
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran
- Student Research Committee, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Nasser Behnampour
- Department of Biostatistics and Epidemiology, School of Health, Golestan University of Medical Sciences, Gorgan, Iran
| | - Leili Tapak
- Department of Biostatistics, School of Public Health, Hamadan University of Medical Sciences, Hamadan, Iran.
- Modeling of Noncommunicable Diseases Research Center, Hamadan University of Medical Sciences, Hamadan, Iran.
| | - Ziba Zarrin
- Department of Photogrammetry and Remote Sensing, K.N. Toosi University of Technology, Tehran, Iran
| | - Salman Khazaei
- Health Sciences Research Center, Health Sciences & Technology Research Institute, Hamadan University of Medical Science, Hamadan, Iran
| | - Irina Dinu
- School of Public Health, University of Alberta, Edmonton, Canada
| |
Collapse
|
3
|
Lee M, Park T, Shin JY, Park M. A comprehensive multi-task deep learning approach for predicting metabolic syndrome with genetic, nutritional, and clinical data. Sci Rep 2024; 14:17851. [PMID: 39090161 PMCID: PMC11294629 DOI: 10.1038/s41598-024-68541-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 07/24/2024] [Indexed: 08/04/2024] Open
Abstract
Metabolic syndrome (MetS) is a complex disorder characterized by a cluster of metabolic abnormalities, including abdominal obesity, hypertension, elevated triglycerides, reduced high-density lipoprotein cholesterol, and impaired glucose tolerance. It poses a significant public health concern, as individuals with MetS are at an increased risk of developing cardiovascular diseases and type 2 diabetes. Early and accurate identification of individuals at risk for MetS is essential. Various machine learning approaches have been employed to predict MetS, such as logistic regression, support vector machines, and several boosting techniques. However, these methods use MetS as a binary status and do not consider that MetS comprises five components. Therefore, a method that focuses on these characteristics of MetS is needed. In this study, we propose a multi-task deep learning model designed to predict MetS and its five components simultaneously. The benefit of multi-task learning is that it can manage multiple tasks with a single model, and learning related tasks may enhance the model's predictive performance. To assess the efficacy of our proposed method, we compared its performance with that of several single-task approaches, including logistic regression, support vector machine, CatBoost, LightGBM, XGBoost and one-dimensional convolutional neural network. For the construction of our multi-task deep learning model, we utilized data from the Korean Association Resource (KARE) project, which includes 352,228 single nucleotide polymorphisms (SNPs) from 7729 individuals. We also considered lifestyle, dietary, and socio-economic factors that affect chronic diseases, in addition to genomic data. By evaluating metrics such as accuracy, precision, F1-score, and the area under the receiver operating characteristic curve, we demonstrate that our multi-task learning model surpasses traditional single-task machine learning models in predicting MetS.
Collapse
Affiliation(s)
- Minhyuk Lee
- Department of Statistics, Korea University, Seoul, Republic of Korea
| | - Taesung Park
- Department of Statistics, Seoul National University, Seoul, Republic of Korea
| | - Ji-Yeon Shin
- Department of Preventive Medicine, School of Medicine, Kyungpook National University, Daegu, Republic of Korea.
| | - Mira Park
- Department of Preventive Medicine, School of Medicine, Eulji University, Daejeon, Republic of Korea.
| |
Collapse
|
4
|
Cho JH, Kim M, Nam HS, Park SY, Lee YS. Age and medial compartmental OA were important predictors of the lateral compartmental OA in the discoid lateral meniscus: Analysis using machine learning approach. Knee Surg Sports Traumatol Arthrosc 2024; 32:1660-1671. [PMID: 38651559 DOI: 10.1002/ksa.12196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 03/16/2024] [Accepted: 03/28/2024] [Indexed: 04/25/2024]
Abstract
PURPOSE The objective of this study was to develop a machine learning model that would predict lateral compartment osteoarthritis (OA) in the discoid lateral meniscus (DLM), from which to then identify factors contributing to lateral compartment OA, with a key focus on the patient's age. METHODS Data were collected from 611 patients with symptomatic DLM diagnosed using magnetic resonance imaging between April 2003 and May 2022. Twenty features, including demographic, clinical and radiological data and six algorithms were used to develop the predictive machine learning models. Shapley additive explanation (SHAP) analysis was performed on the best model, in addition to subgroup analyses according to age. RESULTS Extreme gradient boosting classifier was identified as the best prediction model, with an area under the receiver operating characteristic curve (AUROC) of 0.968, the highest among all the models, regardless of age (AUROC of 0.977 in young age and AUROC of 0.937 in old age). In the SHAP analysis, the most predictive feature was age, followed by the presence of medial compartment OA. In the subgroup analysis, the most predictive feature was age in young age, whereas the most predictive feature was the presence of medial compartment OA in old age. CONCLUSION The machine learning model developed in this study showed a high predictive performance with regard to predicting lateral compartment OA of the DLM. Age was identified as the most important factor, followed by medial compartment OA. In subgroup analysis, medial compartmental OA was found to be the most important factor in the older age group, whereas age remained the most important factor in the younger age group. These findings provide insights that may prove useful for the establishment of strategies for the treatment of patients with symptomatic DLM. LEVEL OF EVIDENCE Level III.
Collapse
Affiliation(s)
- Joon Hee Cho
- Department of Orthopedic Surgery, Seoul National University College of Medicine, Seoul National University Bundang Hospital, Seongnam-si, Korea
| | - Myeongju Kim
- Division of Clinical Medicine, Center for Artificial Intelligence in Healthcare, Seoul National University Bundang Hospital, Seongnam-si, Korea
| | - Hee Seung Nam
- Department of Orthopedic Surgery, Seoul National University College of Medicine, Seoul National University Bundang Hospital, Seongnam-si, Korea
| | - Seong Yun Park
- Department of Orthopedic Surgery, Seoul National University College of Medicine, Seoul National University Bundang Hospital, Seongnam-si, Korea
| | - Yong Seuk Lee
- Department of Orthopedic Surgery, Seoul National University College of Medicine, Seoul National University Bundang Hospital, Seongnam-si, Korea
| |
Collapse
|
5
|
Jiang S, Xu W, Xia Q, Yi M, Zhou Y, Shang J, Cheng X. Application of machine learning in the study of cobalt-based oxide catalysts for antibiotic degradation: An innovative reverse synthesis strategy. JOURNAL OF HAZARDOUS MATERIALS 2024; 471:134309. [PMID: 38653133 DOI: 10.1016/j.jhazmat.2024.134309] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 04/07/2024] [Accepted: 04/13/2024] [Indexed: 04/25/2024]
Abstract
This study addresses antibiotic pollution in global water bodies by integrating machine learning and optimization algorithms to develop a novel reverse synthesis strategy for inorganic catalysts. We meticulously analyzed data from 96 studies, ensuring quality through preprocessing steps. Employing the AdaBoost model, we achieved 90.57% accuracy in classification and an R²value of 0.93 in regression, showcasing strong predictive power. A key innovation is the Sparrow Search Algorithm (SSA), which optimizes catalyst selection and experimental setup tailored to specific antibiotics. Empirical experiments validated SSA's efficacy, with degradation rates of 94% for Levofloxacin and 97% for Norfloxacin, aligning closely with predictions within a 2% margin of error. This research advances theoretical understanding and offers practical applications in material science and environmental engineering, significantly enhancing catalyst design efficiency and accuracy through the fusion of advanced machine learning techniques and optimization algorithms.
Collapse
Affiliation(s)
- Siyuan Jiang
- Key Laboratory for Environmental Pollution Prediction and Control, Gansu Province, College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, PR China
| | - Wen Xu
- Key Laboratory for Environmental Pollution Prediction and Control, Gansu Province, College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, PR China
| | - Qi Xia
- Key Laboratory for Environmental Pollution Prediction and Control, Gansu Province, College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, PR China
| | - Ming Yi
- Key Laboratory for Environmental Pollution Prediction and Control, Gansu Province, College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, PR China
| | - Yuerong Zhou
- Key Laboratory for Environmental Pollution Prediction and Control, Gansu Province, College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, PR China
| | - Jiangwei Shang
- Key Laboratory for Environmental Pollution Prediction and Control, Gansu Province, College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, PR China
| | - Xiuwen Cheng
- Key Laboratory for Environmental Pollution Prediction and Control, Gansu Province, College of Earth and Environmental Sciences, Lanzhou University, Lanzhou 730000, PR China.
| |
Collapse
|
6
|
Zhang X, Chen S, Zhang P, Wang C, Wang Q, Zhou X. Staging of Liver Fibrosis Based on Energy Valley Optimization Multiple Stacking (EVO-MS) Model. Bioengineering (Basel) 2024; 11:485. [PMID: 38790352 PMCID: PMC11117710 DOI: 10.3390/bioengineering11050485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 05/09/2024] [Accepted: 05/10/2024] [Indexed: 05/26/2024] Open
Abstract
Currently, staging the degree of liver fibrosis predominantly relies on liver biopsy, a method fraught with potential risks, such as bleeding and infection. With the rapid development of medical imaging devices, quantification of liver fibrosis through image processing technology has become feasible. Stacking technology is one of the effective ensemble techniques for potential usage, but precise tuning to find the optimal configuration manually is challenging. Therefore, this paper proposes a novel EVO-MS model-a multiple stacking ensemble learning model optimized by the energy valley optimization (EVO) algorithm to select most informatic features for fibrosis quantification. Liver contours are profiled from 415 biopsied proven CT cases, from which 10 shape features are calculated and inputted into a Support Vector Machine (SVM) classifier to generate the accurate predictions, then the EVO algorithm is applied to find the optimal parameter combination to fuse six base models: K-Nearest Neighbors (KNNs), Decision Tree (DT), Naive Bayes (NB), Extreme Gradient Boosting (XGB), Gradient Boosting Decision Tree (GBDT), and Random Forest (RF), to create a well-performing ensemble model. Experimental results indicate that selecting 3-5 feature parameters yields satisfactory results in classification, with features such as the contour roundness non-uniformity (Rmax), maximum peak height of contour (Rp), and maximum valley depth of contour (Rm) significantly influencing classification accuracy. The improved EVO algorithm, combined with a multiple stacking model, achieves an accuracy of 0.864, a precision of 0.813, a sensitivity of 0.912, a specificity of 0.824, and an F1-score of 0.860, which demonstrates the effectiveness of our EVO-MS model in staging the degree of liver fibrosis.
Collapse
Affiliation(s)
- Xuejun Zhang
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China; (X.Z.); (P.Z.); (C.W.)
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning 530004, China
| | - Shengxiang Chen
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China; (X.Z.); (P.Z.); (C.W.)
| | - Pengfei Zhang
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China; (X.Z.); (P.Z.); (C.W.)
| | - Chun Wang
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China; (X.Z.); (P.Z.); (C.W.)
| | - Qibo Wang
- School of Computer, Electronics and Information, Guangxi University, Nanning 530004, China; (X.Z.); (P.Z.); (C.W.)
| | - Xiangrong Zhou
- Department of Electrical, Electronic and Computer Engineering, Gifu University, Gifu 501-1193, Japan;
| |
Collapse
|
7
|
Zhou J, Liu W, Zhou H, Lau KK, Wong GH, Chan WC, Zhang Q, Knapp M, Wong IC, Luo H. Identifying dementia from cognitive footprints in hospital records among Chinese older adults: a machine-learning study. THE LANCET REGIONAL HEALTH. WESTERN PACIFIC 2024; 46:101060. [PMID: 38638410 PMCID: PMC11025003 DOI: 10.1016/j.lanwpc.2024.101060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/09/2024] [Accepted: 03/25/2024] [Indexed: 04/20/2024]
Abstract
Background By combining theory-driven and data-driven methods, this study aimed to develop dementia predictive algorithms among Chinese older adults guided by the cognitive footprint theory. Methods Electronic medical records from the Clinical Data Analysis and Reporting System in Hong Kong were employed. We included patients with dementia diagnosed at 65+ between 2010 and 2018, and 1:1 matched dementia-free controls. We identified 51 features, comprising exposures to established modifiable factors and other factors before and after 65 years old. The performances of four machine learning models, including LASSO, Multilayer perceptron (MLP), XGBoost, and LightGBM, were compared with logistic regression models, for all patients and subgroups by age. Findings A total of 159,920 individuals (40.5% male; mean age [SD]: 83.97 [7.38]) were included. Compared with the model included established modifiable factors only (area under the curve [AUC] 0.689, 95% CI [0.684, 0.694]), the predictive accuracy substantially improved for models with all factors (0.774, [0.770, 0.778]). Machine learning and logistic regression models performed similarly, with AUC ranged between 0.773 (0.768, 0.777) for LASSO and 0.780 (0.776, 0.784) for MLP. Antipsychotics, education, antidepressants, head injury, and stroke were identified as the most important predictors in the total sample. Age-specific models identified different important features, with cardiovascular and infectious diseases becoming prominent in older ages. Interpretation The models showed satisfactory performances in identifying dementia. These algorithms can be used in clinical practice to assist decision making and allow timely interventions cost-effectively. Funding The Research Grants Council of Hong Kong under the Early Career Scheme 27110519.
Collapse
Affiliation(s)
- Jiayi Zhou
- Department of Social Work and Social Administration, The University of Hong Kong, Hong Kong SAR, China
| | - Wenlong Liu
- Centre for Safe Medication Practice and Research, Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Huiquan Zhou
- Department of Psychiatry, The University of Hong Kong, Hong Kong SAR, China
| | - Kui Kai Lau
- Department of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Gloria H.Y. Wong
- Department of Social Work and Social Administration, The University of Hong Kong, Hong Kong SAR, China
| | - Wai Chi Chan
- Department of Psychiatry, The University of Hong Kong, Hong Kong SAR, China
| | - Qingpeng Zhang
- Centre for Safe Medication Practice and Research, Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Musketeers Foundation Institute of Data Science, The University of Hong Kong, Hong Kong SAR, China
| | - Martin Knapp
- Care Policy and Evaluation Centre (CPEC), The London School of Economics and Political Science, London, UK
| | - Ian C.K. Wong
- Centre for Safe Medication Practice and Research, Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
- Laboratory of Data Discovery for Health (D24H), Hong Kong Science and Technology Park, Sha Tin, Hong Kong SAR, China
- Aston Pharmacy School, Aston University, Birmingham B4 7ET, UK
| | - Hao Luo
- Department of Social Work and Social Administration, The University of Hong Kong, Hong Kong SAR, China
- Department of Computer Science, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
8
|
Jacobs P, Khoche S. Artificial Intelligence and Echocardiography: A Genuinely Interesting Conundrum. J Cardiothorac Vasc Anesth 2024; 38:1065-1067. [PMID: 38378322 DOI: 10.1053/j.jvca.2024.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 01/17/2024] [Indexed: 02/22/2024]
Affiliation(s)
- Paul Jacobs
- Department of Anesthesiology, Division of Cardiothoracic Anesthesia, University of California, San Diego, Thornton Hospital, La Jolla, CA.
| | - Swapnil Khoche
- Department of Anesthesiology, Division of Cardiothoracic Anesthesia, University of California, San Diego, Thornton Hospital, La Jolla, CA
| |
Collapse
|
9
|
Romano D, Novielli P, Diacono D, Cilli R, Pantaleo E, Amoroso N, Bellantuono L, Monaco A, Bellotti R, Tangaro S. Insights from Explainable Artificial Intelligence of Pollution and Socioeconomic Influences for Respiratory Cancer Mortality in Italy. J Pers Med 2024; 14:430. [PMID: 38673057 PMCID: PMC11051343 DOI: 10.3390/jpm14040430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Revised: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 04/28/2024] Open
Abstract
Respiratory malignancies, encompassing cancers affecting the lungs, the trachea, and the bronchi, pose a significant and dynamic public health challenge. Given that air pollution stands as a significant contributor to the onset of these ailments, discerning the most detrimental agents becomes imperative for crafting policies aimed at mitigating exposure. This study advocates for the utilization of explainable artificial intelligence (XAI) methodologies, leveraging remote sensing data, to ascertain the primary influencers on the prediction of standard mortality rates (SMRs) attributable to respiratory cancer across Italian provinces, utilizing both environmental and socioeconomic data. By scrutinizing thirteen distinct machine learning algorithms, we endeavor to pinpoint the most accurate model for categorizing Italian provinces as either above or below the national average SMR value for respiratory cancer. Furthermore, employing XAI techniques, we delineate the salient factors crucial in predicting the two classes of SMR. Through our machine learning scrutiny, we illuminate the environmental and socioeconomic factors pertinent to mortality in this disease category, thereby offering a roadmap for prioritizing interventions aimed at mitigating risk factors.
Collapse
Affiliation(s)
- Donato Romano
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy; (D.R.); (P.N.)
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
| | - Pierfrancesco Novielli
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy; (D.R.); (P.N.)
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
| | - Domenico Diacono
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
| | - Roberto Cilli
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
- Dipartimento Interateneo di Fisica “M. Merlin”, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Ester Pantaleo
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
- Dipartimento Interateneo di Fisica “M. Merlin”, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Nicola Amoroso
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
- Dipartimento di Farmacia Scienze del Farmaco, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Loredana Bellantuono
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
- Dipartimento di Biomedicina Traslazionale e Neuroscienze, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Alfonso Monaco
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
- Dipartimento Interateneo di Fisica “M. Merlin”, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Roberto Bellotti
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
- Dipartimento Interateneo di Fisica “M. Merlin”, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy
| | - Sabina Tangaro
- Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari Aldo Moro, 70126 Bari, Italy; (D.R.); (P.N.)
- Istituto Nazionale di Fisica Nucleare, Sezione di Bari, 70126 Bari, Italy; (D.D.); (R.C.); (E.P.); (N.A.); (L.B.); (A.M.); (R.B.)
| |
Collapse
|
10
|
Chafai N, Bonizzi L, Botti S, Badaoui B. Emerging applications of machine learning in genomic medicine and healthcare. Crit Rev Clin Lab Sci 2024; 61:140-163. [PMID: 37815417 DOI: 10.1080/10408363.2023.2259466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 09/12/2023] [Indexed: 10/11/2023]
Abstract
The integration of artificial intelligence technologies has propelled the progress of clinical and genomic medicine in recent years. The significant increase in computing power has facilitated the ability of artificial intelligence models to analyze and extract features from extensive medical data and images, thereby contributing to the advancement of intelligent diagnostic tools. Artificial intelligence (AI) models have been utilized in the field of personalized medicine to integrate clinical data and genomic information of patients. This integration allows for the identification of customized treatment recommendations, ultimately leading to enhanced patient outcomes. Notwithstanding the notable advancements, the application of artificial intelligence (AI) in the field of medicine is impeded by various obstacles such as the limited availability of clinical and genomic data, the diversity of datasets, ethical implications, and the inconclusive interpretation of AI models' results. In this review, a comprehensive evaluation of multiple machine learning algorithms utilized in the fields of clinical and genomic medicine is conducted. Furthermore, we present an overview of the implementation of artificial intelligence (AI) in the fields of clinical medicine, drug discovery, and genomic medicine. Finally, a number of constraints pertaining to the implementation of artificial intelligence within the healthcare industry are examined.
Collapse
Affiliation(s)
- Narjice Chafai
- Laboratory of Biodiversity, Ecology, and Genome, Faculty of Sciences, Department of Biology, Mohammed V University in Rabat, Rabat, Morocco
| | - Luigi Bonizzi
- Department of Biomedical, Surgical and Dental Science, University of Milan, Milan, Italy
| | - Sara Botti
- PTP Science Park, Via Einstein - Loc. Cascina Codazza, Lodi, Italy
| | - Bouabid Badaoui
- Laboratory of Biodiversity, Ecology, and Genome, Faculty of Sciences, Department of Biology, Mohammed V University in Rabat, Rabat, Morocco
- African Sustainable Agriculture Research Institute (ASARI), Mohammed VI Polytechnic University (UM6P), Laâyoune, Morocco
| |
Collapse
|
11
|
Staerk C, Byrd A, Mayr A. Recent Methodological Trends in Epidemiology: No Need for Data-Driven Variable Selection? Am J Epidemiol 2024; 193:370-376. [PMID: 37771042 DOI: 10.1093/aje/kwad193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Revised: 08/02/2023] [Accepted: 09/27/2023] [Indexed: 09/30/2023] Open
Abstract
Variable selection in regression models is a particularly important issue in epidemiology, where one usually encounters observational studies. In contrast to randomized trials or experiments, confounding is often not controlled by the study design, but has to be accounted for by suitable statistical methods. For instance, when risk factors should be identified with unconfounded effect estimates, multivariable regression techniques can help to adjust for confounders. We investigated the current practice of variable selection in 4 major epidemiologic journals in 2019 and found that the majority of articles used subject-matter knowledge to determine a priori the set of included variables. In comparison with previous reviews from 2008 and 2015, fewer articles applied data-driven variable selection. Furthermore, for most articles the main aim of analysis was hypothesis-driven effect estimation in rather low-dimensional data situations (i.e., large sample size compared with the number of variables). Based on our results, we discuss the role of data-driven variable selection in epidemiology.
Collapse
|
12
|
Lu MY, Huang CF, Hung CH, Tai C, Mo LR, Kuo HT, Tseng KC, Lo CC, Bair MJ, Wang SJ, Huang JF, Yeh ML, Chen CT, Tsai MC, Huang CW, Lee PL, Yang TH, Huang YH, Chong LW, Chen CL, Yang CC, Yang S, Cheng PN, Hsieh TY, Hu JT, Wu WC, Cheng CY, Chen GY, Zhou GX, Tsai WL, Kao CN, Lin CL, Wang CC, Lin TY, Lin C, Su WW, Lee TH, Chang TS, Liu CJ, Dai CY, Kao JH, Lin HC, Chuang WL, Peng CY, Tsai CW, Chen CY, Yu ML. Artificial intelligence predicts direct-acting antivirals failure among hepatitis C virus patients: A nationwide hepatitis C virus registry program. Clin Mol Hepatol 2024; 30:64-79. [PMID: 38195113 PMCID: PMC10776298 DOI: 10.3350/cmh.2023.0287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 11/02/2023] [Accepted: 11/20/2023] [Indexed: 01/11/2024] Open
Abstract
BACKGROUND/AIMS Despite the high efficacy of direct-acting antivirals (DAAs), approximately 1-3% of hepatitis C virus (HCV) patients fail to achieve a sustained virological response. We conducted a nationwide study to investigate risk factors associated with DAA treatment failure. Machine-learning algorithms have been applied to discriminate subjects who may fail to respond to DAA therapy. METHODS We analyzed the Taiwan HCV Registry Program database to explore predictors of DAA failure in HCV patients. Fifty-five host and virological features were assessed using multivariate logistic regression, decision tree, random forest, eXtreme Gradient Boosting (XGBoost), and artificial neural network. The primary outcome was undetectable HCV RNA at 12 weeks after the end of treatment. RESULTS The training (n=23,955) and validation (n=10,346) datasets had similar baseline demographics, with an overall DAA failure rate of 1.6% (n=538). Multivariate logistic regression analysis revealed that liver cirrhosis, hepatocellular carcinoma, poor DAA adherence, and higher hemoglobin A1c were significantly associated with virological failure. XGBoost outperformed the other algorithms and logistic regression models, with an area under the receiver operating characteristic curve of 1.000 in the training dataset and 0.803 in the validation dataset. The top five predictors of treatment failure were HCV RNA, body mass index, α-fetoprotein, platelets, and FIB-4 index. The accuracy, sensitivity, specificity, positive predictive value, and negative predictive value of the XGBoost model (cutoff value=0.5) were 99.5%, 69.7%, 99.9%, 97.4%, and 99.5%, respectively, for the entire dataset. CONCLUSION Machine learning algorithms effectively provide risk stratification for DAA failure and additional information on the factors associated with DAA failure.
Collapse
Affiliation(s)
- Ming-Ying Lu
- School of Medicine and Doctoral Program of Clinical and Experimental Medicine, College of Medicine and Center of Excellence for Metabolic Associated Fatty Liver Disease, National Sun Yat-sen University, Kaohsiung, Taiwan
- Hepatobiliary Division, Department of Internal Medicine and Hepatitis Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Hepatitis Research Center, College of Medicine and Center for Liquid Biopsy and Cohort Research, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Chung-Feng Huang
- Hepatobiliary Division, Department of Internal Medicine and Hepatitis Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Hepatitis Research Center, College of Medicine and Center for Liquid Biopsy and Cohort Research, Kaohsiung Medical University, Kaohsiung, Taiwan
- Ph.D. Program in Translational Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, and Academia Sinica, Taipei, Taiwan
| | - Chao-Hung Hung
- Division of Hepatogastroenterology, Department of Internal Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
| | - Chi‐Ming Tai
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, E-Da Hospital, I-Shou University, Kaohsiung, Taiwan
- School of Medicine for International Students, College of Medicine, I-Shou University, Kaohsiung, Taiwan
| | - Lein-Ray Mo
- Division of Gastroenterology, Tainan Municipal Hospital (Managed By Show Chwan Medical Care Corporation), Tainan, Taiwan
| | - Hsing-Tao Kuo
- School of Medicine and Doctoral Program of Clinical and Experimental Medicine, College of Medicine and Center of Excellence for Metabolic Associated Fatty Liver Disease, National Sun Yat-sen University, Kaohsiung, Taiwan
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Chi Mei Medical Center, Yongkang District, Tainan, Taiwan
| | - Kuo-Chih Tseng
- Department of Internal Medicine, Dalin Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Chiayi, Taiwan
- School of Medicine, Tzuchi University, Hualien, Taiwan
| | - Ching-Chu Lo
- Division of Gastroenterology, Department of Internal Medicine, St. Martin De Porres Hospital, Chiayi, Taiwan
| | - Ming-Jong Bair
- Division of Gastroenterology, Department of Internal Medicine, Taitung Mackay Memorial Hospital, Taitung, Taiwan
- Mackay Medical College, New Taipei City, Taiwan
| | - Szu-Jen Wang
- Division of Gastroenterology, Department of Internal Medicine, Yuan’s General Hospital, Kaohsiung, Taiwan
| | - Jee-Fu Huang
- Hepatobiliary Division, Department of Internal Medicine and Hepatitis Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Hepatitis Research Center, College of Medicine and Center for Liquid Biopsy and Cohort Research, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Ming-Lun Yeh
- Hepatobiliary Division, Department of Internal Medicine and Hepatitis Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Hepatitis Research Center, College of Medicine and Center for Liquid Biopsy and Cohort Research, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Chun-Ting Chen
- Division of Gastroenterology, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
- Division of Gastroenterology, Department of Internal Medicine, Tri-Service General Hospital Penghu Branch, National Defense Medical Center, Taipei, Taiwan
| | - Ming-Chang Tsai
- School of Medicine, Chung Shan Medical University, Department of Internal Medicine, Chung Shan Medical University Hospital, Taichung, Taiwan
| | - Chien-Wei Huang
- Division of Gastroenterology, Kaohsiung Armed Forces General Hospital, Kaohsiung, Taiwan
| | - Pei-Lun Lee
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Chi Mei Medical Center, Liouying, Tainan, Taiwan
| | | | - Yi-Hsiang Huang
- Division of Gastroenterology and Hepatology, Department of Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
- Institute of Clinical Medicine, School of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan
| | - Lee-Won Chong
- Division of Hepatology and Gastroenterology, Department of Internal Medicine, Shin Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan
- School of Medicine, Fu-Jen Catholic University, New Taipei City, Taiwan
| | - Chien-Lin Chen
- Department of Medicine, Hualien Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation and Tzu Chi University, Hualien, Taiwan
| | - Chi-Chieh Yang
- Department of Gastroenterology, Division of Internal Medicine, Show Chwan Memorial Hospital, Changhua, Taiwan
| | - Sheng‐Shun Yang
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan
| | - Pin-Nan Cheng
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan, Taiwan
| | - Tsai-Yuan Hsieh
- Division of Gastroenterology, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
| | - Jui-Ting Hu
- Liver Center, Cathay General Hospital, Taipei, Taiwan
| | - Wen-Chih Wu
- Wen-Chih Wu Clinic, Fengshan, Kaohsiung, Taiwan
| | - Chien-Yu Cheng
- Division of Infectious Diseases, Department of Internal Medicine, Taoyuan General Hospital, Ministry of Health and Welfare, Taoyuan, Taiwan
| | - Guei-Ying Chen
- Penghu Hospital, Ministry of Health and Welfare, Penghu, Taiwan
| | | | - Wei-Lun Tsai
- Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan
| | - Chien-Neng Kao
- National Taiwan University Hospital Hsin-Chu Branch, Hsinchu, Taiwan
| | - Chih-Lang Lin
- Liver Research Unit, Department of Hepato-Gastroenterology and Community Medicine Research Center, Chang Gung Memorial Hospital at Keelung, College of Medicine, Chang Gung University, Keelung, Taiwan
| | - Chia-Chi Wang
- Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation and School of Medicine, Tzu Chi University, Taipei, Taiwan
| | - Ta-Ya Lin
- Cishan Hospital, Ministry of Health and Welfare, Kaohsiung, Taiwan
| | - Chih‐Lin Lin
- Department of Gastroenterology, Renai Branch, Taipei City Hospital, Taipei, Taiwan
| | - Wei-Wen Su
- Department of Gastroenterology and Hepatology, Changhua Christian Hospital, Changhua, Taiwan
| | - Tzong-Hsi Lee
- Division of Gastroenterology and Hepatology, Far Eastern Memorial Hospital, New Taipei City, Taiwan
| | - Te-Sheng Chang
- Division of Hepatogastroenterology, Department of Internal Medicine, Chang Gung Memorial Hospital, Chiayi, Taiwan and College of Medicine, Chang Gung University, Taoyuan, Taiwan
| | - Chun-Jen Liu
- Hepatitis Research Center and Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Chia-Yen Dai
- Hepatobiliary Division, Department of Internal Medicine and Hepatitis Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Hepatitis Research Center, College of Medicine and Center for Liquid Biopsy and Cohort Research, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Jia-Horng Kao
- Hepatitis Research Center and Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Han-Chieh Lin
- Division of Gastroenterology and Hepatology, Department of Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
- Institute of Clinical Medicine, School of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan
| | - Wan-Long Chuang
- Hepatobiliary Division, Department of Internal Medicine and Hepatitis Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Hepatitis Research Center, College of Medicine and Center for Liquid Biopsy and Cohort Research, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Cheng-Yuan Peng
- Center for Digestive Medicine, Department of Internal Medicine, China Medical University Hospital, Taichung, Taiwan
- School of Medicine, China Medical University, Taichung, Taiwan
| | - Chun-Wei- Tsai
- Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
| | - Chi-Yi Chen
- Division of Gastroenterology and Hepatology, Department of Medicine, Ditmanson Medical Foundation Chiayi Christian Hospital, Chiayi, Taiwan
| | - Ming-Lung Yu
- School of Medicine and Doctoral Program of Clinical and Experimental Medicine, College of Medicine and Center of Excellence for Metabolic Associated Fatty Liver Disease, National Sun Yat-sen University, Kaohsiung, Taiwan
- Hepatobiliary Division, Department of Internal Medicine and Hepatitis Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Hepatitis Research Center, College of Medicine and Center for Liquid Biopsy and Cohort Research, Kaohsiung Medical University, Kaohsiung, Taiwan
- Division of Hepatogastroenterology, Department of Internal Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
| | - TACR Study Group
- School of Medicine and Doctoral Program of Clinical and Experimental Medicine, College of Medicine and Center of Excellence for Metabolic Associated Fatty Liver Disease, National Sun Yat-sen University, Kaohsiung, Taiwan
- Hepatobiliary Division, Department of Internal Medicine and Hepatitis Center, Kaohsiung Medical University Hospital, Kaohsiung Medical University, Kaohsiung, Taiwan
- Hepatitis Research Center, College of Medicine and Center for Liquid Biopsy and Cohort Research, Kaohsiung Medical University, Kaohsiung, Taiwan
- Ph.D. Program in Translational Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung, and Academia Sinica, Taipei, Taiwan
- Division of Hepatogastroenterology, Department of Internal Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, E-Da Hospital, I-Shou University, Kaohsiung, Taiwan
- School of Medicine for International Students, College of Medicine, I-Shou University, Kaohsiung, Taiwan
- Division of Gastroenterology, Tainan Municipal Hospital (Managed By Show Chwan Medical Care Corporation), Tainan, Taiwan
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Chi Mei Medical Center, Yongkang District, Tainan, Taiwan
- Department of Internal Medicine, Dalin Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation, Chiayi, Taiwan
- School of Medicine, Tzuchi University, Hualien, Taiwan
- Division of Gastroenterology, Department of Internal Medicine, St. Martin De Porres Hospital, Chiayi, Taiwan
- Division of Gastroenterology, Department of Internal Medicine, Taitung Mackay Memorial Hospital, Taitung, Taiwan
- Mackay Medical College, New Taipei City, Taiwan
- Division of Gastroenterology, Department of Internal Medicine, Yuan’s General Hospital, Kaohsiung, Taiwan
- Division of Gastroenterology, Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
- Division of Gastroenterology, Department of Internal Medicine, Tri-Service General Hospital Penghu Branch, National Defense Medical Center, Taipei, Taiwan
- School of Medicine, Chung Shan Medical University, Department of Internal Medicine, Chung Shan Medical University Hospital, Taichung, Taiwan
- Division of Gastroenterology, Kaohsiung Armed Forces General Hospital, Kaohsiung, Taiwan
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Chi Mei Medical Center, Liouying, Tainan, Taiwan
- Lotung Poh-Ai Hospital, Yilan, Taiwan
- Division of Gastroenterology and Hepatology, Department of Medicine, Taipei Veterans General Hospital, Taipei, Taiwan
- Institute of Clinical Medicine, School of Medicine, National Yang-Ming Chiao Tung University, Taipei, Taiwan
- Division of Hepatology and Gastroenterology, Department of Internal Medicine, Shin Kong Wu Ho-Su Memorial Hospital, Taipei, Taiwan
- School of Medicine, Fu-Jen Catholic University, New Taipei City, Taiwan
- Department of Medicine, Hualien Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation and Tzu Chi University, Hualien, Taiwan
- Department of Gastroenterology, Division of Internal Medicine, Show Chwan Memorial Hospital, Changhua, Taiwan
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, Taichung Veterans General Hospital, Taichung, Taiwan
- Division of Gastroenterology and Hepatology, Department of Internal Medicine, National Cheng Kung University Hospital, College of Medicine, National Cheng Kung University, Tainan, Taiwan
- Liver Center, Cathay General Hospital, Taipei, Taiwan
- Wen-Chih Wu Clinic, Fengshan, Kaohsiung, Taiwan
- Division of Infectious Diseases, Department of Internal Medicine, Taoyuan General Hospital, Ministry of Health and Welfare, Taoyuan, Taiwan
- Penghu Hospital, Ministry of Health and Welfare, Penghu, Taiwan
- Zhou Guoxiong Clinic, Penghu, Taiwan
- Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan
- National Taiwan University Hospital Hsin-Chu Branch, Hsinchu, Taiwan
- Liver Research Unit, Department of Hepato-Gastroenterology and Community Medicine Research Center, Chang Gung Memorial Hospital at Keelung, College of Medicine, Chang Gung University, Keelung, Taiwan
- Taipei Tzu Chi Hospital, Buddhist Tzu Chi Medical Foundation and School of Medicine, Tzu Chi University, Taipei, Taiwan
- Cishan Hospital, Ministry of Health and Welfare, Kaohsiung, Taiwan
- Department of Gastroenterology, Renai Branch, Taipei City Hospital, Taipei, Taiwan
- Department of Gastroenterology and Hepatology, Changhua Christian Hospital, Changhua, Taiwan
- Division of Gastroenterology and Hepatology, Far Eastern Memorial Hospital, New Taipei City, Taiwan
- Division of Hepatogastroenterology, Department of Internal Medicine, Chang Gung Memorial Hospital, Chiayi, Taiwan and College of Medicine, Chang Gung University, Taoyuan, Taiwan
- Hepatitis Research Center and Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
- Center for Digestive Medicine, Department of Internal Medicine, China Medical University Hospital, Taichung, Taiwan
- School of Medicine, China Medical University, Taichung, Taiwan
- Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, Taiwan
- Division of Gastroenterology and Hepatology, Department of Medicine, Ditmanson Medical Foundation Chiayi Christian Hospital, Chiayi, Taiwan
| |
Collapse
|
13
|
Yolchuyeva S, Giacomazzi E, Tonneau M, Lamaze F, Orain M, Coulombe F, Malo J, Belkaid W, Routy B, Joubert P, Manem VS. Imaging-Based Biomarkers Predict Programmed Death-Ligand 1 and Survival Outcomes in Advanced NSCLC Treated With Nivolumab and Pembrolizumab: A Multi-Institutional Study. JTO Clin Res Rep 2023; 4:100602. [PMID: 38124790 PMCID: PMC10730368 DOI: 10.1016/j.jtocrr.2023.100602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 10/18/2023] [Accepted: 11/08/2023] [Indexed: 12/23/2023] Open
Abstract
Background Although the immune checkpoint inhibitors, nivolumab and pembrolizumab, were found to be promising in patients with advanced NSCLC, some of them either do not respond or have recurrence after an initial response. It is still unclear who will benefit from these therapies, and, hence, there is an unmet clinical need to build robust biomarkers. Methods Patients with advanced NSCLC (N = 323) who were treated with pembrolizumab or nivolumab were retrospectively identified from two institutions. Radiomics features extracted from baseline pretreatment computed tomography scans along with the clinical variables were used to build the predictive models for overall survival (OS), progression-free survival (PFS), and programmed death-ligand 1 (PD-L1). To develop the imaging and integrative clinical-imaging predictive models, we used the XGBoost learning algorithm with ReliefF feature selection method and validated them in an independent cohort. The concordance index for OS, PFS, and area under the curve for PD-L1 was used to evaluate model performance. Results We developed radiomics and the ensemble radiomics-clinical predictive models for OS, PFS, and PD-L1 expression. The concordance indices of the radiomics model were 0.60 and 0.61 for predicting OS and PFS and area under the curve was 0.61 for predicting PD-L1 in the validation cohort, respectively. The combined radiomics-clinical model resulted in higher performance with 0.65, 0.63, and 0.68 to predict OS, PFS, and PD-L1 in the validation cohort, respectively. Conclusions We found that pretreatment computed tomography imaging along with clinical data can aid as predictive biomarkers for PD-L1 and survival end points. These imaging-driven approaches may prove useful to expand the therapeutic options for nonresponders and improve the selection of patients who would benefit from immune checkpoint inhibitors.
Collapse
Affiliation(s)
- Sevinj Yolchuyeva
- Department of Mathematics and Computer Science, Université du Québec à Trois Rivières, Quebec, Canada
| | - Elena Giacomazzi
- Department of Mathematics and Computer Science, Université du Québec à Trois Rivières, Quebec, Canada
| | - Marion Tonneau
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Quebec, Canada
- Université de Médecine de Lille, Lille, France
| | - Fabien Lamaze
- Quebec Heart & Lung Institute Research Center, Quebec, Canada
| | - Michele Orain
- Quebec Heart & Lung Institute Research Center, Quebec, Canada
| | | | - Julie Malo
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Quebec, Canada
| | - Wiam Belkaid
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Quebec, Canada
| | - Bertrand Routy
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Quebec, Canada
- Centre Hospitalier Universitaire de Montréal, Hemato-Oncology Service, Quebec, Canada
| | - Philippe Joubert
- Quebec Heart & Lung Institute Research Center, Quebec, Canada
- Department of Molecular Biology, Medical Biochemistry and Pathology, Laval University, Quebec, Canada
| | - Venkata S.K. Manem
- Department of Mathematics and Computer Science, Université du Québec à Trois Rivières, Quebec, Canada
- Quebec Heart & Lung Institute Research Center, Quebec, Canada
- Centre de Recherche du CHU de Québec – Université Laval, Quebec, Canada
| |
Collapse
|
14
|
Lee G, Yoon Y, Lee K. Anomaly Detection Using an Ensemble of Multi-Point LSTMs. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1480. [PMID: 37998172 PMCID: PMC10670439 DOI: 10.3390/e25111480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 10/13/2023] [Accepted: 10/16/2023] [Indexed: 11/25/2023]
Abstract
As technologies for storing time-series data such as smartwatches and smart factories become common, we are collectively accumulating a great deal of time-series data. With the accumulation of time-series data, the importance of time-series abnormality detection technology that detects abnormal patterns such as Cyber-Intrusion Detection, Fraud Detection, Social Networks Anomaly Detection, and Industrial Anomaly Detection is emerging. In the past, time-series anomaly detection algorithms have mainly focused on processing univariate data. However, with the development of technology, time-series data has become complicated, and corresponding deep learning-based time-series anomaly detection technology has been actively developed. Currently, most industries rely on deep learning algorithms to detect time-series anomalies. In this paper, we propose an anomaly detection algorithm with an ensemble of multi-point LSTMs that can be used in three cases of time-series domains. We propose our anomaly detection model that uses three steps. The first step is a model selection step, in which a model is learned within a user-specified range, and among them, models that are most suitable are automatically selected. In the next step, a collected output vector from M LSTMs is completed by stacking ensemble techniques of the previously selected models. In the final step, anomalies are finally detected using the output vector of the second step. We conducted experiments comparing the performance of the proposed model with other state-of-the-art time-series detection deep learning models using three real-world datasets. Our method shows excellent accuracy, efficient execution time, and a good F1 score for the three datasets, though training the LSTM ensemble naturally requires more time.
Collapse
Affiliation(s)
| | | | - Kichun Lee
- Department of Industrial Engineering, College of Engineering, Hanyang University, 222 Wangsimni-ro, Seongdong-gu, Seoul 133-791, Republic of Korea; (G.L.); (Y.Y.)
| |
Collapse
|
15
|
Ramírez-Sanz JM, Maestro-Prieto JA, Arnaiz-González Á, Bustillo A. Semi-supervised learning for industrial fault detection and diagnosis: A systemic review. ISA TRANSACTIONS 2023:S0019-0578(23)00434-2. [PMID: 37778919 DOI: 10.1016/j.isatra.2023.09.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 08/03/2023] [Accepted: 09/22/2023] [Indexed: 10/03/2023]
Abstract
The automation of Fault Detection and Diagnosis (FDD) is a central task for many industries today. A myriad of methods are in use, although the most recent leading contenders are data-driven approaches and especially Machine Learning (ML) methods. ML algorithms fall into two main categories: supervised and unsupervised methods, depending on whether or not the instances are labeled with the expected outputs. However, a new approach called Semi-Supervised Learning (SSL) has recently emerged that uses a few labeled instances together with other unlabeled instances for the training process. This new approach can significantly improve the accuracy of conventional ML models for industrial environments where labeled data are scarce. SSL has been tested as a promising solution over the past few years for several FDD problems, although there have been no systemic reviews of this sort of approach up until the present review. In this study, an attempt to organize the existing literature on SSL for FDD using the taxonomy of van Engelen & Hoos is reported. The most and the least frequently used SSL algorithms are identified and considered in terms of different fault detection tasks and their most common dataset structure. Moreover, a set of best practices are proposed in the conclusions of this work for implementation under real industrial conditions, so as to avoid some of the most common faults.
Collapse
Affiliation(s)
| | | | | | - Andrés Bustillo
- Universidad de Burgos, Avda. Cantabria s/n, Burgos, 09006, Burgos, Spain
| |
Collapse
|
16
|
Astle DE, Johnson MH, Akarca D. Toward computational neuroconstructivism: a framework for developmental systems neuroscience. Trends Cogn Sci 2023; 27:726-744. [PMID: 37263856 DOI: 10.1016/j.tics.2023.04.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 01/05/2023] [Accepted: 04/19/2023] [Indexed: 06/03/2023]
Abstract
Brain development is underpinned by complex interactions between neural assemblies, driving structural and functional change. This neuroconstructivism (the notion that neural functions are shaped by these interactions) is core to some developmental theories. However, due to their complexity, understanding underlying developmental mechanisms is challenging. Elsewhere in neurobiology, a computational revolution has shown that mathematical models of hidden biological mechanisms can bridge observations with theory building. Can we build a similar computational framework yielding mechanistic insights for brain development? Here, we outline the conceptual and technical challenges of addressing this theory gap, and demonstrate that there is great potential in specifying brain development as mathematically defined processes operating within physical constraints. We provide examples, alongside broader ingredients needed, as the field explores computational explanations of system-wide development.
Collapse
Affiliation(s)
- Duncan E Astle
- Department of Psychiatry, University of Cambridge, Cambridge, CB2 2QQ, UK; MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK.
| | - Mark H Johnson
- Department of Psychology, University of Cambridge, Cambridge, CB2 3EB, UK; Centre for Brain and Cognitive Development, Birkbeck, University of London, London, WC1E 7JL, UK
| | - Danyal Akarca
- MRC Cognition and Brain Sciences Unit, University of Cambridge, Cambridge, CB2 7EF, UK
| |
Collapse
|
17
|
Yolchuyeva S, Giacomazzi E, Tonneau M, Ebrahimpour L, Lamaze FC, Orain M, Coulombe F, Malo J, Belkaid W, Routy B, Joubert P, Manem VSK. A Radiomics-Clinical Model Predicts Overall Survival of Non-Small Cell Lung Cancer Patients Treated with Immunotherapy: A Multicenter Study. Cancers (Basel) 2023; 15:3829. [PMID: 37568646 PMCID: PMC10417039 DOI: 10.3390/cancers15153829] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/14/2023] [Accepted: 07/24/2023] [Indexed: 08/13/2023] Open
Abstract
BACKGROUND Immune checkpoint inhibitors (ICIs) are a great breakthrough in cancer treatments and provide improved long-term survival in a subset of non-small cell lung cancer (NSCLC) patients. However, prognostic and predictive biomarkers of immunotherapy still remain an unmet clinical need. In this work, we aim to leverage imaging data and clinical variables to develop survival risk models among advanced NSCLC patients treated with immunotherapy. METHODS This retrospective study includes a total of 385 patients from two institutions who were treated with ICIs. Radiomics features extracted from pretreatment CT scans were used to build predictive models. The objectives were to predict overall survival (OS) along with building a classifier for short- and long-term survival groups. We employed the XGBoost learning method to build radiomics and integrated clinical-radiomics predictive models. Feature selection and model building were developed and validated on a multicenter cohort. RESULTS We developed parsimonious models that were associated with OS and a classifier for short- and long-term survivor groups. The concordance indices (C-index) of the radiomics model were 0.61 and 0.57 to predict OS in the discovery and validation cohorts, respectively. While the area under the curve (AUC) values of the radiomic models for short- and long-term groups were found to be 0.65 and 0.58 in the discovery and validation cohorts. The accuracy of the combined radiomics-clinical model resulted in 0.63 and 0.62 to predict OS and in 0.77 and 0.62 to classify the survival groups in the discovery and validation cohorts, respectively. CONCLUSIONS We developed and validated novel radiomics and integrated radiomics-clinical survival models among NSCLC patients treated with ICIs. This model has important translational implications, which can be used to identify a subset of patients who are not likely to benefit from immunotherapy. The developed imaging biomarkers may allow early prediction of low-group survivors, though additional validation of these radiomics models is warranted.
Collapse
Affiliation(s)
- Sevinj Yolchuyeva
- Department of Mathematics and Computer Science, Université du Québec à Trois Rivières, Trois-Rivières, QC G8Z 4M3, Canada
- Quebec Heart & Lung Institute Research Center, Québec City, QC G1V 4G5, Canada (M.O.)
| | - Elena Giacomazzi
- Department of Mathematics and Computer Science, Université du Québec à Trois Rivières, Trois-Rivières, QC G8Z 4M3, Canada
- Quebec Heart & Lung Institute Research Center, Québec City, QC G1V 4G5, Canada (M.O.)
| | - Marion Tonneau
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Montréal, QC H2X 0A9, Canada
- Université de Médecine de Lille—Université Henri Warembourg, 59020 Lille, France
| | - Leyla Ebrahimpour
- Quebec Heart & Lung Institute Research Center, Québec City, QC G1V 4G5, Canada (M.O.)
- Department of Physics, Engineering Physics and Optics, Laval University, Quebec City, QC G1V 4G5, Canada
| | - Fabien C. Lamaze
- Quebec Heart & Lung Institute Research Center, Québec City, QC G1V 4G5, Canada (M.O.)
| | - Michele Orain
- Quebec Heart & Lung Institute Research Center, Québec City, QC G1V 4G5, Canada (M.O.)
| | - François Coulombe
- Quebec Heart & Lung Institute Research Center, Québec City, QC G1V 4G5, Canada (M.O.)
| | - Julie Malo
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Montréal, QC H2X 0A9, Canada
| | - Wiam Belkaid
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Montréal, QC H2X 0A9, Canada
| | - Bertrand Routy
- Centre de Recherche du Centre Hospitalier Universitaire de Montréal, Montréal, QC H2X 0A9, Canada
| | - Philippe Joubert
- Quebec Heart & Lung Institute Research Center, Québec City, QC G1V 4G5, Canada (M.O.)
- Department of Molecular Biology, Medical Biochemistry and Pathology, Laval University, Québec City, QC G1V 0A6, Canada
| | - Venkata S. K. Manem
- Department of Mathematics and Computer Science, Université du Québec à Trois Rivières, Trois-Rivières, QC G8Z 4M3, Canada
- Quebec Heart & Lung Institute Research Center, Québec City, QC G1V 4G5, Canada (M.O.)
| |
Collapse
|
18
|
Jang MG, Cha S, Kim S, Lee S, Lee KE, Shin KH. Application of tree-based machine learning classification methods to detect signals of fluoroquinolones using the Korea Adverse Event Reporting System (KAERS) database. Expert Opin Drug Saf 2023; 22:629-636. [PMID: 36794497 DOI: 10.1080/14740338.2023.2181341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 01/23/2023] [Indexed: 02/17/2023]
Abstract
BACKGROUND Safety issues for fluoroquinolones have been provided by regulatory agencies. This study was conducted to identify signals of fluoroquinolones reported in the Korea Adverse Event Reporting System (KAERS) using tree-based machine learning (ML) methods. RESEARCH DESIGN AND METHODS All adverse events (AEs) associated with the target drugs reported in the KAERS from 2013 to 2017 were matched with drug label information. A dataset containing label-positive and -negative AEs was arbitrarily divided into training and test sets. Decision tree, random forest (RF), bagging, and gradient boosting machine (GBM) were fitted on the training set with hyperparameters tuned using five-fold cross-validation and applied to the test set. The ML method with the highest area under the curve (AUC) scores was selected as the final ML model. RESULTS Bagging was selected as the final ML model for gemifloxacin (AUC score: 1) and levofloxacin (AUC: 0.9987). RF was selected in ciprofloxacin, moxifloxacin, and ofloxacin (AUC scores: 0.9859, 0.9974, and 0.9999 respectively). We found that the final ML methods detected additional signals that were not detected using the disproportionality analysis (DPA) methods. CONCLUSIONS The bagging-or-RF-based ML methods performed better than DPA and detected novel AE signals previously unidentified using the DPA methods.
Collapse
Affiliation(s)
- Min-Gyo Jang
- College of Pharmacy, Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu, Republic of Korea
| | - SangHun Cha
- Department of Statistics, College of Natural Sciences, Kyungpook National University, Daegu, Republic of Korea
| | - Seunghwak Kim
- Department of Statistics, College of Natural Sciences, Kyungpook National University, Daegu, Republic of Korea
| | - Sojung Lee
- Department of Statistics, College of Natural Sciences, Kyungpook National University, Daegu, Republic of Korea
| | - Kyeong Eun Lee
- Department of Statistics, College of Natural Sciences, Kyungpook National University, Daegu, Republic of Korea
| | - Kwang-Hee Shin
- College of Pharmacy, Research Institute of Pharmaceutical Sciences, Kyungpook National University, Daegu, Republic of Korea
| |
Collapse
|
19
|
De Bin R, Stikbakke VG. A boosting first-hitting-time model for survival analysis in high-dimensional settings. LIFETIME DATA ANALYSIS 2023; 29:420-440. [PMID: 35476164 PMCID: PMC10006065 DOI: 10.1007/s10985-022-09553-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 03/25/2022] [Indexed: 06/13/2023]
Abstract
In this paper we propose a boosting algorithm to extend the applicability of a first hitting time model to high-dimensional frameworks. Based on an underlying stochastic process, first hitting time models do not require the proportional hazards assumption, hardly verifiable in the high-dimensional context, and represent a valid parametric alternative to the Cox model for modelling time-to-event responses. First hitting time models also offer a natural way to integrate low-dimensional clinical and high-dimensional molecular information in a prediction model, that avoids complicated weighting schemes typical of current methods. The performance of our novel boosting algorithm is illustrated in three real data examples.
Collapse
Affiliation(s)
- Riccardo De Bin
- Department of Mathematics, University of Oslo, Moltke Moes vei 35, 0851 Oslo, Norway
| | | |
Collapse
|
20
|
Stöcker A, Steyer L, Greven S. Functional additive models on manifolds of planar shapes and forms. J Comput Graph Stat 2023. [DOI: 10.1080/10618600.2023.2175687] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Almond Stöcker
- School of Business and Economics, Humboldt-Universität zu Berlin
| | - Lisa Steyer
- School of Business and Economics, Humboldt-Universität zu Berlin
| | - Sonja Greven
- School of Business and Economics, Humboldt-Universität zu Berlin
| |
Collapse
|
21
|
Friedrich S, Groll A, Ickstadt K, Kneib T, Pauly M, Rahnenführer J, Friede T. Regularization approaches in clinical biostatistics: A review of methods and their applications. Stat Methods Med Res 2023; 32:425-440. [PMID: 36384320 PMCID: PMC9896544 DOI: 10.1177/09622802221133557] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A range of regularization approaches have been proposed in the data sciences to overcome overfitting, to exploit sparsity or to improve prediction. Using a broad definition of regularization, namely controlling model complexity by adding information in order to solve ill-posed problems or to prevent overfitting, we review a range of approaches within this framework including penalization, early stopping, ensembling and model averaging. Aspects of their practical implementation are discussed including available R-packages and examples are provided. To assess the extent to which these approaches are used in medicine, we conducted a review of three general medical journals. It revealed that regularization approaches are rarely applied in practical clinical applications, with the exception of random effects models. Hence, we suggest a more frequent use of regularization approaches in medical research. In situations where also other approaches work well, the only downside of the regularization approaches is increased complexity in the conduct of the analyses which can pose challenges in terms of computational resources and expertise on the side of the data analyst. In our view, both can and should be overcome by investments in appropriate computing facilities and educational resources.
Collapse
Affiliation(s)
- Sarah Friedrich
- Institute of Mathematics, University of
Augsburg, Augsburg, Germany
- Centre for Advanced Analytics and Predictive Sciences, University of
Augsburg, Augsburg, Germany
| | - Andreas Groll
- Department of Statistics, TU Dortmund
University, Dortmund, Germany
| | - Katja Ickstadt
- Department of Statistics, TU Dortmund
University, Dortmund, Germany
| | - Thomas Kneib
- Chair of Statistics and Campus Institute Data Science,
Georg-August-University Göttingen,
Göttingen, Germany
| | - Markus Pauly
- Department of Statistics, TU Dortmund
University, Dortmund, Germany
| | | | - Tim Friede
- Department of Medical Statistics, University Medical Center
Göttingen, Göttingen, Germany
- DZHK (German Center for Cardiovascular Research), partner site
Göttingen, Göttingen, Germany
| |
Collapse
|
22
|
Özbek M, Toy HI, Takan I, Asfa S, Arshinchi Bonab R, Karakülah G, Kontou PI, Geronikolou SA, Pavlopoulou A. A Counterintuitive Neutrophil-Mediated Pattern in COVID-19 Patients Revealed through Transcriptomics Analysis. Viruses 2022; 15:104. [PMID: 36680144 PMCID: PMC9866184 DOI: 10.3390/v15010104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 12/25/2022] [Accepted: 12/27/2022] [Indexed: 01/01/2023] Open
Abstract
The COVID-19 pandemic has persisted for almost three years. However, the mechanisms linked to the SARS-CoV-2 effect on tissues and disease severity have not been fully elucidated. Since the onset of the pandemic, a plethora of high-throughput data related to the host transcriptional response to SARS-CoV-2 infections has been generated. To this end, the aim of this study was to assess the effect of SARS-CoV-2 infections on circulating and organ tissue immune responses. We profited from the publicly accessible gene expression data of the blood and soft tissues by employing an integrated computational methodology, including bioinformatics, machine learning, and natural language processing in the relevant transcriptomics data. COVID-19 pathophysiology and severity have mainly been associated with macrophage-elicited responses and a characteristic "cytokine storm". Our counterintuitive findings suggested that the COVID-19 pathogenesis could also be mediated through neutrophil abundance and an exacerbated suppression of the immune system, leading eventually to uncontrolled viral dissemination and host cytotoxicity. The findings of this study elucidated new physiological functions of neutrophils, as well as tentative pathways to be explored in asymptomatic-, ethnicity- and locality-, or staging-associated studies.
Collapse
Affiliation(s)
- Melih Özbek
- Izmir Biomedicine and Genome Center, Balcova, Izmir 35340, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, Izmir 35220, Turkey
| | - Halil Ibrahim Toy
- Izmir Biomedicine and Genome Center, Balcova, Izmir 35340, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, Izmir 35220, Turkey
| | - Işil Takan
- Izmir Biomedicine and Genome Center, Balcova, Izmir 35340, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, Izmir 35220, Turkey
| | - Seyedehsadaf Asfa
- Izmir Biomedicine and Genome Center, Balcova, Izmir 35340, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, Izmir 35220, Turkey
| | - Reza Arshinchi Bonab
- Izmir Biomedicine and Genome Center, Balcova, Izmir 35340, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, Izmir 35220, Turkey
| | - Gökhan Karakülah
- Izmir Biomedicine and Genome Center, Balcova, Izmir 35340, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, Izmir 35220, Turkey
| | | | - Styliani A. Geronikolou
- Clinical, Translational and Experimental Surgery Research Centre, Biomedical Research Foundation Academy of Athens, 11527 Athens, Greece
- University Research Institute of Maternal and Child Health and Precision Medicine, UNESCO on Adolescent Health Care, National and Kapodistrian University of Athens, Aghia Sophia Children’s Hospital, 11527 Athens, Greece
| | - Athanasia Pavlopoulou
- Izmir Biomedicine and Genome Center, Balcova, Izmir 35340, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylül University, Balcova, Izmir 35220, Turkey
| |
Collapse
|
23
|
Maj C, Staerk C, Borisov O, Klinkhammer H, Wai Yeung M, Krawitz P, Mayr A. Statistical learning for sparser fine-mapped polygenic models: The prediction of LDL-cholesterol. Genet Epidemiol 2022; 46:589-603. [PMID: 35938382 DOI: 10.1002/gepi.22495] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 06/03/2022] [Accepted: 06/07/2022] [Indexed: 11/10/2022]
Abstract
Polygenic risk scores quantify the individual genetic predisposition regarding a particular trait. We propose and illustrate the application of existing statistical learning methods to derive sparser models for genome-wide data with a polygenic signal. Our approach is based on three consecutive steps. First, potentially informative loci are identified by a marginal screening approach. Then, fine-mapping is independently applied for blocks of variants in linkage disequilibrium, where informative variants are retrieved by using variable selection methods including boosting with probing and stochastic searches with the Adaptive Subspace method. Finally, joint prediction models with the selected variants are derived using statistical boosting. In contrast to alternative approaches relying on univariate summary statistics from genome-wide association studies, our three-step approach enables to select and fit multivariable regression models on large-scale genotype data. Based on UK Biobank data, we develop prediction models for LDL-cholesterol as a continuous trait. Additionally, we consider a recent scalable algorithm for the Lasso. Results show that statistical learning approaches based on fine-mapping of genetic signals result in a competitive prediction performance compared to classical polygenic risk approaches, while yielding sparser risk models.
Collapse
Affiliation(s)
- Carlo Maj
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
- Centre for Human Genetics, University of Marburg, Marburg, Germany
| | - Christian Staerk
- Institute for Medical Biometry, Informatics and Epidemiology, Medical Faculty, University Bonn, Bonn, Germany
| | - Oleg Borisov
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
| | - Hannah Klinkhammer
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
- Institute for Medical Biometry, Informatics and Epidemiology, Medical Faculty, University Bonn, Bonn, Germany
| | - Ming Wai Yeung
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
- Department of Cardiology, University of Groningen, Groningen, The Netherlands
| | - Peter Krawitz
- Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University Bonn, Bonn, Germany
| | - Andreas Mayr
- Institute for Medical Biometry, Informatics and Epidemiology, Medical Faculty, University Bonn, Bonn, Germany
| |
Collapse
|
24
|
Nakach FZ, Zerouaoui H, Idri A. Hybrid deep boosting ensembles for histopathological breast cancer classification. HEALTH AND TECHNOLOGY 2022. [DOI: 10.1007/s12553-022-00709-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
|
25
|
Iosifidis V, Papadopoulos S, Rosenhahn B, Ntoutsi E. AdaCC: cumulative cost-sensitive boosting for imbalanced classification. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-022-01780-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
AbstractClass imbalance poses a major challenge for machine learning as most supervised learning models might exhibit bias towards the majority class and under-perform in the minority class. Cost-sensitive learning tackles this problem by treating the classes differently, formulated typically via a user-defined fixed misclassification cost matrix provided as input to the learner. Such parameter tuning is a challenging task that requires domain knowledge and moreover, wrong adjustments might lead to overall predictive performance deterioration. In this work, we propose a novel cost-sensitive boosting approach for imbalanced data that dynamically adjusts the misclassification costs over the boosting rounds in response to model’s performance instead of using a fixed misclassification cost matrix. Our method, called AdaCC, is parameter-free as it relies on the cumulative behavior of the boosting model in order to adjust the misclassification costs for the next boosting round and comes with theoretical guarantees regarding the training error. Experiments on 27 real-world datasets from different domains with high class imbalance demonstrate the superiority of our method over 12 state-of-the-art cost-sensitive boosting approaches exhibiting consistent improvements in different measures, for instance, in the range of [0.3–28.56%] for AUC, [3.4–21.4%] for balanced accuracy, [4.8–45%] for gmean and [7.4–85.5%] for recall.
Collapse
|
26
|
Machetanz L, Huber D, Lau S, Kirchebner J. Model Building in Forensic Psychiatry: A Machine Learning Approach to Screening Offender Patients with SSD. Diagnostics (Basel) 2022; 12:diagnostics12102509. [PMID: 36292198 PMCID: PMC9600890 DOI: 10.3390/diagnostics12102509] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 09/28/2022] [Accepted: 10/13/2022] [Indexed: 11/16/2022] Open
Abstract
Today’s extensive availability of medical data enables the development of predictive models, but this requires suitable statistical methods, such as machine learning (ML). Especially in forensic psychiatry, a complex and cost-intensive field with risk assessments and predictions of treatment outcomes as central tasks, there is a need for such predictive tools, for example, to anticipate complex treatment courses and to be able to offer appropriate therapy on an individualized basis. This study aimed to develop a first basic model for the anticipation of adverse treatment courses based on prior compulsory admission and/or conviction as simple and easily objectifiable parameters in offender patients with a schizophrenia spectrum disorder (SSD). With a balanced accuracy of 67% and an AUC of 0.72, gradient boosting proved to be the optimal ML algorithm. Antisocial behavior, physical violence against staff, rule breaking, hyperactivity, delusions of grandeur, fewer feelings of guilt, the need for compulsory isolation, cannabis abuse/dependence, a higher dose of antipsychotics (measured by the olanzapine half-life) and an unfavorable legal prognosis emerged as the ten most influential variables out of a dataset with 209 parameters. Our findings could demonstrate an example of the use of ML in the development of an easy-to-use predictive model based on few objectifiable factors.
Collapse
|
27
|
Ouhourane M, Yang Y, Benedet AL, Oualkacha K. Group penalized quantile regression. STAT METHOD APPL-GER 2022. [DOI: 10.1007/s10260-021-00580-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
28
|
Analyzing Milk Foam Using Machine Learning for Diverse Applications. FOOD ANAL METHOD 2022. [DOI: 10.1007/s12161-022-02379-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
29
|
Menzenbach J, Kirfel A, Guttenthaler V, Feggeler J, Hilbert T, Ricchiuto A, Staerk C, Mayr A, Coburn M, Wittmann M. PRe-Operative Prediction of postoperative DElirium by appropriate SCreening (PROPDESC) development and validation of a pragmatic POD risk screening score based on routine preoperative data. J Clin Anesth 2022; 78:110684. [DOI: 10.1016/j.jclinane.2022.110684] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Revised: 02/08/2022] [Accepted: 02/09/2022] [Indexed: 12/19/2022]
|
30
|
Wang X, Wang H, Ramazi P, Nah K, Lewis M. A Hypothesis-Free Bridging of Disease Dynamics and Non-pharmaceutical Policies. Bull Math Biol 2022; 84:57. [PMID: 35394257 PMCID: PMC8991680 DOI: 10.1007/s11538-022-01012-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2021] [Accepted: 03/08/2022] [Indexed: 11/22/2022]
Abstract
Accurate prediction of the number of daily or weekly confirmed cases of COVID-19 is critical to the control of the pandemic. Existing mechanistic models nicely capture the disease dynamics. However, to forecast the future, they require the transmission rate to be known, limiting their prediction power. Typically, a hypothesis is made on the form of the transmission rate with respect to time. Yet the real form is too complex to be mechanistically modeled due to the unknown dynamics of many influential factors. We tackle this problem by using a hypothesis-free machine-learning algorithm to estimate the transmission rate from data on non-pharmaceutical policies, and in turn forecast the confirmed cases using a mechanistic disease model. More specifically, we build a hybrid model consisting of a mechanistic ordinary differential equation (ODE) model and a gradient boosting model (GBM). To calibrate the parameters, we develop an "inverse method" that obtains the transmission rate inversely from the other variables in the ODE model and then feed it into the GBM to connect with the policy data. The resulting model forecasted the number of daily confirmed cases up to 35 days in the future in the USA with an averaged mean absolute percentage error of 27%. It can identify the most informative predictive variables, which can be helpful in designing improved forecasters as well as informing policymakers.
Collapse
Affiliation(s)
- Xiunan Wang
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB, T6G 2G1, Canada
- Department of Mathematics, University of Tennessee at Chattanooga, Chattanooga, TN, 37403, USA
| | - Hao Wang
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB, T6G 2G1, Canada.
| | - Pouria Ramazi
- Department of Mathematics and Statistics, Brock University, St. Catharines, ON, L2S 3A1, Canada
| | - Kyeongah Nah
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB, T6G 2G1, Canada
- National Institute for Mathematical Sciences, Daejeon, 34047, Korea
| | - Mark Lewis
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB, T6G 2G1, Canada
- Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2G1, Canada
| |
Collapse
|
31
|
A likelihood-based boosting algorithm for factor analysis models with binary data. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2021.107412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
32
|
Mapping Land Cover Types for Highland Andean Ecosystems in Peru Using Google Earth Engine. REMOTE SENSING 2022. [DOI: 10.3390/rs14071562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Highland Andean ecosystems sustain high levels of floral and faunal biodiversity in areas with diverse topography and provide varied ecosystem services, including the supply of water to cities and downstream agricultural valleys. Google (™) has developed a product specifically designed for mapping purposes (Earth Engine), which enables users to harness the computing power of a cloud-based solution in near-real time for land cover change mapping and monitoring. We explore the feasibility of using this platform for mapping land cover types in topographically complex terrain with highly mixed vegetation types (Nor Yauyos Cochas Landscape Reserve located in the central Andes of Peru) using classification machine learning (ML) algorithms in combination with different sets of remote sensing data. The algorithms were trained using 3601 sampling pixels of (a) normalized spectral bands between the visible and near infrared spectrum of the Landsat 8 OLI sensor for the 2018 period, (b) spectral indices of vegetation, soil, water, snow, burned areas and bare ground and (c) topographic-derived indices (elevation, slope and aspect). Six ML algorithms were tested, including CART, random forest, gradient tree boosting, minimum distance, naïve Bayes and support vector machine. The results reveal that ML algorithms produce accurate classifications when spectral bands are used in conjunction with topographic indices, resulting in better discrimination among classes with similar spectral signatures such as pajonal (tussock grass-dominated cover) and short grasses or rocky groups, and moraines, agricultural and forested areas. The model with the highest explanatory power was obtained from the combination of spectral bands and topographic indices using the random forest algorithm (Kappa = 0.81). Our study presents a first approach of its kind in topographically complex Cordilleran terrain and we show that GEE is particularly useful in large-scale land cover mapping and monitoring in mountainous ecosystems subject to rapid changes and conversions, with replicability and scalability to other areas with similar characteristics.
Collapse
|
33
|
Effectiveness of Artificial Intelligence Models for Cardiovascular Disease Prediction: Network Meta-Analysis. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:5849995. [PMID: 35251153 PMCID: PMC8894073 DOI: 10.1155/2022/5849995] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2021] [Accepted: 01/18/2022] [Indexed: 11/23/2022]
Abstract
Heart failure is the most common cause of death in both males and females around the world. Cardiovascular diseases (CVDs), in particular, are the main cause of death worldwide, accounting for 30% of all fatalities in the United States and 45% in Europe. Artificial intelligence (AI) approaches such as machine learning (ML) and deep learning (DL) models are playing an important role in the advancement of heart failure therapy. The main objective of this study was to perform a network meta-analysis of patients with heart failure, stroke, hypertension, and diabetes by comparing the ML and DL models. A comprehensive search of five electronic databases was performed using ScienceDirect, EMBASE, PubMed, Web of Science, and IEEE Xplore. The search strategy was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) statement. The methodological quality of studies was assessed by following the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) guidelines. The random-effects network meta-analysis forest plot with categorical data was used, as were subgroups testing for all four types of treatments and calculating odds ratio (OR) with a 95% confidence interval (CI). Pooled network forest, funnel plots, and the league table, which show the best algorithms for each outcome, were analyzed. Seventeen studies, with a total of 285,213 patients with CVDs, were included in the network meta-analysis. The statistical evidence indicated that the DL algorithms performed well in the prediction of heart failure with AUC of 0.843 and CI [0.840–0.845], while in the ML algorithm, the gradient boosting machine (GBM) achieved an average accuracy of 91.10% in predicting heart failure. An artificial neural network (ANN) performed well in the prediction of diabetes with an OR and CI of 0.0905 [0.0489; 0.1673]. Support vector machine (SVM) performed better for the prediction of stroke with OR and CI of 25.0801 [11.4824; 54.7803]. Random forest (RF) results performed well in the prediction of hypertension with OR and CI of 10.8527 [4.7434; 24.8305]. The findings of this work suggest that the DL models can effectively advance the prediction of and knowledge about heart failure, but there is a lack of literature regarding DL methods in the field of CVDs. As a result, more DL models should be applied in this field. To confirm our findings, more meta-analysis (e.g., Bayesian network) and thorough research with a larger number of patients are encouraged.
Collapse
|
34
|
Sengupta D, Ali SN, Bhattacharya A, Mustafi J, Mukhopadhyay A, Sengupta K. A deep hybrid learning pipeline for accurate diagnosis of ovarian cancer based on nuclear morphology. PLoS One 2022; 17:e0261181. [PMID: 34995293 PMCID: PMC8741040 DOI: 10.1371/journal.pone.0261181] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Accepted: 11/24/2021] [Indexed: 12/31/2022] Open
Abstract
Nuclear morphological features are potent determining factors for clinical diagnostic approaches adopted by pathologists to analyze the malignant potential of cancer cells. Considering the structural alteration of the nucleus in cancer cells, various groups have developed machine learning techniques based on variation in nuclear morphometric information like nuclear shape, size, nucleus-cytoplasm ratio and various non-parametric methods like deep learning have also been tested for analyzing immunohistochemistry images of tissue samples for diagnosing various cancers. We aim to correlate the morphometric features of the nucleus along with the distribution of nuclear lamin proteins with classical machine learning to differentiate between normal and ovarian cancer tissues. It has already been elucidated that in ovarian cancer, the extent of alteration in nuclear shape and morphology can modulate genetic changes and thus can be utilized to predict the outcome of low to a high form of serous carcinoma. In this work, we have performed exhaustive imaging of ovarian cancer versus normal tissue and developed a dual pipeline architecture that combines the matrices of morphometric parameters with deep learning techniques of auto feature extraction from pre-processed images. This novel Deep Hybrid Learning model, though derived from classical machine learning algorithms and standard CNN, showed a training and validation AUC score of 0.99 whereas the test AUC score turned out to be 1.00. The improved feature engineering enabled us to differentiate between cancerous and non-cancerous samples successfully from this pilot study.
Collapse
Affiliation(s)
- Duhita Sengupta
- Biophysics and Structural Genomics Division, Saha Institute of Nuclear Physics, Kolkata, West Bengal, India
- Homi Bhaba National Institute, Mumbai, India
| | - Sk Nishan Ali
- Artificial Intelligence and Machine Learning Division, MUST Research Trust, Hyderabad, Telangana, India
| | - Aditya Bhattacharya
- Artificial Intelligence and Machine Learning Division, MUST Research Trust, Hyderabad, Telangana, India
| | - Joy Mustafi
- Artificial Intelligence and Machine Learning Division, MUST Research Trust, Hyderabad, Telangana, India
| | - Asima Mukhopadhyay
- Chittaranjan National Cancer Institute, Newtown, Kolkata, West Bengal, India
| | - Kaushik Sengupta
- Biophysics and Structural Genomics Division, Saha Institute of Nuclear Physics, Kolkata, West Bengal, India
- * E-mail:
| |
Collapse
|
35
|
da Silva Neto SR, Tabosa Oliveira T, Teixeira IV, Aguiar de Oliveira SB, Souza Sampaio V, Lynn T, Endo PT. Machine learning and deep learning techniques to support clinical diagnosis of arboviral diseases: A systematic review. PLoS Negl Trop Dis 2022; 16:e0010061. [PMID: 35025860 PMCID: PMC8791518 DOI: 10.1371/journal.pntd.0010061] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 01/26/2022] [Accepted: 12/06/2021] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Neglected tropical diseases (NTDs) primarily affect the poorest populations, often living in remote, rural areas, urban slums or conflict zones. Arboviruses are a significant NTD category spread by mosquitoes. Dengue, Chikungunya, and Zika are three arboviruses that affect a large proportion of the population in Latin and South America. The clinical diagnosis of these arboviral diseases is a difficult task due to the concurrent circulation of several arboviruses which present similar symptoms, inaccurate serologic tests resulting from cross-reaction and co-infection with other arboviruses. OBJECTIVE The goal of this paper is to present evidence on the state of the art of studies investigating the automatic classification of arboviral diseases to support clinical diagnosis based on Machine Learning (ML) and Deep Learning (DL) models. METHOD We carried out a Systematic Literature Review (SLR) in which Google Scholar was searched to identify key papers on the topic. From an initial 963 records (956 from string-based search and seven from a single backward snowballing procedure), only 15 relevant papers were identified. RESULTS Results show that current research is focused on the binary classification of Dengue, primarily using tree-based ML algorithms. Only one paper was identified using DL. Five papers presented solutions for multi-class problems, covering Dengue (and its variants) and Chikungunya. No papers were identified that investigated models to differentiate between Dengue, Chikungunya, and Zika. CONCLUSIONS The use of an efficient clinical decision support system for arboviral diseases can improve the quality of the entire clinical process, thus increasing the accuracy of the diagnosis and the associated treatment. It should help physicians in their decision-making process and, consequently, improve the use of resources and the patient's quality of life.
Collapse
Affiliation(s)
| | | | | | | | - Vanderson Souza Sampaio
- Universidade do Estado do Amazonas, Manaus, Brazil
- Fundação de Medicina Tropical Dr. Heitor Vieira Dourado, Manaus, Brazil
| | - Theo Lynn
- Dublin City University, Dublin, Ireland
| | | |
Collapse
|
36
|
Strömer A, Staerk C, Klein N, Weinhold L, Titze S, Mayr A. Deselection of base-learners for statistical boosting-with an application to distributional regression. Stat Methods Med Res 2021; 31:207-224. [PMID: 34882438 DOI: 10.1177/09622802211051088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
We present a new procedure for enhanced variable selection for component-wise gradient boosting. Statistical boosting is a computational approach that emerged from machine learning, which allows to fit regression models in the presence of high-dimensional data. Furthermore, the algorithm can lead to data-driven variable selection. In practice, however, the final models typically tend to include too many variables in some situations. This occurs particularly for low-dimensional data (p<n), where we observe a slow overfitting behavior of boosting. As a result, more variables get included into the final model without altering the prediction accuracy. Many of these false positives are incorporated with a small coefficient and therefore have a small impact, but lead to a larger model. We try to overcome this issue by giving the algorithm the chance to deselect base-learners with minor importance. We analyze the impact of the new approach on variable selection and prediction performance in comparison to alternative methods including boosting with earlier stopping as well as twin boosting. We illustrate our approach with data of an ongoing cohort study for chronic kidney disease patients, where the most influential predictors for the health-related quality of life measure are selected in a distributional regression approach based on beta regression.
Collapse
Affiliation(s)
- Annika Strömer
- Department of Medical Biometrics, Informatics and Epidemiology, Faculty of Medicine, 9374University of Bonn, Germany
| | - Christian Staerk
- Department of Medical Biometrics, Informatics and Epidemiology, Faculty of Medicine, 9374University of Bonn, Germany
| | - Nadja Klein
- Emmy Noether Research Group in Statistics and Data Science, Humboldt-Universität zu Berlin, Germany
| | - Leonie Weinhold
- Department of Medical Biometrics, Informatics and Epidemiology, Faculty of Medicine, 9374University of Bonn, Germany
| | - Stephanie Titze
- Department of Nephrology and Hypertension, 9171FAU Erlangen-Nuremberg, Germany
| | - Andreas Mayr
- Department of Medical Biometrics, Informatics and Epidemiology, Faculty of Medicine, 9374University of Bonn, Germany
| |
Collapse
|
37
|
Griesbach C, Groll A, Bergherr E. Joint Modelling Approaches to Survival Analysis via Likelihood-Based Boosting Techniques. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:4384035. [PMID: 34819988 PMCID: PMC8608498 DOI: 10.1155/2021/4384035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 10/15/2021] [Indexed: 12/04/2022]
Abstract
Joint models are a powerful class of statistical models which apply to any data where event times are recorded alongside a longitudinal outcome by connecting longitudinal and time-to-event data within a joint likelihood allowing for quantification of the association between the two outcomes without possible bias. In order to make joint models feasible for regularization and variable selection, a statistical boosting algorithm has been proposed, which fits joint models using component-wise gradient boosting techniques. However, these methods have well-known limitations, i.e., they provide no balanced updating procedure for random effects in longitudinal analysis and tend to return biased effect estimation for time-dependent covariates in survival analysis. In this manuscript, we adapt likelihood-based boosting techniques to the framework of joint models and propose a novel algorithm in order to improve inference where gradient boosting has said limitations. The algorithm represents a novel boosting approach allowing for time-dependent covariates in survival analysis and in addition offers variable selection for joint models, which is evaluated via simulations and real world application modelling CD4 cell counts of patients infected with human immunodeficiency virus (HIV). Overall, the method stands out with respect to variable selection properties and represents an accessible way to boosting for time-dependent covariates in survival analysis, which lays a foundation for all kinds of possible extensions.
Collapse
Affiliation(s)
- Colin Griesbach
- Chair of Spatial Data Science and Statistical Learning, Georg August University, Germany
| | - Andreas Groll
- Department of Statistics, TU Dortmund University, Germany
| | - Elisabeth Bergherr
- Chair of Spatial Data Science and Statistical Learning, Georg August University, Germany
| |
Collapse
|
38
|
Mohammed A, Kora R. An effective ensemble deep learning framework for text classification. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2021. [DOI: 10.1016/j.jksuci.2021.11.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
39
|
Le J, Tian Y, Mendes J, Wilson B, Ibrahim M, DiBella E, Adluru G. Deep learning for radial SMS myocardial perfusion reconstruction using the 3D residual booster U-net. Magn Reson Imaging 2021; 83:178-188. [PMID: 34428512 PMCID: PMC8493758 DOI: 10.1016/j.mri.2021.08.007] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 08/12/2021] [Accepted: 08/13/2021] [Indexed: 11/24/2022]
Abstract
PURPOSE To develop an end-to-end deep learning solution for quickly reconstructing radial simultaneous multi-slice (SMS) myocardial perfusion datasets with comparable quality to the pixel tracking spatiotemporal constrained reconstruction (PT-STCR) method. METHODS Dynamic contrast enhanced (DCE) radial SMS myocardial perfusion data were obtained from 20 subjects who were scanned at rest and/or stress with or without ECG gating using a saturation recovery radial CAIPI turboFLASH sequence. Input to the networks consisted of complex coil combined images reconstructed using the inverse Fourier transform of undersampled radial SMS k-space data. Ground truth images were reconstructed using the PT-STCR pipeline. The performance of the residual booster 3D U-Net was tested by comparing it to state-of-the-art network architectures including MoDL, CRNN-MRI, and other U-Net variants. RESULTS Results demonstrate significant improvements in speed requiring approximately 8 seconds to reconstruct one radial SMS dataset which is approximately 200 times faster than the PT-STCR method. Images reconstructed with the residual booster 3D U-Net retain quality of ground truth PT-STCR images (0.963 SSIM/40.238 PSNR/0.147 NRMSE). The residual booster 3D U-Net has superior performance compared to existing network architectures in terms of image quality, temporal dynamics, and reconstruction time. CONCLUSION Residual and booster learning combined with the 3D U-Net architecture was shown to be an effective network for reconstructing high-quality images from undersampled radial SMS datasets while bypassing the reconstruction time of the PT-STCR method.
Collapse
Affiliation(s)
- Johnathan Le
- Utah Center for Advanced Imaging Research (UCAIR), Department of Radiology and Imaging Sciences, University of Utah Salt Lake City, UT, USA; Department of Biomedical Engineering, University of Utah, Salt Lake City, UT, USA
| | - Ye Tian
- Utah Center for Advanced Imaging Research (UCAIR), Department of Radiology and Imaging Sciences, University of Utah Salt Lake City, UT, USA; Department of Physics and Astronomy, University of Utah, Salt Lake City, UT, USA; Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California, Los Angeles, CA, USA
| | - Jason Mendes
- Utah Center for Advanced Imaging Research (UCAIR), Department of Radiology and Imaging Sciences, University of Utah Salt Lake City, UT, USA
| | - Brent Wilson
- Department of Cardiology, University of Utah, Salt Lake City, UT, USA
| | - Mark Ibrahim
- Department of Cardiology, University of Utah, Salt Lake City, UT, USA
| | - Edward DiBella
- Utah Center for Advanced Imaging Research (UCAIR), Department of Radiology and Imaging Sciences, University of Utah Salt Lake City, UT, USA; Department of Biomedical Engineering, University of Utah, Salt Lake City, UT, USA
| | - Ganesh Adluru
- Utah Center for Advanced Imaging Research (UCAIR), Department of Radiology and Imaging Sciences, University of Utah Salt Lake City, UT, USA; Department of Biomedical Engineering, University of Utah, Salt Lake City, UT, USA.
| |
Collapse
|
40
|
Staerk C, Mayr A. Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction. BMC Bioinformatics 2021; 22:441. [PMID: 34530737 PMCID: PMC8447543 DOI: 10.1186/s12859-021-04340-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 08/24/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Statistical boosting is a computational approach to select and estimate interpretable prediction models for high-dimensional biomedical data, leading to implicit regularization and variable selection when combined with early stopping. Traditionally, the set of base-learners is fixed for all iterations and consists of simple regression learners including only one predictor variable at a time. Furthermore, the number of iterations is typically tuned by optimizing the predictive performance, leading to models which often include unnecessarily large numbers of noise variables. RESULTS We propose three consecutive extensions of classical component-wise gradient boosting. In the first extension, called Subspace Boosting (SubBoost), base-learners can consist of several variables, allowing for multivariable updates in a single iteration. To compensate for the larger flexibility, the ultimate selection of base-learners is based on information criteria leading to an automatic stopping of the algorithm. As the second extension, Random Subspace Boosting (RSubBoost) additionally includes a random preselection of base-learners in each iteration, enabling the scalability to high-dimensional data. In a third extension, called Adaptive Subspace Boosting (AdaSubBoost), an adaptive random preselection of base-learners is considered, focusing on base-learners which have proven to be predictive in previous iterations. Simulation results show that the multivariable updates in the three subspace algorithms are particularly beneficial in cases of high correlations among signal covariates. In several biomedical applications the proposed algorithms tend to yield sparser models than classical statistical boosting, while showing a very competitive predictive performance also compared to penalized regression approaches like the (relaxed) lasso and the elastic net. CONCLUSIONS The proposed randomized boosting approaches with multivariable base-learners are promising extensions of statistical boosting, particularly suited for highly-correlated and sparse high-dimensional settings. The incorporated selection of base-learners via information criteria induces automatic stopping of the algorithms, promoting sparser and more interpretable prediction models.
Collapse
Affiliation(s)
- Christian Staerk
- Department of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Venusberg-Campus 1, 53127, Bonn, Germany.
| | - Andreas Mayr
- Department of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| |
Collapse
|
41
|
Gupta A, Kulkarni M, Mukherjee A. Accurate prediction of B-form/A-form DNA conformation propensity from primary sequence: A machine learning and free energy handshake. PATTERNS 2021; 2:100329. [PMID: 34553171 PMCID: PMC8441556 DOI: 10.1016/j.patter.2021.100329] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 03/25/2021] [Accepted: 07/20/2021] [Indexed: 11/26/2022]
Abstract
DNA carries the genetic code of life, with different conformations associated with different biological functions. Predicting the conformation of DNA from its primary sequence, although desirable, is a challenging problem owing to the polymorphic nature of DNA. We have deployed a host of machine learning algorithms, including the popular state-of-the-art LightGBM (a gradient boosting model), for building prediction models. We used the nested cross-validation strategy to address the issues of “overfitting” and selection bias. This simultaneously provides an unbiased estimate of the generalization performance of a machine learning algorithm and allows us to tune the hyperparameters optimally. Furthermore, we built a secondary model based on SHAP (SHapley Additive exPlanations) that offers crucial insight into model interpretability. Our detailed model-building strategy and robust statistical validation protocols tackle the formidable challenge of working on small datasets, which is often the case in biological and medical data. A robust machine learning model to predict A- or B-DNA conformation Outcome of machine learning model is explained with free energy values Our approach works well under class imbalance and limited data constraints
The sequence in the genome of an organism encodes all the information of life. We combine a data-driven approach using machine learning (ML) and the results of free energy calculations to offer a fresh perspective on this long-standing problem of prediction of DNA conformation (A or B) from the sequence. We trained our ML model using sophisticated state-of-the art algorithms such as LightGBM along with a nested cross-validation strategy to overcome the common problems associated with data bias and overfitting when constrained by limited data size. Our study will serve the broader interest of researchers who are not only seeking accurate and reliable predictive models but also want to understand the physical and chemical origins behind the predictions.
Collapse
Affiliation(s)
- Abhijit Gupta
- Department of Chemistry, Indian Institute of Science Education and Research, Pune, Maharashtra 411008, India
| | - Mandar Kulkarni
- Division of Biophysical Chemistry, Lund University, Chemical Center, P.O.B. 124, 22100 Lund, Sweden
| | - Arnab Mukherjee
- Department of Chemistry, Indian Institute of Science Education and Research, Pune, Maharashtra 411008, India
| |
Collapse
|
42
|
|
43
|
Sun R, Lerousseau M, Henry T, Carré A, Leroy A, Estienne T, Niyoteka S, Bockel S, Rouyar A, Alvarez Andres É, Benzazon N, Battistella E, Classe M, Robert C, Scoazec JY, Deutsch É. [Artificial intelligence, radiomics and pathomics to predict response and survival of patients treated with radiations]. Cancer Radiother 2021; 25:630-637. [PMID: 34284970 DOI: 10.1016/j.canrad.2021.06.027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 06/19/2021] [Indexed: 12/24/2022]
Abstract
Artificial intelligence approaches in medicine are more and more used and are extremely promising due to the growing number of data produced and the variety of data they allow to exploit. Thus, the computational analysis of medical images in particular, radiological (radiomics), or anatomopathological (pathomics), has shown many very interesting results for the prediction of the prognosis and the response of cancer patients. Radiotherapy is a discipline that particularly benefits from these new approaches based on computer science and imaging. This review will present the main principles of an artificial intelligence approach and in particular machine learning, the principles of a radiomic and pathomic approach and the potential of their use for the prediction of the prognosis of patients treated with radiotherapy.
Collapse
Affiliation(s)
- R Sun
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France; Département de radiothérapie, Gustave-Roussy Cancer Campus, 94800 Villejuif, France; Faculté de médecine, université Paris-Sud Paris-Saclay, 94270 Kremlin-Bicêtre, France.
| | - M Lerousseau
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France
| | - T Henry
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France; Département de médecine nucléaire, Gustave-Roussy Cancer Campus, 94800 Villejuif, France
| | - A Carré
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France
| | - A Leroy
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France; TheraPanacea, Paris, France
| | - T Estienne
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France
| | - S Niyoteka
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France
| | - S Bockel
- Département de radiothérapie, Gustave-Roussy Cancer Campus, 94800 Villejuif, France; Faculté de médecine, université Paris-Sud Paris-Saclay, 94270 Kremlin-Bicêtre, France
| | - A Rouyar
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France
| | - É Alvarez Andres
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France; TheraPanacea, Paris, France
| | - N Benzazon
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France
| | - E Battistella
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France
| | | | - C Robert
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France; Département de radiothérapie, Gustave-Roussy Cancer Campus, 94800 Villejuif, France; Faculté de médecine, université Paris-Sud Paris-Saclay, 94270 Kremlin-Bicêtre, France
| | - J Y Scoazec
- Faculté de médecine, université Paris-Sud Paris-Saclay, 94270 Kremlin-Bicêtre, France; Département de biologie et pathologie médicales, Gustave-Roussy Cancer Campus, 94800 Villejuif, France
| | - É Deutsch
- Université Paris-Saclay, institut Gustave-Roussy, Inserm, Radiothérapie moléculaire et innovation thérapeutique, 94800 Villejuif, France; Département de radiothérapie, Gustave-Roussy Cancer Campus, 94800 Villejuif, France; Faculté de médecine, université Paris-Sud Paris-Saclay, 94270 Kremlin-Bicêtre, France
| |
Collapse
|
44
|
Griesbach C, Groll A, Bergherr E. Addressing cluster-constant covariates in mixed effects models via likelihood-based boosting techniques. PLoS One 2021; 16:e0254178. [PMID: 34242316 PMCID: PMC8270154 DOI: 10.1371/journal.pone.0254178] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2021] [Accepted: 06/22/2021] [Indexed: 11/18/2022] Open
Abstract
Boosting techniques from the field of statistical learning have grown to be a popular tool for estimating and selecting predictor effects in various regression models and can roughly be separated in two general approaches, namely gradient boosting and likelihood-based boosting. An extensive framework has been proposed in order to fit generalized mixed models based on boosting, however for the case of cluster-constant covariates likelihood-based boosting approaches tend to mischoose variables in the selection step leading to wrong estimates. We propose an improved boosting algorithm for linear mixed models, where the random effects are properly weighted, disentangled from the fixed effects updating scheme and corrected for correlations with cluster-constant covariates in order to improve quality of estimates and in addition reduce the computational effort. The method outperforms current state-of-the-art approaches from boosting and maximum likelihood inference which is shown via simulations and various data examples.
Collapse
Affiliation(s)
- Colin Griesbach
- Department of Medical Informatics, Biometry and Epidemiology Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| | - Andreas Groll
- Faculty of Statistics, TU Dortmund, Dortmund, Germany
| | - Elisabeth Bergherr
- Department of Medical Informatics, Biometry and Epidemiology Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
45
|
Tozzo V, Azencott CA, Fiorini S, Fava E, Trucco A, Barla A. Where Do We Stand in Regularization for Life Science Studies? J Comput Biol 2021; 29:213-232. [PMID: 33926217 PMCID: PMC8968832 DOI: 10.1089/cmb.2019.0371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
More and more biologists and bioinformaticians turn to machine learning to analyze large amounts of data. In this context, it is crucial to understand which is the most suitable data analysis pipeline for achieving reliable results. This process may be challenging, due to a variety of factors, the most crucial ones being the data type and the general goal of the analysis (e.g., explorative or predictive). Life science data sets require further consideration as they often contain measures with a low signal-to-noise ratio, high-dimensional observations, and relatively few samples. In this complex setting, regularization, which can be defined as the introduction of additional information to solve an ill-posed problem, is the tool of choice to obtain robust models. Different regularization practices may be used depending both on characteristics of the data and of the question asked, and different choices may lead to different results. In this article, we provide a comprehensive description of the impact and importance of regularization techniques in life science studies. In particular, we provide an intuition of what regularization is and of the different ways it can be implemented and exploited. We propose four general life sciences problems in which regularization is fundamental and should be exploited for robustness. For each of these large families of problems, we enumerate different techniques as well as examples and case studies. Lastly, we provide a unified view of how to approach each data type with various regularization techniques.
Collapse
Affiliation(s)
- Veronica Tozzo
- Department of Informatics, Bioengineering, Robotics and System Engineering-DIBRIS, University of Genoa, Genoa, Italy
| | - Chloé-Agathe Azencott
- Centre for Computational Biology-CBIO, MINES ParisTech, PSL Research University, Paris, France.,Institut Curie, PSL Research University, Paris, France.,INSERM, U900, Paris, France
| | | | - Emanuele Fava
- Departiment of Electrical, Electronic, Telecommunications Engineering, and Naval Architecture (DITEN), University of Genoa, Genoa, Italy
| | - Andrea Trucco
- Departiment of Electrical, Electronic, Telecommunications Engineering, and Naval Architecture (DITEN), University of Genoa, Genoa, Italy
| | - Annalisa Barla
- Department of Informatics, Bioengineering, Robotics and System Engineering-DIBRIS, University of Genoa, Genoa, Italy
| |
Collapse
|
46
|
Tyralis H, Papacharalampous G. Boosting algorithms in energy research: a systematic review. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-05995-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
47
|
Rappl A, Mayr A, Waldmann E. More than one way: exploring the capabilities of different estimation approaches to joint models for longitudinal and time-to-event outcomes. Int J Biostat 2021; 18:127-149. [PMID: 33818032 DOI: 10.1515/ijb-2020-0067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 03/12/2021] [Indexed: 11/15/2022]
Abstract
The development of physical functioning after a caesura in an aged population is still widely unexplored. Analysis of this topic would need to model the longitudinal trajectories of physical functioning and simultaneously take terminal events (deaths) into account. Separate analysis of both results in biased estimates, since it neglects the inherent connection between the two outcomes. Thus, this type of data generating process is best modelled jointly. To facilitate this several software applications were made available. They differ in model formulation, estimation technique (likelihood-based, Bayesian inference, statistical boosting) and a comparison of the different approaches is necessary to identify their capabilities and limitations. Therefore, we compared the performance of the packages JM, joineRML, JMbayes and JMboost of the R software environment with respect to estimation accuracy, variable selection properties and prediction precision. With these findings we then illustrate the topic of physical functioning after a caesura with data from the German ageing survey (DEAS). The results suggest that in smaller data sets and theory driven modelling likelihood-based methods (expectation maximation, JM, joineRML) or Bayesian inference (JMbayes) are preferable, whereas statistical boosting (JMboost) is a better choice with high-dimensional data and data exploration settings.
Collapse
Affiliation(s)
- Anja Rappl
- Friedrich-Alexander-Universität Erlangen-Nürnberg, Institut für Medizininformatik, Biometrie und Epidemiologie, Waldstraße 6, Erlangen91054, Germany
| | - Andreas Mayr
- Rheinische Friedrich-Wilhelms-Universitat Bonn, Institut für Medizinische Biometrie, Informatik und Epidemiologie, Venusberg-Campus 1, Bonn53127, Germany
| | - Elisabeth Waldmann
- Friedrich-Alexander-Universität Erlangen-Nürnberg, Institut für Medizininformatik, Biometrie und Epidemiologie, Waldstrasse 6, Erlangen91054, Germany
| |
Collapse
|
48
|
Explanation and Probabilistic Prediction of Hydrological Signatures with Statistical Boosting Algorithms. REMOTE SENSING 2021. [DOI: 10.3390/rs13030333] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Hydrological signatures, i.e., statistical features of streamflow time series, are used to characterize the hydrology of a region. A relevant problem is the prediction of hydrological signatures in ungauged regions using the attributes obtained from remote sensing measurements at ungauged and gauged regions together with estimated hydrological signatures from gauged regions. The relevant framework is formulated as a regression problem, where the attributes are the predictor variables and the hydrological signatures are the dependent variables. Here we aim to provide probabilistic predictions of hydrological signatures using statistical boosting in a regression setting. We predict 12 hydrological signatures using 28 attributes in 667 basins in the contiguous US. We provide formal assessment of probabilistic predictions using quantile scores. We also exploit the statistical boosting properties with respect to the interpretability of derived models. It is shown that probabilistic predictions at quantile levels 2.5% and 97.5% using linear models as base learners exhibit better performance compared to more flexible boosting models that use both linear models and stumps (i.e., one-level decision trees). On the contrary, boosting models that use both linear models and stumps perform better than boosting with linear models when used for point predictions. Moreover, it is shown that climatic indices and topographic characteristics are the most important attributes for predicting hydrological signatures.
Collapse
|
49
|
Griesbach C, Säfken B, Waldmann E. Gradient boosting for linear mixed models. Int J Biostat 2021; 17:317-329. [PMID: 34826371 DOI: 10.1515/ijb-2020-0136] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 12/07/2020] [Indexed: 11/15/2022]
Abstract
Gradient boosting from the field of statistical learning is widely known as a powerful framework for estimation and selection of predictor effects in various regression models by adapting concepts from classification theory. Current boosting approaches also offer methods accounting for random effects and thus enable prediction of mixed models for longitudinal and clustered data. However, these approaches include several flaws resulting in unbalanced effect selection with falsely induced shrinkage and a low convergence rate on the one hand and biased estimates of the random effects on the other hand. We therefore propose a new boosting algorithm which explicitly accounts for the random structure by excluding it from the selection procedure, properly correcting the random effects estimates and in addition providing likelihood-based estimation of the random effects variance structure. The new algorithm offers an organic and unbiased fitting approach, which is shown via simulations and data examples.
Collapse
Affiliation(s)
- Colin Griesbach
- Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Benjamin Säfken
- Chair of Statistics, Georg-August-Universität Göttingen, Göttingen, Germany
| | - Elisabeth Waldmann
- Department of Medical Informatics, Biometry and Epidemiology, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| |
Collapse
|
50
|
Wu L, Han L, Li Q, Wang G, Zhang H, Li L. Using Interactome Big Data to Crack Genetic Mysteries and Enhance Future Crop Breeding. MOLECULAR PLANT 2021; 14:77-94. [PMID: 33340690 DOI: 10.1016/j.molp.2020.12.012] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 12/11/2020] [Accepted: 12/14/2020] [Indexed: 05/27/2023]
Abstract
The functional genes underlying phenotypic variation and their interactions represent "genetic mysteries". Understanding and utilizing these genetic mysteries are key solutions for mitigating the current threats to agriculture posed by population growth and individual food preferences. Due to advances in high-throughput multi-omics technologies, we are stepping into an Interactome Big Data era that is certain to revolutionize genetic research. In this article, we provide a brief overview of current strategies to explore genetic mysteries. We then introduce the methods for constructing and analyzing the Interactome Big Data and summarize currently available interactome resources. Next, we discuss how Interactome Big Data can be used as a versatile tool to dissect genetic mysteries. We propose an integrated strategy that could revolutionize genetic research by combining Interactome Big Data with machine learning, which involves mining information hidden in Big Data to identify the genetic models or networks that control various traits, and also provide a detailed procedure for systematic dissection of genetic mysteries,. Finally, we discuss three promising future breeding strategies utilizing the Interactome Big Data to improve crop yields and quality.
Collapse
Affiliation(s)
- Leiming Wu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Linqian Han
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Qing Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoying Wang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China
| | - Hongwei Zhang
- Institute of Crop Science, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|