201
|
Farook TH, Haq TM, Ramees L, Dudley J. Deep learning and predictive modelling for generating normalised muscle function parameters from signal images of mandibular electromyography. Med Biol Eng Comput 2024; 62:1763-1779. [PMID: 38376739 PMCID: PMC11076382 DOI: 10.1007/s11517-024-03047-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 02/06/2024] [Indexed: 02/21/2024]
Abstract
Challenges arise in accessing archived signal outputs due to proprietary software limitations. There is a notable lack of exploration in open-source mandibular EMG signal conversion for continuous access and analysis, hindering tasks such as pattern recognition and predictive modelling for temporomandibular joint complex function. To Develop a workflow to extract normalised signal parameters from images of mandibular muscle EMG and identify optimal clustering methods for quantifying signal intensity and activity durations. A workflow utilising OpenCV, variational encoders and Neurokit2 generated and augmented 866 unique EMG signals from jaw movement exercises. k-means, GMM and DBSCAN were employed for normalisation and cluster-centric signal processing. The workflow was validated with data collected from 66 participants, measuring temporalis, masseter and digastric muscles. DBSCAN (0.35 to 0.54) and GMM (0.09 to 0.24) exhibited lower silhouette scores for mouth opening, anterior protrusion and lateral excursions, while K-means performed best (0.10 to 0.11) for temporalis and masseter muscles during chewing activities. The current study successfully developed a deep learning workflow capable of extracting normalised signal data from EMG images and generating quantifiable parameters for muscle activity duration and general functional intensity.
Collapse
Affiliation(s)
- Taseef Hasan Farook
- Adelaide Dental School, The University of Adelaide, Adelaide, SA, 5000, Australia.
| | | | - Lameesa Ramees
- Adelaide Dental School, The University of Adelaide, Adelaide, SA, 5000, Australia
| | - James Dudley
- Adelaide Dental School, The University of Adelaide, Adelaide, SA, 5000, Australia
| |
Collapse
|
202
|
Yamkovoy K, Patil P, Dunn D, Erdman E, Bernson D, Swathi PA, Nall SK, Zhang Y, Wang J, Brinkley-Rubinstein L, LeMasters KH, White LF, Barocas JA. Using decision tree models and comprehensive statewide data to predict opioid overdoses following prison release. Ann Epidemiol 2024; 94:81-90. [PMID: 38710239 PMCID: PMC11117432 DOI: 10.1016/j.annepidem.2024.04.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 04/28/2024] [Accepted: 04/29/2024] [Indexed: 05/08/2024]
Abstract
PURPOSE Identifying predictors of opioid overdose following release from prison is critical for opioid overdose prevention. METHODS We leveraged an individually linked, state-wide database from 2015-2020 to predict the risk of opioid overdose within 90 days of release from Massachusetts state prisons. We developed two decision tree modeling schemes: a model fit on all individuals with a single weight for those that experienced an opioid overdose and models stratified by race/ethnicity. We compared the performance of each model using several performance measures and identified factors that were most predictive of opioid overdose within racial/ethnic groups and across models. RESULTS We found that out of 44,246 prison releases in Massachusetts between 2015-2020, 2237 (5.1%) resulted in opioid overdose in the 90 days following release. The performance of the two predictive models varied. The single weight model had high sensitivity (79%) and low specificity (56%) for predicting opioid overdose and was more sensitive for White non-Hispanic individuals (sensitivity = 84%) than for racial/ethnic minority individuals. CONCLUSIONS Stratified models had better balanced performance metrics for both White non-Hispanic and racial/ethnic minority groups and identified different predictors of overdose between racial/ethnic groups. Across racial/ethnic groups and models, involuntary commitment (involuntary treatment for alcohol/substance use disorder) was an important predictor of opioid overdose.
Collapse
Affiliation(s)
- Kristina Yamkovoy
- University of Colorado School of Medicine, Division of General Internal Medicine, Aurora, CO, USA
| | - Prasad Patil
- Boston University School of Public Health, Boston, MA, USA
| | - Devon Dunn
- Massachusetts Department of Public Health, Boston, MA, USA
| | | | - Dana Bernson
- Massachusetts Department of Public Health, Boston, MA, USA
| | - Pallavi Aytha Swathi
- University of Colorado School of Medicine, Division of General Internal Medicine, Aurora, CO, USA
| | - Samantha K Nall
- University of Colorado School of Medicine, Division of General Internal Medicine, Aurora, CO, USA
| | - Yanjia Zhang
- Boston University School of Public Health, Boston, MA, USA
| | | | | | - Katherine H LeMasters
- University of Colorado School of Medicine, Division of General Internal Medicine, Aurora, CO, USA
| | - Laura F White
- Boston University School of Public Health, Boston, MA, USA
| | - Joshua A Barocas
- University of Colorado School of Medicine, Division of General Internal Medicine, Aurora, CO, USA; University of Colorado School of Medicine, Division of Infectious Diseases, Aurora, CO, USA.
| |
Collapse
|
203
|
Fusaroli M, Salvo F, Begaud B, AlShammari TM, Bate A, Battini V, Brueckner A, Candore G, Carnovale C, Crisafulli S, Cutroneo PM, Dolladille C, Drici MD, Faillie JL, Goldman A, Hauben M, Herdeiro MT, Mahaux O, Manlik K, Montastruc F, Noguchi Y, Norén GN, Noseda R, Onakpoya IJ, Pariente A, Poluzzi E, Salem M, Sartori D, Trinh NTH, Tuccori M, van Hunsel F, van Puijenbroek E, Raschi E, Khouri C. The REporting of A Disproportionality Analysis for DrUg Safety Signal Detection Using Individual Case Safety Reports in PharmacoVigilance (READUS-PV): Explanation and Elaboration. Drug Saf 2024; 47:585-599. [PMID: 38713347 PMCID: PMC11116264 DOI: 10.1007/s40264-024-01423-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/07/2024] [Indexed: 05/08/2024]
Abstract
In pharmacovigilance, disproportionality analyses based on individual case safety reports are widely used to detect safety signals. Unfortunately, publishing disproportionality analyses lacks specific guidelines, often leading to incomplete and ambiguous reporting, and carries the risk of incorrect conclusions when data are not placed in the correct context. The REporting of A Disproportionality analysis for drUg Safety signal detection using individual case safety reports in PharmacoVigilance (READUS-PV) statement was developed to address this issue by promoting transparent and comprehensive reporting of disproportionality studies. While the statement paper explains in greater detail the procedure followed to develop these guidelines, with this explanation paper we present the 14 items retained for READUS-PV guidelines, together with an in-depth explanation of their rationale and bullet points to illustrate their practical implementation. Our primary objective is to foster the adoption of the READUS-PV guidelines among authors, editors, peer reviewers, and readers of disproportionality analyses. Enhancing transparency, completeness, and accuracy of reporting, as well as proper interpretation of their results, READUS-PV guidelines will ultimately facilitate evidence-based decision making in pharmacovigilance.
Collapse
Affiliation(s)
- Michele Fusaroli
- Department of Medical and Surgical Sciences, Alma Mater Studiorum, University of Bologna, Bologna, Italy.
| | - Francesco Salvo
- Université de Bordeaux, INSERM, BPH, Team AHeaD, U1219, 33000, Bordeaux, France.
- Service de Pharmacologie Médicale, CHU de Bordeaux, INSERM, U1219, 33000, Bordeaux, France.
| | - Bernard Begaud
- Université de Bordeaux, INSERM, BPH, Team AHeaD, U1219, 33000, Bordeaux, France
| | | | - Andrew Bate
- Global Safety, GSK, Brentford, UK
- Department of Non-Communicable Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Vera Battini
- Pharmacovigilance and Clinical Research, International Centre for Pesticides and Health Risk Prevention, Department of Biomedical and Clinical Sciences (DIBIC), ASST Fatebenefratelli-Sacco University Hospital, Università degli Studi di Milano, Milan, Italy
| | | | | | - Carla Carnovale
- Pharmacovigilance and Clinical Research, International Centre for Pesticides and Health Risk Prevention, Department of Biomedical and Clinical Sciences (DIBIC), ASST Fatebenefratelli-Sacco University Hospital, Università degli Studi di Milano, Milan, Italy
| | | | - Paola Maria Cutroneo
- Unit of Clinical Pharmacology, Sicily Pharmacovigilance Regional Centre, University Hospital of Messina, Messina, Italy
| | - Charles Dolladille
- UNICAEN, EA4650 SEILIRM, CHU de Caen Normandie, Normandie University, Caen, France
- Department of Pharmacology, CHU de Caen Normandie, Caen, France
| | - Milou-Daniel Drici
- Department of Clinical Pharmacology, Université Côte d'Azur Medical Center, Nice, France
| | - Jean-Luc Faillie
- Desbrest Institute of Epidemiology and Public Health, Department of Medical Pharmacology and Toxicology, INSERM, Univ Montpellier, Regional Pharmacovigilance Centre, CHU Montpellier, Montpellier, France
| | - Adam Goldman
- Department of Internal Medicine, Sheba Medical Center, Ramat-Gan, Israel
- Department of Epidemiology and Preventive Medicine, School of Public Health, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Manfred Hauben
- Pfizer Inc, New York, NY, USA
- Department of Family and Community Medicine, New York Medical College, Valhalla, New York, USA
| | - Maria Teresa Herdeiro
- Department of Medical Sciences, IBIMED-Institute of Biomedicine, University of Aveiro, 3810-193, Aveiro, Portugal
| | | | - Katrin Manlik
- Medical Affairs and Pharmacovigilance, Bayer AG, Berlin, Germany
| | - François Montastruc
- Department of Medical and Clinical Pharmacology, Centre of PharmacoVigilance and Pharmacoepidemiology, Faculty of Medicine, Toulouse University Hospital (CHU), Toulouse, France
- CIC 1436, Team PEPSS (Pharmacologie En Population cohorteS et biobanqueS), Toulouse University Hospital, Toulouse, France
| | - Yoshihiro Noguchi
- Laboratory of Clinical Pharmacy, Gifu Pharmaceutical University, Gifu, Japan
| | | | - Roberta Noseda
- Institute of Pharmacological Sciences of Southern Switzerland, Division of Clinical Pharmacology and Toxicology, Ente Ospedaliero Cantonale, Lugano, Switzerland
| | - Igho J Onakpoya
- Department for Continuing Education, University of Oxford, Oxford, UK
| | - Antoine Pariente
- Université de Bordeaux, INSERM, BPH, Team AHeaD, U1219, 33000, Bordeaux, France
- Service de Pharmacologie Médicale, CHU de Bordeaux, INSERM, U1219, 33000, Bordeaux, France
| | - Elisabetta Poluzzi
- Department of Medical and Surgical Sciences, Alma Mater Studiorum, University of Bologna, Bologna, Italy
| | | | - Daniele Sartori
- Uppsala Monitoring Centre, Uppsala, Sweden
- Centre for Evidence-Based Medicine, Nuffield, Department of Primary Care Health Sciences, University of Oxford, Oxford, UK
| | - Nhung T H Trinh
- PharmacoEpidemiology and Drug Safety Research Group, Department of Pharmacy, University of Oslo, Oslo, Norway
| | - Marco Tuccori
- Tuscany Regional Centre, Unit of Adverse Drug Reaction Monitoring, University Hospital of Pisa, Pisa, Italy
| | - Florence van Hunsel
- Netherlands Pharmacovigilance Centre Lareb, 's-Hertogenbosch, The Netherlands
- PharmacoTherapy, Epidemiology and Economics, University of Groningen, Groningen Research Institute of Pharmacy, Groningen, The Netherlands
| | - Eugène van Puijenbroek
- Netherlands Pharmacovigilance Centre Lareb, 's-Hertogenbosch, The Netherlands
- PharmacoTherapy, Epidemiology and Economics, University of Groningen, Groningen Research Institute of Pharmacy, Groningen, The Netherlands
| | - Emanuel Raschi
- Department of Medical and Surgical Sciences, Alma Mater Studiorum, University of Bologna, Bologna, Italy
| | - Charles Khouri
- Pharmacovigilance Department, Université Grenoble Alpes, Grenoble Alpes University Hospital, Grenoble, France
- UMR 1300-HP2 Laboratory, Université Grenoble Alpes, INSERM, Grenoble Alpes University, Grenoble, France
| |
Collapse
|
204
|
Wang J, Ma J, Zhou Z, Xie X, Zhang H, Wu Y, Qu H. TacPrint: Visualizing the Biomechanical Fingerprint in Table Tennis. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2024; 30:2955-2967. [PMID: 38619948 DOI: 10.1109/tvcg.2024.3388555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
Table tennis is a sport that demands high levels of technical proficiency and body coordination from players. Biomechanical fingerprints can provide valuable insights into players' habitual movement patterns and characteristics, allowing them to identify and improve technical weaknesses. Despite the potential, few studies have developed effective methods for generating such fingerprints. To address this gap, we propose TacPrint, a framework for generating a biomechanical fingerprint for each player. TacPrint leverages machine learning techniques to extract comprehensive features from biomechanics data collected by inertial measurement units (IMU) and employs the attention mechanism to enhance model interpretability. After generating fingerprints, TacPrint provides a visualization system to facilitate the exploration and investigation of these fingerprints. In order to validate the effectiveness of the framework, we designed an experiment to evaluate the model's performance and conducted a case study with the system. The results of our experiment demonstrated the high accuracy and effectiveness of the model. Additionally, we discussed the potential of TacPrint to be extended to other sports.
Collapse
|
205
|
Pazo M, Gerassis S, Araújo M, Margarida Antunes I, Rigueira X. Enhancing water quality prediction for fluctuating missing data scenarios: A dynamic Bayesian network-based processing system to monitor cyanobacteria proliferation. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 927:172340. [PMID: 38608909 DOI: 10.1016/j.scitotenv.2024.172340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 04/05/2024] [Accepted: 04/07/2024] [Indexed: 04/14/2024]
Abstract
Tackling the impact of missing data in water management is crucial to ensure the reliability of scientific research that informs decision-making processes in public health. The goal of this study is to ascertain the root causes associated with cyanobacteria proliferation under major missing data scenarios. For this purpose, a dynamic missing data management methodology is proposed using Bayesian Machine Learning for accurate surface water quality prediction of a river from Limia basin (Spain). The methodology used entails a sequence of analytical steps, starting with data pre-processing, followed by the selection of a reliable dynamic Bayesian missing value prediction system, leading finally to a supervised analysis of the behavioral patterns exhibited by cyanobacteria. For that, a total of 2,118,844 data points were used, with 205,316 (9.69 %) missing values identified. The machine learning testing showed the iterative structural expectation maximization (SEM) as the best performing algorithm, above the dynamic imputation (DI) and entropy-based dynamic imputation methods (EBDI), enhancing in some cases the accuracy of imputations by approximately 50 % in R2, RMSE, NRMSE, and logarithmic loss values. These findings can impact how data on water quality is being processed and studied, thus, opening the door for more reliable water management strategies that better inform public health decisions.
Collapse
Affiliation(s)
- M Pazo
- CINTECX, Universidade de Vigo, Grupo de Xestión Segura e Sostible de Recursos Minerais, Dpto. De Enxeñaría dos Recursos Naturais e Medio Ambiente, 36310 Vigo, Spain.
| | - S Gerassis
- CINTECX, Universidade de Vigo, Grupo de Xestión Segura e Sostible de Recursos Minerais, Dpto. De Enxeñaría dos Recursos Naturais e Medio Ambiente, 36310 Vigo, Spain
| | - M Araújo
- CINTECX, Universidade de Vigo, Grupo de Xestión Segura e Sostible de Recursos Minerais, Dpto. De Enxeñaría dos Recursos Naturais e Medio Ambiente, 36310 Vigo, Spain
| | - I Margarida Antunes
- Institute of Earth Sciences (ICT), Pole of University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal
| | - X Rigueira
- CINTECX, Universidade de Vigo, Grupo de Xestión Segura e Sostible de Recursos Minerais, Dpto. De Enxeñaría dos Recursos Naturais e Medio Ambiente, 36310 Vigo, Spain
| |
Collapse
|
206
|
Tan HS, Wang K, Mcbeth R. Exploring UMAP in hybrid models of entropy-based and representativeness sampling for active learning in biomedical segmentation. Comput Biol Med 2024; 176:108605. [PMID: 38772054 DOI: 10.1016/j.compbiomed.2024.108605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 02/18/2024] [Accepted: 05/11/2024] [Indexed: 05/23/2024]
Abstract
In this work, we study various hybrid models of entropy-based and representativeness sampling techniques in the context of active learning in medical segmentation, in particular examining the role of UMAP (Uniform Manifold Approximation and Projection) as a technique for capturing representativeness. Although UMAP has been shown viable as a general purpose dimension reduction method in diverse areas, its role in deep learning-based medical segmentation has yet been extensively explored. Using the cardiac and prostate datasets in the Medical Segmentation Decathlon for validation, we found that a novel hybrid combination of Entropy-UMAP sampling technique achieved a statistically significant Dice score advantage over the random baseline (3.2% for cardiac, 4.5% for prostate), and attained the highest Dice coefficient among the spectrum of 10 distinct active learning methodologies we examined. This provides preliminary evidence that there is an interesting synergy between entropy-based and UMAP methods when the former precedes the latter in a hybrid model of active learning.
Collapse
Affiliation(s)
- Hai Siong Tan
- University of Pennsylvania, Perelman School of Medicine, Department of Radiation Oncology, Philadelphia, USA.
| | | | - Rafe Mcbeth
- University of Pennsylvania, Perelman School of Medicine, Department of Radiation Oncology, Philadelphia, USA
| |
Collapse
|
207
|
Adelson RP, Garikipati A, Zhou Y, Ciobanu M, Tawara K, Barnes G, Singh NP, Mao Q, Das R. Machine Learning Approach with Harmonized Multinational Datasets for Enhanced Prediction of Hypothyroidism in Patients with Type 2 Diabetes. Diagnostics (Basel) 2024; 14:1152. [PMID: 38893680 PMCID: PMC11172278 DOI: 10.3390/diagnostics14111152] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Revised: 05/24/2024] [Accepted: 05/29/2024] [Indexed: 06/21/2024] Open
Abstract
Type 2 diabetes (T2D) is a global health concern with increasing prevalence. Comorbid hypothyroidism (HT) exacerbates kidney, cardiac, neurological and other complications of T2D; these risks can be mitigated pharmacologically upon detecting HT. The current HT standard of care (SOC) screening in T2D is infrequent, delaying HT diagnosis and treatment. We present a first-to-date machine learning algorithm (MLA) clinical decision tool to classify patients as low vs. high risk for developing HT comorbid with T2D; the MLA was developed using readily available patient data from harmonized multinational datasets. The MLA was trained on data from NIH All of US (AoU) and UK Biobank (UKBB) (Combined dataset) and achieved a high negative predictive value (NPV) of 0.989 and an AUROC of 0.762 in the Combined dataset, exceeding AUROCs for the models trained on AoU or UKBB alone (0.666 and 0.622, respectively), indicating that increasing dataset diversity for MLA training improves performance. This high-NPV automated tool can supplement SOC screening and rule out T2D patients with low HT risk, allowing for the prioritization of lab-based testing for at-risk patients. Conversely, an MLA output that designates a patient to be at risk of developing HT allows for tailored clinical management and thereby promotes improved patient outcomes.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Qingqing Mao
- Montera, Inc. dba Forta, 548 Market St, PMB 89605, San Francisco, CA 94104-5401, USA; (R.P.A.); (A.G.); (Y.Z.); (M.C.); (K.T.); (G.B.); (N.P.S.); (R.D.)
| | | |
Collapse
|
208
|
Akter S, Mustafa HA. Analysis and interpretability of machine learning models to classify thyroid disease. PLoS One 2024; 19:e0300670. [PMID: 38820460 PMCID: PMC11142566 DOI: 10.1371/journal.pone.0300670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 03/01/2024] [Indexed: 06/02/2024] Open
Abstract
Thyroid disease classification plays a crucial role in early diagnosis and effective treatment of thyroid disorders. Machine learning (ML) techniques have demonstrated remarkable potential in this domain, offering accurate and efficient diagnostic tools. Most of the real-life datasets have imbalanced characteristics that hamper the overall performance of the classifiers. Existing data balancing techniques process the whole dataset at a time that sometimes causes overfitting and underfitting. However, the complexity of some ML models, often referred to as "black boxes," raises concerns about their interpretability and clinical applicability. This paper presents a comprehensive study focused on the analysis and interpretability of various ML models for classifying thyroid diseases. In our work, we first applied a new data-balancing mechanism using a clustering technique and then analyzed the performance of different ML algorithms. To address the interpretability challenge, we explored techniques for model explanation and feature importance analysis using eXplainable Artificial Intelligence (XAI) tools globally as well as locally. Finally, the XAI results are validated with the domain experts. Experimental results have shown that our proposed mechanism is efficient in diagnosing thyroid disease and can explain the models effectively. The findings can contribute to bridging the gap between adopting advanced ML techniques and the clinical requirements of transparency and accountability in diagnostic decision-making.
Collapse
Affiliation(s)
- Sumya Akter
- Institute of Information and Communication Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
- Department of Computer Science and Engineering, Hajee Mohammad Danesh Science and Technology University, Dinajpur, Bangladesh
| | - Hossen A. Mustafa
- Institute of Information and Communication Engineering, Bangladesh University of Engineering and Technology, Dhaka, Bangladesh
| |
Collapse
|
209
|
Darkhawaja R, Hänggi J, Bringolf-Isler B, Kayser B, Suggs LS, Kwiatkowski M, Probst-Hensch N. Weekend physical activity profiles and their relationship with quality of life: The SOPHYA cohort of Swiss children and adolescents. PLoS One 2024; 19:e0298890. [PMID: 38820541 PMCID: PMC11142694 DOI: 10.1371/journal.pone.0298890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 01/31/2024] [Indexed: 06/02/2024] Open
Abstract
INTRODUCTION Quality of life (QoL) is an important health indicator among children and adolescents. Evidence on the effect of physical activity (PA)-related behaviors on QoL among youth remains inconsistent. Conventional accelerometer-derived PA metrics and guidelines with a focus on whole weeks may not adequately characterize QoL relevant PA behavior. OBJECTIVE This study aims to a) identify clusters of accelerometer-derived PA profiles during weekend days among children and adolescents living in Switzerland, b) assess their cross-sectional and predictive association with overall QoL and its dimensions, and c) investigate whether the associations of QoL with the newly identified clusters persist upon adjustment for the commonly used PA metrics moderate-to-vigorous physical activity (MVPA) and time spent in sedentary behavior (SB). METHODS The population-based Swiss children's Objectively measured PHYsical Activity (SOPHYA) cohort among children and adolescents aged 6 to 16 years was initiated at baseline in 2013. PA and QoL information was obtained twice over a five-year follow-up period. The primary endpoint is the overall QoL score and its six dimension scores obtained by KINDL® questionnaire. The primary predictor is the cluster membership of accelerometer-derived weekend PA profile. Clusters were obtained by applying the k-medoid algorithm to the distance matrix of profiles obtained by pairwise alignments of PA time series using the Dynamic Time Warping (DTW) algorithm. Secondary predictors are accelerometer-derived conventional PA metrics MVPA and SB from two combined weekend days. Linear regression models were applied to assess a) the cross-sectional association between PA cluster membership and QoL at baseline and b) the predictive association between PA cluster membership at baseline and QoL at follow-up, adjusting for baseline QoL. RESULTS The study sample for deriving PA profile clusters consisted of 51.4% girls and had an average age of 10.9 [SD 2.5] years). The elbow and silhouette methods indicated that weekend PA profiles are best classified in two or four clusters. The most differentiating characteristic for the two-clusters classification ("lower activity" and "high activity"), and the four-clusters classification ("inactive", "low activity", "medium activity", and "high activity"), respectively was the participant's mean counts per 15-seconds epoch. Participants assigned to high activity clusters were younger and more often male. Neither the clustered PA profiles nor MVPA or SB were cross-sectionally or predictively associated with overall QoL. The only association of a conventional PA metrics with QoL while adjusting for cluster membership was observed between MVPA during the weekend days and social well-being with a mean score difference of 2.4 (95%CI: 0.3 to 4.5; p = 0.025). CONCLUSION The absence of strong associations of PA metrics for the weekend with QoL, except for the positive association between MVPA during the weekend days and social well-being, is in line with results from two randomized studies not showing efficacy of PA interventions on youth QoL. But because PA decreases with age, its promotion and relevance to QoL remain important research topics. Larger longitudinal study samples with more than two follow-up time points of children and adolescents are needed to derive new novel accelerometer-derived PA profiles and to associate them with QoL dimensions.
Collapse
Affiliation(s)
- Ranin Darkhawaja
- Swiss Tropical and Public Health Institute, Allschwil, Switzerland
- University of Basel, Basel, Switzerland
| | - Johanna Hänggi
- Swiss Tropical and Public Health Institute, Allschwil, Switzerland
- University of Basel, Basel, Switzerland
| | - Bettina Bringolf-Isler
- Swiss Tropical and Public Health Institute, Allschwil, Switzerland
- University of Basel, Basel, Switzerland
| | - Bengt Kayser
- Institute of Sport Sciences, University of Lausanne, Lausanne, Switzerland
| | - L. Suzanne Suggs
- Institute for Public Health and Institute of Communication and Public Policy, Università della SvizzeraItaliana, Lugano, Switzerland
| | - Marek Kwiatkowski
- Swiss Tropical and Public Health Institute, Allschwil, Switzerland
- University of Basel, Basel, Switzerland
| | - Nicole Probst-Hensch
- Swiss Tropical and Public Health Institute, Allschwil, Switzerland
- University of Basel, Basel, Switzerland
| |
Collapse
|
210
|
Wang X, Tang Z, Shao J, Robertson S, Gómez MÁ, Zhang S. HoopTransformer: Advancing NBA Offensive Play Recognition with Self-Supervised Learning from Player Trajectories. Sports Med 2024:10.1007/s40279-024-02030-3. [PMID: 38814566 DOI: 10.1007/s40279-024-02030-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/28/2024] [Indexed: 05/31/2024]
Abstract
BACKGROUND AND OBJECTIVE Understanding and recognizing basketball offensive set plays, which involve intricate interactions between players, have always been regarded as challenging tasks for untrained humans, not to mention machines. In this study, our objective is to propose an artificial intelligence model that can automatically recognize offensive plays using a novel self-supervised learning approach. METHODS The dataset was collected by SportVU from 632 games during the 2015-2016 season of the National Basketball Association (NBA), with a total of 90,524 possessions. A multi-agent motion prediction pretraining model was built on the basis of axial-attention transformer and trained with different masking strategies: motion prediction (MP), motion reconstruction (MR), and MP + MR joint strategy. A downstream play-level classification task and similarity search were used to evaluate the models' performance. RESULTS The results showed that the MP + MR joint masking strategy maximized the ability of the model compared with individual masking strategies. For the classification task, the joint strategy achieved a top-1 accuracy of 81.5% and top-3 accuracy of 97.5%. In the similarity search evaluation, the joint strategy attained a top-5 accuracy of 76% and top-10 accuracy of 59%. Additionally, with the same MP + MR joint masking strategy, our HoopTransformer model outperformed the two baseline models in the classification task and similarity search. CONCLUSION This study presents a self-supervised learning model and demonstrates the effectiveness and potential of the model in accurately comprehending and capturing player movements and complex interactions during offensive plays.
Collapse
Affiliation(s)
- Xing Wang
- Facultad de Ciencias de la Actividad Física y del Deporte, Universidad Politécnica de Madrid, Madrid, Spain.
| | - Zitian Tang
- Athletic Performance and Data Science Laboratory, Division of Sports Science and Physical Education, Tsinghua University, Beijing, China
- Computer Science Department, Brown University, Providence, RI, USA
| | - Jianchong Shao
- Athletic Performance and Data Science Laboratory, Division of Sports Science and Physical Education, Tsinghua University, Beijing, China
| | - Sam Robertson
- Institute for Health and Sport, Victoria University, Melbourne, Australia
| | - Miguel-Ángel Gómez
- Facultad de Ciencias de la Actividad Física y del Deporte, Universidad Politécnica de Madrid, Madrid, Spain
| | - Shaoliang Zhang
- Athletic Performance and Data Science Laboratory, Division of Sports Science and Physical Education, Tsinghua University, Beijing, China
| |
Collapse
|
211
|
Abdalgader K, Matroud AA, Hossin K. Experimental study on short-text clustering using transformer-based semantic similarity measure. PeerJ Comput Sci 2024; 10:e2078. [PMID: 38855231 PMCID: PMC11157522 DOI: 10.7717/peerj-cs.2078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 05/03/2024] [Indexed: 06/11/2024]
Abstract
Sentence clustering plays a central role in various text-processing activities and has received extensive attention for measuring semantic similarity between compared sentences. However, relatively little focus has been placed on evaluating clustering performance using available similarity measures that adopt low-dimensional continuous representations. Such representations are crucial in domains like sentence clustering, where traditional word co-occurrence representations often achieve poor results when clustering semantically similar sentences that share no common words. This article presents a new implementation that incorporates a sentence similarity measure based on the notion of embedding representation for evaluating the performance of three types of text clustering methods: partitional clustering, hierarchical clustering, and fuzzy clustering, on standard textual datasets. This measure derives its semantic information from pre-training models designed to simulate human knowledge about words in natural language. The article also compares the performance of the used similarity measure by training it on two state-of-the-art pre-training models to investigate which yields better results. We argue that the superior performance of the selected clustering methods stems from their more effective use of the semantic information offered by this embedding-based similarity measure. Furthermore, we use hierarchical clustering, the best-performing method, for a text summarization task and report the results. The implementation in this article demonstrates that incorporating the sentence embedding measure leads to significantly improved performance in both text clustering and text summarization tasks.
Collapse
Affiliation(s)
- Khaled Abdalgader
- Department of Computer Science and Engineering, American University of Ras Al Khaimah, Ras Al Khaimah, United Arab Emirates
| | | | - Khaled Hossin
- Department of Mechanical and Industrial Engineering, American University of Ras Al Khaimah, Ras Al Khaimah, United Arab Emirates
| |
Collapse
|
212
|
Adam N, Wieder R. Temporal Association Rule Mining: Race-Based Patterns of Treatment-Adverse Events in Breast Cancer Patients Using SEER-Medicare Dataset. Biomedicines 2024; 12:1213. [PMID: 38927419 PMCID: PMC11200891 DOI: 10.3390/biomedicines12061213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Revised: 05/17/2024] [Accepted: 05/20/2024] [Indexed: 06/28/2024] Open
Abstract
PURPOSE Disparities in the screening, treatment, and survival of African American (AA) patients with breast cancer extend to adverse events experienced with systemic therapy. However, data are limited and difficult to obtain. We addressed this challenge by applying temporal association rule (TAR) mining using the SEER-Medicare dataset for differences in the association of specific adverse events (AEs) and treatments (TRs) for breast cancer between AA and White women. We considered two categories of cancer care providers and settings: practitioners providing care in the outpatient units of hospitals and institutions and private practitioners providing care in their offices. PATIENTS AN METHODS We considered women enrolled in the Medicare fee-for-service option at age 65 who qualified by age and not disability, who were diagnosed with breast cancer with attributed patient factors of age and race, marital status, comorbidities, prior malignancies, prior therapy, disease factors of stage, grade, and ER/PR and Her2 status and laterality. We included 141 HCPCS drug J codes for chemotherapy, biotherapy, and hormone therapy drugs, which we consolidated into 46 mechanistic categories and generated AE data. We consolidated AEs from ICD9 codes into 18 categories associated with breast cancer therapy. We applied TAR mining to determine associations between the 46 TR and 18 AE categories in the context of the patient categories outlined. We applied the spark.mllib implementation of the FPGrowth algorithm, a parallel version called PFP. We considered differences of at least one unit of lift as significant between groups. The model's results demonstrated a high overlap between the model's identified TR-AEs associated set and the actual set. RESULTS Our results demonstrate that specific TR/AE associations are highly dependent on race, stage, and venue of care administration. CONCLUSIONS Our data demonstrate the usefulness of this approach in identifying differences in the associations between TRs and AEs in different populations and serve as a reference for predicting the likelihood of AEs in different patient populations treated for breast cancer. Our novel approach using unsupervised learning enables the discovery of association rules while paying special attention to temporal information, resulting in greater predictive and descriptive power as a patient's health and life status change over time.
Collapse
Affiliation(s)
- Nabil Adam
- Phalcon, LLC., Manhasset, NY 11030, USA;
- Rutgers University, Newark Campus, Newark, NJ 07102, USA
| | - Robert Wieder
- Rutgers New Jersey Medical School, Newark, NJ 07103, USA
- Rutgers Cancer Institute of New Jersey, Newark, NJ 07103, USA
| |
Collapse
|
213
|
Archbold J, Clohessy S, Herath D, Griffiths N, Oyebode O. An agent-based model of the spread of behavioural risk-factors for cardiovascular disease in city-scale populations. PLoS One 2024; 19:e0303051. [PMID: 38805418 PMCID: PMC11132484 DOI: 10.1371/journal.pone.0303051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 04/18/2024] [Indexed: 05/30/2024] Open
Abstract
Cardiovascular disease (CVD) is the leading cause of mortality globally, and is the second main cause of mortality in the UK. Four key modifiable behaviours are known to increase CVD risk, namely: tobacco use, unhealthy diet, physical inactivity and harmful use of alcohol. Behaviours that increase the risk of CVD can spread through social networks because individuals consciously and unconsciously mimic the behaviour of others they relate to and admire. Exploiting these social influences may lead to effective and efficient public health interventions to prevent CVD. This project aimed to construct and validate an agent-based model (ABM) of how the four major behavioural risk-factors for CVD spread through social networks in a population, and examine whether the model could be used to identify targets for public health intervention and to test intervention strategies. Previous ABMs have typically focused on a single risk factor or considered very small populations. We created a city-scale ABM to model the behavioural risk-factors of individuals, their social networks (spousal, household, friendship and workplace), the spread of behaviours through these social networks, and the subsequent impact on the development of CVD. We compared the model output (predicted CVD events over a ten year period) to observed data, demonstrating that the model output is realistic. The model output is stable up to at least a population size of 1.2M agents (the maximum tested). We found that there is scope for the modelled interventions targeting the spread of these behaviours to change the number of CVD events experienced by the agents over ten years. Specifically, we modelled the impact of workplace interventions to show that the ABM could be useful for identifying targets for public health intervention. The model itself is Open Source and is available for use or extension by other researchers.
Collapse
Affiliation(s)
- James Archbold
- Department of Computer Science, University of Warwick, Coventry, United Kingdom
| | - Sophie Clohessy
- Warwick Medical School, University of Warwick, Coventry, United Kingdom
| | - Deshani Herath
- Warwick Medical School, University of Warwick, Coventry, United Kingdom
| | - Nathan Griffiths
- Department of Computer Science, University of Warwick, Coventry, United Kingdom
| | - Oyinlola Oyebode
- Wolfson Institute of Population Health, Queen Mary University of London, London, United Kingdom
| |
Collapse
|
214
|
Kumar V, Banerjee A, Roy K. Breaking the Barriers: Machine-Learning-Based c-RASAR Approach for Accurate Blood-Brain Barrier Permeability Prediction. J Chem Inf Model 2024; 64:4298-4309. [PMID: 38700741 DOI: 10.1021/acs.jcim.4c00433] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2024]
Abstract
The intricate nature of the blood-brain barrier (BBB) poses a significant challenge in predicting drug permeability, which is crucial for assessing central nervous system (CNS) drug efficacy and safety. This research utilizes an innovative approach, the classification read-across structure-activity relationship (c-RASAR) framework, that leverages machine learning (ML) to enhance the accuracy of BBB permeability predictions. The c-RASAR framework seamlessly integrates principles from both read-across and QSAR methodologies, underscoring the need to consider similarity-related aspects during the development of the c-RASAR model. It is crucial to note that the primary goal of this research is not to introduce yet another model for predicting BBB permeability but rather to showcase the refinement in predicting the BBB permeability of organic compounds through the introduction of a c-RASAR approach. This groundbreaking methodology aims to elevate the accuracy of assessing neuropharmacological implications and streamline the process of drug development. In this study, an ML-based c-RASAR linear discriminant analysis (LDA) model was developed using a dataset of 7807 compounds, encompassing both BBB-permeable and -nonpermeable substances sourced from the B3DB database (freely accessible from https://github.com/theochem/B3DB), for predicting BBB permeability in lead discovery for CNS drugs. The model's predictive capability was then validated using three external sets: one containing 276,518 natural products (NPs) from the LOTUS database (accessible from https://lotus.naturalproducts.net/download) for data gap filling, another comprising 13,002 drug-like/drug compounds from the DrugBank database (available from https://go.drugbank.com/), and a third set of 56 FDA-approved drugs to assess the model's reliability. Further diversifying the predictive arsenal, various other ML-based c-RASAR models were also developed for comparison purposes. The proposed c-RASAR framework emerged as a powerful tool for predicting BBB permeability. This research not only advances the understanding of molecular determinants influencing CNS drug permeability but also provides a versatile computational platform for the rapid assessment of diverse compounds, facilitating informed decision-making in drug development and design.
Collapse
Affiliation(s)
- Vinay Kumar
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India
| |
Collapse
|
215
|
Tiribelli S, Calvaresi D. Rethinking Health Recommender Systems for Active Aging: An Autonomy-Based Ethical Analysis. SCIENCE AND ENGINEERING ETHICS 2024; 30:22. [PMID: 38801621 PMCID: PMC11129984 DOI: 10.1007/s11948-024-00479-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 04/02/2024] [Indexed: 05/29/2024]
Abstract
Health Recommender Systems are promising Articial-Intelligence-based tools endowing healthy lifestyles and therapy adherence in healthcare and medicine. Among the most supported areas, it is worth mentioning active aging. However, current HRS supporting AA raise ethical challenges that still need to be properly formalized and explored. This study proposes to rethink HRS for AA through an autonomy-based ethical analysis. In particular, a brief overview of the HRS' technical aspects allows us to shed light on the ethical risks and challenges they might raise on individuals' well-being as they age. Moreover, the study proposes a categorization, understanding, and possible preventive/mitigation actions for the elicited risks and challenges through rethinking the AI ethics core principle of autonomy. Finally, elaborating on autonomy-related ethical theories, the paper proposes an autonomy-based ethical framework and how it can foster the development of autonomy-enabling HRS for AA.
Collapse
Affiliation(s)
- Simona Tiribelli
- Department of Political Sciences, Communication, and International Relations, University of Macerata, 62100, Macerata, Italy.
- Institute for Technology and Global Health, PathCheck Foundation, 955 Massachusetts Ave, Cambridge, MA, 02139, USA.
| | - Davide Calvaresi
- University of Applied Sciences and Arts Western Switzerland (HES-SO), Rue de l'Industrie 23, 1950, Sion, Switzerland
| |
Collapse
|
216
|
Sbodio ML, López V, Hoang TL, Brisimi T, Picco G, Vejsbjerg I, Rho V, Mac Aonghusa P, Kristiansen M, Segrave-Daly J. Collaborative artificial intelligence system for investigation of healthcare claims compliance. Sci Rep 2024; 14:11884. [PMID: 38789503 PMCID: PMC11126731 DOI: 10.1038/s41598-024-62665-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 05/20/2024] [Indexed: 05/26/2024] Open
Abstract
Healthcare fraud, waste and abuse are costly problems that have huge impact on society. Traditional approaches to identify non-compliant claims rely on auditing strategies requiring trained professionals, or on machine learning methods requiring labelled data and possibly lacking interpretability. We present Clais, a collaborative artificial intelligence system for claims analysis. Clais automatically extracts human-interpretable rules from healthcare policy documents (0.72 F1-score), and it enables professionals to edit and validate the extracted rules through an intuitive user interface. Clais executes the rules on claim records to identify non-compliance: on this task Clais significantly outperforms two baseline machine learning models, and its median F1-score is 1.0 (IQR = 0.83 to 1.0) when executing the extracted rules, and 1.0 (IQR = 1.0 to 1.0) when executing the same rules after human curation. Professionals confirm through a user study the usefulness of Clais in making their workflow simpler and more effective.
Collapse
Affiliation(s)
- Marco Luca Sbodio
- IBM Research Europe, Building 3, IBM Technology Campus, Damastown Industrial Park, Mulhuddart, Dublin 15, Ireland.
| | - Vanessa López
- IBM Research Europe, Building 3, IBM Technology Campus, Damastown Industrial Park, Mulhuddart, Dublin 15, Ireland
| | - Thanh Lam Hoang
- IBM Research Europe, Building 3, IBM Technology Campus, Damastown Industrial Park, Mulhuddart, Dublin 15, Ireland
| | - Theodora Brisimi
- IBM Research Europe, Building 3, IBM Technology Campus, Damastown Industrial Park, Mulhuddart, Dublin 15, Ireland
| | - Gabriele Picco
- IBM Research Europe, Building 3, IBM Technology Campus, Damastown Industrial Park, Mulhuddart, Dublin 15, Ireland
| | - Inge Vejsbjerg
- IBM Research Europe, Building 3, IBM Technology Campus, Damastown Industrial Park, Mulhuddart, Dublin 15, Ireland
| | - Valentina Rho
- IBM Research Europe, Building 3, IBM Technology Campus, Damastown Industrial Park, Mulhuddart, Dublin 15, Ireland
| | - Pol Mac Aonghusa
- IBM Research Europe, Building 3, IBM Technology Campus, Damastown Industrial Park, Mulhuddart, Dublin 15, Ireland
| | - Morten Kristiansen
- IBM Watson Health, Building 3, IBM Technology Campus, Damastown Industrial Park, Mulhuddart, Dublin 15, Ireland
| | - John Segrave-Daly
- IBM Watson Health, Building 3, IBM Technology Campus, Damastown Industrial Park, Mulhuddart, Dublin 15, Ireland
| |
Collapse
|
217
|
Hamar Á, Mohammed D, Váradi A, Herczeg R, Balázsfalvi N, Fülesdi B, László I, Gömöri L, Gergely PA, Kovacs GL, Jáksó K, Gombos K. COVID-19 mortality prediction in Hungarian ICU settings implementing random forest algorithm. Sci Rep 2024; 14:11941. [PMID: 38789490 PMCID: PMC11126653 DOI: 10.1038/s41598-024-62791-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Accepted: 05/19/2024] [Indexed: 05/26/2024] Open
Abstract
The emergence of newer SARS-CoV-2 variants of concern (VOCs) profoundly changed the ICU demography; this shift in the virus's genotype and its correlation to lethality in the ICUs is still not fully investigated. We aimed to survey ICU patients' clinical and laboratory parameters in correlation with SARS-CoV-2 variant genotypes to lethality. 503 COVID-19 ICU patients were included in our study beginning in January 2021 through November 2022 in Hungary. Furthermore, we implemented random forest (RF) as a potential predictor regarding SARS-CoV-2 lethality among 649 ICU patients in two ICU centers. Survival analysis and comparison of hypertension (HT), diabetes mellitus (DM), and vaccination effects were conducted. Logistic regression identified DM as a significant mortality risk factor (OR: 1.55, 95% CI 1.06-2.29, p = 0.025), while HT showed marginal significance. Additionally, vaccination demonstrated protection against mortality (p = 0.028). RF detected lethality with 81.42% accuracy (95% CI 73.01-88.11%, [AUC]: 91.6%), key predictors being PaO2/FiO2 ratio, lymphocyte count, and chest Computed Tomography Severity Score (CTSS). Although a smaller number of patients require ICU treatment among Omicron cases, the likelihood of survival has not proportionately increased for those who are admitted to the ICU. In conclusion, our RF model supports more effective clinical decision-making among ICU COVID-19 patients.
Collapse
Affiliation(s)
- Ágoston Hamar
- Department of Laboratory Medicine, Medical School, University of Pécs, Pécs, Hungary
- Molecular Medicine Research Group, Szentágothai Research Centre, University of Pécs, Pécs, Hungary
| | - Daryan Mohammed
- Molecular Medicine Research Group, Szentágothai Research Centre, University of Pécs, Pécs, Hungary
| | - Alex Váradi
- Molecular Medicine Research Group, Szentágothai Research Centre, University of Pécs, Pécs, Hungary
- Institute of Metagenomics, University of Debrecen, Debrecen, Hungary
| | - Róbert Herczeg
- Molecular Medicine Research Group, Szentágothai Research Centre, University of Pécs, Pécs, Hungary
| | - Norbert Balázsfalvi
- Department of Anaesthesiology and Intensive Care, University of Debrecen, Debrecen, Hungary
| | - Béla Fülesdi
- Department of Anaesthesiology and Intensive Care, University of Debrecen, Debrecen, Hungary
| | - István László
- Department of Anaesthesiology and Intensive Care, University of Debrecen, Debrecen, Hungary
| | - Lídia Gömöri
- Doctoral School of Neuroscience, University of Debrecen, Debrecen, Hungary
| | | | - Gabor Laszlo Kovacs
- Department of Laboratory Medicine, Medical School, University of Pécs, Pécs, Hungary
- Molecular Medicine Research Group, Szentágothai Research Centre, University of Pécs, Pécs, Hungary
| | - Krisztián Jáksó
- Department of Anaesthesiology and Intensive Care, Clinical Centre, University of Pécs, Pécs, Hungary
| | - Katalin Gombos
- Department of Laboratory Medicine, Medical School, University of Pécs, Pécs, Hungary.
- Molecular Medicine Research Group, Szentágothai Research Centre, University of Pécs, Pécs, Hungary.
| |
Collapse
|
218
|
Castanho EN, Aidos H, Madeira SC. Biclustering data analysis: a comprehensive survey. Brief Bioinform 2024; 25:bbae342. [PMID: 39007596 PMCID: PMC11247412 DOI: 10.1093/bib/bbae342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 05/16/2024] [Accepted: 07/01/2024] [Indexed: 07/16/2024] Open
Abstract
Biclustering, the simultaneous clustering of rows and columns of a data matrix, has proved its effectiveness in bioinformatics due to its capacity to produce local instead of global models, evolving from a key technique used in gene expression data analysis into one of the most used approaches for pattern discovery and identification of biological modules, used in both descriptive and predictive learning tasks. This survey presents a comprehensive overview of biclustering. It proposes an updated taxonomy for its fundamental components (bicluster, biclustering solution, biclustering algorithms, and evaluation measures) and applications. We unify scattered concepts in the literature with new definitions to accommodate the diversity of data types (such as tabular, network, and time series data) and the specificities of biological and biomedical data domains. We further propose a pipeline for biclustering data analysis and discuss practical aspects of incorporating biclustering in real-world applications. We highlight prominent application domains, particularly in bioinformatics, and identify typical biclusters to illustrate the analysis output. Moreover, we discuss important aspects to consider when choosing, applying, and evaluating a biclustering algorithm. We also relate biclustering with other data mining tasks (clustering, pattern mining, classification, triclustering, N-way clustering, and graph mining). Thus, it provides theoretical and practical guidance on biclustering data analysis, demonstrating its potential to uncover actionable insights from complex datasets.
Collapse
Affiliation(s)
- Eduardo N Castanho
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| | - Helena Aidos
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| | - Sara C Madeira
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Campo Grande 16, P-1749-016 Lisbon, Portugal
| |
Collapse
|
219
|
Xu N, Xu L, Wang Y, Liu W, Xu W, Hu X, Han ZK. Unraveling the formation of oxygen vacancies on the surface of transition metal-doped ceria utilizing artificial intelligence. NANOSCALE 2024; 16:9853-9860. [PMID: 38712569 DOI: 10.1039/d3nr05950b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Ceria has been extensively utilized in different fields, with surface oxygen vacancies playing a central role. However, versatile oxygen vacancy regulation is still in its infancy. In this work, we propose an effective strategy to manipulate the oxygen vacancy formation energy via transition metal doping by combining first-principles calculations and analytical learning. We elucidate the underlying mechanism driving the formation of oxygen vacancies using combined symbolic regression and data analytics techniques. The results show that the Fermi level of the system and the electronegativity of the dopants are the paramount parameters (features) influencing the formation of oxygen vacancies. These insights not only enhance our understanding of the oxygen vacancy formation mechanism in ceria-based materials to improve their functionality but also potentially lay the groundwork for future strategies in the rational design of other transition metal oxide-based catalysts.
Collapse
Affiliation(s)
- Ning Xu
- Department of Physics, School of Physical Science and Technology, Ningbo University, Ningbo, 315211, China.
- School of Materials Science and Engineering, Zhejiang University, Hangzhou, 310027, China.
| | - Liangliang Xu
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-Ro, Yuseong-Gu, Daejeon 34141, Republic of Korea
| | - Yue Wang
- Department of Electrical Engineering, Hanyang University, Seoul 04763, Republic of Korea
| | - Wen Liu
- School of Materials Science and Engineering, Zhejiang University, Hangzhou, 310027, China.
| | - Wenwu Xu
- Department of Physics, School of Physical Science and Technology, Ningbo University, Ningbo, 315211, China.
| | - Xiaojuan Hu
- School of Materials Science and Engineering, Zhejiang University, Hangzhou, 310027, China.
- Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195 Berlin, Germany.
| | - Zhong-Kang Han
- School of Materials Science and Engineering, Zhejiang University, Hangzhou, 310027, China.
| |
Collapse
|
220
|
Sadeghi MA, Stevens D, Kundu S, Sanghera R, Dagher R, Yedavalli V, Jones C, Sair H, Luna LP. Detecting Alzheimer's Disease Stages and Frontotemporal Dementia in Time Courses of Resting-State fMRI Data Using a Machine Learning Approach. JOURNAL OF IMAGING INFORMATICS IN MEDICINE 2024:10.1007/s10278-024-01101-1. [PMID: 38780666 DOI: 10.1007/s10278-024-01101-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 05/25/2024]
Abstract
Early, accurate diagnosis of neurodegenerative dementia subtypes such as Alzheimer's disease (AD) and frontotemporal dementia (FTD) is crucial for the effectiveness of their treatments. However, distinguishing these conditions becomes challenging when symptoms overlap or the conditions present atypically. Resting-state fMRI (rs-fMRI) studies have demonstrated condition-specific alterations in AD, FTD, and mild cognitive impairment (MCI) compared to healthy controls (HC). Here, we used machine learning to build a diagnostic classification model based on these alterations. We curated all rs-fMRIs and their corresponding clinical information from the ADNI and FTLDNI databases. Imaging data underwent preprocessing, time course extraction, and feature extraction in preparation for the analyses. The imaging features data and clinical variables were fed into gradient-boosted decision trees with fivefold nested cross-validation to build models that classified four groups: AD, FTD, HC, and MCI. The mean and 95% confidence intervals for model performance metrics were calculated using the unseen test sets in the cross-validation rounds. The model built using only imaging features achieved 74.4% mean balanced accuracy, 0.94 mean macro-averaged AUC, and 0.73 mean macro-averaged F1 score. It accurately classified FTD (F1 = 0.99), HC (F1 = 0.99), and MCI (F1 = 0.86) fMRIs but mostly misclassified AD scans as MCI (F1 = 0.08). Adding clinical variables to model inputs raised balanced accuracy to 91.1%, macro-averaged AUC to 0.99, macro-averaged F1 score to 0.92, and improved AD classification accuracy (F1 = 0.74). In conclusion, a multimodal model based on rs-fMRI and clinical data accurately differentiates AD-MCI vs. FTD vs. HC.
Collapse
Affiliation(s)
- Mohammad Amin Sadeghi
- Division of Neuroradiology, Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins Medicine, 600 N Wolfe St, Phipps B100F, Baltimore, MD, 21287, USA
| | - Daniel Stevens
- Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA
| | - Shinjini Kundu
- Division of Neuroradiology, Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins Medicine, 600 N Wolfe St, Phipps B100F, Baltimore, MD, 21287, USA
| | - Rohan Sanghera
- University of Cambridge, School of Clinical Medicine, Cambridge, UK
| | - Richard Dagher
- Division of Neuroradiology, Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins Medicine, 600 N Wolfe St, Phipps B100F, Baltimore, MD, 21287, USA
| | - Vivek Yedavalli
- Division of Neuroradiology, Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins Medicine, 600 N Wolfe St, Phipps B100F, Baltimore, MD, 21287, USA
| | - Craig Jones
- Division of Neuroradiology, Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins Medicine, 600 N Wolfe St, Phipps B100F, Baltimore, MD, 21287, USA
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
- The Malone Center for Engineering in Healthcare, The Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Haris Sair
- Division of Neuroradiology, Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins Medicine, 600 N Wolfe St, Phipps B100F, Baltimore, MD, 21287, USA
- The Malone Center for Engineering in Healthcare, The Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, USA
| | - Licia P Luna
- Division of Neuroradiology, Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins Medicine, 600 N Wolfe St, Phipps B100F, Baltimore, MD, 21287, USA.
| |
Collapse
|
221
|
Raslan E, Alrahmawy MF, Mohammed YA, Tolba AS. Evaluation of data representation techniques for vibration based road surface condition classification. Sci Rep 2024; 14:11620. [PMID: 38773123 PMCID: PMC11109277 DOI: 10.1038/s41598-024-61757-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 05/09/2024] [Indexed: 05/23/2024] Open
Abstract
The accurate classification of road surface conditions plays a vital role in ensuring road safety and effective maintenance. Vibration-based techniques have shown promise in this domain, leveraging the unique vibration signatures generated by vehicles to identify different road conditions. In this study, we focus on utilizing vehicle-mounted vibration sensors to collect road surface vibrations and comparing various data representation techniques for classifying road surface conditions into four classes: normal road surface, potholes, bad road surface, and speedbumps. Our experimental results reveal that the combination of multiple data representation techniques results in higher performance, with an average accuracy of 93.4%. This suggests that the integration of deep neural networks and signal processing techniques can produce a high-level representation better suited for challenging multivariate time series classification issues.
Collapse
Affiliation(s)
- E Raslan
- New Damietta Institute for Engineering & Technology, New Damietta, Egypt.
- Faculty of Computer and Information, Mansoura University, Mansoura, Egypt.
| | - Mohammed F Alrahmawy
- Faculty of Computer and Information, Mansoura University, Mansoura, Egypt
- Faculty of Computer Science & Engineering, New Mansoura University, Gamasa, 35712, Egypt
- University of Economics and Human Sciences, Warsaw, Poland
| | - Y A Mohammed
- New Heliopolis Institute for Engineering & Automotive and Energy Technologies, New Heliopolis, Egypt
| | - A S Tolba
- Faculty of Computer and Information, Mansoura University, Mansoura, Egypt
- New Heliopolis Institute for Engineering & Automotive and Energy Technologies, New Heliopolis, Egypt
| |
Collapse
|
222
|
Pirruccello JP, Di Achille P, Choi SH, Rämö JT, Khurshid S, Nekoui M, Jurgens SJ, Nauffal V, Kany S, Ng K, Friedman SF, Batra P, Lunetta KL, Palotie A, Philippakis AA, Ho JE, Lubitz SA, Ellinor PT. Deep learning of left atrial structure and function provides link to atrial fibrillation risk. Nat Commun 2024; 15:4304. [PMID: 38773065 PMCID: PMC11109224 DOI: 10.1038/s41467-024-48229-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 04/24/2024] [Indexed: 05/23/2024] Open
Abstract
Increased left atrial volume and decreased left atrial function have long been associated with atrial fibrillation. The availability of large-scale cardiac magnetic resonance imaging data paired with genetic data provides a unique opportunity to assess the genetic contributions to left atrial structure and function, and understand their relationship with risk for atrial fibrillation. Here, we use deep learning and surface reconstruction models to measure left atrial minimum volume, maximum volume, stroke volume, and emptying fraction in 40,558 UK Biobank participants. In a genome-wide association study of 35,049 participants without pre-existing cardiovascular disease, we identify 20 common genetic loci associated with left atrial structure and function. We find that polygenic contributions to increased left atrial volume are associated with atrial fibrillation and its downstream consequences, including stroke. Through Mendelian randomization, we find evidence supporting a causal role for left atrial enlargement and dysfunction on atrial fibrillation risk.
Collapse
Affiliation(s)
- James P Pirruccello
- Division of Cardiology, University of California San Francisco, San Francisco, CA, USA.
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA.
- Bakar Computational Health Sciences Institute, University of California San Francisco, San Francisco, CA, USA.
- Cardiovascular Genetics Center, University of California San Francisco, San Francisco, CA, USA.
| | - Paolo Di Achille
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Seung Hoan Choi
- Cardiovascular Disease Initiative, Broad Institute, Cambridge, MA, USA
| | - Joel T Rämö
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
| | - Shaan Khurshid
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiology Division, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Mahan Nekoui
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Sean J Jurgens
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Experimental Cardiology, Amsterdam UMC, University of Amsterdam, Amsterdam, NL, Netherlands
- Amsterdam Cardiovascular Sciences, Heart Failure & Arrhythmias, University of Amsterdam, Amsterdam, NL, Netherlands
| | - Victor Nauffal
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Cardiovascular Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Shinwan Kany
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Cardiology, University Heart and Vascular Center Hamburg-Eppendorf, Hamburg, Germany
| | | | - Samuel F Friedman
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Puneet Batra
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Kathryn L Lunetta
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Aarno Palotie
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
- Analytic and Translational Genetics Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Boston, MA, USA
| | | | - Jennifer E Ho
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
- CardioVascular Institute, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Steven A Lubitz
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiology Division, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Patrick T Ellinor
- Cardiovascular Disease Initiative, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiology Division, Massachusetts General Hospital, Boston, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| |
Collapse
|
223
|
Franklin G, Stephens R, Piracha M, Tiosano S, Lehouillier F, Koppel R, Elkin PL. The Sociodemographic Biases in Machine Learning Algorithms: A Biomedical Informatics Perspective. Life (Basel) 2024; 14:652. [PMID: 38929638 PMCID: PMC11204917 DOI: 10.3390/life14060652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 04/24/2024] [Accepted: 04/26/2024] [Indexed: 06/28/2024] Open
Abstract
Artificial intelligence models represented in machine learning algorithms are promising tools for risk assessment used to guide clinical and other health care decisions. Machine learning algorithms, however, may house biases that propagate stereotypes, inequities, and discrimination that contribute to socioeconomic health care disparities. The biases include those related to some sociodemographic characteristics such as race, ethnicity, gender, age, insurance, and socioeconomic status from the use of erroneous electronic health record data. Additionally, there is concern that training data and algorithmic biases in large language models pose potential drawbacks. These biases affect the lives and livelihoods of a significant percentage of the population in the United States and globally. The social and economic consequences of the associated backlash cannot be underestimated. Here, we outline some of the sociodemographic, training data, and algorithmic biases that undermine sound health care risk assessment and medical decision-making that should be addressed in the health care system. We present a perspective and overview of these biases by gender, race, ethnicity, age, historically marginalized communities, algorithmic bias, biased evaluations, implicit bias, selection/sampling bias, socioeconomic status biases, biased data distributions, cultural biases and insurance status bias, conformation bias, information bias and anchoring biases and make recommendations to improve large language model training data, including de-biasing techniques such as counterfactual role-reversed sentences during knowledge distillation, fine-tuning, prefix attachment at training time, the use of toxicity classifiers, retrieval augmented generation and algorithmic modification to mitigate the biases moving forward.
Collapse
Affiliation(s)
- Gillian Franklin
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
- Department of Veterans Affairs, Knowledge Based Systems and Western New York, Veterans Affairs, Buffalo, NY 14215, USA
| | - Rachel Stephens
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
| | - Muhammad Piracha
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
| | - Shmuel Tiosano
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
| | - Frank Lehouillier
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
- Department of Veterans Affairs, Knowledge Based Systems and Western New York, Veterans Affairs, Buffalo, NY 14215, USA
| | - Ross Koppel
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
- Institute for Biomedical Informatics, Perelman School of Medicine, and Sociology Department, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Peter L. Elkin
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
- Department of Veterans Affairs, Knowledge Based Systems and Western New York, Veterans Affairs, Buffalo, NY 14215, USA
| |
Collapse
|
224
|
Dong X, Zhao C, Song X, Zhang L, Liu Y, Wu J, Xu Y, Xu N, Liu J, Yu H, Yang K, Zhou X. PresRecST: a novel herbal prescription recommendation algorithm for real-world patients with integration of syndrome differentiation and treatment planning. J Am Med Inform Assoc 2024; 31:1268-1279. [PMID: 38598532 PMCID: PMC11105127 DOI: 10.1093/jamia/ocae066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 02/21/2024] [Accepted: 03/13/2024] [Indexed: 04/12/2024] Open
Abstract
OBJECTIVES Herbal prescription recommendation (HPR) is a hot topic and challenging issue in field of clinical decision support of traditional Chinese medicine (TCM). However, almost all previous HPR methods have not adhered to the clinical principles of syndrome differentiation and treatment planning of TCM, which has resulted in suboptimal performance and difficulties in application to real-world clinical scenarios. MATERIALS AND METHODS We emphasize the synergy among diagnosis and treatment procedure in real-world TCM clinical settings to propose the PresRecST model, which effectively combines the key components of symptom collection, syndrome differentiation, treatment method determination, and herb recommendation. This model integrates a self-curated TCM knowledge graph to learn the high-quality representations of TCM biomedical entities and performs 3 stages of clinical predictions to meet the principle of systematic sequential procedure of TCM decision making. RESULTS To address the limitations of previous datasets, we constructed the TCM-Lung dataset, which is suitable for the simultaneous training of the syndrome differentiation, treatment method determination, and herb recommendation. Overall experimental results on 2 datasets demonstrate that the proposed PresRecST outperforms the state-of-the-art algorithm by significant improvements (eg, improvements of P@5 by 4.70%, P@10 by 5.37%, P@20 by 3.08% compared with the best baseline). DISCUSSION The workflow of PresRecST effectively integrates the embedding vectors of the knowledge graph for progressive recommendation tasks, and it closely aligns with the actual diagnostic and treatment procedures followed by TCM doctors. A series of ablation experiments and case study show the availability and interpretability of PresRecST, indicating the proposed PresRecST can be beneficial for assisting the diagnosis and treatment in real-world TCM clinical settings. CONCLUSION Our technology can be applied in a progressive recommendation scenario, providing recommendations for related items in a progressive manner, which can assist in providing more reliable diagnoses and herbal therapies for TCM clinical task.
Collapse
Affiliation(s)
- Xin Dong
- Beijing Key Lab of Traffic Data Analysis and Mining, Institute of Medical Intelligence, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Chenxi Zhao
- Beijing Key Lab of Traffic Data Analysis and Mining, Institute of Medical Intelligence, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Xinpeng Song
- Beijing Key Lab of Traffic Data Analysis and Mining, Institute of Medical Intelligence, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Lei Zhang
- National Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Yu Liu
- Beijing Key Lab of Traffic Data Analysis and Mining, Institute of Medical Intelligence, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Jun Wu
- Beijing Key Lab of Traffic Data Analysis and Mining, Institute of Medical Intelligence, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Yiran Xu
- Department of Computer Science, Cornell University, New York, NY 14853, United States
| | - Ning Xu
- National Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Jialing Liu
- Beijing Key Lab of Traffic Data Analysis and Mining, Institute of Medical Intelligence, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Haibin Yu
- The First Affiliated Hospital, Henan University of Chinese Medicine, Zhengzhou 450000, China
- Collaborative Innovation Center for Chinese Medicine and Respiratory Diseases Co-Constructed by Henan Province & Education Ministry of P.R. China, Henan University of Chinese Medicine, Zhengzhou 450046, China
| | - Kuo Yang
- Beijing Key Lab of Traffic Data Analysis and Mining, Institute of Medical Intelligence, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Xuezhong Zhou
- Beijing Key Lab of Traffic Data Analysis and Mining, Institute of Medical Intelligence, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China
| |
Collapse
|
225
|
Liu C, Cao B, Zhang J. s-TBN: A New Neural Decoding Model to Identify Stimulus Categories From Brain Activity Patterns. IEEE Trans Neural Syst Rehabil Eng 2024; 32:1934-1943. [PMID: 38722722 DOI: 10.1109/tnsre.2024.3399191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/21/2024]
Abstract
Neural decoding is still a challenging and a hot topic in neurocomputing science. Recently, many studies have shown that brain network patterns containing rich spatiotemporal structural information represent the brain's activation information under external stimuli. In the traditional method, brain network features are directly obtained using the standard machine learning method and provide to a classifier, subsequently decoding external stimuli. However, this method cannot effectively extract the multidimensional structural information hidden in the brain network. Furthermore, studies on tensors have show that the tensor decomposition model can fully mine unique spatiotemporal structural characteristics of a spatiotemporal structure in data with a multidimensional structure. This research proposed a stimulus-constrained Tensor Brain Network (s-TBN) model that involves the tensor decomposition and stimulus category-constraint information. The model was verified on real neuroimaging data obtained via magnetoencephalograph and functional mangetic resonance imaging). Experimental results show that the s-TBN model achieve accuracy matrices of greater than 11.06% and 18.46% on the accuracy matrix compared with other methods on two modal datasets. These results prove the superiority of extracting discriminative characteristics using the STN model, especially for decoding object stimuli with semantic information.
Collapse
|
226
|
Lin J, Hong B, Cai Z, Lu P, Lin K. MASMDDI: multi-layer adaptive soft-mask graph neural network for drug-drug interaction prediction. Front Pharmacol 2024; 15:1369403. [PMID: 38831885 PMCID: PMC11144894 DOI: 10.3389/fphar.2024.1369403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2024] [Accepted: 04/23/2024] [Indexed: 06/05/2024] Open
Abstract
Accurately predicting Drug-Drug Interaction (DDI) is a critical and challenging aspect of the drug discovery process, particularly in preventing adverse reactions in patients undergoing combination therapy. However, current DDI prediction methods often overlook the interaction information between chemical substructures of drugs, focusing solely on the interaction information between drugs and failing to capture sufficient chemical substructure details. To address this limitation, we introduce a novel DDI prediction method: Multi-layer Adaptive Soft Mask Graph Neural Network (MASMDDI). Specifically, we first design a multi-layer adaptive soft mask graph neural network to extract substructures from molecular graphs. Second, we employ an attention mechanism to mine substructure feature information and update latent features. In this process, to optimize the final feature representation, we decompose drug-drug interactions into pairwise interaction correlations between the core substructures of each drug. Third, we use these features to predict the interaction probabilities of DDI tuples and evaluate the model using real-world datasets. Experimental results demonstrate that the proposed model outperforms state-of-the-art methods in DDI prediction. Furthermore, MASMDDI exhibits excellent performance in predicting DDIs of unknown drugs in two tasks that are more aligned with real-world scenarios. In particular, in the transductive scenario using the DrugBank dataset, the ACC and AUROC and AUPRC scores of MASMDDI are 0.9596, 0.9903, and 0.9894, which are 2% higher than the best performing baseline.
Collapse
Affiliation(s)
- Junpeng Lin
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, China
| | - Binsheng Hong
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, China
| | - Zhongqi Cai
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, China
| | - Ping Lu
- School of Economics and Management, Xiamen University of Technology, Xiamen, China
| | - Kaibiao Lin
- School of Computer and Information Engineering, Xiamen University of Technology, Xiamen, China
| |
Collapse
|
227
|
Zhang J, Zhang M, Yu Y, Yu R. An innovative method integrating run theory and DBSCAN for complete three-dimensional drought structures. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 926:171901. [PMID: 38521270 DOI: 10.1016/j.scitotenv.2024.171901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Revised: 03/01/2024] [Accepted: 03/20/2024] [Indexed: 03/25/2024]
Abstract
Drought displays dynamic and uncertain spatiotemporal characteristics, thus it is typically not confined to fixed temporal-spatial boundaries. Existing drought clustering methods often involve spatially clustering drought points or grids into patches, subsequently connected over time to form three-dimensional structures. Despite this process being able to extract three-dimensional drought clusters, it is likely to overlook mild or relatively small, isolated drought patches. To overcome this limitation, this paper presented an effective method (named STD-CLUSTER) for identifying drought clusters with complete three-dimensional structures. The method initially employed run theory to extract drought events as "lines" and subsequently clustered these events using the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. A case study on the 2006 flash drought in the Yangtze River Basin demonstrated that STD-CLUSTER successfully clustered drought events and ensured the integrity of drought clusters by considering small, isolated, or disconnected patches. Additionally, an in-depth analysis using STD-CLUSTER examined seasonal drought events in China from 1991 to 2022, identifying a total of 35 drought clusters. These clusters began and ended with small-area patches, exhibiting features of expansion, contraction, spread, merging, and splitting over time. Furthermore, seasonal changes significantly influenced the evolution of drought clusters, with affected area and severity increasing in spring and summer and decreasing in autumn and winter. The applicability of the proposed method extends beyond various geographical regions and time scales, providing effective support for comprehensively investigating the spatiotemporal evolution of drought.
Collapse
Affiliation(s)
- Jing Zhang
- Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Min Zhang
- Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| | - Yang Yu
- Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ruide Yu
- Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
228
|
Lencastre P, Lotfigolian M, Lind PG. Identifying Autism Gaze Patterns in Five-Second Data Records. Diagnostics (Basel) 2024; 14:1047. [PMID: 38786345 PMCID: PMC11119316 DOI: 10.3390/diagnostics14101047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Revised: 05/15/2024] [Accepted: 05/16/2024] [Indexed: 05/25/2024] Open
Abstract
One of the most challenging problems when diagnosing autism spectrum disorder (ASD) is the need for long sets of data. Collecting data during such long periods is challenging, particularly when dealing with children. This challenge motivates the investigation of possible classifiers of ASD that do not need such long data sets. In this paper, we use eye-tracking data sets covering only 5 s and introduce one metric able to distinguish between ASD and typically developed (TD) gaze patterns based on such short time-series and compare it with two benchmarks, one using the traditional eye-tracking metrics and one state-of-the-art AI classifier. Although the data can only track possible disorders in visual attention and our approach is not a substitute to medical diagnosis, we find that our newly introduced metric can achieve an accuracy of 93% in classifying eye gaze trajectories from children with ASD surpassing both benchmarks while needing fewer data. The classification accuracy of our method, using a 5 s data series, performs better than the standard metrics in eye-tracking and is at the level of the best AI benchmarks, even when these are trained with longer time series. We also discuss the advantages and limitations of our method in comparison with the state of the art: besides needing a low amount of data, this method is a simple, understandable, and straightforward criterion to apply, which often contrasts with "black box" AI methods.
Collapse
Affiliation(s)
- Pedro Lencastre
- Department of Computer Science, Oslo Metropolitan University, N-0130 Oslo, Norway (P.G.L.)
- OsloMet Artificial Intelligence Lab, Pilestredet 52, N-0166 Oslo, Norway
- NordSTAR—Nordic Center for Sustainable and Trustworthy AI Research, Pilestredet 52, N-0166 Oslo, Norway
| | - Maryam Lotfigolian
- Department of Computer Science, Oslo Metropolitan University, N-0130 Oslo, Norway (P.G.L.)
| | - Pedro G. Lind
- Department of Computer Science, Oslo Metropolitan University, N-0130 Oslo, Norway (P.G.L.)
- OsloMet Artificial Intelligence Lab, Pilestredet 52, N-0166 Oslo, Norway
- NordSTAR—Nordic Center for Sustainable and Trustworthy AI Research, Pilestredet 52, N-0166 Oslo, Norway
- Simula Research Laboratory, Numerical Analysis and Scientific Computing, N-0164 Oslo, Norway
| |
Collapse
|
229
|
Eddy E, Campbell E, Bateman S, Scheme E. Understanding the influence of confounding factors in myoelectric control for discrete gesture recognition. J Neural Eng 2024; 21:036015. [PMID: 38722304 DOI: 10.1088/1741-2552/ad4915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2024] [Accepted: 05/09/2024] [Indexed: 05/18/2024]
Abstract
Discrete myoelectric control-based gesture recognition has recently gained interest as a possible input modality for many emerging ubiquitous computing applications. Unlike the continuous control commonly employed in powered prostheses, discrete systems seek to recognize the dynamic sequences associated with gestures to generate event-based inputs. More akin to those used in general-purpose human-computer interaction, these could include, for example, a flick of the wrist to dismiss a phone call or a double tap of the index finger and thumb to silence an alarm. Moelectric control systems have been shown to achieve near-perfect classification accuracy, but in highly constrained offline settings. Real-world, online systems are subject to 'confounding factors' (i.e. factors that hinder the real-world robustness of myoelectric control that are not accounted for during typical offline analyses), which inevitably degrade system performance, limiting their practical use. Although these factors have been widely studied in continuous prosthesis control, there has been little exploration of their impacts on discrete myoelectric control systems for emerging applications and use cases. Correspondingly, this work examines, for the first time, three confounding factors and their effect on the robustness of discrete myoelectric control: (1)limb position variability, (2)cross-day use, and a newly identified confound faced by discrete systems (3)gesture elicitation speed. Results from four different discrete myoelectric control architectures: (1) Majority Vote LDA, (2) Dynamic Time Warping, (3) an LSTM network trained with Cross Entropy, and (4) an LSTM network trained with Contrastive Learning, show that classification accuracy is significantly degraded (p<0.05) as a result of each of these confounds. This work establishes that confounding factors are a critical barrier that must be addressed to enable the real-world adoption of discrete myoelectric control for robust and reliable gesture recognition.
Collapse
Affiliation(s)
- Ethan Eddy
- University of New Brunswick, Fredericton, NB E3B 5A3, Canada
| | - Evan Campbell
- University of New Brunswick, Fredericton, NB E3B 5A3, Canada
| | - Scott Bateman
- University of New Brunswick, Fredericton, NB E3B 5A3, Canada
| | - Erik Scheme
- University of New Brunswick, Fredericton, NB E3B 5A3, Canada
| |
Collapse
|
230
|
Kopell BH, Kaji DA, Liharska LE, Vornholt E, Valentine A, Lund A, Hashemi A, Thompson RC, Lohrenz T, Johnson JS, Bussola N, Cheng E, Park YJ, Shah P, Ma W, Searfoss R, Qasim S, Miller GM, Chand NM, Aristel A, Humphrey J, Wilkins L, Ziafat K, Silk H, Linares LM, Sullivan B, Feng C, Batten SR, Bang D, Barbosa LS, Twomey T, White JP, Vannucci M, Hadj-Amar B, Cohen V, Kota P, Moya E, Rieder MK, Figee M, Nadkarni GN, Breen MS, Kishida KT, Scarpa J, Ruderfer DM, Narain NR, Wang P, Kiebish MA, Schadt EE, Saez I, Montague PR, Beckmann ND, Charney AW. Multiomic foundations of human prefrontal cortex tissue function. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.05.17.24307537. [PMID: 38798344 PMCID: PMC11118644 DOI: 10.1101/2024.05.17.24307537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
The prefrontal cortex (PFC) is a region of the brain that in humans is involved in the production of higher-order functions such as cognition, emotion, perception, and behavior. Neurotransmission in the PFC produces higher-order functions by integrating information from other areas of the brain. At the foundation of neurotransmission, and by extension at the foundation of higher-order brain functions, are an untold number of coordinated molecular processes involving the DNA sequence variants in the genome, RNA transcripts in the transcriptome, and proteins in the proteome. These "multiomic" foundations are poorly understood in humans, perhaps in part because most modern studies that characterize the molecular state of the human PFC use tissue obtained when neurotransmission and higher-order brain functions have ceased (i.e., the postmortem state). Here, analyses are presented on data generated for the Living Brain Project (LBP) to investigate whether PFC tissue from individuals with intact higher-order brain function has characteristic multiomic foundations. Two complementary strategies were employed towards this end. The first strategy was to identify in PFC samples obtained from living study participants a signature of RNA transcript expression associated with neurotransmission measured intracranially at the time of PFC sampling, in some cases while participants performed a task engaging higher-order brain functions. The second strategy was to perform multiomic comparisons between PFC samples obtained from individuals with intact higher-order brain function at the time of sampling (i.e., living study participants) and PFC samples obtained in the postmortem state. RNA transcript expression within multiple PFC cell types was associated with fluctuations of dopaminergic, serotonergic, and/or noradrenergic neurotransmission in the substantia nigra measured while participants played a computer game that engaged higher-order brain functions. A subset of these associations - termed the "transcriptional program associated with neurotransmission" (TPAWN) - were reproduced in analyses of brain RNA transcript expression and intracranial neurotransmission data obtained from a second LBP cohort and from a cohort in an independent study. RNA transcripts involved in TPAWN were found to be (1) enriched for RNA transcripts associated with measures of neurotransmission in rodent and cell models, (2) enriched for RNA transcripts encoded by evolutionarily constrained genes, (3) depleted of RNA transcripts regulated by common DNA sequence variants, and (4) enriched for RNA transcripts implicated in higher-order brain functions by human population genetic studies. In PFC excitatory neurons of living study participants, higher expression of the genes in TPAWN tracked with higher expression of RNA transcripts that in rodent PFC samples are markers of a class of excitatory neurons that connect the PFC to deep brain structures. TPAWN was further reproduced by RNA transcript expression patterns differentiating living PFC samples from postmortem PFC samples, and significant differences between living and postmortem PFC samples were additionally observed with respect to (1) the expression of most primary RNA transcripts, mature RNA transcripts, and proteins, (2) the splicing of most primary RNA transcripts into mature RNA transcripts, (3) the patterns of co-expression between RNA transcripts and proteins, and (4) the effects of some DNA sequence variants on RNA transcript and protein expression. Taken together, this report highlights that studies of brain tissue obtained in a safe and ethical manner from large cohorts of living individuals can help advance understanding of the multiomic foundations of brain function.
Collapse
|
231
|
Wen J, Gabrys B, Musial K. Evolutionary Digital Twin-Oriented Complex Networked Systems driven by node features and the mutation of feature preferences. PLoS One 2024; 19:e0303571. [PMID: 38753719 PMCID: PMC11098356 DOI: 10.1371/journal.pone.0303571] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Accepted: 04/26/2024] [Indexed: 05/18/2024] Open
Abstract
Accurate modelling of complex social systems, where people interact with each other and those interactions change over time, has been a research challenge for many years. This study proposes an evolutionary Digital Twin-Oriented Complex Networked System (DT-CNS) framework that considers heterogeneous node features and changeable connection preferences. We create heterogeneous preference mutation mechanisms to characterise nodes' adaptive decisions on preference mutation in response to interaction patterns and epidemic risks. In this space, we use nodes' interaction utilities to characterise the positive feedback from interactions and negative impact of epidemic risks. We also introduce social capital constraint to harness the density of social connections better. The nodes' heterogeneous preference mutation styles include the (i)inactive style that keeps initial social preferences, (ii) ignorant style that randomly mutates preferences, (iii) egocentric style that optimises individual interaction utility, (iv) cooperative style that optimises the total interaction utilities by group decisions and (v) collaborative style that further allows the cooperative nodes to transfer social capital. Our simulation experiments on evolutionary DT-CNSs reveal that heterogeneous preference mutation styles lead to various interaction and infection patterns. The results also show that (i) increasing social capital enables higher interactions but higher infection risks and uncertainty in decision-making; (ii) group decisions outperform individual decisions by eliminating the unawareness of the decisions of other nodes; (iii) the collaborative nodes under a strict social capital limit can promote interactions, reduce infection risks and achieve higher overall interaction utilities.
Collapse
Affiliation(s)
- Jiaqi Wen
- Complex Adaptive Systems, Data Science Institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Bogdan Gabrys
- Complex Adaptive Systems, Data Science Institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Katarzyna Musial
- Complex Adaptive Systems, Data Science Institute, University of Technology Sydney, Sydney, NSW, Australia
| |
Collapse
|
232
|
Kovacs KD, Beres B, Kanyo N, Szabó B, Peter B, Bősze S, Szekacs I, Horvath R. Single-cell classification based on label-free high-resolution optical data of cell adhesion kinetics. Sci Rep 2024; 14:11231. [PMID: 38755203 PMCID: PMC11099063 DOI: 10.1038/s41598-024-61257-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Accepted: 05/03/2024] [Indexed: 05/18/2024] Open
Abstract
Selecting and isolating various cell types is a critical procedure in many applications, including immune therapy, regenerative medicine, and cancer research. Usually, these selection processes involve some labeling or another invasive step potentially affecting cellular functionality or damaging the cell. In the current proof of principle study, we first introduce an optical biosensor-based method capable of classification between healthy and numerous cancerous cell types in a label-free setup. We present high classification accuracy based on the monitored single-cell adhesion kinetic signals. We developed a high-throughput data processing pipeline to build a benchmark database of ~ 4500 single-cell adhesion measurements of a normal preosteoblast (MC3T3-E1) and various cancer (HeLa, LCLC-103H, MDA-MB-231, MCF-7) cell types. Several datasets were used with different cell-type selections to test the performance of deep learning-based classification models, reaching above 70-80% depending on the classification task. Beyond testing these models, we aimed to draw interpretable biological insights from their results; thus, we applied a deep neural network visualization method (grad-CAM) to reveal the basis on which these complex models made their decisions. Our proof-of-concept work demonstrated the success of a deep neural network using merely label-free adhesion kinetic data to classify single mammalian cells into different cell types. We propose our method for label-free single-cell profiling and in vitro cancer research involving adhesion. The employed label-free measurement is noninvasive and does not affect cellular functionality. Therefore, it could also be adapted for applications where the selected cells need further processing, such as immune therapy and regenerative medicine.
Collapse
Affiliation(s)
- Kinga Dora Kovacs
- Nanobiosensorics Laboratory, Institute of Technical Physics and Materials Science MFA, HUN-REN Centre for Energy Research, Konkoly-Thege út 29-33, 1121, Budapest, Hungary
- Department of Biological Physics, Eötvös University, Budapest, Hungary
| | - Balint Beres
- Nanobiosensorics Laboratory, Institute of Technical Physics and Materials Science MFA, HUN-REN Centre for Energy Research, Konkoly-Thege út 29-33, 1121, Budapest, Hungary
- Department of Automation and Applied Informatics, Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, Műegyetem Rkp. 3., 1111, Budapest, Hungary
| | - Nicolett Kanyo
- Nanobiosensorics Laboratory, Institute of Technical Physics and Materials Science MFA, HUN-REN Centre for Energy Research, Konkoly-Thege út 29-33, 1121, Budapest, Hungary
| | - Balint Szabó
- Department of Biological Physics, Eötvös University, Budapest, Hungary
- Cellsorter Kft., Budapest, Hungary
| | - Beatrix Peter
- Nanobiosensorics Laboratory, Institute of Technical Physics and Materials Science MFA, HUN-REN Centre for Energy Research, Konkoly-Thege út 29-33, 1121, Budapest, Hungary
| | - Szilvia Bősze
- HUN-REN-ELTE Research Group of Peptide Chemistry, Hungarian Research Network, Eötvös Loránd University, 1117, Budapest, Hungary
| | - Inna Szekacs
- Nanobiosensorics Laboratory, Institute of Technical Physics and Materials Science MFA, HUN-REN Centre for Energy Research, Konkoly-Thege út 29-33, 1121, Budapest, Hungary
| | - Robert Horvath
- Nanobiosensorics Laboratory, Institute of Technical Physics and Materials Science MFA, HUN-REN Centre for Energy Research, Konkoly-Thege út 29-33, 1121, Budapest, Hungary.
| |
Collapse
|
233
|
Adediran GA, Cox R, Jürgens MD, Morel E, Cross R, Carter H, Pereira MG, Read DS, Johnson AC. Fate and behaviour of Microplastics (> 25µm) within the water distribution network, from water treatment works to service reservoirs and customer taps. WATER RESEARCH 2024; 255:121508. [PMID: 38552487 DOI: 10.1016/j.watres.2024.121508] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 03/11/2024] [Accepted: 03/22/2024] [Indexed: 04/24/2024]
Abstract
Water treatment works have previously shown high efficiency in removing microplastics > 25 µm from raw source water. However, what is less well known is the extent to which microplastics of this size class are generated or lost within the water distribution network, particularly whether there is a greater presence in the customer tap than in the water treatment works outlet. This study focused on the presence of 21 different types of synthetic polymer particles with sizes larger than 25 µm examined through multiple rounds of sampling at outlets of water treatment works (WTW), service reservoirs (SR), and customer taps (CT) managed by seven different water companies in Britain. Nineteen different types of polymers were detected; their signature and concentration varied based on the round of sampling, the location within the water supply network, and the water company responsible for managing the supply. Among the polymers examined, polyamide (PA), polyethene terephthalate (PET), polypropylene (PP), and polystyrene (PS) were the most commonly found. Apart from PET having its highest concentration of 0.0189 microplastic per litre (MP/L) in the SR, the concentrations of the other three most frequent polymers (PS = 0.017 MP/L, PA = 0.0752 MP/L, PP= 0.1513 MP/L) were highest in the CT. The overall prevalence of this size of microplastics in the network is low, but there was a high variability of polymer types and occurrences. These spatial and temporal variations suggested that the MP in the distribution network may exist as a series of pulses. Given the presence and polymer types, the potential for some of the microplastics to originate from materials used in the water network and domestic plumbing systems cannot be ruled out. As found before, the absolute number of microplastics in the water distribution network remained extremely low.
Collapse
Affiliation(s)
- Gbotemi A Adediran
- UK Centre for Ecology & Hydrology, Wallingford, Oxfordshire OX10 8BB, UK.
| | - Ruairidh Cox
- UK Centre for Ecology & Hydrology, Wallingford, Oxfordshire OX10 8BB, UK
| | - Monika D Jürgens
- UK Centre for Ecology & Hydrology, Wallingford, Oxfordshire OX10 8BB, UK
| | - Elise Morel
- UK Centre for Ecology & Hydrology, Wallingford, Oxfordshire OX10 8BB, UK
| | - Richard Cross
- UK Centre for Ecology & Hydrology, Wallingford, Oxfordshire OX10 8BB, UK
| | - Heather Carter
- UK Centre for Ecology & Hydrology, Lancaster Environment Centre, Library Avenue, Bailrigg, Lancaster LA1 4AP, UK
| | - M Glória Pereira
- UK Centre for Ecology & Hydrology, Lancaster Environment Centre, Library Avenue, Bailrigg, Lancaster LA1 4AP, UK
| | - Daniel S Read
- UK Centre for Ecology & Hydrology, Wallingford, Oxfordshire OX10 8BB, UK
| | - Andrew C Johnson
- UK Centre for Ecology & Hydrology, Wallingford, Oxfordshire OX10 8BB, UK
| |
Collapse
|
234
|
Cortesi M, Giordano E. Driving cell response through deep learning, a study in simulated 3D cell cultures. Heliyon 2024; 10:e29395. [PMID: 38699000 PMCID: PMC11063986 DOI: 10.1016/j.heliyon.2024.e29395] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Revised: 03/27/2024] [Accepted: 04/08/2024] [Indexed: 05/05/2024] Open
Abstract
Computational simulations are becoming increasingly relevant in biomedical research, providing strategies to reproduce experimental results, improve the resolution of in-vitro experiments, and predict the system's behavior in untested conditions. Their use to determine the features associated with an extensive response to treatment and optimize treatment schedules has, however received little attention. To bridge this gap, we propose a deep learning framework capable of reliably classifying simulated time series data and identifying class-defining features. This information will be shown to be useful for the determination of which changes in treatment schedule elicit a more extensive cellular response. This analysis pipeline will be initially tested on a synthetic dataset created ad-hoc to identify its accuracy in identifying the most relevant portion of the signals. Successively this method will be applied to simulations describing the behaviors of populations of cancer cells treated with either one or two drugs in different concentrations. The proposed method will be shown to be effective in identifying which changes in the treatment protocol lead to a more extensive response to treatment. While lacking direct experimental validation, this result holds great potential for the integration of in-silico and in-vitro analyses and the effective optimization of experimental conditions in complex experimental setups.
Collapse
Affiliation(s)
- Marilisa Cortesi
- Department of Electrical, Electronic and Information Engineering ”G.Marconi” (DEI), Alma Mater Studiorum – University of Bologna, via dell'Università 50, Cesena, 47521, FC, Italy
- Gynaecological Cancer Research Group, School of Clinical Medicine, University of New South Wales, High Street, Kensington, 2033, NSW, Australia
| | - Emanuele Giordano
- Department of Electrical, Electronic and Information Engineering ”G.Marconi” (DEI), Alma Mater Studiorum – University of Bologna, via dell'Università 50, Cesena, 47521, FC, Italy
| |
Collapse
|
235
|
Zhang X, Teng X, Zhang J, Lai Q, Cai J. Enhancing pathological complete response prediction in breast cancer: the role of dynamic characterization of DCE-MRI and its association with tumor heterogeneity. Breast Cancer Res 2024; 26:77. [PMID: 38745321 PMCID: PMC11094888 DOI: 10.1186/s13058-024-01836-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 05/07/2024] [Indexed: 05/16/2024] Open
Abstract
BACKGROUND Early prediction of pathological complete response (pCR) is important for deciding appropriate treatment strategies for patients. In this study, we aimed to quantify the dynamic characteristics of dynamic contrast-enhanced magnetic resonance images (DCE-MRI) and investigate its value to improve pCR prediction as well as its association with tumor heterogeneity in breast cancer patients. METHODS The DCE-MRI, clinicopathologic record, and full transcriptomic data of 785 breast cancer patients receiving neoadjuvant chemotherapy were retrospectively included from a public dataset. Dynamic features of DCE-MRI were computed from extracted phase-varying radiomic feature series using 22 CAnonical Time-sereis CHaracteristics. Dynamic model and radiomic model were developed by logistic regression using dynamic features and traditional radiomic features respectively. Various combined models with clinical factors were also developed to find the optimal combination and the significance of each components was evaluated. All the models were evaluated in independent test set in terms of area under receiver operating characteristic curve (AUC). To explore the potential underlying biological mechanisms, radiogenomic analysis was implemented on patient subgroups stratified by dynamic model to identify differentially expressed genes (DEGs) and enriched pathways. RESULTS A 10-feature dynamic model and a 4-feature radiomic model were developed (AUC = 0.688, 95%CI: 0.635-0.741 and AUC = 0.650, 95%CI: 0.595-0.705) and tested (AUC = 0.686, 95%CI: 0.594-0.778 and AUC = 0.626, 95%CI: 0.529-0.722), with the dynamic model showing slightly higher AUC (train p = 0.181, test p = 0.222). The combined model of clinical, radiomic, and dynamic achieved the highest AUC in pCR prediction (train: 0.769, 95%CI: 0.722-0.816 and test: 0.762, 95%CI: 0.679-0.845). Compared with clinical-radiomic combined model (train AUC = 0.716, 95%CI: 0.665-0.767 and test AUC = 0.695, 95%CI: 0.656-0.714), adding the dynamic component brought significant improvement in model performance (train p < 0.001 and test p = 0.005). Radiogenomic analysis identified 297 DEGs, including CXCL9, CCL18, and HLA-DPB1 which are known to be associated with breast cancer prognosis or angiogenesis. Gene set enrichment analysis further revealed enrichment of gene ontology terms and pathways related to immune system. CONCLUSION Dynamic characteristics of DCE-MRI were quantified and used to develop dynamic model for improving pCR prediction in breast cancer patients. The dynamic model was associated with tumor heterogeniety in prognostic-related gene expression and immune-related pathways.
Collapse
Affiliation(s)
- Xinyu Zhang
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Xinzhi Teng
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Jiang Zhang
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Qingpei Lai
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China
| | - Jing Cai
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China.
- The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, China.
| |
Collapse
|
236
|
García Sánchez N, Ugarte Carro E, Prieto-Santamaría L, Rodríguez-González A. Protein sequence analysis in the context of drug repurposing. BMC Med Inform Decis Mak 2024; 24:122. [PMID: 38741115 DOI: 10.1186/s12911-024-02531-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/08/2024] [Indexed: 05/16/2024] Open
Abstract
MOTIVATION Drug repurposing speeds up the development of new treatments, being less costly, risky, and time consuming than de novo drug discovery. There are numerous biological elements that contribute to the development of diseases and, as a result, to the repurposing of drugs. METHODS In this article, we analysed the potential role of protein sequences in drug repurposing scenarios. For this purpose, we embedded the protein sequences by performing four state of the art methods and validated their capacity to encapsulate essential biological information through visualization. Then, we compared the differences in sequence distance between protein-drug target pairs of drug repurposing and non - drug repurposing data. Thus, we were able to uncover patterns that define protein sequences in repurposing cases. RESULTS We found statistically significant sequence distance differences between protein pairs in the repurposing data and the rest of protein pairs in non-repurposing data. In this manner, we verified the potential of using numerical representations of sequences to generate repurposing hypotheses in the future.
Collapse
Affiliation(s)
- Natalia García Sánchez
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain
| | - Esther Ugarte Carro
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain
| | - Lucía Prieto-Santamaría
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain
- ETS de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, 28660, Spain
| | - Alejandro Rodríguez-González
- Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, 28223, Spain.
- ETS de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, 28660, Spain.
| |
Collapse
|
237
|
Li J, Xu F, Song S, Qi J. A maize seed variety identification method based on improving deep residual convolutional network. FRONTIERS IN PLANT SCIENCE 2024; 15:1382715. [PMID: 38803603 PMCID: PMC11128617 DOI: 10.3389/fpls.2024.1382715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 04/19/2024] [Indexed: 05/29/2024]
Abstract
Seed quality and safety are related to national food security, and seed variety purity is an essential indicator in seed quality detection. This study established a maize seed dataset comprising 5877 images of six different types and proposed a maize seed recognition model based on an improved ResNet50 framework. Firstly, we introduced the ResStage structure in the early stage of the original model, which facilitated the network's learning process and enabled more efficient information propagation across the network layers. Meanwhile, in the later residual blocks of the model, we introduced both the efficient channel attention (ECA) mechanism and depthwise separable (DS) convolution, which reduced the model's parameter cost and enabled the capturing of more precise and detailed features. Finally, a Swish-PReLU mixed activation function was introduced globally to improve the overall predictive power of the model. The results showed that our model achieved an impressive accuracy of 91.23% in corn seed classification, surpassing other related models. Compared with the original model, our model improved the accuracy by 7.07%, reduced the loss value by 0.19, and decreased the number of parameters by 40%. The research suggested that this method can efficiently classify corn seeds, holding significant value in seed variety identification.
Collapse
Affiliation(s)
- Jian Li
- College of Information Technology, Jilin Agricultural University, Changchun, China
- College of Information Technology, Jilin Bioinformatics Research Center, Changchun, China
| | - Fan Xu
- College of Information Technology, Jilin Agricultural University, Changchun, China
- College of Information Technology, Jilin Bioinformatics Research Center, Changchun, China
| | - Shaozhong Song
- School of Data Science and Artificial Intelligence, Jilin Engineering Normal University, Changchun, China
| | - Ji Qi
- College of Engineering Technical, Jilin Agricultural University, Changchun, China
| |
Collapse
|
238
|
Boussina A, Langouche L, Obirieze AC, Sinha M, Mack H, Leineweber W, Aralar A, Pride DT, Coleman TP, Fraley SI. Machine learning based DNA melt curve profiling enables automated novel genotype detection. BMC Bioinformatics 2024; 25:185. [PMID: 38730317 PMCID: PMC11088152 DOI: 10.1186/s12859-024-05747-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 03/14/2024] [Indexed: 05/12/2024] Open
Abstract
Surveillance for genetic variation of microbial pathogens, both within and among species, plays an important role in informing research, diagnostic, prevention, and treatment activities for disease control. However, large-scale systematic screening for novel genotypes remains challenging in part due to technological limitations. Towards addressing this challenge, we present an advancement in universal microbial high resolution melting (HRM) analysis that is capable of accomplishing both known genotype identification and novel genotype detection. Specifically, this novel surveillance functionality is achieved through time-series modeling of sequence-defined HRM curves, which is uniquely enabled by the large-scale melt curve datasets generated using our high-throughput digital HRM platform. Taking the detection of bacterial genotypes as a model application, we demonstrate that our algorithms accomplish an overall classification accuracy over 99.7% and perform novelty detection with a sensitivity of 0.96, specificity of 0.96 and Youden index of 0.92. Since HRM-based DNA profiling is an inexpensive and rapid technique, our results add support for the feasibility of its use in surveillance applications.
Collapse
Affiliation(s)
- Aaron Boussina
- Division of Biomedical Informatics, University of California San Diego, La Jolla, CA, 92093, USA
| | - Lennart Langouche
- Department of Nanoengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Augustine C Obirieze
- Department of Nanoengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Mridu Sinha
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - Hannah Mack
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - William Leineweber
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - April Aralar
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA
| | - David T Pride
- Department of Pathology, University of California San Diego, La Jolla, CA, 92093, USA
| | - Todd P Coleman
- Department of Bioengineering, Stanford University, Stanford, CA, 94305, USA.
| | - Stephanie I Fraley
- Department of Bioengineering, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
239
|
Peng H, Wang H, Kong W, Li J, Goh WWB. Optimizing differential expression analysis for proteomics data via high-performing rules and ensemble inference. Nat Commun 2024; 15:3922. [PMID: 38724498 PMCID: PMC11082229 DOI: 10.1038/s41467-024-47899-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2023] [Accepted: 04/16/2024] [Indexed: 05/12/2024] Open
Abstract
Identification of differentially expressed proteins in a proteomics workflow typically encompasses five key steps: raw data quantification, expression matrix construction, matrix normalization, missing value imputation (MVI), and differential expression analysis. The plethora of options in each step makes it challenging to identify optimal workflows that maximize the identification of differentially expressed proteins. To identify optimal workflows and their common properties, we conduct an extensive study involving 34,576 combinatoric experiments on 24 gold standard spike-in datasets. Applying frequent pattern mining techniques to top-ranked workflows, we uncover high-performing rules that demonstrate optimality has conserved properties. Via machine learning, we confirm optimal workflows are indeed predictable, with average cross-validation F1 scores and Matthew's correlation coefficients surpassing 0.84. We introduce an ensemble inference to integrate results from individual top-performing workflows for expanding differential proteome coverage and resolve inconsistencies. Ensemble inference provides gains in pAUC (up to 4.61%) and G-mean (up to 11.14%) and facilitates effective aggregation of information across varied quantification approaches such as topN, directLFQ, MaxLFQ intensities, and spectral counts. However, further development and evaluation are needed to establish acceptable frameworks for conducting ensemble inference on multiple proteomics workflows.
Collapse
Affiliation(s)
- Hui Peng
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - He Wang
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Weijia Kong
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Jinyan Li
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China.
| | - Wilson Wen Bin Goh
- Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore, Singapore.
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
- Center for Biomedical Informatics, Nanyang Technological University, Singapore, Singapore.
- Center of AI in Medicine, Nanyang Technological University, Singapore, Singapore.
- Division of Neurology, Department of Brain Sciences, Faculty of Medicine, Imperial College London, London, UK.
| |
Collapse
|
240
|
Cavallaro L, De Meo P, Fiumara G, Liotta A. On the sensitivity of centrality metrics. PLoS One 2024; 19:e0299255. [PMID: 38722923 PMCID: PMC11081296 DOI: 10.1371/journal.pone.0299255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 02/07/2024] [Indexed: 05/13/2024] Open
Abstract
Despite the huge importance that the centrality metrics have in understanding the topology of a network, too little is known about the effects that small alterations in the topology of the input graph induce in the norm of the vector that stores the node centralities. If so, then it could be possible to avoid re-calculating the vector of centrality metrics if some minimal changes occur in the network topology, which would allow for significant computational savings. Hence, after formalising the notion of centrality, three of the most basic metrics were herein considered (i.e., Degree, Eigenvector, and Katz centrality). To perform the simulations, two probabilistic failure models were used to describe alterations in network topology: Uniform (i.e., all nodes can be independently deleted from the network with a fixed probability) and Best Connected (i.e., the probability a node is removed depends on its degree). Our analysis suggests that, in the case of degree, small variations in the topology of the input graph determine small variations in Degree centrality, independently of the topological features of the input graph; conversely, both Eigenvector and Katz centralities can be extremely sensitive to changes in the topology of the input graph. In other words, if the input graph has some specific features, even small changes in the topology of the input graph can have catastrophic effects on the Eigenvector or Katz centrality.
Collapse
Affiliation(s)
- Lucia Cavallaro
- Institute for Computing and Information Sciences, Radboud University, Nijmegen, The Netherlands
| | | | | | - Antonio Liotta
- Faculty of Engineering, Free University of Bozen-Bolzano, Bolzano, Italy
| |
Collapse
|
241
|
Chang TL, Xia H, Mahajan S, Mahajan R, Maisog J, Vattikuti S, Chow CC, Chang JC. Interpretable (not just posthoc-explainable) medical claims modeling for discharge placement to reduce preventable all-cause readmissions or death. PLoS One 2024; 19:e0302871. [PMID: 38722929 PMCID: PMC11081343 DOI: 10.1371/journal.pone.0302871] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Accepted: 04/15/2024] [Indexed: 05/13/2024] Open
Abstract
We developed an inherently interpretable multilevel Bayesian framework for representing variation in regression coefficients that mimics the piecewise linearity of ReLU-activated deep neural networks. We used the framework to formulate a survival model for using medical claims to predict hospital readmission and death that focuses on discharge placement, adjusting for confounding in estimating causal local average treatment effects. We trained the model on a 5% sample of Medicare beneficiaries from 2008 and 2011, based on their 2009-2011 inpatient episodes (approximately 1.2 million), and then tested the model on 2012 episodes (approximately 400 thousand). The model scored an out-of-sample AUROC of approximately 0.75 on predicting all-cause readmissions-defined using official Centers for Medicare and Medicaid Services (CMS) methodology-or death within 30-days of discharge, being competitive against XGBoost and a Bayesian deep neural network, demonstrating that one need-not sacrifice interpretability for accuracy. Crucially, as a regression model, it provides what blackboxes cannot-its exact gold-standard global interpretation, explicitly defining how the model performs its internal "reasoning" for mapping the input data features to predictions. In doing so, we identify relative risk factors and quantify the effect of discharge placement. We also show that the posthoc explainer SHAP provides explanations that are inconsistent with the ground truth model reasoning that our model readily admits.
Collapse
Affiliation(s)
- Ted L. Chang
- Sound Prediction Inc., Columbus, OH, United States of America
- Mederrata Research Inc., Columbus, OH, United States of America
| | - Hongjing Xia
- Sound Prediction Inc., Columbus, OH, United States of America
- Mederrata Research Inc., Columbus, OH, United States of America
| | - Sonya Mahajan
- Sound Prediction Inc., Columbus, OH, United States of America
- Mederrata Research Inc., Columbus, OH, United States of America
| | - Rohit Mahajan
- Sound Prediction Inc., Columbus, OH, United States of America
- Mederrata Research Inc., Columbus, OH, United States of America
| | - Joe Maisog
- Lee Health, Fort Meyers, FL, United States of America
| | - Shashaank Vattikuti
- Sleep Research Center, Walter Reed Army Institute of Research, Silver Spring, MD, United States of America
| | - Carson C. Chow
- Mederrata Research Inc., Columbus, OH, United States of America
- Laboratory of Biological Modeling, NIDDK, National Institutes of Health, Bethesda, MD, United States of America
| | - Joshua C. Chang
- Sound Prediction Inc., Columbus, OH, United States of America
- Mederrata Research Inc., Columbus, OH, United States of America
- Epidemiology and Biostatistics Section, Rehabilitation Medicine Department, The National Institutes of Health, Besthesda, MD, United States of America
| |
Collapse
|
242
|
Rui M, Rosa F, Viberti A, Brun F, Massaglia S, Blanc S. Understanding Factors Associated with Interest in Sustainability-Certified Wine among American and Italian Consumers. Foods 2024; 13:1468. [PMID: 38790768 PMCID: PMC11120048 DOI: 10.3390/foods13101468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2024] [Revised: 05/03/2024] [Accepted: 05/08/2024] [Indexed: 05/26/2024] Open
Abstract
The wine industry has been witnessing a growth in businesses crafting sustainability-certified wines and in the attention of consumers to sustainability, especially in the United States and Italy. To identify the characteristics of consumers who prefer sustainability-certified wine, this study analysed the relationship between consumers' demographics, wine buying behaviour, and interest in sustainability-certified wine, focusing on these two countries for comparison. Data were collected through an online survey of US and Italian consumers. Through correspondence analysis, k-modes clustering analysis, and multi-way correspondence analysis, this study revealed a stronger relationship between demographics and interest in sustainability-certified wine among US consumers than Italian consumers. In particular, middle-aged US consumers exhibited a greater interest than seniors. The patterns of connections between consumers' wine buying behaviour and interest in sustainable wine were similar for the two countries. In particular, consumers who purchase wine weekly had a keen interest, and those who purchase wine sporadically had no or little interest. Furthermore, this study uncovered the intricate relationship among various variables, providing a comprehensive understanding of the association between wine consumer characteristics and their interest in sustainability-certified wine.
Collapse
Affiliation(s)
- Mingze Rui
- Department of Agricultural, Forest, and Food Sciences, University of Turin, 10095 Grugliasco, Italy; (M.R.); (F.B.); (S.M.)
| | | | | | - Filippo Brun
- Department of Agricultural, Forest, and Food Sciences, University of Turin, 10095 Grugliasco, Italy; (M.R.); (F.B.); (S.M.)
| | - Stefano Massaglia
- Department of Agricultural, Forest, and Food Sciences, University of Turin, 10095 Grugliasco, Italy; (M.R.); (F.B.); (S.M.)
- Centro Interdipartimentale Viticoltura e Vino (CONViVi), University of Turin, 12051 Alba, Italy
| | - Simone Blanc
- Department of Agricultural, Forest, and Food Sciences, University of Turin, 10095 Grugliasco, Italy; (M.R.); (F.B.); (S.M.)
| |
Collapse
|
243
|
Stańczyk U, Zielosko B, Baron G. Importance of Characteristic Features and Their Form for Data Exploration. ENTROPY (BASEL, SWITZERLAND) 2024; 26:404. [PMID: 38785653 PMCID: PMC11119179 DOI: 10.3390/e26050404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 04/27/2024] [Accepted: 05/03/2024] [Indexed: 05/25/2024]
Abstract
The nature of the input features is one of the key factors indicating what kind of tools, methods, or approaches can be used in a knowledge discovery process. Depending on the characteristics of the available attributes, some techniques could lead to unsatisfactory performance or even may not proceed at all without additional preprocessing steps. The types of variables and their domains affect performance. Any changes to their form can influence it as well, or even enable some learners. On the other hand, the relevance of features for a task constitutes another element with a noticeable impact on data exploration. The importance of attributes can be estimated through the application of mechanisms belonging to the feature selection and reduction area, such as rankings. In the described research framework, the data form was conditioned on relevance by the proposed procedure of gradual discretisation controlled by a ranking of attributes. Supervised and unsupervised discretisation methods were employed to the datasets from the stylometric domain and the task of binary authorship attribution. For the selected classifiers, extensive tests were performed and they indicated many cases of enhanced prediction for partially discretised datasets.
Collapse
Affiliation(s)
- Urszula Stańczyk
- Department of Computer Graphics, Vision and Digital Systems, Silesian University of Technology, Akademicka 2A, 44-100 Gliwice, Poland;
| | - Beata Zielosko
- Institute of Computer Science, University of Silesia in Katowice, Bȩdzińska 39, 41-200 Sosnowiec, Poland;
| | - Grzegorz Baron
- Department of Computer Graphics, Vision and Digital Systems, Silesian University of Technology, Akademicka 2A, 44-100 Gliwice, Poland;
| |
Collapse
|
244
|
Shen Z, Zhang F, Guo Z, Qu R, Wei Y, Wang J, Zhang W, Xing X, Zhang Y, Liu J, Tang D. Association between air pollution and male sexual function: A nationwide observational study in China. JOURNAL OF HAZARDOUS MATERIALS 2024; 469:134010. [PMID: 38492404 DOI: 10.1016/j.jhazmat.2024.134010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 03/05/2024] [Accepted: 03/09/2024] [Indexed: 03/18/2024]
Abstract
This study aimed to explore the associations between air pollution and male sexual function. A total of 5047 male subjects in China were included in this study. The average air pollution exposure (PM2.5, PM10, SO2, CO, NO2, and O3) for the preceding 1, 3, 6, and 12 months before the participants' response was assessed. Male sexual function was evaluated using the International Index of Erectile Function-5 (IIEF-5) and the Premature Ejaculation Diagnostic Tool (PEDT). Generalized linear models were utilized to explore the associations between air pollution and male sexual function. K-prototype algorithm was conducted to identify the association among specific populations. Significant adverse effects on the IIEF-5 score were observed with NO2 exposure during the preceding 1, 3, and 6 months (1 m: β = -5.26E-05; 3 m: β = -4.83E-05; 6 m: β = -4.23E-05, P < 0.05). PM2.5 exposure during the preceding 12 months was found to significantly negatively affect the PEDT after adjusting for confounding variables. Our research indicated negative correlations between air pollutant exposures and male sexual function for the first time. Furthermore, these associations were more pronounced among specific participants who maintain a normal BMI, exhibit extroverted traits, and currently engage in smoking and alcohol consumption.
Collapse
Affiliation(s)
- Ziyuan Shen
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei 230032, Anhui, China
| | - Feng Zhang
- Reproductive Medical Center, Renmin Hospital of Wuhan University, Wuhan 430060, Hubei, China
| | - Zihan Guo
- Reproductive Medical Center, Renmin Hospital of Wuhan University, Wuhan 430060, Hubei, China
| | - Rui Qu
- Reproductive Medical Center, Renmin Hospital of Wuhan University, Wuhan 430060, Hubei, China
| | - Yiqiu Wei
- Reproductive Medical Center, Renmin Hospital of Wuhan University, Wuhan 430060, Hubei, China
| | - Jingxuan Wang
- Reproductive Medical Center, Renmin Hospital of Wuhan University, Wuhan 430060, Hubei, China
| | - Weiqian Zhang
- Reproductive Medical Center, Renmin Hospital of Wuhan University, Wuhan 430060, Hubei, China
| | - Xing Xing
- Department of Epidemiology and Biostatistics, School of Public Health, Anhui Medical University, Hefei 230032, Anhui, China
| | - Yan Zhang
- Department of Clinical Laboratory, Renmin Hospital of Wuhan University, Wuhan 430060, Hubei, China.
| | - Jue Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing 100191, China; Institute for Global Health and Development, Peking University, Beijing 100871, China; Ministry of Education, Key Laboratory of Epidemiology of Major Diseases, Peking University, Beijing 100083, China.
| | - Dongdong Tang
- Reproductive Medicine Center, Department of Obstetrics and Gynecology, the First Affiliated Hospital of Anhui Medical University, Hefei 230022, China; NHC Key Laboratory of Study on Abnormal Gametes and Reproductive Tract, Anhui Medical University, Hefei 230032, China; Key Laboratory of Population Health Across Life Cycle (Anhui Medical University), Ministry of Education of the People's Republic of China, Hefei 230032, China.
| |
Collapse
|
245
|
Liu J, Duan Z, Hu X, Zhong J, Yin Y. Detracking Autoencoding Conditional Generative Adversarial Network: Improved Generative Adversarial Network Method for Tabular Missing Value Imputation. ENTROPY (BASEL, SWITZERLAND) 2024; 26:402. [PMID: 38785651 PMCID: PMC11120050 DOI: 10.3390/e26050402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/20/2024] [Accepted: 04/21/2024] [Indexed: 05/25/2024]
Abstract
Due to various reasons, such as limitations in data collection and interruptions in network transmission, gathered data often contain missing values. Existing state-of-the-art generative adversarial imputation methods face three main issues: limited applicability, neglect of latent categorical information that could reflect relationships among samples, and an inability to balance local and global information. We propose a novel generative adversarial model named DTAE-CGAN that incorporates detracking autoencoding and conditional labels to address these issues. This enhances the network's ability to learn inter-sample correlations and makes full use of all data information in incomplete datasets, rather than learning random noise. We conducted experiments on six real datasets of varying sizes, comparing our method with four classic imputation baselines. The results demonstrate that our proposed model consistently exhibited superior imputation accuracy.
Collapse
Affiliation(s)
- Jingrui Liu
- College of Computer Science, Chongqing University, Chongqing 400044, China
- Chongqing University-University of Cincinnati Joint Co-op Institute, Chongqing University, Chongqing 400044, China
| | - Zixin Duan
- School of Pharmaceutical Sciences, Chongqing University, Chongqing 401331, China
| | - Xinkai Hu
- College of Computer Science, Chongqing University, Chongqing 400044, China
| | - Jingxuan Zhong
- College of Mechanical and Vehicle Engineering, Chongqing University, Chongqing 400044, China
| | - Yunfei Yin
- College of Computer Science, Chongqing University, Chongqing 400044, China
| |
Collapse
|
246
|
Brookshire G, Kasper J, Blauch NM, Wu YC, Glatt R, Merrill DA, Gerrol S, Yoder KJ, Quirk C, Lucero C. Data leakage in deep learning studies of translational EEG. Front Neurosci 2024; 18:1373515. [PMID: 38765672 PMCID: PMC11099244 DOI: 10.3389/fnins.2024.1373515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Accepted: 04/04/2024] [Indexed: 05/22/2024] Open
Abstract
A growing number of studies apply deep neural networks (DNNs) to recordings of human electroencephalography (EEG) to identify a range of disorders. In many studies, EEG recordings are split into segments, and each segment is randomly assigned to the training or test set. As a consequence, data from individual subjects appears in both the training and the test set. Could high test-set accuracy reflect data leakage from subject-specific patterns in the data, rather than patterns that identify a disease? We address this question by testing the performance of DNN classifiers using segment-based holdout (in which segments from one subject can appear in both the training and test set), and comparing this to their performance using subject-based holdout (where all segments from one subject appear exclusively in either the training set or the test set). In two datasets (one classifying Alzheimer's disease, and the other classifying epileptic seizures), we find that performance on previously-unseen subjects is strongly overestimated when models are trained using segment-based holdout. Finally, we survey the literature and find that the majority of translational DNN-EEG studies use segment-based holdout. Most published DNN-EEG studies may dramatically overestimate their classification performance on new subjects.
Collapse
Affiliation(s)
| | - Jake Kasper
- SPARK Neuro Inc., New York, NY, United States
| | - Nicholas M. Blauch
- SPARK Neuro Inc., New York, NY, United States
- Neuroscience Institute, Carnegie Mellon University, Pittsburgh, PA, United States
| | | | - Ryan Glatt
- Pacific Brain Health Center, Pacific Neuroscience Institute and Foundation, Santa Monica, CA, United States
| | - David A. Merrill
- Pacific Brain Health Center, Pacific Neuroscience Institute and Foundation, Santa Monica, CA, United States
- Saint John's Cancer Institute at Providence Saint John's Health Center, Santa Monica, CA, United States
- Psychiatry and Biobehavioral Sciences, Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine at University of California, Los Angeles, Los Angeles, CA, United States
| | | | | | - Colin Quirk
- SPARK Neuro Inc., New York, NY, United States
| | - Ché Lucero
- SPARK Neuro Inc., New York, NY, United States
| |
Collapse
|
247
|
Höpfl S, Albadry M, Dahmen U, Herrmann KH, Kindler EM, König M, Reichenbach JR, Tautenhahn HM, Wei W, Zhao WT, Radde NE. Bayesian modelling of time series data (BayModTS)-a FAIR workflow to process sparse and highly variable data. Bioinformatics 2024; 40:btae312. [PMID: 38741151 PMCID: PMC11128094 DOI: 10.1093/bioinformatics/btae312] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 04/11/2024] [Accepted: 05/13/2024] [Indexed: 05/16/2024] Open
Abstract
MOTIVATION Systems biology aims to better understand living systems through mathematical modelling of experimental and clinical data. A pervasive challenge in quantitative dynamical modelling is the integration of time series measurements, which often have high variability and low sampling resolution. Approaches are required to utilize such information while consistently handling uncertainties. RESULTS We present BayModTS (Bayesian modelling of time series data), a new FAIR (findable, accessible, interoperable, and reusable) workflow for processing and analysing sparse and highly variable time series data. BayModTS consistently transfers uncertainties from data to model predictions, including process knowledge via parameterized models. Further, credible differences in the dynamics of different conditions can be identified by filtering noise. To demonstrate the power and versatility of BayModTS, we applied it to three hepatic datasets gathered from three different species and with different measurement techniques: (i) blood perfusion measurements by magnetic resonance imaging in rat livers after portal vein ligation, (ii) pharmacokinetic time series of different drugs in normal and steatotic mice, and (iii) CT-based volumetric assessment of human liver remnants after clinical liver resection. AVAILABILITY AND IMPLEMENTATION The BayModTS codebase is available on GitHub at https://github.com/Systems-Theory-in-Systems-Biology/BayModTS. The repository contains a Python script for the executable BayModTS workflow and a widely applicable SBML (systems biology markup language) model for retarded transient functions. In addition, all examples from the paper are included in the repository. Data and code of the application examples are stored on DaRUS: https://doi.org/10.18419/darus-3876. The raw MRI ROI voxel data were uploaded to DaRUS: https://doi.org/10.18419/darus-3878. The steatosis metabolite data are published on FairdomHub: 10.15490/fairdomhub.1.study.1070.1.
Collapse
Affiliation(s)
- Sebastian Höpfl
- Institute for Stochastics and Applications, University of Stuttgart, 70569 Stuttgart, Germany
| | - Mohamed Albadry
- Experimental Transplantation Surgery, Department of General, Vascular and Visceral Surgery, University Hospital Jena, 07745 Jena, Germany
- Department of Pathology, Faculty of Veterinary Medicine, Menoufia University, Shebin Elkom, Menoufia, Egypt
| | - Uta Dahmen
- Experimental Transplantation Surgery, Department of General, Vascular and Visceral Surgery, University Hospital Jena, 07745 Jena, Germany
| | - Karl-Heinz Herrmann
- Medical Physics Group, Institute for Diagnostic and Interventional Radiology, University Hospital Jena, 07743 Jena, Germany
| | - Eva Marie Kindler
- Clinic for General, Visceral and Vascular Surgery, Jena University Hospital, 07747 Jena, Germany
| | - Matthias König
- Institute for Biology, Faculty of Life Sciences, Humboldt-University Berlin, 10115 Berlin, Germany
| | - Jürgen Rainer Reichenbach
- Medical Physics Group, Institute for Diagnostic and Interventional Radiology, University Hospital Jena, 07743 Jena, Germany
| | - Hans-Michael Tautenhahn
- Clinic for Visceral, Transplantation, Thoracic and Vascular Surgery, Leipzig University Hospital, 04103 Leipzig, Germany
| | - Weiwei Wei
- Experimental Transplantation Surgery, Department of General, Vascular and Visceral Surgery, University Hospital Jena, 07745 Jena, Germany
| | - Wan-Ting Zhao
- Medical Physics Group, Institute for Diagnostic and Interventional Radiology, University Hospital Jena, 07743 Jena, Germany
| | - Nicole Erika Radde
- Institute for Stochastics and Applications, University of Stuttgart, 70569 Stuttgart, Germany
| |
Collapse
|
248
|
Gombolay GY, Silva A, Schrum M, Gopalan N, Hallman-Cooper J, Dutt M, Gombolay M. Effects of explainable artificial intelligence in neurology decision support. Ann Clin Transl Neurol 2024; 11:1224-1235. [PMID: 38581138 PMCID: PMC11093252 DOI: 10.1002/acn3.52036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 02/20/2024] [Accepted: 02/27/2024] [Indexed: 04/08/2024] Open
Abstract
OBJECTIVE Artificial intelligence (AI)-based decision support systems (DSS) are utilized in medicine but underlying decision-making processes are usually unknown. Explainable AI (xAI) techniques provide insight into DSS, but little is known on how to design xAI for clinicians. Here we investigate the impact of various xAI techniques on a clinician's interaction with an AI-based DSS in decision-making tasks as compared to a general population. METHODS We conducted a randomized, blinded study in which members of the Child Neurology Society and American Academy of Neurology were compared to a general population. Participants received recommendations from a DSS via a random assignment of an xAI intervention (decision tree, crowd sourced agreement, case-based reasoning, probability scores, counterfactual reasoning, feature importance, templated language, and no explanations). Primary outcomes included test performance and perceived explainability, trust, and social competence of the DSS. Secondary outcomes included compliance, understandability, and agreement per question. RESULTS We had 81 neurology participants with 284 in the general population. Decision trees were perceived as the more explainable by the medical versus general population (P < 0.01) and as more explainable than probability scores within the medical population (P < 0.001). Increasing neurology experience and perceived explainability degraded performance (P = 0.0214). Performance was not predicted by xAI method but by perceived explainability. INTERPRETATION xAI methods have different impacts on a medical versus general population; thus, xAI is not uniformly beneficial, and there is no one-size-fits-all approach. Further user-centered xAI research targeting clinicians and to develop personalized DSS for clinicians is needed.
Collapse
Affiliation(s)
- Grace Y Gombolay
- Department of Pediatrics, Division of Neurology, Children's Healthcare of Atlanta, Emory University School of Medicine, Atlanta, GA, USA
| | - Andrew Silva
- Georgia Institute of Technology, Atlanta, GA, USA
| | | | | | - Jamika Hallman-Cooper
- Department of Pediatrics, Division of Neurology, Children's Healthcare of Atlanta, Emory University School of Medicine, Atlanta, GA, USA
| | - Monideep Dutt
- Department of Pediatrics, Division of Neurology, Children's Healthcare of Atlanta, Emory University School of Medicine, Atlanta, GA, USA
| | - Matthew Gombolay
- Department of Pediatrics, Division of Neurology, Children's Healthcare of Atlanta, Emory University School of Medicine, Atlanta, GA, USA
| |
Collapse
|
249
|
Jung M, Lee KO, Kim HR, Koh SB, Gim JA. Four modeling approaches to study restrictions on everyday life and social activities due to chronic diseases with consequences of suicidal behavior. J Psychiatr Res 2024; 173:355-362. [PMID: 38581904 DOI: 10.1016/j.jpsychires.2024.03.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 03/20/2024] [Accepted: 03/24/2024] [Indexed: 04/08/2024]
Abstract
The purpose of this study was to discover the association between disability in everyday life and social activities due to chronic diseases and suicidal ideation (SI), suicidal plan (SP), and suicidal attempt (SA) from the Korea National Health and Nutrition Examination Survey (KNHANES), considering the cross-sectional design of this study, 2016-2018 dataset. Variables for finding the associated factors of SI, SP, and SA were confirmed through random forest (RF), decision tree, generalized linear model (GLM), and support vector machine (SVM), and the performance of each model is listed. A total of 17,323 (males: 7,530, females: 9793) responders from the KNHANES from 2016 to 2018 were employed for the study. The relationship between restrictions on daily life, social activities, and three stages of suicidal behaviors due to diseases were analyzed using the R function (R version 4.2.0), randomForest, ctree, glm, and ksvm. The F1-score is a measure used to evaluate the accuracy of the performance of a model, in the binary classification. The score of 1 indicates good performance, whereas a score of 0 signifies poor performance. Due to chronic diseases, disability in everyday life and social activities lead to suicide behaviors. In our study, we examined the impact of limitations in daily living and social activities on suicidal behaviors among participants. Our findings revealed that for those experiencing such limitations, the odds ratios (ORs) for SIs were 6.10 (95% CI: 3.99-9.34) for males and 2.61 (1.79-3.81) for females. SPs were 3.69 (2.36-5.78) for males and 3.94 (2.70-5.75) for females. Similarly, the odds ratios for SAs were 5.04 (2.51-10.13) for males and 2.71 (1.48-4.98) for females, indicating a significant association between these limitations and increased suicidal behaviors, with variances observed between genders. These results underscore the necessity of addressing daily living and social activity restrictions when considering mental health interventions and suicide prevention strategies. In RF, GLM, and SVM, F1-score were 0.8192, 0.6887, and 0.9687 in SA, respectively. Among the patients with chronic disease, those with sequelae, low incomes, and low levels of education had limitations in daily activities and social activities, which increased the likelihood of suicidal thoughts, planning, and attempts.
Collapse
Affiliation(s)
- Myoungjee Jung
- Division of Cancer Screening, National Cancer Center, South Korea
| | - Kwang Ok Lee
- Department of Nursing, Sangmyung University, South Korea
| | - Hae-Rim Kim
- Department of Statistics, University of Seoul, South Korea
| | - Sang-Baek Koh
- Institute of Genomic Cohort, Yonsei University Wonju College of Medicine, South Korea.
| | - Jeong-An Gim
- Department of Medical Science, Soonchunhyang University, South Korea.
| |
Collapse
|
250
|
Zhou X, Wang X. Memory and Communication Efficient Federated Kernel k-Means. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:7114-7125. [PMID: 36315538 DOI: 10.1109/tnnls.2022.3213777] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
A federated kernel k -means (FedKKM) algorithm is developed in this article to conduct distributed clustering with low memory consumption on user devices. In FedKKM, a federated eigenvector approximation (FEA) algorithm is designed to iteratively determine the low-dimensional approximate vectors of the transformed feature vectors, using only low-dimensional random feature vectors. To maintain high communication efficiency in each iteration of FEA, a communication-efficient Lanczos algorithm (CELA) is further designed in FEA to reduce the communication cost. Based on the low-dimensional approximate vectors, the clustering result is obtained by leveraging a distributed linear k -means algorithm. A theoretical analysis shows that: 1) FEA has a convergence rate of O(1/T) , where T is the number of iterations; 2) the scalability of FedKKM is not affected by the dataset size since the communication cost of FedKKM is independent of the number of users' data; and 3) FedKKM is a (1+ϵ) approximation algorithm. The experimental results show that FedKKM achieves the comparable clustering quality to that of a centralized kernel k -means. Compared with state-of-the-art schemes, FedKKM reduces the memory consumption on user devices by up to 94% and also reduces the communication cost by more than 40%.
Collapse
|