1
|
Bombina P, Tally D, Abrams ZB, Coombes KR. SillyPutty: Improved clustering by optimizing the silhouette width. PLoS One 2024; 19:e0300358. [PMID: 38848330 PMCID: PMC11161052 DOI: 10.1371/journal.pone.0300358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 02/26/2024] [Indexed: 06/09/2024] Open
Abstract
Clustering is an important task in biomedical science, and it is widely believed that different data sets are best clustered using different algorithms. When choosing between clustering algorithms on the same data set, reseachers typically rely on global measures of quality, such as the mean silhouette width, and overlook the fine details of clustering. However, the silhouette width actually computes scores that describe how well each individual element is clustered. Inspired by this observation, we developed a novel clustering method, called SillyPutty. Unlike existing methods, SillyPutty uses the silhouette width for individual elements as a tool to optimize the mean silhouette width. This shift in perspective allows for a more granular evaluation of clustering quality, potentially addressing limitations in current methodologies. To test the SillyPutty algorithm, we first simulated a series of data sets using the Umpire R package and then used real-workd data from The Cancer Genome Atlas. Using these data sets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed. Availability: The SillyPutty R package can be downloaded from the Comprehensive R Archive Network (CRAN).
Collapse
Affiliation(s)
- Polina Bombina
- Department of Biostatistics, Data Science and Epidemiology, Georgia Cancer Center at Augusta University, Augusta, GA, United States of America
| | - Dwayne Tally
- Department of Informatics, Indiana University, United States of America
| | - Zachary B. Abrams
- Division of Data Science and Biostatistics, Institute for Informatics, Washington University School of Medicine, Saint Louis, MO, United States of America
| | - Kevin R. Coombes
- Department of Biostatistics, Data Science and Epidemiology, Georgia Cancer Center at Augusta University, Augusta, GA, United States of America
| |
Collapse
|
2
|
Martins GL, Ferreira DS, Carneiro CM, Nogueira-Paiva NC, Bianchi AGC. Trajectory-driven computational analysis for element characterization in Trypanosoma cruzi video microscopy. PLoS One 2024; 19:e0304716. [PMID: 38829872 PMCID: PMC11146708 DOI: 10.1371/journal.pone.0304716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 05/14/2024] [Indexed: 06/05/2024] Open
Abstract
Optical microscopy videos enable experts to analyze the motion of several biological elements. Particularly in blood samples infected with Trypanosoma cruzi (T. cruzi), microscopy videos reveal a dynamic scenario where the parasites' motions are conspicuous. While parasites have self-motion, cells are inert and may assume some displacement under dynamic events, such as fluids and microscope focus adjustments. This paper analyzes the trajectory of T. cruzi and blood cells to discriminate between these elements by identifying the following motion patterns: collateral, fluctuating, and pan-tilt-zoom (PTZ). We consider two approaches: i) classification experiments for discrimination between parasites and cells; and ii) clustering experiments to identify the cell motion. We propose the trajectory step dispersion (TSD) descriptor based on standard deviation to characterize these elements, outperforming state-of-the-art descriptors. Our results confirm motion is valuable in discriminating T. cruzi of the cells. Since the parasites perform the collateral motion, their trajectory steps tend to randomness. The cells may assume fluctuating motion following a homogeneous and directional path or PTZ motion with trajectory steps in a restricted area. Thus, our findings may contribute to developing new computational tools focused on trajectory analysis, which can advance the study and medical diagnosis of Chagas disease.
Collapse
Affiliation(s)
- Geovani L. Martins
- Postgraduate Program in Computer Science, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
- Department of Computing, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
| | - Daniel S. Ferreira
- Department of Computing, Federal Institute of Education, Science, and Technology of Ceará, Maracanaú, CE, Brazil
| | - Claudia M. Carneiro
- Nucleus of Biological Sciences Research, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
- Department of Clinical Analysis, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
| | - Nivia C. Nogueira-Paiva
- Nucleus of Biological Sciences Research, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
| | - Andrea G. C. Bianchi
- Postgraduate Program in Computer Science, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
- Department of Computing, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
| |
Collapse
|
3
|
Raharinirina NA, Sunkara V, von Kleist M, Fackeldey K, Weber M. Multi-Input data ASsembly for joint Analysis (MIASA): A framework for the joint analysis of disjoint sets of variables. PLoS One 2024; 19:e0302425. [PMID: 38728301 PMCID: PMC11086896 DOI: 10.1371/journal.pone.0302425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 04/04/2024] [Indexed: 05/12/2024] Open
Abstract
The joint analysis of two datasets [Formula: see text] and [Formula: see text] that describe the same phenomena (e.g. the cellular state), but measure disjoint sets of variables (e.g. mRNA vs. protein levels) is currently challenging. Traditional methods typically analyze single interaction patterns such as variance or covariance. However, problem-tailored external knowledge may contain multiple different information about the interaction between the measured variables. We introduce MIASA, a holistic framework for the joint analysis of multiple different variables. It consists of assembling multiple different information such as similarity vs. association, expressed in terms of interaction-scores or distances, for subsequent clustering/classification. In addition, our framework includes a novel qualitative Euclidean embedding method (qEE-Transition) which enables using Euclidean-distance/vector-based clustering/classification methods on datasets that have a non-Euclidean-based interaction structure. As an alternative to conventional optimization-based multidimensional scaling methods which are prone to uncertainties, our qEE-Transition generates a new vector representation for each element of the dataset union [Formula: see text] in a common Euclidean space while strictly preserving the original ordering of the assembled interaction-distances. To demonstrate our work, we applied the framework to three types of simulated datasets: samples from families of distributions, samples from correlated random variables, and time-courses of statistical moments for three different types of stochastic two-gene interaction models. We then compared different clustering methods with vs. without the qEE-Transition. For all examples, we found that the qEE-Transition followed by Ward clustering had superior performance compared to non-agglomerative clustering methods but had a varied performance against ultrametric-based agglomerative methods. We also tested the qEE-Transition followed by supervised and unsupervised machine learning methods and found promising results, however, more work is needed for optimal parametrization of these methods. As a future perspective, our framework points to the importance of more developments and validation of distance-distribution models aiming to capture multiple-complex interactions between different variables.
Collapse
Affiliation(s)
- Nomenjanahary Alexia Raharinirina
- Department of Mathematics & Computer Science, Freie Universität Berlin, Berlin, Germany
- Departement of Modeling and Simulation of Complex Processes, Zuse Institute Berlin, Berlin, Germany
| | - Vikram Sunkara
- Departement of Visual and Data-Centric Computing, Zuse Institute Berlin, Berlin, Germany
| | - Max von Kleist
- Department of Mathematics & Computer Science, Freie Universität Berlin, Berlin, Germany
- Project Groups, Robert-Koch Institute, Berlin, Germany
| | - Konstantin Fackeldey
- Departement of Modeling and Simulation of Complex Processes, Zuse Institute Berlin, Berlin, Germany
- Institute of Mathematics, Technical University Berlin, Berlin, Germany
| | - Marcus Weber
- Departement of Modeling and Simulation of Complex Processes, Zuse Institute Berlin, Berlin, Germany
| |
Collapse
|
4
|
da Silva GD, Silva FN, de Arruda HF, e Souza BC, Costa LDF, Amancio DR. Using full-text content to characterize and identify best seller books: A study of early 20th-century literature. PLoS One 2024; 19:e0302070. [PMID: 38669247 PMCID: PMC11051604 DOI: 10.1371/journal.pone.0302070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 03/27/2024] [Indexed: 04/28/2024] Open
Abstract
Artistic pieces can be studied from several perspectives, one example being their reception among readers over time. In the present work, we approach this interesting topic from the standpoint of literary works, particularly assessing the task of predicting whether a book will become a best seller. Unlike previous approaches, we focused on the full content of books and considered visualization and classification tasks. We employed visualization for the preliminary exploration of the data structure and properties, involving SemAxis and linear discriminant analyses. To obtain quantitative and more objective results, we employed various classifiers. Such approaches were used along with a dataset containing (i) books published from 1895 to 1923 and consecrated as best sellers by the Publishers Weekly Bestseller Lists and (ii) literary works published in the same period but not being mentioned in that list. Our comparison of methods revealed that the best-achieved result-combining a bag-of-words representation with a logistic regression classifier-led to an average accuracy of 0.75 both for the leave-one-out and 10-fold cross-validations. Such an outcome enhances the difficulty in predicting the success of books with high accuracy, even using the full content of the texts. Nevertheless, our findings provide insights into the factors leading to the relative success of a literary work.
Collapse
Affiliation(s)
| | - Filipi N. Silva
- The Observatory on Social Media (OSoMe), Indiana University, Bloomington, Indiana, United States of America
| | | | - Bárbara C. e Souza
- Institute of Mathematics and Computer Science – USP, São Carlos, SP, Brazil
| | | | - Diego R. Amancio
- Institute of Mathematics and Computer Science – USP, São Carlos, SP, Brazil
| |
Collapse
|
5
|
Lyver D, Nica M, Cot C, Cacciapaglia G, Mohammadi Z, Thommes EW, Cojocaru MG. Population mobility, well-mixed clustering and disease spread: a look at COVID-19 Spread in the United States and preventive policy insights. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:5604-5633. [PMID: 38872550 DOI: 10.3934/mbe.2024247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
The epidemiology of pandemics is classically viewed using geographical and political borders; however, these artificial divisions can result in a misunderstanding of the current epidemiological state within a given region. To improve upon current methods, we propose a clustering algorithm which is capable of recasting regions into well-mixed clusters such that they have a high level of interconnection while minimizing the external flow of the population towards other clusters. Moreover, we analyze and identify so-called core clusters, clusters that retain their features over time (temporally stable) and independent of the presence or absence of policy measures. In order to demonstrate the capabilities of this algorithm, we use USA county-level cellular mobility data to divide the country into such clusters. Herein, we show a more granular spread of SARS-CoV-2 throughout the first weeks of the pandemic. Moreover, we are able to identify areas (groups of counties) that were experiencing above average levels of transmission within a state, as well as pan-state areas (clusters overlapping more than one state) with very similar disease spread. Therefore, our method enables policymakers to make more informed decisions on the use of public health interventions within their jurisdiction, as well as guide collaboration with surrounding regions to benefit the general population in controlling the spread of communicable diseases.
Collapse
Affiliation(s)
- David Lyver
- Department of Mathematics, University of Guelph, Guelph ON N1G 2W1, Canada
| | - Mihai Nica
- Department of Mathematics, University of Guelph, Guelph ON N1G 2W1, Canada
| | - Corentin Cot
- Laboratoire de Physique des 2 Infinis Irène Joliot Curie (UMR 9012), CNRS/IN2P3, Orsay 91400, France
| | - Giacomo Cacciapaglia
- Institut de Physique des 2 Infinis de Lyon (UMR 5822), CNRS/IN2P3 et Université Claude Bernard Lyon 1, Villeurbanne 69622, France
| | - Zahra Mohammadi
- Department of Mathematics, University of Guelph, Guelph ON N1G 2W1, Canada
| | - Edward W Thommes
- Department of Mathematics, University of Guelph, Guelph ON N1G 2W1, Canada
- Sanofi, North York ON M2R 3T4, Canada
| | | |
Collapse
|
6
|
Rodríguez-Fernández A, Aloisi I, Blanco-Alegre C, Vega-Maray AM, Valencia-Barrera RM, Suanno C, Calvo AI, Fraile R, Fernández-González D. Identifying key environmental factors to model Alt a 1 airborne allergen presence and variation. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 917:170597. [PMID: 38307265 DOI: 10.1016/j.scitotenv.2024.170597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 01/26/2024] [Accepted: 01/29/2024] [Indexed: 02/04/2024]
Abstract
Fungal spores, commonly found in the atmosphere, can trigger important respiratory disorders. The glycoprotein Alt a 1 is the major allergen present in conidia of the genus Alternaria and has a high clinical relevance for people sensitized to fungi. Exposure to this allergen has been traditionally assessed by aerobiological spore counts, although this does not always offer an accurate estimate of airborne allergen load. This study aims to pinpoint the key factors that explain the presence and variation of Alt a 1 concentration in the atmosphere in order to establish exposure risk periods and improve forecasting models. Alternaria spores were sampled using a Hirst-type volumetric sampler over a five-year period. The allergenic fraction from the bioaerosol was collected using a low-volume cyclone sampler and Alt a 1 quantified by Enzyme-Linked ImmunoSorbent Assay. A cluster analysis was executed in order to group days with similar environmental features and then analyze days with the presence of the allergen in each of them. Subsequently, a quadratic discriminant analysis was performed to evaluate if the selected variables can predict days with high Alt a 1 load. The results indicate that higher temperatures and absolute humidity favor the presence of Alt a 1 in the atmosphere, while time of precipitation is related to days without allergen. Moreover, using the selected parameters, the quadratic discriminant analysis to predict days with allergen showed an accuracy rate between 67 % and 85 %. The mismatch between daily airborne concentration of Alternaria spores and allergen load can be explained by the greater contribution of medium-to-long distance transport of the allergen from the major emission sources as compared with spores. Results highlight the importance of conducting aeroallergen quantification studies together with spore counts to improve the forecasting models of allergy risk, especially for fungal spores.
Collapse
Affiliation(s)
| | - Iris Aloisi
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | | | - Ana María Vega-Maray
- Department of Biodiversity and Environmental Management (Botany), University of León, León, Spain
| | | | - Chiara Suanno
- Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy
| | | | | | - Delia Fernández-González
- Department of Biodiversity and Environmental Management (Botany), University of León, León, Spain; Institute of Atmospheric Sciences and Climate-CNR, Bologna, Italy
| |
Collapse
|
7
|
Newson JJ, Bala J, Giedd JN, Maxwell B, Thiagarajan TC. Leveraging big data for causal understanding in mental health: a research framework. Front Psychiatry 2024; 15:1337740. [PMID: 38439791 PMCID: PMC10910083 DOI: 10.3389/fpsyt.2024.1337740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 02/01/2024] [Indexed: 03/06/2024] Open
Abstract
Over the past 30 years there have been numerous large-scale and longitudinal psychiatric research efforts to improve our understanding and treatment of mental health conditions. However, despite the huge effort by the research community and considerable funding, we still lack a causal understanding of most mental health disorders. Consequently, the majority of psychiatric diagnosis and treatment still operates at the level of symptomatic experience, rather than measuring or addressing root causes. This results in a trial-and-error approach that is a poor fit to underlying causality with poor clinical outcomes. Here we discuss how a research framework that originates from exploration of causal factors, rather than symptom groupings, applied to large scale multi-dimensional data can help address some of the current challenges facing mental health research and, in turn, clinical outcomes. Firstly, we describe some of the challenges and complexities underpinning the search for causal drivers of mental health conditions, focusing on current approaches to the assessment and diagnosis of psychiatric disorders, the many-to-many mappings between symptoms and causes, the search for biomarkers of heterogeneous symptom groups, and the multiple, dynamically interacting variables that influence our psychology. Secondly, we put forward a causal-orientated framework in the context of two large-scale datasets arising from the Adolescent Brain Cognitive Development (ABCD) study, the largest long-term study of brain development and child health in the United States, and the Global Mind Project which is the largest database in the world of mental health profiles along with life context information from 1.4 million people across the globe. Finally, we describe how analytical and machine learning approaches such as clustering and causal inference can be used on datasets such as these to help elucidate a more causal understanding of mental health conditions to enable diagnostic approaches and preventative solutions that tackle mental health challenges at their root cause.
Collapse
Affiliation(s)
| | - Jerzy Bala
- Sapien Labs, Arlington, VA, United States
| | - Jay N. Giedd
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, United States
| | - Benjamin Maxwell
- Department of Psychiatry, University of California, San Diego, La Jolla, CA, United States
- Rady Children’s Hospital – San Diego, San Diego, CA, United States
| | | |
Collapse
|
8
|
Petnak T, Cheungpasitporn W, Thongprayoon C, Sodsri T, Tangpanithandee S, Moua T. Phenotypic subtypes of fibrotic hypersensitivity pneumonitis identified by machine learning consensus clustering analysis. Respir Res 2024; 25:41. [PMID: 38238763 PMCID: PMC10797808 DOI: 10.1186/s12931-024-02664-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Accepted: 01/01/2024] [Indexed: 01/22/2024] Open
Abstract
BACKGROUND Patients with fibrotic hypersensitivity pneumonitis (f-HP) have varied clinical and radiologic presentations whose associated phenotypic outcomes have not been previously described. We conducted a study to evaluate mortality and lung transplant (LT) outcomes among clinical clusters of f-HP as characterized by an unsupervised machine learning approach. METHODS Consensus cluster analysis was performed on a retrospective cohort of f-HP patients diagnosed according to recent international guideline. Demographics, antigen exposure, radiologic, histopathologic, and pulmonary function findings along with comorbidities were included in the cluster analysis. Cox proportional-hazards regression was used to assess mortality or LT risk as a combined outcome for each cluster. RESULTS Three distinct clusters were identified among 336 f-HP patients. Cluster 1 (n = 158, 47%) was characterized by mild restriction on pulmonary function testing (PFT). Cluster 2 (n = 46, 14%) was characterized by younger age, lower BMI, and a higher proportion of identifiable causative antigens with baseline obstructive physiology. Cluster 3 (n = 132, 39%) was characterized by moderate to severe restriction. When compared to cluster 1, mortality or LT risk was lower in cluster 2 (hazard ratio (HR) of 0.42; 95% CI, 0.21-0.82; P = 0.01) and higher in cluster 3 (HR of 1.76; 95% CI, 1.24-2.48; P = 0.001). CONCLUSIONS Three distinct phenotypes of f-HP with unique mortality or transplant outcomes were found using unsupervised cluster analysis, highlighting improved mortality in fibrotic patients with obstructive physiology and identifiable antigens.
Collapse
Affiliation(s)
- Tananchai Petnak
- Division of Pulmonary and Pulmonary Critical Care Medicine, Faculty of Medicine Ramathibodi Hospital, Mahidol University, Nakhon Pathom, Thailand
- Division of Pulmonary and Critical Care Medicine, Mayo Clinic, 200 First St SW, Rochester, MN, 55905, United States
| | | | - Charat Thongprayoon
- Division of Nephrology and Hypertension, Mayo Clinic, Rochester, MN, United States
| | - Tulaton Sodsri
- Faculty of Medicine Ramathibodi Hospital, Chakri Naruebodindra Medical Institute, Mahidol University, Samut Prakan, Thailand
| | | | - Teng Moua
- Division of Pulmonary and Critical Care Medicine, Mayo Clinic, 200 First St SW, Rochester, MN, 55905, United States.
| |
Collapse
|
9
|
Kariotis S, Tan PF, Lu H, Rhodes CJ, Wilkins MR, Lawrie A, Wang D. Omada: robust clustering of transcriptomes through multiple testing. Gigascience 2024; 13:giae039. [PMID: 38991852 PMCID: PMC11238428 DOI: 10.1093/gigascience/giae039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2023] [Revised: 02/09/2024] [Accepted: 06/17/2024] [Indexed: 07/13/2024] Open
Abstract
BACKGROUND Cohort studies increasingly collect biosamples for molecular profiling and are observing molecular heterogeneity. High-throughput RNA sequencing is providing large datasets capable of reflecting disease mechanisms. Clustering approaches have produced a number of tools to help dissect complex heterogeneous datasets, but selecting the appropriate method and parameters to perform exploratory clustering analysis of transcriptomic data requires deep understanding of machine learning and extensive computational experimentation. Tools that assist with such decisions without prior field knowledge are nonexistent. To address this, we have developed Omada, a suite of tools aiming to automate these processes and make robust unsupervised clustering of transcriptomic data more accessible through automated machine learning-based functions. FINDINGS The efficiency of each tool was tested with 7 datasets characterized by different expression signal strengths to capture a wide spectrum of RNA expression datasets. Our toolkit's decisions reflected the real number of stable partitions in datasets where the subgroups are discernible. Within datasets with less clear biological distinctions, our tools either formed stable subgroups with different expression profiles and robust clinical associations or revealed signs of problematic data such as biased measurements. CONCLUSIONS In conclusion, Omada successfully automates the robust unsupervised clustering of transcriptomic data, making advanced analysis accessible and reliable even for those without extensive machine learning expertise. Implementation of Omada is available at http://bioconductor.org/packages/omada/.
Collapse
Affiliation(s)
- Sokratis Kariotis
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), 30 Medical Dr, 117609, Singapore, Republic of Singapore
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis St, Matrix, 138671, Singapore, Republic of Singapore
- National Heart and Lung Institute, Imperial College London, Guy Scadding Building, Dovehouse St, SW3 6LY, London, United Kingdom
| | - Pei Fang Tan
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), 30 Medical Dr, 117609, Singapore, Republic of Singapore
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis St, Matrix, 138671, Singapore, Republic of Singapore
| | - Haiping Lu
- Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, S1 4DP, Sheffield, United Kingdom
| | - Christopher J Rhodes
- National Heart and Lung Institute, Imperial College London, Guy Scadding Building, Dovehouse St, SW3 6LY, London, United Kingdom
| | - Martin R Wilkins
- National Heart and Lung Institute, Imperial College London, Guy Scadding Building, Dovehouse St, SW3 6LY, London, United Kingdom
| | - Allan Lawrie
- National Heart and Lung Institute, Imperial College London, Guy Scadding Building, Dovehouse St, SW3 6LY, London, United Kingdom
| | - Dennis Wang
- Singapore Institute for Clinical Sciences, Agency for Science, Technology and Research (A*STAR), 30 Medical Dr, 117609, Singapore, Republic of Singapore
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis St, Matrix, 138671, Singapore, Republic of Singapore
- National Heart and Lung Institute, Imperial College London, Guy Scadding Building, Dovehouse St, SW3 6LY, London, United Kingdom
- Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, S1 4DP, Sheffield, United Kingdom
| |
Collapse
|
10
|
Goggin SM, Zunder ER. A hyperparameter-randomized ensemble approach for robust clustering across diverse datasets. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.571953. [PMID: 38187667 PMCID: PMC10769222 DOI: 10.1101/2023.12.18.571953] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Clustering analysis is widely used to group objects by similarity, but for complex datasets such as those produced by single-cell analysis, the currently available clustering methods are limited by accuracy, robustness, ease of use, and interpretability. To address these limitations, we developed an ensemble clustering method with hyperparameter randomization that outperforms other methods across a broad range of single-cell and synthetic datasets, without the need for manual hyperparameter selection. In addition to hard cluster labels, it also outputs soft cluster memberships to characterize continuum-like regions and per cell overlap scores to quantify the uncertainty in cluster assignment. We demonstrate the improved clustering interpretability from these features by tracing the intermediate stages between handwritten digits in the MNIST dataset, and between tanycyte subpopulations in the hypothalamus. This approach improves the quality of clustering and subsequent downstream analyses for single-cell datasets, and may also prove useful in other fields of data analysis.
Collapse
Affiliation(s)
- Sarah M. Goggin
- Neuroscience Graduate Program, School of Medicine, University of Virginia, Charlottesville, VA 22902
| | - Eli R. Zunder
- Neuroscience Graduate Program, School of Medicine, University of Virginia, Charlottesville, VA 22902
- Department of Biomedical Engineering, School of Engineering, University of Virginia, Charlottesville, VA 22902
| |
Collapse
|
11
|
Zarei D, Saghazadeh A, Rezaei N. Subtyping irritable bowel syndrome using cluster analysis: a systematic review. BMC Bioinformatics 2023; 24:478. [PMID: 38102564 PMCID: PMC10724977 DOI: 10.1186/s12859-023-05567-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 11/13/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND Irritable bowel syndrome (IBS) is a common chronic functional gastrointestinal disorder associated with a wide range of clinical symptoms. Some researchers have used cluster analysis (CA), a group of non-supervised learning methods that identifies homogenous clusters within different entities based on their similarity. OBJECTIVE AND METHODS This literature review aims to identify published articles that apply CA to IBS patients. We searched relevant keywords in PubMed, Embase, Web of Science, and Scopus. We reviewed studies in terms of the selected variables, participants' characteristics, data collection, methodology, number of clusters, clusters' profiles, and results. RESULTS Among the 14 articles focused on the heterogeneity of IBS, eight of them utilized K-means Cluster Analysis (K-means CA), four employed Hierarchical Cluster Analysis, and only two studies utilized Latent Class Analysis. Seven studies focused on clinical symptoms, while four articles examined anocolorectal functions. Two studies were centered around immunological findings, and only one study explored microbial composition. The number of clusters obtained ranged from two to seven, showing variation across the studies. Males exhibited lower symptom severity and fewer psychological findings. The association between symptom severity and rectal perception suggests that altered rectal perception serves as a biological indicator of IBS. Ultra-slow waves observed in IBS patients are linked to increased activity of the anal sphincter, higher anal pressure, dystonia, and dyschezia. CONCLUSION IBS has different subgroups based on different factors. Most IBS patients have low clinical severity, good QoL, high rectal sensitivity, delayed left colon transit time, increased systemic cytokines, and changes in microbial composition, including increased Firmicutes-associated taxa and depleted Bacteroidetes-related taxa. However, the number of clusters is inconsistent across studies due to the methodological heterogeneity. CA, a valuable non-supervised learning method, is sensitive to hyperparameters like the number of clusters and random initialization of cluster centers. The random nature of these parameters leads to diverse outcomes even with the same algorithm. This has implications for future research and practical applications, necessitating further studies to improve our understanding of IBS and develop personalized treatments.
Collapse
Affiliation(s)
- Diana Zarei
- School of Medicine, Iran University of Medical Science, Tehran, Iran
- Systematic Review and Meta-Analysis Expert Group (SRMEG), Universal Scientific Education and Research Network (USERN), Tehran, Iran
| | - Amene Saghazadeh
- Research Center for Immunodeficiencies, Children's Medical Center, Tehran University of Medical Sciences, Dr. Qarib St, Keshavarz Blvd, Tehran, 14194, Iran
- Integrated Science Association (ISA), Universal Scientific Education and Research Network (USERN), Tehran, Iran
| | - Nima Rezaei
- Research Center for Immunodeficiencies, Children's Medical Center, Tehran University of Medical Sciences, Dr. Qarib St, Keshavarz Blvd, Tehran, 14194, Iran.
- Integrated Science Association (ISA), Universal Scientific Education and Research Network (USERN), Tehran, Iran.
- Department of Immunology and Biology, School of Medicine, Tehran University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
12
|
Tohalino JAV, Silva TC, Amancio DR. Using citation networks to evaluate the impact of text length on keyword extraction. PLoS One 2023; 18:e0294500. [PMID: 38011182 PMCID: PMC10681196 DOI: 10.1371/journal.pone.0294500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 11/02/2023] [Indexed: 11/29/2023] Open
Abstract
The identification of key concepts within unstructured data is of paramount importance in practical applications. Despite the abundance of proposed methods for extracting primary topics, only a few works investigated the influence of text length on the performance of keyword extraction (KE) methods. Specifically, many studies lean on abstracts and titles for content extraction from papers, leaving it uncertain whether leveraging the complete content of papers can yield consistent results. Hence, in this study, we employ a network-based approach to evaluate the concordance between keywords extracted from abstracts and those from the entire papers. Community detection methods are utilized to identify interconnected papers in citation networks. Subsequently, paper clusters are formed to identify salient terms within each cluster, employing a methodology akin to the term frequency-inverse document frequency (tf-idf) approach. Once each cluster has been endowed with its distinctive set of key terms, these selected terms are employed to serve as representative keywords at the paper level. The top-ranked words at the cluster level, which also appear in the abstract, are chosen as keywords for the paper. Our findings indicate that although various community detection methods used in KE yield similar levels of accuracy. Notably, text clustering approaches outperform all citation-based methods, while all approaches yield relatively low accuracy values. We also identified a lack of concordance between keywords extracted from the abstracts and those extracted from the corresponding full-text source. Considering that citations and text clustering yield distinct outcomes, combining them in hybrid approaches could offer improved performance.
Collapse
Affiliation(s)
| | | | - Diego R. Amancio
- Institute of Mathematics and Computer Science – USP, São Carlos, SP, Brazil
| |
Collapse
|
13
|
Bombina P, Tally D, Abrams ZB, Coombes KR. SillyPutty: Improved clustering by optimizing the silhouette width. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.07.566055. [PMID: 37986817 PMCID: PMC10659363 DOI: 10.1101/2023.11.07.566055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Unsupervised clustering is an important task in biomedical science. We developed a new clustering method, called SillyPutty, for unsupervised clustering. As test data, we generated a series of datasets using the Umpire R package. Using these datasets, we compared SillyPutty to several existing algorithms using multiple metrics (Silhouette Width, Adjusted Rand Index, Entropy, Normalized Within-group Sum of Square errors, and Perfect Classification Count). Our findings revealed that SillyPutty is a valid standalone clustering method, comparable in accuracy to the best existing methods. We also found that the combination of hierarchical clustering followed by SillyPutty has the best overall performance in terms of both accuracy and speed.
Collapse
Affiliation(s)
- Polina Bombina
- Department of Biostatistics, Data Science, and Epidemiology, Georgia Cancer Center at Augusta University, Augusta, GA, USA
| | - Dwayne Tally
- Department of Informatics, Indiana University, USA
| | - Zachary B. Abrams
- Institute for Informatics, Division of Data Science and Biostatistics. Washington University School of Medicine. Saint Louis, MO, USA
| | - Kevin R. Coombes
- Department of Biostatistics, Data Science, and Epidemiology, Georgia Cancer Center at Augusta University, Augusta, GA, USA
| |
Collapse
|
14
|
Woodman RJ, Mangoni AA. A comprehensive review of machine learning algorithms and their application in geriatric medicine: present and future. Aging Clin Exp Res 2023; 35:2363-2397. [PMID: 37682491 PMCID: PMC10627901 DOI: 10.1007/s40520-023-02552-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Accepted: 08/24/2023] [Indexed: 09/09/2023]
Abstract
The increasing access to health data worldwide is driving a resurgence in machine learning research, including data-hungry deep learning algorithms. More computationally efficient algorithms now offer unique opportunities to enhance diagnosis, risk stratification, and individualised approaches to patient management. Such opportunities are particularly relevant for the management of older patients, a group that is characterised by complex multimorbidity patterns and significant interindividual variability in homeostatic capacity, organ function, and response to treatment. Clinical tools that utilise machine learning algorithms to determine the optimal choice of treatment are slowly gaining the necessary approval from governing bodies and being implemented into healthcare, with significant implications for virtually all medical disciplines during the next phase of digital medicine. Beyond obtaining regulatory approval, a crucial element in implementing these tools is the trust and support of the people that use them. In this context, an increased understanding by clinicians of artificial intelligence and machine learning algorithms provides an appreciation of the possible benefits, risks, and uncertainties, and improves the chances for successful adoption. This review provides a broad taxonomy of machine learning algorithms, followed by a more detailed description of each algorithm class, their purpose and capabilities, and examples of their applications, particularly in geriatric medicine. Additional focus is given on the clinical implications and challenges involved in relying on devices with reduced interpretability and the progress made in counteracting the latter via the development of explainable machine learning.
Collapse
Affiliation(s)
- Richard J Woodman
- Centre of Epidemiology and Biostatistics, College of Medicine and Public Health, Flinders University, GPO Box 2100, Adelaide, SA, 5001, Australia.
| | - Arduino A Mangoni
- Discipline of Clinical Pharmacology, College of Medicine and Public Health, Flinders University, Adelaide, SA, Australia
- Department of Clinical Pharmacology, Flinders Medical Centre, Southern Adelaide Local Health Network, Adelaide, SA, Australia
| |
Collapse
|
15
|
Ekemeyong Awong LE, Zielinska T. Comparative Analysis of the Clustering Quality in Self-Organizing Maps for Human Posture Classification. SENSORS (BASEL, SWITZERLAND) 2023; 23:7925. [PMID: 37765983 PMCID: PMC10538130 DOI: 10.3390/s23187925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 09/05/2023] [Accepted: 09/13/2023] [Indexed: 09/29/2023]
Abstract
The objective of this article is to develop a methodology for selecting the appropriate number of clusters to group and identify human postures using neural networks with unsupervised self-organizing maps. Although unsupervised clustering algorithms have proven effective in recognizing human postures, many works are limited to testing which data are correctly or incorrectly recognized. They often neglect the task of selecting the appropriate number of groups (where the number of clusters corresponds to the number of output neurons, i.e., the number of postures) using clustering quality assessments. The use of quality scores to determine the number of clusters frees the expert to make subjective decisions about the number of postures, enabling the use of unsupervised learning. Due to high dimensionality and data variability, expert decisions (referred to as data labeling) can be difficult and time-consuming. In our case, there is no manual labeling step. We introduce a new clustering quality score: the discriminant score (DS). We describe the process of selecting the most suitable number of postures using human activity records captured by RGB-D cameras. Comparative studies on the usefulness of popular clustering quality scores-such as the silhouette coefficient, Dunn index, Calinski-Harabasz index, Davies-Bouldin index, and DS-for posture classification tasks are presented, along with graphical illustrations of the results produced by DS. The findings show that DS offers good quality in posture recognition, effectively following postural transitions and similarities.
Collapse
Affiliation(s)
- Lisiane Esther Ekemeyong Awong
- Faculty of Power and Aeronautical Engineering, Division of Theory of Machines and Robots, Warsaw University of Technology, 00-665 Warszawa, Poland
| | - Teresa Zielinska
- Faculty of Power and Aeronautical Engineering, Division of Theory of Machines and Robots, Warsaw University of Technology, 00-665 Warszawa, Poland
| |
Collapse
|
16
|
Ahmadinejad N, Chung Y, Liu L. J-score: a robust measure of clustering accuracy. PeerJ Comput Sci 2023; 9:e1545. [PMID: 37705621 PMCID: PMC10495964 DOI: 10.7717/peerj-cs.1545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 07/27/2023] [Indexed: 09/15/2023]
Abstract
Background Clustering analysis discovers hidden structures in a data set by partitioning them into disjoint clusters. Robust accuracy measures that evaluate the goodness of clustering results are critical for algorithm development and model diagnosis. Common problems of clustering accuracy measures include overlooking unmatched clusters, biases towards excessive clusters, unstable baselines, and difficulties of interpretation. In this study, we presented a novel accuracy measure, J-score, to address these issues. Methods Given a data set with known class labels, J-score quantifies how well the hypothetical clusters produced by clustering analysis recover the true classes. It starts with bidirectional set matching to identify the correspondence between true classes and hypothetical clusters based on Jaccard index. It then computes two weighted sums of Jaccard indices measuring the reconciliation from classes to clusters and vice versa. The final J-score is the harmonic mean of the two weighted sums. Results Through simulation studies and analyses of real data sets, we evaluated the performance of J-score and compared with existing measures. Our results show that J-score is effective in distinguishing partition structures that differ only by unmatched clusters, rewarding correct inference of class numbers, addressing biases towards excessive clusters, and having a relatively stable baseline. The simplicity of its calculation makes the interpretation straightforward. It is a valuable tool complementary to other accuracy measures. We released an R/jScore package implementing the algorithm.
Collapse
Affiliation(s)
- Navid Ahmadinejad
- Biodesign Institute, Arizona State University, Tempe, AZ, United States of America
- College of Health Solutions, Arizona State University, Phoenix, AZ, United States of America
| | - Yunro Chung
- Biodesign Institute, Arizona State University, Tempe, AZ, United States of America
- College of Health Solutions, Arizona State University, Phoenix, AZ, United States of America
| | - Li Liu
- Biodesign Institute, Arizona State University, Tempe, AZ, United States of America
- College of Health Solutions, Arizona State University, Phoenix, AZ, United States of America
| |
Collapse
|
17
|
Gao CX, Dwyer D, Zhu Y, Smith CL, Du L, Filia KM, Bayer J, Menssink JM, Wang T, Bergmeir C, Wood S, Cotton SM. An overview of clustering methods with guidelines for application in mental health research. Psychiatry Res 2023; 327:115265. [PMID: 37348404 DOI: 10.1016/j.psychres.2023.115265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 05/20/2023] [Accepted: 05/21/2023] [Indexed: 06/24/2023]
Abstract
Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and libraries.
Collapse
Affiliation(s)
- Caroline X Gao
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia; Department of Epidemiology and Preventative Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia.
| | - Dominic Dwyer
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Ye Zhu
- School of Information Technology, Deakin University, Geelong, VIC, Australia
| | - Catherine L Smith
- Department of Epidemiology and Preventative Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | - Lan Du
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Kate M Filia
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Johanna Bayer
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Jana M Menssink
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Teresa Wang
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Christoph Bergmeir
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia; Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Stephen Wood
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Sue M Cotton
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| |
Collapse
|
18
|
Dai M, Zhang C, Li C, Wang Q, Gao C, Yue R, Yao M, Su Z, Zheng Z. Clinical characteristics and prognosis in systemic lupus erythematosus-associated pulmonary arterial hypertension based on consensus clustering and risk prediction model. Arthritis Res Ther 2023; 25:155. [PMID: 37612772 PMCID: PMC10463535 DOI: 10.1186/s13075-023-03139-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 08/14/2023] [Indexed: 08/25/2023] Open
Abstract
BACKGROUND Pulmonary arterial hypertension (PAH) is a severe complication of systemic lupus erythematosus (SLE). This study aims to explore the clinical characteristics and prognosis in SLE-PAH based on consensus clustering and risk prediction model. METHODS A total of 205 PAH (including 163 SLE-PAH and 42 idiopathic PAH) patients were enrolled retrospectively based on medical records at the First Affiliated Hospital of Zhengzhou University from July 2014 to June 2021. Unsupervised consensus clustering was used to identify SLE-PAH subtypes that best represent the data pattern. The Kaplan-Meier survival was analyzed in different subtypes. Besides, the least absolute shrinkage and selection operator combined with Cox proportional hazards regression model were performed to construct the SLE-PAH risk prediction model. RESULTS Clustering analysis defined two subtypes, cluster 1 (n = 134) and cluster 2 (n = 29). Compared with cluster 1, SLE-PAH patients in cluster 2 had less favorable levels of poor cardiac, kidney, and coagulation function markers, with higher SLE disease activity, less frequency of PAH medications, and lower survival rate within 2 years (86.2% vs. 92.8%) (P < 0.05). The risk prediction model was also constructed, including older age at diagnosis (≥ 38 years), anti-dsDNA antibody, neuropsychiatric lupus, and platelet distribution width (PDW). CONCLUSIONS Consensus clustering identified two distinct SLE-PAH subtypes which were associated with survival outcomes. Four prognostic factors for death were discovered to construct the SLE-PAH risk prediction model.
Collapse
Affiliation(s)
- Mengmeng Dai
- Department of Rheumatology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Chunyi Zhang
- Department of Rheumatology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Chaoying Li
- Department of Rheumatology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Qianqian Wang
- Department of Rheumatology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Congcong Gao
- Department of Rheumatology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Runzhi Yue
- Department of Rheumatology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Menghui Yao
- Department of Rheumatology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Zhaohui Su
- Department of Rheumatology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
| | - Zhaohui Zheng
- Department of Rheumatology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, China.
| |
Collapse
|
19
|
Santiago E, Quick V, Olfert M, Byrd-Bredbenner C. Relationships of Maternal Employment and Work Impact with Weight-Related Behaviors and Home Environments of Mothers and Their School-Age Children. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:6390. [PMID: 37510622 PMCID: PMC10379117 DOI: 10.3390/ijerph20146390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Revised: 07/08/2023] [Accepted: 07/10/2023] [Indexed: 07/30/2023]
Abstract
The prevalence of obesity continues to rise. Preventing obesity, especially childhood obesity, is critically important. Parents, especially mothers, play a vital role in preventing childhood obesity. Numerous factors, such as maternal employment, may influence maternal weight-related practices and home environment characteristics that affect the risk of childhood obesity. Given the prevalence of both childhood obesity and maternal employment, this study was conducted to examine how weight-related maternal, child, and household behaviors as well as home environment characteristics differ by maternal employment hours and extends existing research by examining work impact on behaviors and home characteristics. U.S. mothers (n = 527) with at least one school-age child (6 to 11 years), who were between the ages of 25 and 54 years and the main food gatekeeper in the household completed an online survey. ANOVA comparisons of non-working, part-time employed, and full-time employed mothers revealed few differences in any of the variables studied. Cluster analysis of the 336 employed mothers based on six work impact scale scores found three unique clusters characterized as Enthusiastic Earners, Indifferent Earners, and Strained Earners. Few differences in sociodemographic and job characteristics occurred among clusters and the differences noted had small effect sizes. Clusters did not differ by maternal BMI or perceived child weight status. However, the clusters differed in numerous weight-related behaviors and home environment characteristics. Future research should aim to determine the direction of the associations of work impact with weight-related behaviors and home environments as well as identify potential strategies for overcoming the negative effects of employment on weight-related behaviors and environments and weight status as well as clarify other factors that may affect maternal work impact, such as time management, reasons for employment, and stress.
Collapse
Affiliation(s)
- Elena Santiago
- Maryland SNAP-Ed Department, Family and Consumer Sciences, University of Maryland, Columbia, MD 21044, USA
| | - Virginia Quick
- Department of Nutritional Sciences, Rutgers University, New Brunswick, NJ 08901-8520, USA
| | - Melissa Olfert
- Department of Animal and Nutritional Sciences, University of West Virginia, Morgantown, WV 26506-3740, USA
| | - Carol Byrd-Bredbenner
- Department of Nutritional Sciences, Rutgers University, New Brunswick, NJ 08901-8520, USA
| |
Collapse
|
20
|
Li Z, Joshi SY, Wang Y, Deshmukh SA, Matson JB. Supramolecular Peptide Nanostructures Regulate Catalytic Efficiency and Selectivity. Angew Chem Int Ed Engl 2023; 62:e202303755. [PMID: 37194941 PMCID: PMC10330506 DOI: 10.1002/anie.202303755] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Indexed: 05/18/2023]
Abstract
We report three constitutionally isomeric tetrapeptides, each comprising one glutamic acid (E) residue, one histidine (H) residue, and two lysine (KS ) residues functionalized with side-chain hydrophobic S-aroylthiooxime (SATO) groups. Depending on the order of amino acids, these amphiphilic peptides self-assembled in aqueous solution into different nanostructures:nanoribbons, a mixture of nanotoroids and nanoribbons, or nanocoils. Each nanostructure catalyzed hydrolysis of a model substrate, with the nanocoils exhibiting the greatest rate enhancement and the highest enzymatic efficiency. Coarse-grained molecular dynamics simulations, analyzed with unsupervised machine learning, revealed clusters of H residues in hydrophobic pockets along the outer edge of the nanocoils, providing insight for the observed catalytic rate enhancement. Finally, all three supramolecular nanostructures catalyzed hydrolysis of the l-substrate only when a pair of enantiomeric Boc-l/d-Phe-ONp substrates were tested. This study highlights how subtle molecular-level changes can influence supramolecular nanostructures, and ultimately affect catalytic efficiency.
Collapse
Affiliation(s)
- Zhao Li
- Department of Chemistry, Virginia Tech, Blacksburg, VA-24061, USA
- Macromolecules Innovation Institute, Virginia Tech, Blacksburg, VA-24061, USA
| | - Soumil Y Joshi
- Department of Chemical Engineering, Virginia Tech, Blacksburg, VA-24061, USA
- Macromolecules Innovation Institute, Virginia Tech, Blacksburg, VA-24061, USA
| | - Yin Wang
- Engineering Research Center of Cell & Therapeutic Antibody, School of Pharmacy, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Sanket A Deshmukh
- Department of Chemical Engineering, Virginia Tech, Blacksburg, VA-24061, USA
- Macromolecules Innovation Institute, Virginia Tech, Blacksburg, VA-24061, USA
| | - John B Matson
- Department of Chemistry, Virginia Tech, Blacksburg, VA-24061, USA
- Macromolecules Innovation Institute, Virginia Tech, Blacksburg, VA-24061, USA
| |
Collapse
|
21
|
Rugard M, Audouze K, Tromelin A. Combining the Classification and Pharmacophore Approaches to Understand Homogeneous Olfactory Perceptions at Peripheral Level: Focus on Two Aroma Mixtures. Molecules 2023; 28:molecules28104028. [PMID: 37241770 DOI: 10.3390/molecules28104028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 04/20/2023] [Accepted: 05/03/2023] [Indexed: 05/28/2023] Open
Abstract
The mechanisms involved in the homogeneous perception of odorant mixtures remain largely unknown. With the aim of enhancing knowledge about blending and masking mixture perceptions, we focused on structure-odor relationships by combining the classification and pharmacophore approaches. We built a dataset of about 5000 molecules and their related odors and reduced the multidimensional space defined by 1014 fingerprints representing the structures to a tridimensional 3D space using uniform manifold approximation and projection (UMAP). The self-organizing map (SOM) classification was then performed using the 3D coordinates in the UMAP space that defined specific clusters. We explored the allocating in these clusters of the components of two aroma mixtures: a blended mixture (red cordial (RC) mixture, 6 molecules) and a masking binary mixture (isoamyl acetate/whiskey-lactone [IA/WL]). Focusing on clusters containing the components of the mixtures, we looked at the odor notes carried by the molecules belonging to these clusters and also at their structural features by pharmacophore modeling (PHASE). The obtained pharmacophore models suggest that WL and IA could have a common binding site(s) at the peripheral level, but that would be excluded for the components of RC. In vitro experiments will soon be carried out to assess these hypotheses.
Collapse
Affiliation(s)
- Marylène Rugard
- T3S, Inserm UMR S-1124, Université Paris Cité, F-75006 Paris, France
| | - Karine Audouze
- T3S, Inserm UMR S-1124, Université Paris Cité, F-75006 Paris, France
| | - Anne Tromelin
- Centre des Sciences du Goût et de l'Alimentation, CNRS, INRAE, Institut Agro, Université de Bourgogne, F-21000 Dijon, France
| |
Collapse
|
22
|
Bertozzi-Villa A, Bever CA, Gerardin J, Proctor JL, Wu M, Harding D, Hollingsworth TD, Bhatt S, Gething PW. An archetypes approach to malaria intervention impact mapping: a new framework and example application. Malar J 2023; 22:138. [PMID: 37101269 PMCID: PMC10131392 DOI: 10.1186/s12936-023-04535-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 03/15/2023] [Indexed: 04/28/2023] Open
Abstract
BACKGROUND As both mechanistic and geospatial malaria modeling methods become more integrated into malaria policy decisions, there is increasing demand for strategies that combine these two methods. This paper introduces a novel archetypes-based methodology for generating high-resolution intervention impact maps based on mechanistic model simulations. An example configuration of the framework is described and explored. METHODS First, dimensionality reduction and clustering techniques were applied to rasterized geospatial environmental and mosquito covariates to find archetypal malaria transmission patterns. Next, mechanistic models were run on a representative site from each archetype to assess intervention impact. Finally, these mechanistic results were reprojected onto each pixel to generate full maps of intervention impact. The example configuration used ERA5 and Malaria Atlas Project covariates, singular value decomposition, k-means clustering, and the Institute for Disease Modeling's EMOD model to explore a range of three-year malaria interventions primarily focused on vector control and case management. RESULTS Rainfall, temperature, and mosquito abundance layers were clustered into ten transmission archetypes with distinct properties. Example intervention impact curves and maps highlighted archetype-specific variation in efficacy of vector control interventions. A sensitivity analysis showed that the procedure for selecting representative sites to simulate worked well in all but one archetype. CONCLUSION This paper introduces a novel methodology which combines the richness of spatiotemporal mapping with the rigor of mechanistic modeling to create a multi-purpose infrastructure for answering a broad range of important questions in the malaria policy space. It is flexible and adaptable to a range of input covariates, mechanistic models, and mapping strategies and can be adapted to the modelers' setting of choice.
Collapse
Affiliation(s)
- Amelia Bertozzi-Villa
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, Seattle, USA.
- Malaria Atlas Project, Telethon Kids Institute, Perth, Australia.
- Big Data Institute, Nuffield Department of Medicine, Oxford University, Oxford, UK.
| | - Caitlin A Bever
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, Seattle, USA
| | - Jaline Gerardin
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, Seattle, USA
- Department of Preventive Medicine and Institute for Global Health, Northwestern University, Chicago, USA
| | - Joshua L Proctor
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, Seattle, USA
| | - Meikang Wu
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, Seattle, USA
| | - Dennis Harding
- Institute for Disease Modeling, Bill & Melinda Gates Foundation, Seattle, USA
| | | | - Samir Bhatt
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College, London, UK
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Peter W Gething
- Malaria Atlas Project, Telethon Kids Institute, Perth, Australia
- Curtin University, Perth, Australia
| |
Collapse
|
23
|
Sieg M, Roselló Atanet I, Tomova MT, Schoeneberg U, Sehy V, Mäder P, März M. Discovering unknown response patterns in progress test data to improve the estimation of student performance. BMC MEDICAL EDUCATION 2023; 23:193. [PMID: 36978145 PMCID: PMC10053036 DOI: 10.1186/s12909-023-04172-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 03/17/2023] [Indexed: 06/18/2023]
Abstract
BACKGROUND The Progress Test Medizin (PTM) is a 200-question formative test that is administered to approximately 11,000 students at medical universities (Germany, Austria, Switzerland) each term. Students receive feedback on their knowledge (development) mostly in comparison to their own cohort. In this study, we use the data of the PTM to find groups with similar response patterns. METHODS We performed k-means clustering with a dataset of 5,444 students, selected cluster number k = 5, and answers as features. Subsequently, the data was passed to XGBoost with the cluster assignment as target enabling the identification of cluster-relevant questions for each cluster with SHAP. Clusters were examined by total scores, response patterns, and confidence level. Relevant questions were evaluated for difficulty index, discriminatory index, and competence levels. RESULTS Three of the five clusters can be seen as "performance" clusters: cluster 0 (n = 761) consisted predominantly of students close to graduation. Relevant questions tend to be difficult, but students answered confidently and correctly. Students in cluster 1 (n = 1,357) were advanced, cluster 3 (n = 1,453) consisted mainly of beginners. Relevant questions for these clusters were rather easy. The number of guessed answers increased. There were two "drop-out" clusters: students in cluster 2 (n = 384) dropped out of the test about halfway through after initially performing well; cluster 4 (n = 1,489) included students from the first semesters as well as "non-serious" students both with mostly incorrect guesses or no answers. CONCLUSION Clusters placed performance in the context of participating universities. Relevant questions served as good cluster separators and further supported our "performance" cluster groupings.
Collapse
Affiliation(s)
- Miriam Sieg
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, AG Progress Test Medizin, Charitéplatz 1, 10117, Berlin, Germany
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany
| | - Iván Roselló Atanet
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, AG Progress Test Medizin, Charitéplatz 1, 10117, Berlin, Germany
| | - Mihaela Todorova Tomova
- Fakultät für Informatik und Automatisierung, Data-Intensive Systems and Visualization Group (dAI.SY), Technische Universität Ilmenau, Ehrenbergstraße 29, 98693, Ilmenau, Germany
| | - Uwe Schoeneberg
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, Institute of Biometry and Clinical Epidemiology, Charitéplatz 1, 10117, Berlin, Germany
| | - Victoria Sehy
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, AG Progress Test Medizin, Charitéplatz 1, 10117, Berlin, Germany
| | - Patrick Mäder
- Fakultät für Informatik und Automatisierung, Data-Intensive Systems and Visualization Group (dAI.SY), Technische Universität Ilmenau, Ehrenbergstraße 29, 98693, Ilmenau, Germany
- Fakultät für Biowissenschaften, Friedrich Schiller Universität Jena, Schloßgasse 10, 07743, Jena, Germany
| | - Maren März
- Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin, AG Progress Test Medizin, Charitéplatz 1, 10117, Berlin, Germany.
| |
Collapse
|
24
|
Buch G, Schulz A, Schmidtmann I, Strauch K, Wild PS. A systematic review and evaluation of statistical methods for group variable selection. Stat Med 2023; 42:331-352. [PMID: 36546512 DOI: 10.1002/sim.9620] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/27/2022] [Accepted: 11/22/2022] [Indexed: 12/24/2022]
Abstract
This review condenses the knowledge on variable selection methods implemented in R and appropriate for datasets with grouped features. The focus is on regularized regressions identified through a systematic review of the literature, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. A total of 14 methods are discussed, most of which use penalty terms to perform group variable selection. Depending on how the methods account for the group structure, they can be classified into knowledge and data-driven approaches. The first encompass group-level and bi-level selection methods, while two-step approaches and collinearity-tolerant methods constitute the second category. The identified methods are briefly explained and their performance compared in a simulation study. This comparison demonstrated that group-level selection methods, such as the group minimax concave penalty, are superior to other methods in selecting relevant variable groups but are inferior in identifying important individual variables in scenarios where not all variables in the groups are predictive. This can be better achieved by bi-level selection methods such as group bridge. Two-step and collinearity-tolerant approaches such as elastic net and ordered homogeneity pursuit least absolute shrinkage and selection operator are inferior to knowledge-driven methods but provide results without requiring prior knowledge. Possible applications in proteomics are considered, leading to suggestions on which method to use depending on existing prior knowledge and research question.
Collapse
Affiliation(s)
- Gregor Buch
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany.,German Center for Cardiovascular Research (DZHK), partner site Rhine-Main, Mainz, Germany.,Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Andreas Schulz
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Irene Schmidtmann
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Konstantin Strauch
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
| | - Philipp S Wild
- Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany.,German Center for Cardiovascular Research (DZHK), partner site Rhine-Main, Mainz, Germany.,Clinical Epidemiology and Systems Medicine, Center for Thrombosis and Hemostasis, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany.,Institute of Molecular Biology (IMB), Mainz, Germany
| |
Collapse
|
25
|
Shekhar H, Sharma A. Global Food Production and Distribution Analysis using Data Mining and Unsupervised Learning. RECENT ADVANCES IN FOOD, NUTRITION & AGRICULTURE 2023; 14:RAFNA-EPUB-129092. [PMID: 36703599 DOI: 10.2174/2772574x14666230126095121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 09/27/2022] [Accepted: 11/25/2022] [Indexed: 01/28/2023]
Abstract
BACKGROUND Today's food industry is extensive and complicated, encompassing anything from subsistence agriculture to multinational food corporations. The mobility of food and food elements in food systems has a major impact on biodiversity preservation and the overall sustainability of our fragile global ecosystem. Identifying the human and livestock consumption patterns across regions and territories will optimize the dietary standards of the habitually undernourished and the expanding population without substantially increasing the amount of land under cultivation. Food preservation is the basis for economic advancement and social sustainability, so the food industry, both local and global, is fundamental to everyone. As a primary mechanism for ensuring global food preservation, there is currently a strong emphasis on accelerating food supply and decreasing waste. Thus, analyzing the production and distribution of food supply will boost economic sustainability. METHODOLOGY In this paper, we present a quantitative analysis of global and regional food supply to reveal the flow of food and feed products in various parts of the world. Using data mining and machine learning-based approaches, we seek to quantify the production and distribution of food elements. The study aims to employ artificial intelligence-based methods to comprehend the shift and change in supply and consumption patterns with timely distribution to meet the global food instability. The method involves using statistical-based approaches to identify the hidden factors and variables. Feature engineering is used to uncover the interesting features in the dataset, and various clustering-based algorithms, like K-Means, have been utilized to group and identify the similar and most notable features. RESULT AND DISCUSSION The concept of data mining and machine learning-based algorithms has helped us in identifying the global food production and distribution subsystem. The identified elements and their relationship can help stakeholders in regulating various external and internal factors, including urbanization, urban food needs, the economic, political and social framework, food demand, and supply flows. The exploratory analysis helps in establishing the efficiency and dynamism of food supply and distribution systems. CONCLUSION The outcome demonstrates a pattern indicating the flow of currently grown crops into various endpoints. Few countries with massive populations have shown tremendous growth in their production capacity. Despite the fact that only a few countries produce a large portion of food and feed crops, still it is insufficient to feed the estimated global population. Significant changes in many people's socioeconomic conditions, as well as radical dietary changes, will also be required to boost agricultural credit and economic foundations.
Collapse
Affiliation(s)
- Himanshu Shekhar
- Department of Software Engineering, Delhi Technological University, New Delhi 110042, India
| | - Abhilasha Sharma
- Department of Software Engineering, Delhi Technological University, New Delhi 110042, India
| |
Collapse
|
26
|
Carniel T, Halloy J, Dalle JM. A novel clustering approach to bipartite investor-startup networks. PLoS One 2023; 18:e0279780. [PMID: 36602981 PMCID: PMC9815571 DOI: 10.1371/journal.pone.0279780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Accepted: 12/14/2022] [Indexed: 01/06/2023] Open
Abstract
We propose a novel similarity-based clustering approach to venture capital investors that takes as input the bipartite graph of funding interactions between investors and startups and returns clusterings of investors built upon 5 characteristic dimensions. We first validate that investors are clustered in a meaningful manner and present methods of visualizing cluster characteristics. We further analyze the temporal dynamics at the cluster level and observe a meaningful second-order evolution of the sectoral investment trends. Finally, and surprisingly, we report that clusters appear stable even when running the clustering algorithm with all but one of the 5 characteristic dimensions, for instance observing geography-focused clusters without taking into account the geographical dimension or sector-focused clusters without taking into account the sectoral dimension, suggesting the presence of significant underlying complex investment patterns.
Collapse
Affiliation(s)
- Théophile Carniel
- Agoranov, Paris, France
- Université Paris Cité, CNRS, LIED UMR 8236, Paris, France
- * E-mail:
| | - José Halloy
- Université Paris Cité, CNRS, LIED UMR 8236, Paris, France
| | | |
Collapse
|
27
|
Venkatasubramaniam A, Evers L, Thakuriah P, Ampountolas K. Functional distributional clustering using spatio-temporal data. J Appl Stat 2023; 50:909-926. [PMID: 36925906 PMCID: PMC10013458 DOI: 10.1080/02664763.2021.2001443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
This paper presents a new method called the functional distributional clustering algorithm (FDCA) that seeks to identify spatially contiguous clusters and incorporate changes in temporal patterns across overcrowded networks. This method is motivated by a graph-based network composed of sensors arranged over space where recorded observations for each sensor represent a multi-modal distribution. The proposed method is fully non-parametric and generates clusters within an agglomerative hierarchical clustering approach based on a measure of distance that defines a cumulative distribution function over temporal changes for different locations in space. Traditional hierarchical clustering algorithms that are spatially adapted do not typically accommodate the temporal characteristics of the underlying data. The effectiveness of the FDCA is illustrated using an application to both empirical and simulated data from about 400 sensors in a 2.5 square miles network area in downtown San Francisco, California. The results demonstrate the superior ability of the the FDCA in identifying true clusters compared to functional only and distributional only algorithms and similar performance to a model-based clustering algorithm.
Collapse
Affiliation(s)
| | - L Evers
- School of Mathematics and Statistics, University of Glasgow, Glasgow, UK
| | - P Thakuriah
- E.J. Bloustein School of Planning & Public Policy, Rutgers University, New Brunswick, NJ, USA
| | - K Ampountolas
- James Watt School of Engineering, University of Glasgow, Glasgow, UK.,Department of Mechanical Engineering, University of Thessaly, Volos, Greece
| |
Collapse
|
28
|
Mrukwa G, Polanska J. DiviK: divisive intelligent K-means for hands-free unsupervised clustering in big biological data. BMC Bioinformatics 2022; 23:538. [PMID: 36503372 PMCID: PMC9743550 DOI: 10.1186/s12859-022-05093-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2020] [Accepted: 12/01/2022] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Investigating molecular heterogeneity provides insights into tumour origin and metabolomics. The increasing amount of data gathered makes manual analyses infeasible-therefore, automated unsupervised learning approaches are utilised for discovering tissue heterogeneity. However, automated analyses require experience setting the algorithms' hyperparameters and expert knowledge about the analysed biological processes. Moreover, feature engineering is needed to obtain valuable results because of the numerous features measured. RESULTS We propose DiviK: a scalable stepwise algorithm with local data-driven feature space adaptation for segmenting high-dimensional datasets. The algorithm is compared to the optional solutions (regular k-means, spatial and spectral approaches) combined with different feature engineering techniques (None, PCA, EXIMS, UMAP, Neural Ions). Three quality indices: Dice Index, Rand Index and EXIMS score, focusing on the overall composition of the clustering, coverage of the tumour region and spatial cluster consistency, are used to assess the quality of unsupervised analyses. Algorithms were validated on mass spectrometry imaging (MSI) datasets-2D human cancer tissue samples and 3D mouse kidney images. DiviK algorithm performed the best among the four clustering algorithms compared (overall quality score 1.24, 0.58 and 162 for d(0, 0, 0), d(1, 1, 1) and the sum of ranks, respectively), with spectral clustering being mostly second. Feature engineering techniques impact the overall clustering results less than the algorithms themselves (partial [Formula: see text] effect size: 0.141 versus 0.345, Kendall's concordance index: 0.424 versus 0.138 for d(0, 0, 0)). CONCLUSIONS DiviK could be the default choice in the exploration of MSI data. Thanks to its unique, GMM-based local optimisation of the feature space and deglomerative schema, DiviK results do not strongly depend on the feature engineering technique applied and can reveal the hidden structure in a tissue sample. Additionally, DiviK shows high scalability, and it can process at once the big omics data with more than 1.5 mln instances and a few thousand features. Finally, due to its simplicity, DiviK is easily generalisable to an even more flexible framework. Therefore, it is helpful for other -omics data (as single cell spatial transcriptomic) or tabular data in general (including medical images after appropriate embedding). A generic implementation is freely available under Apache 2.0 license at https://github.com/gmrukwa/divik .
Collapse
Affiliation(s)
- Grzegorz Mrukwa
- grid.6979.10000 0001 2335 3149Department of Data Science and Engineering, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland ,Netguru, Małe Garbary 9, 61-756 Poznań, Poland
| | - Joanna Polanska
- grid.6979.10000 0001 2335 3149Department of Data Science and Engineering, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland
| |
Collapse
|
29
|
Chawla D, Eriten M, Henak CR. Effect of osmolarity and displacement rate on cartilage microfracture clusters failure into two regimes. J Mech Behav Biomed Mater 2022; 136:105467. [PMID: 36198233 DOI: 10.1016/j.jmbbm.2022.105467] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Revised: 08/16/2022] [Accepted: 09/12/2022] [Indexed: 11/22/2022]
Abstract
Articular cartilage is a poroviscoelastic (PVE) material with remarkable resistance to fracture and fatigue failure. Cartilage failure mechanisms and material properties that govern failure are incompletely understood. Because cartilage is partially comprised of negatively charged glycosaminoglycans, altering solvent osmolarity can influence PVE relaxations. Therefore, this study aims to use osmolarity as a tool to provide additional data to interpret the role of PVE relaxations and identify cartilage failure regimes. Cartilage fracture was induced using a 100 μm radius spheroconical indenter at controlled displacement rates under three different osmolarity solvents. Secondarily, contact pressure (CP) and strain energy density (SED) were estimated to cluster data into two failure regimes with an expectation maximization algorithm. Critical displacement, critical load, critical time, and critical work to fracture increased with increasing osmolarity at a slow displacement rate whereas no significant effect was observed at a fast displacement rate. Clustering provided two distinct failure regimes, with regime (I) at lower normalized thickness (contact radius divided by sample thickness), and regime (II) at higher normalized thickness. Varied CP and SED in regime (I) suggest that failure in the regime is strain-governed. Constant CP and SED in regime (II) suggests that failure in the regime is dominantly governed by stress. These regimes can be interpreted as ductile versus brittle, or using a pressurized fragmentation interpretation. These findings demonstrated fundamental failure properties and postulate failure regimes for articular cartilage.
Collapse
Affiliation(s)
- Dipul Chawla
- Department of Mechanical Engineering, University of Wisconsin-Madison, 1513 University Ave., Madison, WI, 53706, USA
| | - Melih Eriten
- Department of Mechanical Engineering, University of Wisconsin-Madison, 1513 University Ave., Madison, WI, 53706, USA
| | - Corinne R Henak
- Department of Mechanical Engineering, University of Wisconsin-Madison, 1513 University Ave., Madison, WI, 53706, USA; Department of Biomedical Engineering, University of Wisconsin-Madison, 1550 University Ave., Madison, WI, 53706, USA; Department of Orthopedics and Rehabilitation, University of Wisconsin-Madison, 1111 Highland Ave., Madison, WI, 53705, USA.
| |
Collapse
|
30
|
Oyewole GJ, Thopil GA. Data clustering: application and trends. Artif Intell Rev 2022; 56:6439-6475. [PMID: 36466764 PMCID: PMC9702941 DOI: 10.1007/s10462-022-10325-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/04/2022] [Indexed: 11/28/2022]
Abstract
Clustering has primarily been used as an analytical technique to group unlabeled data for extracting meaningful information. The fact that no clustering algorithm can solve all clustering problems has resulted in the development of several clustering algorithms with diverse applications. We review data clustering, intending to underscore recent applications in selected industrial sectors and other notable concepts. In this paper, we begin by highlighting clustering components and discussing classification terminologies. Furthermore, specific, and general applications of clustering are discussed. Notable concepts on clustering algorithms, emerging variants, measures of similarities/dissimilarities, issues surrounding clustering optimization, validation and data types are outlined. Suggestions are made to emphasize the continued interest in clustering techniques both by scholars and Industry practitioners. Key findings in this review show the size of data as a classification criterion and as data sizes for clustering become larger and varied, the determination of the optimal number of clusters will require new feature extracting methods, validation indices and clustering techniques. In addition, clustering techniques have found growing use in key industry sectors linked to the sustainable development goals such as manufacturing, transportation and logistics, energy, and healthcare, where the use of clustering is more integrated with other analytical techniques than a stand-alone clustering technique.
Collapse
Affiliation(s)
- Gbeminiyi John Oyewole
- Department of Engineering and Technology Management, University of Pretoria, Pretoria, South Africa
| | - George Alex Thopil
- Department of Engineering and Technology Management, University of Pretoria, Pretoria, South Africa
| |
Collapse
|
31
|
Baghdadi A, Manouchehri N, Patterson Z, Fan W, Bouguila N. Hierarchical Dirichlet and Pitman–Yor process mixtures of shifted‐scaled Dirichlet distributions for proportional data modeling. Comput Intell 2022. [DOI: 10.1111/coin.12558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Ali Baghdadi
- Concordia Institute for Information Systems Engineering Concordia University Montreal Quebec Canada
| | - Narges Manouchehri
- Concordia Institute for Information Systems Engineering Concordia University Montreal Quebec Canada
| | - Zachary Patterson
- Concordia Institute for Information Systems Engineering Concordia University Montreal Quebec Canada
| | - Wentao Fan
- Department of Computer Science and Technology Huaqiao University Xiamen China
| | - Nizar Bouguila
- Concordia Institute for Information Systems Engineering Concordia University Montreal Quebec Canada
| |
Collapse
|
32
|
Coppola P, Allanson J, Naci L, Adapa R, Finoia P, Williams GB, Pickard JD, Owen AM, Menon DK, Stamatakis EA. The complexity of the stream of consciousness. Commun Biol 2022; 5:1173. [PMID: 36329176 PMCID: PMC9633704 DOI: 10.1038/s42003-022-04109-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 10/10/2022] [Indexed: 11/06/2022] Open
Abstract
Typical consciousness can be defined as an individual-specific stream of experiences. Modern consciousness research on dynamic functional connectivity uses clustering techniques to create common bases on which to compare different individuals. We propose an alternative approach by combining modern theories of consciousness and insights arising from phenomenology and dynamical systems theory. This approach enables a representation of an individual's connectivity dynamics in an intrinsically-defined, individual-specific landscape. Given the wealth of evidence relating functional connectivity to experiential states, we assume this landscape is a proxy measure of an individual's stream of consciousness. By investigating the properties of this landscape in individuals in different states of consciousness, we show that consciousness is associated with short term transitions that are less predictable, quicker, but, on average, more constant. We also show that temporally-specific connectivity states are less easily describable by network patterns that are distant in time, suggesting a richer space of possible states. We show that the cortex, cerebellum and subcortex all display consciousness-relevant dynamics and discuss the implication of our results in forming a point of contact between dynamical systems interpretations and phenomenology.
Collapse
Affiliation(s)
- Peter Coppola
- Division of Anaesthesia, School of Clinical Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
| | - Judith Allanson
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
- Department of Neurosciences, Cambridge University Hospitals NHS Foundation, Addenbrooke's Hospital, Cambridge, UK
| | - Lorina Naci
- Trinity College Institute of Neuroscience, School of Psychology, Lloyd Building, Trinity College Dublin, Dublin, Ireland
| | - Ram Adapa
- Division of Anaesthesia, School of Clinical Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
| | - Paola Finoia
- Division of Anaesthesia, School of Clinical Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
- Division of Neurosurgery, School of Clinical Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
| | - Guy B Williams
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
- Wolfson Brain Imaging Centre, University of Cambridge, Cambridge, UK
| | - John D Pickard
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
- Division of Neurosurgery, School of Clinical Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
- Wolfson Brain Imaging Centre, University of Cambridge, Cambridge, UK
| | - Adrian M Owen
- The Brain and Mind Institute, Western Interdisciplinary Research Building, N6A 5B7 University of Western Ontario, London, ON, Canada
| | - David K Menon
- Division of Anaesthesia, School of Clinical Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK
- Wolfson Brain Imaging Centre, University of Cambridge, Cambridge, UK
| | - Emmanuel A Stamatakis
- Division of Anaesthesia, School of Clinical Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK.
- Department of Clinical Neurosciences, School of Clinical Medicine, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK.
| |
Collapse
|
33
|
Keresztes M, Delaney CL, Byrd-Bredbenner C. Maternal Mental Health Status Is Associated with Weight-Related Parenting Cognitions, Home Food Environment Characteristics, and Children's Behaviors. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:13855. [PMID: 36360736 PMCID: PMC9656610 DOI: 10.3390/ijerph192113855] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 10/17/2022] [Accepted: 10/22/2022] [Indexed: 06/16/2023]
Abstract
Women experience anxiety, depression, and stress at higher levels than men and have more parenting responsibilities, especially establishing health practices in the home. Given children's vulnerability, this study aimed to increase understanding of how mothers' mental health status relates to maternal weight-related cognitions, home food environments, and child health via a cross-sectional survey design. In a cluster analysis, using maternal anxiety, depression, and stress assessments, we placed the sample of 531 mothers of school-age children into four clusters: Cluster 1 had the best mental health status, Cluster 2 had high stress, Cluster 3 had anxiety and moderate stress, and Cluster 4 had anxiety, depression, and high stress. Our results indicate an overall downward trend in weight-related cognitions as mental health worsened. Similarly, as mental health declined, so did home food environment characteristics, such as the greater use of non-recommended child feeding practices, fewer family meals, and greater sugar-sweetened beverage supplies. As mothers' mental health status became poorer, children's general health and mental health quality of life declined, and sugar-sweetened beverage intake increased. Our findings suggest that maternal stress, anxiety, and depression are moderately to strongly linked with mothers' cognitions, home food environments, and children's health. Our results also suggest that mental health interventions for mothers should assess cognitions and home food environments and consider the extent to which these factors are affecting family health.
Collapse
|
34
|
Kholod O, Basket W, Liu D, Mitchem J, Kaifi J, Dooley L, Shyu CR. Identification of Immuno-Targeted Combination Therapies Using Explanatory Subgroup Discovery for Cancer Patients with EGFR Wild-Type Gene. Cancers (Basel) 2022; 14:cancers14194759. [PMID: 36230688 PMCID: PMC9564073 DOI: 10.3390/cancers14194759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 09/27/2022] [Accepted: 09/28/2022] [Indexed: 11/16/2022] Open
Abstract
(1) Background: Phenotypic and genotypic heterogeneity are characteristic features of cancer patients. To tackle patients’ heterogeneity, immune checkpoint inhibitors (ICIs) represent some the most promising therapeutic approaches. However, approximately 50% of cancer patients that are eligible for treatment with ICIs do not respond well, especially patients with no targetable mutations. Over the years, multiple patient stratification techniques have been developed to identify homogenous patient subgroups, although matching a patient subgroup to a treatment option that can improve patients’ health outcomes remains a challenging task. (2) Methods: We extended our Subgroup Discovery algorithm to identify patient subpopulations that could potentially benefit from immuno-targeted combination therapies in four cancer types: head and neck squamous carcinoma (HNSC), lung adenocarcinoma (LUAD), lung squamous carcinoma (LUSC), and skin cutaneous melanoma (SKCM). We employed the proportional odds model to identify significant drug targets and the corresponding compounds that increased the likelihood of stable disease versus progressive disease in cancer patients with the EGFR wild-type (WT) gene. (3) Results: Our pipeline identified six significant drug targets and thirteen specific compounds for cancer patients with the EGFR WT gene. Three out of six drug targets—FCGR2B, IGF1R, and KIT—substantially increased the odds of having stable disease versus progressive disease. Progression-free survival (PFS) of more than 6 months was a common feature among the investigated subgroups. (4) Conclusions: Our approach could help to better select responders for immuno-targeted combination therapies and improve health outcomes for cancer patients with no targetable mutations.
Collapse
Affiliation(s)
- Olha Kholod
- MU Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65212, USA
| | - William Basket
- MU Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65212, USA
| | - Danlu Liu
- Department of Electrical Engineering & Computer Science, University of Missouri, Columbia, MO 65212, USA
| | - Jonathan Mitchem
- MU Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65212, USA
- Department of Surgery, School of Medicine, University of Missouri, Columbia, MO 65212, USA
- Harry S. Truman Memorial Veterans’ Hospital, Columbia, MO 65201, USA
| | - Jussuf Kaifi
- MU Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65212, USA
- Department of Surgery, School of Medicine, University of Missouri, Columbia, MO 65212, USA
| | - Laura Dooley
- Department of Otolaryngology, School of Medicine, University of Missouri, Columbia, MO 65212, USA
| | - Chi-Ren Shyu
- MU Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65212, USA
- Department of Electrical Engineering & Computer Science, University of Missouri, Columbia, MO 65212, USA
- Correspondence:
| |
Collapse
|
35
|
Min YG, Ju W, Ha YE, Ban JJ, Shin JY, Kim SM, Hong YH, Park SH, Sung JJ. Skin Biopsy as a Novel Diagnostic Aid in Immune-Mediated Neuropathies. J Neuropathol Exp Neurol 2022; 81:1018-1025. [PMID: 36137254 PMCID: PMC9677240 DOI: 10.1093/jnen/nlac085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Immune-mediated neuropathies are a heterogenous group of inflammatory peripheral nerve disorders. They can be classified according to the domain where the autoimmune process begins: the internode, paranode, or node. However, conventional diagnostic tools, electrodiagnosis (EDX), and autoantibody testing do not fully address this issue. In this institutional cohort study, we investigated the value of dermal myelinated fiber analysis for target domain-based classification. Twenty-seven consecutive patients with immune-mediated neuropathies underwent skin biopsies. The sections were stained with antibodies representative of myelinated fiber domains and were scanned using a confocal microscope. Clinical and pathological features of each patient were reviewed comprehensively. Quantitative morphometric parameters were subjected to clustering analysis, which stratified patients into 3 groups. Cluster 1 ("internodopathy") was characterized by prominent internodal disruption, intact nodes and paranodes, demyelinating EDX pattern, and absence of nodal-paranodal antibodies. Cluster 2 ("paranodopathy") was characterized by paranodal disruption and corresponding antibodies. Morphological changes were restricted to the nodes in cluster 3; we designated this cluster as "nodopathy." This report highlights the utility of skin biopsy as a diagnostic aid to gain pathogenic insight and classify patients with immune-mediated neuropathies.
Collapse
Affiliation(s)
- Young Gi Min
- From the Department of Neurology, Seoul National University Hospital, Seoul, Korea,Department of Translational Medicine, Seoul National University College of Medicine, Seoul, Korea
| | - Woohee Ju
- From the Department of Neurology, Seoul National University Hospital, Seoul, Korea
| | - Ye-Eun Ha
- From the Department of Neurology, Seoul National University Hospital, Seoul, Korea
| | - Jae-Jun Ban
- From the Department of Neurology, Seoul National University Hospital, Seoul, Korea,Neuroscience Research Institute, Biomedical Research Institute, Seoul National University College of Medicine, Seoul, Korea
| | - Je-Young Shin
- From the Department of Neurology, Seoul National University Hospital, Seoul, Korea
| | - Sung-Min Kim
- From the Department of Neurology, Seoul National University Hospital, Seoul, Korea
| | - Yoon-Ho Hong
- Department of Neurology, Seoul National University Seoul Metropolitan Government Boramae Hospital, Seoul, Korea
| | - Sung-Hye Park
- Department of Pathology, Seoul National University Hospital, Soul, Korea
| | - Jung-Joon Sung
- Send correspondence to: Jung-Joon Sung, MD, PhD, Department of Neurology, Seoul National University Hospital, Department of Translational Medicine, Seoul National University College of Medicine, 101 Daehangno, Jongnogu, Seoul 03080, Korea; E-mail:
| |
Collapse
|
36
|
Templeton J, Tran T. Cluster‐based improvement rates for trust establishment models in single or distributed multi‐agent systems. Comput Intell 2022. [DOI: 10.1111/coin.12546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Julian Templeton
- School of Electrical Engineering and Computer Science University of Ottawa Ottawa Ontario Canada
| | - Thomas Tran
- School of Electrical Engineering and Computer Science University of Ottawa Ottawa Ontario Canada
| |
Collapse
|
37
|
Elkholosy H, Ead R, Hammad A, AbouRizk S. Data mining for forecasting labor resource requirements: a case study of project management staffing requirements. INTERNATIONAL JOURNAL OF CONSTRUCTION MANAGEMENT 2022. [DOI: 10.1080/15623599.2022.2112898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
Affiliation(s)
- Hady Elkholosy
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, Canada
| | - Rana Ead
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, Canada
| | - Ahmed Hammad
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, Canada
| | - Simaan AbouRizk
- Department of Civil and Environmental Engineering, University of Alberta, Edmonton, Canada
| |
Collapse
|
38
|
Use of mixed-type data clustering algorithm for characterizing temporal and spatial distribution of biosecurity border detections of terrestrial non-indigenous species. PLoS One 2022; 17:e0272413. [PMID: 35943971 PMCID: PMC9362945 DOI: 10.1371/journal.pone.0272413] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 07/19/2022] [Indexed: 11/19/2022] Open
Abstract
Appropriate inspection protocols and mitigation strategies are a critical component of effective biosecurity measures, enabling implementation of sound management decisions. Statistical models to analyze biosecurity surveillance data are integral to this decision-making process. Our research focuses on analyzing border interception biosecurity data collected from a Class A Nature Reserve, Barrow Island, in Western Australia and the associated covariates describing both spatial and temporal interception patterns. A clustering analysis approach was adopted using a generalization of the popular k-means algorithm appropriate for mixed-type data. The analysis approach compared the efficiency of clustering using only the numerical data, then subsequently including covariates to the clustering. Based on numerical data only, three clusters gave an acceptable fit and provided information about the underlying data characteristics. Incorporation of covariates into the model suggested four distinct clusters dominated by physical location and type of detection. Clustering increases interpretability of complex models and is useful in data mining to highlight patterns to describe underlying processes in biosecurity and other research areas. Availability of more relevant data would greatly improve the model. Based on outcomes from our research we recommend broader use of cluster models in biosecurity data, with testing of these models on more datasets to validate the model choice and identify important explanatory variables.
Collapse
|
39
|
Van Dyck D, Baijot S, Aeby A, De Tiège X, Deconinck N. Cognitive, perceptual, and motor profiles of school-aged children with developmental coordination disorder. Front Psychol 2022; 13:860766. [PMID: 35992485 PMCID: PMC9381813 DOI: 10.3389/fpsyg.2022.860766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Accepted: 06/24/2022] [Indexed: 12/05/2022] Open
Abstract
Developmental coordination disorder (DCD) is a heterogeneous condition. Besides motor impairments, children with DCD often exhibit poor visual perceptual skills and executive functions. This study aimed to characterize the motor, perceptual, and cognitive profiles of children with DCD at the group level and in terms of subtypes. A total of 50 children with DCD and 31 typically developing (TD) peers (7–11 years old) underwent a comprehensive neuropsychological (15 tests) and motor (three subscales of the Movement Assessment Battery for Children-2) assessment. The percentage of children with DCD showing impairments in each measurement was first described. Hierarchical agglomerative and K-means iterative partitioning clustering analyses were then performed to distinguish the subtypes present among the complete sample of children (DCD and TD) in a data-driven way. Moderate to large percentages of children with DCD showed impaired executive functions (92%) and praxis (meaningless gestures and postures, 68%), as well as attentional (52%), visual perceptual (46%), and visuomotor (36%) skills. Clustering analyses identified five subtypes, four of them mainly consisting of children with DCD and one of TD children. These subtypes were characterized by: (i) generalized impairments (8 children with DCD), (ii) impaired manual dexterity, poor balance (static/dynamic), planning, and alertness (15 DCD and 1 TD child), (iii) impaired manual dexterity, cognitive inhibition, and poor visual perception (11 children with DCD), (iv) impaired manual dexterity and cognitive inhibition (15 DCD and 5 TD children), and (v) no impairment (25 TD and 1 child with DCD). Besides subtle differences, the motor and praxis measures did not enable to discriminate between the four subtypes of children with DCD. The subtypes were, however, characterized by distinct perceptual or cognitive impairments. These results highlight the importance of assessing exhaustively the perceptual and cognitive skills of children with DCD.
Collapse
Affiliation(s)
- Dorine Van Dyck
- Laboratoire de Neuroanatomie et Neuroimagerie Translationnelles, ULB Neuroscience Institute, Université libre de Bruxelles, Brussels, Belgium
- Department of Neurology, Hôpital Universitaire des Enfants Reine Fabiola, Université libre de Bruxelles, Brussels, Belgium
- *Correspondence: Dorine Van Dyck,
| | - Simon Baijot
- Department of Neurology, Hôpital Universitaire des Enfants Reine Fabiola, Université libre de Bruxelles, Brussels, Belgium
- Neuropsychology and Functional Neuroimaging Research Group at Center for Research in Cognition and Neurosciences, ULB Neurosciences Institute, Université libre de Bruxelles, Brussels, Belgium
| | - Alec Aeby
- Department of Neurology, Hôpital Universitaire des Enfants Reine Fabiola, Université libre de Bruxelles, Brussels, Belgium
- Neuropsychology and Functional Neuroimaging Research Group at Center for Research in Cognition and Neurosciences, ULB Neurosciences Institute, Université libre de Bruxelles, Brussels, Belgium
- Department of Pediatric Neurology, CUB Hôpital Erasme, Hôpital Universitaire de Bruxelles, Université libre de Bruxelles, Brussels, Belgium
| | - Xavier De Tiège
- Laboratoire de Neuroanatomie et Neuroimagerie Translationnelles, ULB Neuroscience Institute, Université libre de Bruxelles, Brussels, Belgium
- Department of Translational Neuroimaging, CUB Hôpital Erasme, Hôpital Universitaire de Bruxelles, Université libre de Bruxelles, Brussels, Belgium
| | - Nicolas Deconinck
- Department of Neurology, Hôpital Universitaire des Enfants Reine Fabiola, Université libre de Bruxelles, Brussels, Belgium
| |
Collapse
|
40
|
Damigos G, Zacharaki EI, Zerva N, Pavlopoulos A, Chatzikyrkou K, Koumenti A, Moustakas K, Pantos C, Mourouzis I, Lourbopoulos A. Machine learning based analysis of stroke lesions on mouse tissue sections. J Cereb Blood Flow Metab 2022; 42:1463-1477. [PMID: 35209753 PMCID: PMC9274860 DOI: 10.1177/0271678x221083387] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
An unbiased, automated and reliable method for analysis of brain lesions in tissue after ischemic stroke is missing. Manual infarct volumetry or by threshold-based semi-automated approaches is laborious, and biased to human error or biased by many false -positive and -negative data, respectively. Thereby, we developed a novel machine learning, atlas-based method for fully automated stroke analysis in mouse brain slices stained with 2% Triphenyltetrazolium-chloride (2% TTC), named "StrokeAnalyst", which runs on a user-friendly graphical interface. StrokeAnalyst registers subject images on a common spatial domain (a novel mouse TTC- brain atlas of 80 average mathematical images), calculates pixel-based, tissue-intensity statistics (z-scores), applies outlier-detection and machine learning (Random-Forest) models to increase accuracy of lesion detection, and produces volumetry data and detailed neuroanatomical information per lesion. We validated StrokeAnalyst in two separate experimental sets using the filament stroke model. StrokeAnalyst detects stroke lesions in a rater-independent and reproducible way, correctly detects hemispheric volumes even in presence of post-stroke edema and significantly minimizes false-positive errors compared to threshold-based approaches (false-positive rate 1.2-2.3%, p < 0.05). It can process scanner-acquired, and even smartphone-captured or pdf-retrieved images. Overall, StrokeAnalyst surpasses all previous TTC-volumetry approaches and increases quality, reproducibility and reliability of stroke detection in relevant preclinical models.
Collapse
Affiliation(s)
- Gerasimos Damigos
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece.,Department of Electrical and Computer Engineering, University of Patras, Patras, Greece
| | - Evangelia I Zacharaki
- Department of Electrical and Computer Engineering, University of Patras, Patras, Greece
| | - Nefeli Zerva
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Angelos Pavlopoulos
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Konstantina Chatzikyrkou
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Argyro Koumenti
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece
| | | | - Constantinos Pantos
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Iordanis Mourouzis
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece
| | - Athanasios Lourbopoulos
- Department of Pharmacology, Medical School of Athens, National and Kapodistrian University of Athens, Athens, Greece.,Institute for Stroke and Dementia Research (ISD), University of Munich Medical Center, Munich, Germany.,Neurointensive Care Unit, Schoen Klinik Bad Aibling, Germany
| |
Collapse
|
41
|
Dimensionality reduction for visualizing high-dimensional biological data. Biosystems 2022; 220:104749. [DOI: 10.1016/j.biosystems.2022.104749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2022] [Revised: 07/02/2022] [Accepted: 07/24/2022] [Indexed: 01/04/2023]
|
42
|
Identification of multidimensional phenotypes using cluster analysis in sarcoid uveitis patients. Am J Ophthalmol 2022; 242:107-115. [PMID: 35752321 DOI: 10.1016/j.ajo.2022.06.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Revised: 06/01/2022] [Accepted: 06/09/2022] [Indexed: 11/22/2022]
Abstract
PURPOSE To identify multidimensional phenotypes of sarcoid uveitis patients Design: Retrospective cohort. METHODS Study Population: Consecutive patients with biopsy-proven, presumed or probable sarcoid uveitis between December 2003 and December 2020 in Lyon. OBSERVATION PROCEDURE Data were collected from the clinical notes, and consisted in laboratory and imaging findings, systemic treatments and outcome. Systemic sarcoidosis was diagnosed according to the Abad's modified criteria and uveitis were classified according to the Standardization of Uveitis Nomenclature. A hierarchical cluster analysis was performed. MAIN OUTCOME MEASURE Identification of different phenotypes of sarcoid uveitis patients. RESULTS 299 patients were included. Three clusters were identified: 1) younger non-Caucasian patients who presented acute (75.3%), anterior (55.6%) uveitis, and systemic manifestations (87.8%), requiring oral corticosteroids (75.3%) along with immunosuppressive therapy (17.2%) and who were more prone to experience complete visual recovery (84.1%); 2) middle-aged Caucasian patients who presented chronic (91.7%), panuveitis (79.5%) and isolated uveitis at diagnosis (74.8%), requiring systemic treatment with corticosteroids (74.0%) but less frequently immunosuppressive therapy (9.8%) and a worse prognosis (45.3% complete visual recovery); 3) middle-aged Caucasian patients, without preferential chronic or acute uveitis, isolated uveitis at diagnosis (81.4%), more homogenous in terms of eye involvement repartition, requiring less corticosteroids or immunosuppressive therapy (respectively 54.1% and 13.1%) and having a prognosis close to cluster-2 patients (55.3% complete visual recovery). CONCLUSIONS This retrospective study suggested the existence of several phenotypes of sarcoid uveitis patients with different progressions and prognoses. Further studies are needed to determine the genetic and environmental factors that could explain these results.
Collapse
|
43
|
Tay D, Qiu H. Modeling Linguistic (A)Synchrony: A Case Study of Therapist-Client Interaction. Front Psychol 2022; 13:903227. [PMID: 35677134 PMCID: PMC9170272 DOI: 10.3389/fpsyg.2022.903227] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 05/03/2022] [Indexed: 11/13/2022] Open
Abstract
Interpersonal synchrony is the alignment of responses between social interactants, and is linked to positive outcomes including cooperative behavior, affiliation, and compassion in different social contexts. Language is noted as a key aspect of interpersonal synchrony, but different strands of existing work on linguistic (a)synchrony tends to be methodologically polarized. We introduce a more complementary approach to model linguistic (a)synchrony that is applicable across different interactional contexts, using psychotherapy talk as a case study. We define linguistic synchrony as similarity between linguistic choices that reflect therapists and clients' socio-psychological stances. Our approach involves (i) computing linguistic variables per session, (ii) k-means cluster analysis to derive a global synchrony measure per dyad, and (iii) qualitative analysis of sample extracts from each dyad. This is demonstrated on sample dyads from psychoanalysis, cognitive-behavioral, and humanistic therapy. The resulting synchrony measures reflect the general philosophy of these therapy types, while further qualitative analyses reveal how (a)synchrony is contextually co-constructed. Our approach provides a systematic and replicable tool for research and self-reflection in psychotherapy and other types of purposive dialogic interaction, on more representative and limited datasets alike.
Collapse
Affiliation(s)
- Dennis Tay
- Department of English and Communication, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
| | - Han Qiu
- Department of English and Communication, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China
| |
Collapse
|
44
|
Azzouzi S, Hjouji A, EL-Mekkaoui J, EL Khalfi A. An improved image clustering algorithm based on Kernel method and Tchebychev orthogonal moments. EVOLUTIONARY INTELLIGENCE 2022. [DOI: 10.1007/s12065-022-00734-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
45
|
Vadapalli S, Abdelhalim H, Zeeshan S, Ahmed Z. Artificial intelligence and machine learning approaches using gene expression and variant data for personalized medicine. Brief Bioinform 2022; 23:6590150. [PMID: 35595537 DOI: 10.1093/bib/bbac191] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 04/02/2022] [Accepted: 04/26/2022] [Indexed: 12/16/2022] Open
Abstract
Precision medicine uses genetic, environmental and lifestyle factors to more accurately diagnose and treat disease in specific groups of patients, and it is considered one of the most promising medical efforts of our time. The use of genetics is arguably the most data-rich and complex components of precision medicine. The grand challenge today is the successful assimilation of genetics into precision medicine that translates across different ancestries, diverse diseases and other distinct populations, which will require clever use of artificial intelligence (AI) and machine learning (ML) methods. Our goal here was to review and compare scientific objectives, methodologies, datasets, data sources, ethics and gaps of AI/ML approaches used in genomics and precision medicine. We selected high-quality literature published within the last 5 years that were indexed and available through PubMed Central. Our scope was narrowed to articles that reported application of AI/ML algorithms for statistical and predictive analyses using whole genome and/or whole exome sequencing for gene variants, and RNA-seq and microarrays for gene expression. We did not limit our search to specific diseases or data sources. Based on the scope of our review and comparative analysis criteria, we identified 32 different AI/ML approaches applied in variable genomics studies and report widely adapted AI/ML algorithms for predictive diagnostics across several diseases.
Collapse
Affiliation(s)
- Sreya Vadapalli
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA
| | - Habiba Abdelhalim
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA
| | - Saman Zeeshan
- Rutgers Cancer Institute of New Jersey, Rutgers University, 195 Little Albany St, New Brunswick, NJ, USA
| | - Zeeshan Ahmed
- Rutgers Institute for Health, Health Care Policy and Aging Research, Rutgers University, 112 Paterson St, New Brunswick, NJ, USA.,Department of Medicine, Robert Wood Johnson Medical School, Rutgers Biomedical and Health Sciences, 125 Paterson St, New Brunswick, NJ, USA
| |
Collapse
|
46
|
Vara N, Mirzabeigi M, Sotudeh H, Fakhrahmad SM. Application of k-means clustering algorithm to improve effectiveness of the results recommended by journal recommender system. Scientometrics 2022. [DOI: 10.1007/s11192-022-04397-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
47
|
FCM Clustering Approach Optimization Using Parallel High-Speed Intel FPGA Technology. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING 2022. [DOI: 10.1155/2022/8260283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Fuzzy C-Means (FCM) is a widely used clustering algorithm that performs well in various scientific applications. Implementing FCM involves a massive number of computations, and many parallelization techniques based on GPUs and multicore systems have been suggested. In this study, we present a method for optimizing the FCM algorithm for high-speed field-programmable gate technology (FPGA) using a high-level C-like programming language called open computing language (OpenCL). The method was designed to enable the high-level compiler/synthesis tool to manipulate a task-parallelism model and create an efficient design. Our experimental results (based on several datasets) show that the proposed method makes the FCM execution time more than 186 times faster than the conventional design running on a single-core CPU platform. Also, its processing power reached 89 giga floating points operations per second (GFLOPs).
Collapse
|
48
|
|
49
|
Butyaev A, Drogaris C, Tremblay-Savard O, Waldispühl J. Human-supervised clustering of multidimensional data using crowdsourcing. ROYAL SOCIETY OPEN SCIENCE 2022; 9:211189. [PMID: 35620007 PMCID: PMC9128850 DOI: 10.1098/rsos.211189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 04/29/2022] [Indexed: 06/15/2023]
Abstract
Clustering is a central task in many data analysis applications. However, there is no universally accepted metric to decide the occurrence of clusters. Ultimately, we have to resort to a consensus between experts. The problem is amplified with high-dimensional datasets where classical distances become uninformative and the ability of humans to fully apprehend the distribution of the data is challenged. In this paper, we design a mobile human-computing game as a tool to query human perception for the multidimensional data clustering problem. We propose two clustering algorithms that partially or entirely rely on aggregated human answers and report the results of two experiments conducted on synthetic and real-world datasets. We show that our methods perform on par or better than the most popular automated clustering algorithms. Our results suggest that hybrid systems leveraging annotations of partial datasets collected through crowdsourcing platforms can be an efficient strategy to capture the collective wisdom for solving abstract computational problems.
Collapse
|
50
|
Nicholson C, Beattie L, Beattie M, Razzaghi T, Chen S. A machine learning and clustering-based approach for county-level COVID-19 analysis. PLoS One 2022; 17:e0267558. [PMID: 35476849 PMCID: PMC9045668 DOI: 10.1371/journal.pone.0267558] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 04/11/2022] [Indexed: 12/22/2022] Open
Abstract
COVID-19 is a global pandemic threatening the lives and livelihood of millions of people across the world. Due to its novelty and quick spread, scientists have had difficulty in creating accurate forecasts for this disease. In part, this is due to variation in human behavior and environmental factors that impact disease propagation. This is especially true for regionally specific predictive models due to either limited case histories or other unique factors characterizing the region. This paper employs both supervised and unsupervised methods to identify the critical county-level demographic, mobility, weather, medical capacity, and health related county-level factors for studying COVID-19 propagation prior to the widespread availability of a vaccine. We use this feature subspace to aggregate counties into meaningful clusters to support more refined disease analysis efforts.
Collapse
Affiliation(s)
- Charles Nicholson
- School of Industrial and Systems Engineering, University of Oklahoma, Norman, Oklahoma, United States of America
- Data Science and Analytics Institute, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Lex Beattie
- Data Science and Analytics Institute, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Matthew Beattie
- Data Science and Analytics Institute, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Talayeh Razzaghi
- School of Industrial and Systems Engineering, University of Oklahoma, Norman, Oklahoma, United States of America
| | - Sixia Chen
- Department of Biostatistics and Epidemiology, University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma, United States of America
| |
Collapse
|