1
|
de Miranda Rohlfs ICP, Noce F, Wilke C, Terry VR, Parsons-Smith RL, Terry PC. Prevalence of Specific Mood Profile Clusters among Elite and Youth Athletes at a Brazilian Sports Club. Sports (Basel) 2024; 12:195. [PMID: 39058086 DOI: 10.3390/sports12070195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 07/11/2024] [Accepted: 07/16/2024] [Indexed: 07/28/2024] Open
Abstract
Those responsible for elite and youth athletes are increasingly aware of the need to balance the quest for superior performance with the need to protect the physical and psychological wellbeing of athletes. As a result, regular assessment of risks to mental health is a common feature in sports organisations. In the present study, the Brazil Mood Scale (BRAMS) was administered to 898 athletes (387 female, 511 male, age range: 12-44 years) at a leading sports club in Rio de Janeiro using either "past week" or "right now" response timeframes. Using seeded k-means cluster analysis, six distinct mood profile clusters were identified, referred to as the iceberg, surface, submerged, shark fin, inverse iceberg, and inverse Everest profiles. The latter three profiles, which are associated with varying degrees of increased risk to mental health, were reported by 238 athletes (26.5%). The prevalence of these three mood clusters varied according to the response timeframe (past week > right now) and the sex of the athletes (female > male). The prevalence of the iceberg profile varied by athlete sex (male > female), and age (12-17 years > 18+ years). Findings supported use of the BRAMS as a screening tool for the risk of psychological issues among athletes in Brazilian sports organisations.
Collapse
Affiliation(s)
- Izabel Cristina Provenza de Miranda Rohlfs
- School of Psychology and Wellbeing, University of Southern Queensland, Toowoomba 4350, Australia
- Unified Center for the Identification and Development of Performance Athletes (CUIDAR), Clube de Regatas do Flamengo, Rio de Janeiro 22430-041, Brazil
- School of Physical Education, Physiotherapy and Occupational Therapy, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Franco Noce
- School of Physical Education, Physiotherapy and Occupational Therapy, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Carolina Wilke
- School of Physical Education, Physiotherapy and Occupational Therapy, Federal University of Minas Gerais, Belo Horizonte 31270-901, Brazil
- Faculty of Sport, Technology and Health Sciences, St. Mary's University, London TW1 4SX, UK
| | - Victoria R Terry
- Centre for Health Research, University of Southern Queensland, Toowoomba 4350, Australia
- School of Nursing and Midwifery, University of Southern Queensland, Toowoomba 4350, Australia
| | - Renée L Parsons-Smith
- School of Psychology and Wellbeing, University of Southern Queensland, Toowoomba 4350, Australia
| | - Peter C Terry
- School of Psychology and Wellbeing, University of Southern Queensland, Toowoomba 4350, Australia
- Centre for Health Research, University of Southern Queensland, Toowoomba 4350, Australia
| |
Collapse
|
2
|
Zong W, Li D, Seney ML, Mcclung CA, Tseng GC. Model-based multifacet clustering with high-dimensional omics applications. Biostatistics 2024:kxae020. [PMID: 39002144 DOI: 10.1093/biostatistics/kxae020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 05/08/2024] [Accepted: 06/02/2024] [Indexed: 07/15/2024] Open
Abstract
High-dimensional omics data often contain intricate and multifaceted information, resulting in the coexistence of multiple plausible sample partitions based on different subsets of selected features. Conventional clustering methods typically yield only one clustering solution, limiting their capacity to fully capture all facets of cluster structures in high-dimensional data. To address this challenge, we propose a model-based multifacet clustering (MFClust) method based on a mixture of Gaussian mixture models, where the former mixture achieves facet assignment for gene features and the latter mixture determines cluster assignment of samples. We demonstrate superior facet and cluster assignment accuracy of MFClust through simulation studies. The proposed method is applied to three transcriptomic applications from postmortem brain and lung disease studies. The result captures multifacet clustering structures associated with critical clinical variables and provides intriguing biological insights for further hypothesis generation and discovery.
Collapse
Affiliation(s)
- Wei Zong
- Department of Biostatistics, University of Pittsburgh, 130 De Soto St, Pittsburgh, PA 15261, United States
| | - Danyang Li
- Translational Neuroscience Program, Department of Psychiatry, Center for Neuroscience, University of Pittsburgh, 3811 O'Hara Street, PA 15213, United States
| | - Marianne L Seney
- Translational Neuroscience Program, Department of Psychiatry, Center for Neuroscience, University of Pittsburgh, 3811 O'Hara Street, PA 15213, United States
| | - Colleen A Mcclung
- Translational Neuroscience Program, Department of Psychiatry, Center for Neuroscience, University of Pittsburgh, 3811 O'Hara Street, PA 15213, United States
| | - George C Tseng
- Department of Biostatistics, University of Pittsburgh, 130 De Soto St, Pittsburgh, PA 15261, United States
| |
Collapse
|
3
|
Mongia A, Zohora FT, Burget NG, Zhou Y, Saunders DC, Wang YJ, Brissova M, Powers AC, Kaestner KH, Vahedi G, Naji A, Schwartz GW, Faryabi RB. AnnoSpat annotates cell types and quantifies cellular arrangements from spatial proteomics. Nat Commun 2024; 15:3744. [PMID: 38702321 PMCID: PMC11068798 DOI: 10.1038/s41467-024-47334-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Accepted: 03/25/2024] [Indexed: 05/06/2024] Open
Abstract
Cellular composition and anatomical organization influence normal and aberrant organ functions. Emerging spatial single-cell proteomic assays such as Image Mass Cytometry (IMC) and Co-Detection by Indexing (CODEX) have facilitated the study of cellular composition and organization by enabling high-throughput measurement of cells and their localization directly in intact tissues. However, annotation of cell types and quantification of their relative localization in tissues remain challenging. To address these unmet needs for atlas-scale datasets like Human Pancreas Analysis Program (HPAP), we develop AnnoSpat (Annotator and Spatial Pattern Finder) that uses neural network and point process algorithms to automatically identify cell types and quantify cell-cell proximity relationships. Our study of data from IMC and CODEX shows the higher performance of AnnoSpat in rapid and accurate annotation of cell types compared to alternative approaches. Moreover, the application of AnnoSpat to type 1 diabetic, non-diabetic autoantibody-positive, and non-diabetic organ donor cohorts recapitulates known islet pathobiology and shows differential dynamics of pancreatic polypeptide (PP) cell abundance and CD8+ T cells infiltration in islets during type 1 diabetes progression.
Collapse
Affiliation(s)
- Aanchal Mongia
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Fatema Tuz Zohora
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
- Vector Institute, University of Toronto, Toronto, ON, Canada
| | - Noah G Burget
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Yeqiao Zhou
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Diane C Saunders
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Yue J Wang
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Marcela Brissova
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Alvin C Powers
- Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, TN, USA
- VA Tennessee Valley Healthcare System, Nashville, TN, USA
| | - Klaus H Kaestner
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Golnaz Vahedi
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Ali Naji
- Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Gregory W Schwartz
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada.
- Vector Institute, University of Toronto, Toronto, ON, Canada.
- Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
| | - Robert B Faryabi
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
| |
Collapse
|
4
|
Knisely BM, Pavliscsak HH. Clustering Research Proposal Submissions to Understand the Unmet Needs of Military Clinicians. Mil Med 2024; 189:e291-e297. [PMID: 37552636 DOI: 10.1093/milmed/usad314] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 06/15/2023] [Accepted: 07/26/2023] [Indexed: 08/10/2023] Open
Abstract
INTRODUCTION The Advanced Medical Technology Initiative (AMTI) program solicits research proposals for technology demonstrations and performance improvement projects in the domain of military medicine. Advanced Medical Technology Initiative is managed by the U.S. Army Telemedicine and Advanced Technology Research Center (TATRC). Advanced Medical Technology Initiative proposals span a wide range of topics, for example, treatment of musculoskeletal injury, application of virtual health technology, and demonstration of medical robots. The variety and distribution of central topics in these proposals (problems to be solved and technological solutions proposed) are not well characterized. Characterizing this content over time could highlight over- and under-served problem domains, inspire new technological applications, and inform future research solicitation efforts. METHODS AND MATERIALS This research sought to analyze and categorize historic AMTI proposals from 2010 to 2022 (n = 825). The analysis focused specifically on the "Problem to Be Solved" and "Technology to Demonstrated" sections of the proposals, whose categorizations are referred to as "Problem-Sets" and Solution-Sets" (PS and SS), respectively. A semi-supervised document clustering process was applied independently to the two sections. The process consisted of three stages: (1) Manual Document Annotation-a sample of proposals were manually labeled along each thematic axis; (2) Clustering-semi-supervised clustering, informed by the manually annotated sample, was applied to the proposals to produce document clusters; (3) Evaluation and Selection-quantitative and qualitative means were used to evaluate and select an optimal cluster solution. The results of the clustering were then summarized and presented descriptively. RESULTS The results of the clustering process identified 24 unique PS and 20 unique SS. The most prevalent PS were Musculoskeletal Injury (12%), Traumatic Injury (11%), and Healthcare Systems Optimization (11%). The most prevalent SS were Sensing and Imaging Technology (27%), Virtual Health (23%), and Physical and Virtual Simulation (11.5%). The most common problem-solution pair was Healthcare Systems Optimization-Virtual Health, followed by Musculoskeletal Injury-Sensing and Imaging Technology. The analysis revealed that problem-solution-set co-occurrences were well distributed throughout the domain space, demonstrating the variety of research conducted in this research domain. CONCLUSIONS A semi-supervised document clustering approach was applied to a repository of proposals to partially automate the process of document annotation. By applying this process, we successfully extracted thematic content from the proposals related to problems to be addressed and proposed technological solutions. This analysis provides a snapshot of the research supply in the domain of military medicine over the last 12 years. Future work should seek to replicate and improve the document clustering process used. Future efforts should also be made to compare these results to actual published work in the domain of military medicine, revealing differences in demand for research as determined by funding and publishing decision-makers and supply by researchers.
Collapse
Affiliation(s)
- Benjamin M Knisely
- Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD 21702, USA
| | - Holly H Pavliscsak
- Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD 21702, USA
| |
Collapse
|
5
|
Mokari A, Guo S, Bocklitz T. Exploring the Steps of Infrared (IR) Spectral Analysis: Pre-Processing, (Classical) Data Modelling, and Deep Learning. Molecules 2023; 28:6886. [PMID: 37836728 PMCID: PMC10574384 DOI: 10.3390/molecules28196886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/13/2023] [Accepted: 09/26/2023] [Indexed: 10/15/2023] Open
Abstract
Infrared (IR) spectroscopy has greatly improved the ability to study biomedical samples because IR spectroscopy measures how molecules interact with infrared light, providing a measurement of the vibrational states of the molecules. Therefore, the resulting IR spectrum provides a unique vibrational fingerprint of the sample. This characteristic makes IR spectroscopy an invaluable and versatile technology for detecting a wide variety of chemicals and is widely used in biological, chemical, and medical scenarios. These include, but are not limited to, micro-organism identification, clinical diagnosis, and explosive detection. However, IR spectroscopy is susceptible to various interfering factors such as scattering, reflection, and interference, which manifest themselves as baseline, band distortion, and intensity changes in the measured IR spectra. Combined with the absorption information of the molecules of interest, these interferences prevent direct data interpretation based on the Beer-Lambert law. Instead, more advanced data analysis approaches, particularly artificial intelligence (AI)-based algorithms, are required to remove the interfering contributions and, more importantly, to translate the spectral signals into high-level biological/chemical information. This leads to the tasks of spectral pre-processing and data modeling, the main topics of this review. In particular, we will discuss recent developments in both tasks from the perspectives of classical machine learning and deep learning.
Collapse
Affiliation(s)
- Azadeh Mokari
- Leibniz Institute of Photonic Technology, Member of Research Alliance “Leibniz Health Technologies”, 07745 Jena, Germany (S.G.)
- Institute of Physical Chemistry, Friedrich Schiller University Jena, 07743 Jena, Germany
| | - Shuxia Guo
- Leibniz Institute of Photonic Technology, Member of Research Alliance “Leibniz Health Technologies”, 07745 Jena, Germany (S.G.)
| | - Thomas Bocklitz
- Leibniz Institute of Photonic Technology, Member of Research Alliance “Leibniz Health Technologies”, 07745 Jena, Germany (S.G.)
- Institute of Physical Chemistry, Friedrich Schiller University Jena, 07743 Jena, Germany
- Institute of Computer Science, Faculty of Mathematics, Physics & Computer Science, University Bayreuth, Universitaet sstraße 30, 95447 Bayreuth, Germany
| |
Collapse
|
6
|
Gao CX, Dwyer D, Zhu Y, Smith CL, Du L, Filia KM, Bayer J, Menssink JM, Wang T, Bergmeir C, Wood S, Cotton SM. An overview of clustering methods with guidelines for application in mental health research. Psychiatry Res 2023; 327:115265. [PMID: 37348404 DOI: 10.1016/j.psychres.2023.115265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 05/20/2023] [Accepted: 05/21/2023] [Indexed: 06/24/2023]
Abstract
Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and libraries.
Collapse
Affiliation(s)
- Caroline X Gao
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia; Department of Epidemiology and Preventative Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia.
| | - Dominic Dwyer
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Ye Zhu
- School of Information Technology, Deakin University, Geelong, VIC, Australia
| | - Catherine L Smith
- Department of Epidemiology and Preventative Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | - Lan Du
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Kate M Filia
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Johanna Bayer
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Jana M Menssink
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Teresa Wang
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Christoph Bergmeir
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia; Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Stephen Wood
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Sue M Cotton
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| |
Collapse
|
7
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
8
|
Frigau L, Romano M, Ortu M, Contu G. Semi-supervised sentiment clustering on natural language texts. STAT METHOD APPL-GER 2023. [DOI: 10.1007/s10260-023-00691-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/05/2023]
Abstract
AbstractIn this paper, we propose a semi-supervised method to cluster unstructured textual data called semi-supervised sentiment clustering on natural language texts. The aim is to identify clusters homogeneous with respect to the overall sentiment of the texts analyzed. The method combines different techniques and methodologies: Sentiment Analysis, Threshold-based Naïve Bayes classifier, and Network-based Semi-supervised Clustering. It involves different steps. In the first step, the unstructured text is transformed into structured text, and it is categorized into positive or negative classes using a sentiment analysis algorithm. In the second step, the Threshold-based Naïve Bayes classifier is applied to identify the overall sentiment of the texts and to define a specific sentiment value for the topics. In the last step, Network-based Semi-supervised Clustering is applied to partition the instances into disjoint groups. The proposed algorithm is tested on a collection of reviews written by customers on Booking.com. The results have highlighted the capacity of the proposed algorithm to identify clusters that are distinct, non-overlapped, and homogeneous with respect to the overall sentiment. Results are also easily interpretable thanks to the network representation of the instances that helps to understand the relationship between them.
Collapse
|
9
|
Zhong T, Zhang Q, Huang J, Wu M, Ma S. HETEROGENEITY ANALYSIS VIA INTEGRATING MULTI-SOURCES HIGH-DIMENSIONAL DATA WITH APPLICATIONS TO CANCER STUDIES. Stat Sin 2023; 33:729-758. [PMID: 38037567 PMCID: PMC10686523 DOI: 10.5705/ss.202021.0002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2023]
Abstract
This study has been motivated by cancer research, in which heterogeneity analysis plays an important role and can be roughly classified as unsupervised or supervised. In supervised heterogeneity analysis, the finite mixture of regression (FMR) technique is used extensively, under which the covariates affect the response differently in subgroups. High-dimensional molecular and, very recently, histopathological imaging features have been analyzed separately and shown to be effective for heterogeneity analysis. For simpler analysis, they have been shown to contain overlapping, but also independent information. In this article, our goal is to conduct the first and more effective FMR-based cancer heterogeneity analysis by integrating high-dimensional molecular and histopathological imaging features. A penalization approach is developed to regularize estimation, select relevant variables, and, equally importantly, promote the identification of independent information. Consistency properties are rigorously established. An effective computational algorithm is developed. A simulation and an analysis of The Cancer Genome Atlas (TCGA) lung cancer data demonstrate the practical effectiveness of the proposed approach. Overall, this study provides a practical and useful new way of conducting supervised cancer heterogeneity analysis.
Collapse
Affiliation(s)
- Tingyan Zhong
- SJTU-Yale Joint Center for Biostatistics, Department of Bioinformatics and Biostatistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, China
| | - Qingzhao Zhang
- School of Economics and Wang Yanan Institute for Studies in Economics, Xiamen University, Fujian, China
| | - Jian Huang
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Kowloon, Hong Kong
| | - Mengyun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Shanghai, China
| | - Shuangge Ma
- Department of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06520-0834, USA
| |
Collapse
|
10
|
Guillard R, Hessas A, Korczowski L, Londero A, Congedo M, Loche V. Comparing Clustering Methods Applied to Tinnitus within a Bootstrapped and Diagnostic-Driven Semi-Supervised Framework. Brain Sci 2023; 13:brainsci13040572. [PMID: 37190537 DOI: 10.3390/brainsci13040572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/12/2023] [Accepted: 03/16/2023] [Indexed: 03/31/2023] Open
Abstract
The understanding of tinnitus has always been elusive and is largely prevented by its intrinsic heterogeneity. To address this issue, scientific research has aimed at defining stable and easily identifiable subphenotypes of tinnitus. This would allow better disentangling the multiple underlying pathophysiological mechanisms of tinnitus. In this study, three-dimensionality reduction techniques and two clustering methods were benchmarked on a database of 2772 tinnitus patients in order to obtain a reliable segmentation of subphenotypes. In this database, tinnitus patients’ endotypes (i.e., parts of a population with a condition with distinct underlying mechanisms) are reported when diagnosed by an ENT expert in tinnitus management. This partial labeling of the dataset enabled the design of an original semi-supervised framework. The objective was to perform a benchmark of different clustering methods to get as close as possible to the initial ENT expert endotypes. To do so, two metrics were used: a primary one, the quality of the separation of the endotypes already identified in the database, as well as a secondary one, the stability of the obtained clusterings. The relevance of the results was finally reviewed by two ENT experts in tinnitus management. A 20-cluster clustering was selected as the best-performing, the most-clinically relevant, and the most-stable through bootstrapping. This clustering used a T-SNE method as the dimensionality reduction technique and a k-means algorithm as the clustering method. The characteristics of this clustering are presented in this article.
Collapse
|
11
|
A new method based on ensemble time series for fast and accurate clustering. DATA TECHNOLOGIES AND APPLICATIONS 2023. [DOI: 10.1108/dta-08-2022-0300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
PurposeThe common methods for clustering time series are the use of specific distance criteria or the use of standard clustering algorithms. Ensemble clustering is one of the common techniques used in data mining to increase the accuracy of clustering. In this study, based on segmentation, selecting the best segments, and using ensemble clustering for selected segments, a multistep approach has been developed for the whole clustering of time series data.Design/methodology/approachFirst, this approach divides the time series dataset into equal segments. In the next step, using one or more internal clustering criteria, the best segments are selected, and then the selected segments are combined for final clustering. By using a loop and how to select the best segments for the final clustering (using one criterion or several criteria simultaneously), two algorithms have been developed in different settings. A logarithmic relationship limits the number of segments created in the loop.FindingAccording to Rand's external criteria and statistical tests, at first, the best setting of the two developed algorithms has been selected. Then this setting has been compared to different algorithms in the literature on clustering accuracy and execution time. The obtained results indicate more accuracy and less execution time for the proposed approach.Originality/valueThis paper proposed a fast and accurate approach for time series clustering in three main steps. This is the first work that uses a combination of segmentation and ensemble clustering. More accuracy and less execution time are the remarkable achievements of this study.
Collapse
|
12
|
Mongia A, Saunders DC, Wang YJ, Brissova M, Powers AC, Kaestner KH, Vahedi G, Naji A, Schwartz GW, Faryabi RB. AnnoSpat annotates cell types and quantifies cellular arrangements from spatial proteomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.15.524135. [PMID: 36712052 PMCID: PMC9882100 DOI: 10.1101/2023.01.15.524135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Cellular composition and anatomical organization influence normal and aberrant organ functions. Emerging spatial single-cell proteomic assays such as Image Mass Cytometry (IMC) and Co-Detection by Indexing (CODEX) have facilitated the study of cellular composition and organization by enabling high-throughput measurement of cells and their localization directly in intact tissues. However, annotation of cell types and quantification of their relative localization in tissues remain challenging. To address these unmet needs, we developed AnnoSpat (Annotator and Spatial Pattern Finder) that uses neural network and point process algorithms to automatically identify cell types and quantify cell-cell proximity relationships. Our study of data from IMC and CODEX show the superior performance of AnnoSpat in rapid and accurate annotation of cell types compared to alternative approaches. Moreover, the application of AnnoSpat to type 1 diabetic, non-diabetic autoantibody-positive, and non-diabetic organ donor cohorts recapitulated known islet pathobiology and showed differential dynamics of pancreatic polypeptide (PP) cell abundance and CD8+ T cells infiltration in islets during type 1 diabetes progression.
Collapse
Affiliation(s)
- Aanchal Mongia
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Diane C. Saunders
- Department of Medicine, Division of Diabetes, Endocrinology, and Metabolism, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Yue J. Wang
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Marcela Brissova
- Department of Medicine, Division of Diabetes, Endocrinology, and Metabolism, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Alvin C. Powers
- Department of Molecular Physiology and Biophysics, Vanderbilt University School of Medicine, Nashville, TN, USA
- Department of Medicine, Division of Diabetes, Endocrinology, and Metabolism, Vanderbilt University School of Medicine, Nashville, TN, USA
- VA Tennessee Valley Healthcare System, Nashville, Tennessee, 37212, USA
- Human Pancreas Analysis Program Consortium
| | - Klaus H. Kaestner
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Human Pancreas Analysis Program Consortium
| | - Golnaz Vahedi
- Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Human Pancreas Analysis Program Consortium
| | - Ali Naji
- Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Institute for Diabetes, Obesity and Metabolism, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Human Pancreas Analysis Program Consortium
| | - Gregory W. Schwartz
- Princess Margaret Cancer Center, University Health Network, Toronto, ON, Canada
| | - Robert B. Faryabi
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Epigenetics Institute, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
- Human Pancreas Analysis Program Consortium
| |
Collapse
|
13
|
Knisely BM, Pavliscsak HH. Research proposal content extraction using natural language processing and semi-supervised clustering: A demonstration and comparative analysis. Scientometrics 2023; 128:3197-3224. [PMID: 37101971 PMCID: PMC10083066 DOI: 10.1007/s11192-023-04689-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 03/07/2023] [Indexed: 04/28/2023]
Abstract
Funding institutions often solicit text-based research proposals to evaluate potential recipients. Leveraging the information contained in these documents could help institutions understand the supply of research within their domain. In this work, an end-to-end methodology for semi-supervised document clustering is introduced to partially automate classification of research proposals based on thematic areas of interest. The methodology consists of three stages: (1) manual annotation of a document sample; (2) semi-supervised clustering of documents; (3) evaluation of cluster results using quantitative metrics and qualitative ratings (coherence, relevance, distinctiveness) by experts. The methodology is described in detail to encourage replication and is demonstrated on a real-world data set. This demonstration sought to categorize proposals submitted to the US Army Telemedicine and Advanced Technology Research Center (TATRC) related to technological innovations in military medicine. A comparative analysis of method features was performed, including unsupervised vs. semi-supervised clustering, several document vectorization techniques, and several cluster result selection strategies. Outcomes suggest that pretrained Bidirectional Encoder Representations from Transformers (BERT) embeddings were better suited for the task than older text embedding techniques. When comparing expert ratings between algorithms, semi-supervised clustering produced coherence ratings ~ 25% better on average compared to standard unsupervised clustering with negligible differences in cluster distinctiveness. Last, it was shown that a cluster result selection strategy that balances internal and external validity produced ideal results. With further refinement, this methodological framework shows promise as a useful analytical tool for institutions to unlock hidden insights from untapped archives and similar administrative document repositories. Supplementary Information The online version contains supplementary material available at 10.1007/s11192-023-04689-3.
Collapse
Affiliation(s)
- Benjamin M. Knisely
- Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD 21702 USA
| | - Holly H. Pavliscsak
- Telemedicine and Advanced Technology Research Center, United States Army Medical Research and Development Command, Fort Detrick, MD 21702 USA
| |
Collapse
|
14
|
Mvula PK, Branco P, Jourdan GV, Viktor HL. A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning. DISCOVER DATA 2023; 1:4. [PMID: 37038388 PMCID: PMC10079755 DOI: 10.1007/s44248-023-00003-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/21/2023] [Indexed: 04/12/2023]
Abstract
In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security infrastructure enhancement through pattern recognition, real-time attack detection, and in-depth penetration testing. Therefore, for these applications in particular, the datasets used to build the models must be carefully thought to be representative of real-world data. However, because of the scarcity of labelled data and the cost of manually labelling positive examples, there is a growing corpus of literature utilizing Semi-Supervised Learning with cyber-security data repositories. In this work, we provide a comprehensive overview of publicly available data repositories and datasets used for building computer security or cyber-security systems based on Semi-Supervised Learning, where only a few labels are necessary or available for building strong models. We highlight the strengths and limitations of the data repositories and sets and provide an analysis of the performance assessment metrics used to evaluate the built models. Finally, we discuss open challenges and provide future research directions for using cyber-security datasets and evaluating models built upon them.
Collapse
Affiliation(s)
- Paul K. Mvula
- Present Address: School of Electrical Engineering and Computer Science (EECS), University of Ottawa, 800 King Edward Avenue, Ottawa, K1N 6N5 ON Canada
| | - Paula Branco
- Present Address: School of Electrical Engineering and Computer Science (EECS), University of Ottawa, 800 King Edward Avenue, Ottawa, K1N 6N5 ON Canada
| | - Guy-Vincent Jourdan
- Present Address: School of Electrical Engineering and Computer Science (EECS), University of Ottawa, 800 King Edward Avenue, Ottawa, K1N 6N5 ON Canada
| | - Herna L. Viktor
- Present Address: School of Electrical Engineering and Computer Science (EECS), University of Ottawa, 800 King Edward Avenue, Ottawa, K1N 6N5 ON Canada
| |
Collapse
|
15
|
Faucheux L, Soumelis V, Chevret S. Multiobjective semisupervised learning with a right-censored endpoint adapted to the multiple imputation framework. Biom J 2022; 64:1446-1466. [PMID: 34180091 DOI: 10.1002/bimj.202000365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 04/12/2021] [Accepted: 06/05/2021] [Indexed: 12/14/2022]
Abstract
Semisupervised learning aims to use additional knowledge in the search for data structure. In clinical applications, including predictive information in the construction of a data-driven classification is of major importance. This work was motivated by a study that aimed to identify different patterns of immune parameters that would be associated with relapse-free survival in a cohort of breast cancer patients. Supervised and unsupervised objectives can be concomitantly optimized using multiobjective optimization. We propose such a procedure that addresses two challenges in the semisupervised approach, that is, missing data and additional knowledge based on survival time. The former was handled by using multiple imputation and consensus clustering. Survival information was incorporated in the supervised objective through the estimation of a cross-validation error of a Cox regression. A simulation study was performed to assess the performance of the proposed procedure. On complete datasets, the performances were compared to those of an existing modified multiobjective semisupervised learning method. The added value of including the survival data in the learning process was assessed by comparing the procedure to unsupervised learning. The proposed procedure showed better performance than the existing method, notably in the selection of the number of clusters. On incomplete datasets, the procedure showed little sensitivity to most of its parameters, even though a high number of imputations and partition initialization seeds improved the performance. The performance was degraded with a high proportion of missing data (40%) and with more ambiguous data structures. Simulation results and application on real data support the conclusion that our procedure enables the construction of a classification associated with a right-censored endpoint on a possibly incomplete dataset.
Collapse
Affiliation(s)
- Lilith Faucheux
- Université de Paris, Statistic and epidemiologic research center, INSERM UMR-1153, ECSTRRA Team, Paris, France.,Université de Paris, INSERM U976, Paris, France
| | - Vassili Soumelis
- Université de Paris, INSERM U976, Paris, France.,Laboratoire d'immunologie, biologie et histocompatibilité, AP-HP, Hôpital Saint-Louis, Paris, France
| | - Sylvie Chevret
- Université de Paris, Statistic and epidemiologic research center, INSERM UMR-1153, ECSTRRA Team, Paris, France.,Service de Biostatistique et Information Médicale, AP-HP, Hôpital Saint-Louis, Paris, France
| |
Collapse
|
16
|
O'Donnell MS, Edmunds DR, Aldridge CL, Heinrichs JA, Monroe AP, Coates PS, Prochazka BG, Hanser SE, Wiechman LA. Defining biologically relevant and hierarchically nested population units to inform wildlife management. Ecol Evol 2022; 12:e9565. [PMID: 36466138 PMCID: PMC9712811 DOI: 10.1002/ece3.9565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Revised: 10/29/2022] [Accepted: 11/11/2022] [Indexed: 12/05/2022] Open
Abstract
Wildlife populations are increasingly affected by natural and anthropogenic changes that negatively alter biotic and abiotic processes at multiple spatiotemporal scales and therefore require increased wildlife management and conservation efforts. However, wildlife management boundaries frequently lack biological context and mechanisms to assess demographic data across the multiple spatiotemporal scales influencing populations. To address these limitations, we developed a novel approach to define biologically relevant subpopulations of hierarchically nested population levels that could facilitate managing and conserving wildlife populations and habitats. Our approach relied on the Spatial "K"luster Analysis by Tree Edge Removal clustering algorithm, which we applied in an agglomerative manner (bottom-to-top). We modified the clustering algorithm using a workflow and population structure tiers from least-cost paths, which captured biological inferences of habitat conditions (functional connectivity), dispersal capabilities (potential connectivity), genetic information, and functional processes affecting movements. The approach uniquely included context of habitat resources (biotic and abiotic) summarized at multiple spatial scales surrounding locations with breeding site fidelity and constraint-based rules (number of sites grouped and population structure tiers). We applied our approach to greater sage-grouse (Centrocercus urophasianus), a species of conservation concern, across their range within the western United States. This case study produced 13 hierarchically nested population levels (akin to cluster levels, each representing a collection of subpopulations of an increasing number of breeding sites). These closely approximated population closure at finer ecological scales (smaller subpopulation extents with fewer breeding sites; cluster levels ≥2), where >92% of individual sage-grouse's time occurred within their home cluster. With available population monitoring data, our approaches can support the investigation of factors affecting population dynamics at multiple scales and assist managers with making informed, targeted, and cost-effective decisions within an adaptive management framework. Importantly, our approach provides the flexibility of including species-relevant context, thereby supporting other wildlife characterized by site fidelity.
Collapse
Affiliation(s)
| | - David R. Edmunds
- U.S. Geological SurveyFort Collins Science CenterFort CollinsColoradoUSA
| | | | - Julie A. Heinrichs
- Natural Resource Ecology Laboratory, U.S. Geological Survey, Fort Collins Science CenterColorado State UniversityFort CollinsColoradoUSA
| | - Adrian P. Monroe
- U.S. Geological SurveyFort Collins Science CenterFort CollinsColoradoUSA
| | - Peter S. Coates
- U.S. Geological SurveyWestern Ecological Research CenterDixonCaliforniaUSA
| | - Brian G. Prochazka
- U.S. Geological SurveyWestern Ecological Research CenterDixonCaliforniaUSA
| | - Steve E. Hanser
- U.S. Geological SurveyFort Collins Science CenterFort CollinsColoradoUSA
| | - Lief A. Wiechman
- U.S. Geological SurveyEcosystems Mission AreaFort CollinsColoradoUSA
| |
Collapse
|
17
|
Gupta A, Das S. Transfer Clustering Using a Multiple Kernel Metric Learned Under Multi-Instance Weak Supervision. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2022. [DOI: 10.1109/tetci.2021.3110526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Affiliation(s)
- Avisek Gupta
- Electronics and Communication Sciences Unit, Indian Statistical Institute, Kolkata, India
| | - Swagatam Das
- Electronics and Communication Sciences Unit, Indian Statistical Institute, Kolkata, India
| |
Collapse
|
18
|
Nish Chandran S, Durgaprasad Gangodkar. Scalable Semi-Supervised Clustering for Face Recognition with Insufficient Labelled Samples. PATTERN RECOGNITION AND IMAGE ANALYSIS 2022. [DOI: 10.1134/s1054661822020055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
19
|
Zhou P, Wang N, Zhao S, Zhang Y. Robust semi-supervised clustering via data transductive warping. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03493-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
20
|
Bunker J, Curtis K, Girolami M, Sriharsha R. A Mixture Modeling Approach for Clustering Log Files with Coreset and User Feedback. Pattern Recognit Lett 2022. [DOI: 10.1016/j.patrec.2022.01.027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
21
|
Pittet I, Kojovic N, Franchini M, Schaer M. Trajectories of imitation skills in preschoolers with autism spectrum disorders. J Neurodev Disord 2022; 14:2. [PMID: 34986807 PMCID: PMC8903579 DOI: 10.1186/s11689-021-09412-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Accepted: 12/16/2021] [Indexed: 11/23/2022] Open
Abstract
Background Imitation skills play a crucial role in social cognitive development from early childhood. Many studies have shown a deficit in imitation skills in children with autism spectrum disorders (ASD). Little is known about the development of imitation behaviors in children with ASD. This study aims to measure the trajectories of early imitation skills in preschoolers with ASD and how these skills impact other areas of early development. Methods For this purpose, we assessed imitation, language, and cognition skills in 177 children with ASD and 43 typically developing children (TD) aged 2 to 5 years old, 126 of which were followed longitudinally, yielding a total of 396 time points. Results Our results confirmed the presence of an early imitation deficit in toddlers with ASD compared to TD children. The study of the trajectories showed that these difficulties were marked at the age of 2 years and gradually decreased until the age of 5 years old. Imitation skills were strongly linked with cognitive and language skills and level of symptoms in our ASD group at baseline. Moreover, the imitation skills at baseline were predictive of the language gains a year later in our ASD group. Using a data-driven clustering method, we delineated different developmental trajectories of imitation skills within the ASD group. Conclusions The clinical implications of the findings are discussed, particularly the impact of an early imitation deficit on other areas of competence of the young child. Supplementary Information The online version contains supplementary material available at 10.1186/s11689-021-09412-y.
Collapse
Affiliation(s)
- Irène Pittet
- Autism Brain and Behavior (ABB) Lab, Department of Psychiatry, Faculty of Medicine, University of Geneva, Geneva, Switzerland.
| | - Nada Kojovic
- Autism Brain and Behavior (ABB) Lab, Department of Psychiatry, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | | | - Marie Schaer
- Autism Brain and Behavior (ABB) Lab, Department of Psychiatry, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Fondation Pôle Autisme, Geneva, Switzerland
| |
Collapse
|
22
|
Semi-supervised consensus clustering based on closed patterns. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2021.107599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
23
|
Liu T, Yu H, Blair RH. Stability estimation for unsupervised clustering: A review. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL STATISTICS 2022; 14:e1575. [PMID: 36583207 PMCID: PMC9787023 DOI: 10.1002/wics.1575] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Revised: 11/24/2021] [Accepted: 12/08/2021] [Indexed: 01/01/2023]
Abstract
Cluster analysis remains one of the most challenging yet fundamental tasks in unsupervised learning. This is due in part to the fact that there are no labels or gold standards by which performance can be measured. Moreover, the wide range of clustering methods available is governed by different objective functions, different parameters, and dissimilarity measures. The purpose of clustering is versatile, often playing critical roles in the early stages of exploratory data analysis and as an endpoint for knowledge and discovery. Thus, understanding the quality of a clustering is of critical importance. The concept of stability has emerged as a strategy for assessing the performance and reproducibility of data clustering. The key idea is to produce perturbed data sets that are very close to the original, and cluster them. If the clustering is stable, then the clusters from the original data will be preserved in the perturbed data clustering. The nature of the perturbation, and the methods for quantifying similarity between clusterings, are nontrivial, and ultimately what distinguishes many of the stability estimation methods apart. In this review, we provide an overview of the very active research area of cluster stability estimation and discuss some of the open questions and challenges that remain in the field. This article is categorized under:Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification.
Collapse
Affiliation(s)
- Tianmou Liu
- Institute for Artificial Intelligence and Data ScienceState University of New York at BuffaloBuffaloNew YorkUSA
| | - Han Yu
- Roswell Park Comprehensive Cancer CenterBuffaloNew YorkUSA
| | - Rachael Hageman Blair
- Department of Biostatistics, Institute for Artificial Intelligence and Data ScienceState University of New York at BuffaloBuffaloNew YorkUSA
| |
Collapse
|
24
|
Fagherazzi G, Zhang L, Aguayo G, Pastore J, Goetzinger C, Fischer A, Malisoux L, Samouda H, Bohn T, Ruiz-Castell M, Huiart L. Towards precision cardiometabolic prevention: results from a machine learning, semi-supervised clustering approach in the nationwide population-based ORISCAV-LUX 2 study. Sci Rep 2021; 11:16056. [PMID: 34362963 PMCID: PMC8346462 DOI: 10.1038/s41598-021-95487-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 07/27/2021] [Indexed: 11/09/2022] Open
Abstract
Given the rapid increase in the incidence of cardiometabolic conditions, there is an urgent need for better approaches to prevent as many cases as possible and move from a one-size-fits-all approach to a precision cardiometabolic prevention strategy in the general population. We used data from ORISCAV-LUX 2, a nationwide, cross-sectional, population-based study. On the 1356 participants, we used a machine learning semi-supervised cluster method guided by body mass index (BMI) and glycated hemoglobin (HbA1c), and a set of 29 cardiometabolic variables, to identify subgroups of interest for cardiometabolic health. Cluster stability was assessed with the Jaccard similarity index. We have observed 4 clusters with a very high stability (ranging between 92 and 100%). Based on distinctive features that deviate from the overall population distribution, we have labeled Cluster 1 (N = 729, 53.76%) as "Healthy", Cluster 2 (N = 508, 37.46%) as "Family history-Overweight-High Cholesterol ", Cluster 3 (N = 91, 6.71%) as "Severe Obesity-Prediabetes-Inflammation" and Cluster 4 (N = 28, 2.06%) as "Diabetes-Hypertension-Poor CV Health". Our work provides an in-depth characterization and thus, a better understanding of cardiometabolic health in the general population. Our data suggest that such a clustering approach could now be used to define more targeted and tailored strategies for the prevention of cardiometabolic diseases at a population level. This study provides a first step towards precision cardiometabolic prevention and should be externally validated in other contexts.
Collapse
Affiliation(s)
- Guy Fagherazzi
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, 1A-B, rue Thomas Edison, 1445, Strassen, Luxembourg. .,Center of Epidemiology and Population Health UMR 1018, Inserm, Gustave Roussy Institute, Paris South - Paris Saclay University, Villejuif, France.
| | - Lu Zhang
- Quantitative Biology Unit, Luxembourg Institute of Health, 1A-B, rue Thomas Edison, 1445, Strassen, Luxembourg
| | - Gloria Aguayo
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, 1A-B, rue Thomas Edison, 1445, Strassen, Luxembourg
| | - Jessica Pastore
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, 1A-B, rue Thomas Edison, 1445, Strassen, Luxembourg
| | - Catherine Goetzinger
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, 1A-B, rue Thomas Edison, 1445, Strassen, Luxembourg.,University of Luxembourg, 2, avenue de l'Université, 4365, Esch-sur-Alzette, Luxembourg
| | - Aurélie Fischer
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, 1A-B, rue Thomas Edison, 1445, Strassen, Luxembourg
| | - Laurent Malisoux
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, 1A-B, rue Thomas Edison, 1445, Strassen, Luxembourg
| | - Hanen Samouda
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, 1A-B, rue Thomas Edison, 1445, Strassen, Luxembourg
| | - Torsten Bohn
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, 1A-B, rue Thomas Edison, 1445, Strassen, Luxembourg
| | - Maria Ruiz-Castell
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, 1A-B, rue Thomas Edison, 1445, Strassen, Luxembourg
| | - Laetitia Huiart
- Deep Digital Phenotyping Research Unit, Department of Population Health, Luxembourg Institute of Health, 1A-B, rue Thomas Edison, 1445, Strassen, Luxembourg.,University of Luxembourg, 2, avenue de l'Université, 4365, Esch-sur-Alzette, Luxembourg
| |
Collapse
|
25
|
Vasquez-Rios G, Menon MC. Kidney Transplant Rejection Clusters and Graft Outcomes: Revisiting Banff in the Era of "Big Data". J Am Soc Nephrol 2021; 32:1009-1011. [PMID: 33824191 PMCID: PMC8259687 DOI: 10.1681/asn.2021030348] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Affiliation(s)
- George Vasquez-Rios
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Madhav C. Menon
- Division of Nephrology, Department of Medicine, Icahn School of Medicine at Mount Sinai, New York, New York,Division of Nephrology, Yale University School of Medicine, New Haven, Connecticut
| |
Collapse
|
26
|
Zhang Y, Melnykov V, Melnykov I. Semi-supervised clustering of time-dependent categorical sequences with application to discovering education-based life patterns. STAT MODEL 2021. [DOI: 10.1177/1471082x21989170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
A new approach to the analysis of heterogeneous categorical sequences is proposed. The first-order Markov model is employed in a finite mixture setting with initial state and transition probabilities being expressed as functions of time. The expectation–maximization algorithm approach to parameter estimation is implemented in the presence of positive equivalence constraints that determine which observations must be placed in the same class in the solution. The proposed model is applied to a dataset from the British Household Panel Survey to evaluate the association between the education background and life outcomes of study participants. The analysis of the survey data reveals many interesting relationships between the level of education and major life events.
Collapse
Affiliation(s)
- Yingying Zhang
- Department of Mathematics and Statistics, University of South Alabama, Mobile, AL, USA
| | - Volodymyr Melnykov
- Department of Information Systems, Statistics, and Management Science, The University of Alabama, Tuscaloosa, AL, USA
| | - Igor Melnykov
- Department of Mathematics and Statistics, University of Minnesota Duluth, Duluth, MN, USA
| |
Collapse
|
27
|
Pister A, Buono P, Fekete JD, Plaisant C, Valdivia P. Integrating Prior Knowledge in Mixed-Initiative Social Network Clustering. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2021; 27:1775-1785. [PMID: 33095715 DOI: 10.1109/tvcg.2020.3030347] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
We propose a new approach-called PK-clustering-to help social scientists create meaningful clusters in social networks. Many clustering algorithms exist but most social scientists find them difficult to understand, and tools do not provide any guidance to choose algorithms, or to evaluate results taking into account the prior knowledge of the scientists. Our work introduces a new clustering approach and a visual analytics user interface that address this issue. It is based on a process that 1) captures the prior knowledge of the scientists as a set of incomplete clusters, 2) runs multiple clustering algorithms (similarly to clustering ensemble methods), 3) visualizes the results of all the algorithms ranked and summarized by how well each algorithm matches the prior knowledge, 4) evaluates the consensus between user-selected algorithms and 5) allows users to review details and iteratively update the acquired knowledge. We describe our approach using an initial functional prototype, then provide two examples of use and early feedback from social scientists. We believe our clustering approach offers a novel constructive method to iteratively build knowledge while avoiding being overly influenced by the results of often randomly selected black-box clustering algorithms.
Collapse
|
28
|
Herzog NJ, Magoulas GD. Brain Asymmetry Detection and Machine Learning Classification for Diagnosis of Early Dementia. SENSORS 2021; 21:s21030778. [PMID: 33498908 PMCID: PMC7865614 DOI: 10.3390/s21030778] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Revised: 01/20/2021] [Accepted: 01/21/2021] [Indexed: 11/30/2022]
Abstract
Early identification of degenerative processes in the human brain is considered essential for providing proper care and treatment. This may involve detecting structural and functional cerebral changes such as changes in the degree of asymmetry between the left and right hemispheres. Changes can be detected by computational algorithms and used for the early diagnosis of dementia and its stages (amnestic early mild cognitive impairment (EMCI), Alzheimer’s Disease (AD)), and can help to monitor the progress of the disease. In this vein, the paper proposes a data processing pipeline that can be implemented on commodity hardware. It uses features of brain asymmetries, extracted from MRI of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, for the analysis of structural changes, and machine learning classification of the pathology. The experiments provide promising results, distinguishing between subjects with normal cognition (NC) and patients with early or progressive dementia. Supervised machine learning algorithms and convolutional neural networks tested are reaching an accuracy of 92.5% and 75.0% for NC vs. EMCI, and 93.0% and 90.5% for NC vs. AD, respectively. The proposed pipeline offers a promising low-cost alternative for the classification of dementia and can be potentially useful to other brain degenerative disorders that are accompanied by changes in the brain asymmetries.
Collapse
Affiliation(s)
- Nitsa J. Herzog
- Department of Computer Science, Birkbeck College, University of London, London WC1E 7HZ, UK;
| | - George D. Magoulas
- Department of Computer Science, Birkbeck College, University of London, London WC1E 7HZ, UK;
- Birkbeck Knowledge Lab, University of London, London WC1E 7HZ, UK
- Correspondence:
| |
Collapse
|
29
|
Becker BN, Luo J, Gray KS, Colson C, Cohen DE, McMurray S, Gregory B, Lohmeyer N, Brunelli SM. Association of Chronic Condition Special Needs Plan With Hospitalization and Mortality Among Patients With End-Stage Kidney Disease. JAMA Netw Open 2020; 3:e2023663. [PMID: 33136135 PMCID: PMC7607441 DOI: 10.1001/jamanetworkopen.2020.23663] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
IMPORTANCE While several studies have demonstrated the benefit of enrollment in chronic condition special needs plans (C-SNPs) for other chronic diseases (eg, diabetes), there is no evaluation of the association of C-SNPs with outcomes among patients with end-stage kidney disease (ESKD). OBJECTIVE To examine whether and to what degree C-SNP enrollment was associated with improved clinical outcomes and quality of life in patients with ESKD. DESIGN, SETTING, AND PARTICIPANTS This multicenter cohort study included 2718 patients who were newly enrolled in an ESKD C-SNP between January 1, 2013, and September 30, 2017, and receiving dialysis from DaVita Kidney Care. Patients were followed up until death, loss to follow-up, or end of study (ie, December 31, 2018). Enrollees in C-SNP were matched via multiple clinical and demographic characteristics with 2 different control populations, as follows: (1) those in the same facilities (n = 2545) or (2) those in similar counties (n = 1986). Patients enrolled in CareMore C-SNPs (n = 206) were excluded from the study. Data analysis was conducted June to December 2019. EXPOSURES Standard ESKD care with dialysis plus access to an integrated care team who worked with the patient and the dialysis team, comprehensive health assessments done by the integrated care team, and access to select benefits (such as vision and dental care) as a C-SNP enrollee. MAIN OUTCOMES AND MEASURES Hospitalizations, mortality, laboratory values indicative of metabolic control, and Kidney Disease Quality of Life 36-item (KDQOL-36) survey scores. RESULTS The 2545 C-SNP enrollees in the facility-matched analysis had a mean (SD) age of 57.2 (12.9) years, and included 968 (38.0%) women, 1328 (52.2%) Hispanic individuals, and 553 (21.7%) African American individuals. The 1986 C-SNP enrollees in the county-matched analysis had a mean (SD) age of 57.8 (12.2) years, with 705 (35.5%) women, 1085 (54.6%) Hispanic individuals, and 472 (23.8%) African American individuals. Compared with patients not enrolled in C-SNP, enrollees had lower hospitalization rates, with incidence rate ratios of 0.90 (95% CI, 0.84-0.97; P = .006) in the facility-matched analysis and 0.76 (95% CI, 0.70-0.83; P < .001) in the county-matched analysis. Compared with patients not enrolled in C-SNP, enrollees had decreased mortality risk in the same facilities (hazard ratio, 0.77; 95% CI, 0.68-0.88; P < .001) and in the same counties (hazard ratio, 0.77; 95% CI, 0.66-0.88; P < .001). No significant differences were observed between C-SNP enrollees and matched patients in metabolic laboratory values or KDQOL-36 survey scores. CONCLUSIONS AND RELEVANCE This cohort study found a positive association of C-SNP enrollment with lower rates of hospitalization and mortality. The findings suggest that the additional services and benefits C-SNPs provide may improve outcomes compared with standard of care for patients with ESKD.
Collapse
Affiliation(s)
| | - Jiacong Luo
- DaVita Clinical Research, Minneapolis, Minnesota
| | | | - Carey Colson
- DaVita Clinical Research, Minneapolis, Minnesota
| | | | | | | | | | | |
Collapse
|
30
|
Ohi AQ, Mridha M, Safir FB, Hamid MA, Monowar MM. AutoEmbedder: A semi-supervised DNN embedding system for clustering. Knowl Based Syst 2020. [DOI: 10.1016/j.knosys.2020.106190] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
31
|
Horne E, Tibble H, Sheikh A, Tsanas A. Challenges of Clustering Multimodal Clinical Data: Review of Applications in Asthma Subtyping. JMIR Med Inform 2020; 8:e16452. [PMID: 32463370 PMCID: PMC7290450 DOI: 10.2196/16452] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Revised: 12/10/2019] [Accepted: 02/10/2020] [Indexed: 12/27/2022] Open
Abstract
Background In the current era of personalized medicine, there is increasing interest in understanding the heterogeneity in disease populations. Cluster analysis is a method commonly used to identify subtypes in heterogeneous disease populations. The clinical data used in such applications are typically multimodal, which can make the application of traditional cluster analysis methods challenging. Objective This study aimed to review the research literature on the application of clustering multimodal clinical data to identify asthma subtypes. We assessed common problems and shortcomings in the application of cluster analysis methods in determining asthma subtypes, such that they can be brought to the attention of the research community and avoided in future studies. Methods We searched PubMed and Scopus bibliographic databases with terms related to cluster analysis and asthma to identify studies that applied dissimilarity-based cluster analysis methods. We recorded the analytic methods used in each study at each step of the cluster analysis process. Results Our literature search identified 63 studies that applied cluster analysis to multimodal clinical data to identify asthma subtypes. The features fed into the cluster algorithms were of a mixed type in 47 (75%) studies and continuous in 12 (19%), and the feature type was unclear in the remaining 4 (6%) studies. A total of 23 (37%) studies used hierarchical clustering with Ward linkage, and 22 (35%) studies used k-means clustering. Of these 45 studies, 39 had mixed-type features, but only 5 specified dissimilarity measures that could handle mixed-type features. A further 9 (14%) studies used a preclustering step to create small clusters to feed on a hierarchical method. The original sample sizes in these 9 studies ranged from 84 to 349. The remaining studies used hierarchical clustering with other linkages (n=3), medoid-based methods (n=3), spectral clustering (n=1), and multiple kernel k-means clustering (n=1), and in 1 study, the methods were unclear. Of 63 studies, 54 (86%) explained the methods used to determine the number of clusters, 24 (38%) studies tested the quality of their cluster solution, and 11 (17%) studies tested the stability of their solution. Reporting of the cluster analysis was generally poor in terms of the methods employed and their justification. Conclusions This review highlights common issues in the application of cluster analysis to multimodal clinical data to identify asthma subtypes. Some of these issues were related to the multimodal nature of the data, but many were more general issues in the application of cluster analysis. Although cluster analysis may be a useful tool for investigating disease subtypes, we recommend that future studies carefully consider the implications of clustering multimodal data, the cluster analysis process itself, and the reporting of methods to facilitate replication and interpretation of findings.
Collapse
Affiliation(s)
- Elsie Horne
- Usher Institute, Edinburgh Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Holly Tibble
- Usher Institute, Edinburgh Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Aziz Sheikh
- Usher Institute, Edinburgh Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Athanasios Tsanas
- Usher Institute, Edinburgh Medical School, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
32
|
|
33
|
Kahkoska AR, Nguyen CT, Jiang X, Adair LA, Agarwal S, Aiello AE, Burger KS, Buse JB, Dabelea D, Dolan LM, Imperatore G, Lawrence JM, Marcovina S, Pihoker C, Reboussin BA, Sauder KA, Kosorok MR, Mayer-Davis EJ. Characterizing the weight-glycemia phenotypes of type 1 diabetes in youth and young adulthood. BMJ Open Diabetes Res Care 2020; 8:e000886. [PMID: 32049631 PMCID: PMC7039605 DOI: 10.1136/bmjdrc-2019-000886] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 12/27/2019] [Accepted: 01/04/2020] [Indexed: 12/13/2022] Open
Abstract
INTRODUCTION Individuals with type 1 diabetes (T1D) present with diverse body weight status and degrees of glycemic control, which may warrant different treatment approaches. We sought to identify subgroups sharing phenotypes based on both weight and glycemia and compare characteristics across subgroups. RESEARCH DESIGN AND METHODS Participants with T1D in the SEARCH study cohort (n=1817, 6.0-30.4 years) were seen at a follow-up visit >5 years after diagnosis. Hierarchical agglomerative clustering was used to group participants based on five measures summarizing the joint distribution of body mass index z-score (BMIz) and hemoglobin A1c (HbA1c) which were estimated by reinforcement learning tree predictions from 28 covariates. Interpretation of cluster weight status and glycemic control was based on mean BMIz and HbA1c, respectively. RESULTS The sample was 49.5% female and 55.5% non-Hispanic white (NHW); mean±SD age=17.6±4.5 years, T1D duration=7.8±1.9 years, BMIz=0.61±0.94, and HbA1c=76±21 mmol/mol (9.1±1.9)%. Six weight-glycemia clusters were identified, including four normal weight, one overweight, and one subgroup with obesity. No cluster had a mean HbA1c <58 mmol/mol (7.5%). Cluster 1 (34.0%) was normal weight with the lowest HbA1c and comprised 85% NHW participants with the highest socioeconomic position, insulin pump use, dietary quality, and physical activity. Subgroups with very poor glycemic control (ie, ≥108 mmol/mol (≥12.0%); cluster 4, 4.4%, and cluster 5, 7.5%) and obesity (cluster 6, 15.4%) had a lower proportion of NHW youth, lower socioeconomic position, and reported decreased pump use and poorer health behaviors (overall p<0.01). The overweight subgroup with very poor glycemic control (cluster 5) showed the highest lipids and blood pressure (p<0.01). CONCLUSIONS There are distinct subgroups of youth and young adults with T1D that share weight-glycemia phenotypes. Subgroups may benefit from tailored interventions addressing differences in clinical care, health behaviors, and underlying health inequity.
Collapse
Affiliation(s)
- Anna R Kahkoska
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Crystal T Nguyen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Xiaotong Jiang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Linda A Adair
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Shivani Agarwal
- Center for Diabetes Translational Research, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Allison E Aiello
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kyle S Burger
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - John B Buse
- Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Dana Dabelea
- Department of Epidemiology, Colorado School of Public Health, Aurora, Colorado, USA
- Department of Pediatrics, School of Medicine, University of Colorado, Aurora, Colorado, USA
| | - Lawrence M Dolan
- Division of Endocrinology, Cincinnati Children's Hospital Medical Center, Cincinnati, Ohio, USA
| | - Giuseppina Imperatore
- Division of Diabetes Translation, Centers of Disease Control and Prevention, Atlanta, Georgia
| | - Jean Marie Lawrence
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, Southern California, USA
| | - Santica Marcovina
- Northwest Lipid Metabolism and Diabetes Research Laboratories, Department of Medicine, University of Washington, Seattle, Washington, USA
| | - Catherine Pihoker
- Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Beth A Reboussin
- Department of Pediatrics, University of Washington, Seattle, Washington, USA
| | - Katherine A Sauder
- Department of Epidemiology, Colorado School of Public Health, Aurora, Colorado, USA
- Department of Pediatrics, School of Medicine, University of Colorado, Aurora, Colorado, USA
| | - Michael R Kosorok
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Elizabeth J Mayer-Davis
- Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
34
|
Beauchemin M. Semi-supervised map regionalization for categorical data. INTERNATIONAL JOURNAL OF REMOTE SENSING 2019; 40:9401-9411. [DOI: 10.1080/2150704x.2019.1633485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 06/10/2019] [Indexed: 09/02/2023]
Affiliation(s)
- Mario Beauchemin
- Natural Resources Canada, Canada Centre for Remote Sensing, Ottawa, Canada
| |
Collapse
|
35
|
Casalino G, Castellano G, Mencar C. Data Stream Classification by Dynamic Incremental Semi-Supervised Fuzzy Clustering. INT J ARTIF INTELL T 2019. [DOI: 10.1142/s0218213019600091] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A data stream classification method called DISSFCM (Dynamic Incremental Semi-Supervised FCM) is presented, which is based on an incremental semi-supervised fuzzy clustering algorithm. The method assumes that partially labeled data belonging to different classes are continuously available during time in form of chunks. Each chunk is processed by semi-supervised fuzzy clustering leading to a cluster-based classification model. The proposed DISSFCM is capable of dynamically adapting the number of clusters to data streams, by splitting low-quality clusters so as to improve classification quality. Experimental results on both synthetic and real-world data show the effectiveness of the proposed method in data stream classification.
Collapse
Affiliation(s)
- Gabriella Casalino
- CILab — Computational Intelligence Lab, Department of Computer Science, University of Bari Aldo Moro, Italy
| | - Giovanna Castellano
- CILab — Computational Intelligence Lab, Department of Computer Science, University of Bari Aldo Moro, Italy
| | - Corrado Mencar
- CILab — Computational Intelligence Lab, Department of Computer Science, University of Bari Aldo Moro, Italy
| |
Collapse
|
36
|
Abstract
AbstractSemi-supervised learning is the branch of machine learning concerned with using labelled as well as unlabelled data to perform certain learning tasks. Conceptually situated between supervised and unsupervised learning, it permits harnessing the large amounts of unlabelled data available in many use cases in combination with typically smaller sets of labelled data. In recent years, research in this area has followed the general trends observed in machine learning, with much attention directed at neural network-based models and generative learning. The literature on the topic has also expanded in volume and scope, now encompassing a broad spectrum of theory, algorithms and applications. However, no recent surveys exist to collect and organize this knowledge, impeding the ability of researchers and engineers alike to utilize it. Filling this void, we present an up-to-date overview of semi-supervised learning methods, covering earlier work as well as more recent advances. We focus primarily on semi-supervised classification, where the large majority of semi-supervised learning research takes place. Our survey aims to provide researchers and practitioners new to the field as well as more advanced readers with a solid understanding of the main approaches and algorithms developed over the past two decades, with an emphasis on the most prominent and currently relevant work. Furthermore, we propose a new taxonomy of semi-supervised classification algorithms, which sheds light on the different conceptual and methodological approaches for incorporating unlabelled data into the training process. Lastly, we show how the fundamental assumptions underlying most semi-supervised learning algorithms are closely connected to each other, and how they relate to the well-known semi-supervised clustering assumption.
Collapse
|
37
|
Kalintha W, Ono S, Numao M, Fukui KI. Kernelized evolutionary distance metric learning for semi-supervised clustering. INTELL DATA ANAL 2019. [DOI: 10.3233/ida-184283] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- Wasin Kalintha
- Graduate School of Information Science and Technology, Osaka University, Osaka, Japan
| | - Satoshi Ono
- Graduate School of Science and Engineering, Kagoshima University, Kagoshima, Japan
| | - Masayuki Numao
- The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan
| | - Ken-ichi Fukui
- The Institute of Scientific and Industrial Research, Osaka University, Osaka, Japan
| |
Collapse
|
38
|
Sanyal D, Das S. On semi-supervised active clustering of stable instances with oracles. INFORM PROCESS LETT 2019. [DOI: 10.1016/j.ipl.2019.105833] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
39
|
O'Donnell MS, Edmunds DR, Aldridge CL, Heinrichs JA, Coates PS, Prochazka BG, Hanser SE. Designing multi‐scale hierarchical monitoring frameworks for wildlife to support management: a sage‐grouse case study. Ecosphere 2019. [DOI: 10.1002/ecs2.2872] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Affiliation(s)
- Michael S. O'Donnell
- U.S. Geological Survey Fort Collins Science Center Fort Collins Colorado 80526 USA
| | - David R. Edmunds
- Natural Resource Ecology Laboratory Colorado State University, in cooperation with the Fort Collins Science Center, U.S. Geological Survey Fort Collins Colorado 80526 USA
| | - Cameron L. Aldridge
- Natural Resource Ecology Laboratory Department of Ecosystem Science and Sustainability Colorado State University, in cooperation with the Fort Collins Science Center, U.S. Geological Survey Fort Collins Colorado 80526 USA
| | - Julie A. Heinrichs
- Natural Resource Ecology Laboratory Colorado State University, in cooperation with the Fort Collins Science Center, U.S. Geological Survey Fort Collins Colorado 80526 USA
| | - Peter S. Coates
- U.S. Geological Survey Western Ecological Research Center Dixon California 95620 USA
| | - Brian G. Prochazka
- U.S. Geological Survey Western Ecological Research Center Dixon California 95620 USA
| | - Steve E. Hanser
- U.S. Geological Survey Ecosystems Mission Area Reston VA 20192 USA
| |
Collapse
|
40
|
Dimitriou K, Roussaki I. Location Privacy Protection in Distributed IoT Environments Based on Dynamic Sensor Node Clustering. SENSORS (BASEL, SWITZERLAND) 2019; 19:E3022. [PMID: 31324012 PMCID: PMC6651351 DOI: 10.3390/s19133022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 06/25/2019] [Accepted: 07/02/2019] [Indexed: 11/16/2022]
Abstract
One of the most significant challenges in Internet of Things (IoT) environments is the protection of privacy. Failing to guarantee the privacy of sensitive data collected and shared over IoT infrastructures is a critical barrier that delays the wide penetration of IoT technologies in several user-centric application domains. Location information is the most common dynamic information monitored and lies among the most sensitive ones from a privacy perspective. This article introduces a novel mechanism that aims to protect the privacy of location information across Data Centric Sensor Networks (DCSNs) that monitor the location of mobile objects in IoT systems. The respective data dissemination protocols proposed enhance the security of DCSNs rendering them less vulnerable to intruders interested in obtaining the location information monitored. In this respect, a dynamic clustering algorithm is that clusters the DCSN nodes not only based on the network topology, but also considering the current location of the objects monitored. The proposed techniques do not focus on the prevention of attacks, but on enhancing the privacy of sensitive location information once IoT nodes have been compromised. They have been extensively assessed via series of experiments conducted over the IoT infrastructure of FIT IoT-LAB and the respective evaluation results indicate that the dynamic clustering algorithm proposed significantly outperforms existing solutions focusing on enhancing the privacy of location information in IoT.
Collapse
Affiliation(s)
- Konstantinos Dimitriou
- School of Electrical and Computer Engineering, National Technical University of Athens and Greece, 15773 Athens, Greece
| | - Ioanna Roussaki
- School of Electrical and Computer Engineering, National Technical University of Athens and Greece, 15773 Athens, Greece.
- Institute of Communication and Computer Systems, 10682 Athens, Greece.
| |
Collapse
|
41
|
|
42
|
Quartiroli A, Parsons-Smith RL, Fogarty GJ, Kuan G, Terry PC. Cross-Cultural Validation of Mood Profile Clusters in a Sport and Exercise Context. Front Psychol 2018; 9:1949. [PMID: 30356841 PMCID: PMC6190738 DOI: 10.3389/fpsyg.2018.01949] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2018] [Accepted: 09/20/2018] [Indexed: 11/13/2022] Open
Abstract
Mood profiling has a long history in the field of sport and exercise. Several novel mood profile clusters were identified and described in the literature recently (Parsons-Smith et al., 2017). In the present study, we investigated whether the same clusters were evident in an Italian-language, sport and exercise context. The Italian Mood Scale (ITAMS; Quartiroli et al., 2017) was administered to 950 Italian-speaking sport participants (659 females, 284 males, 7 unspecified; age range = 16-63 year, M = 25.03, SD = 7.62) and seeded k-means clustering methodology applied to the responses. Six distinct mood profiles were identified, termed the iceberg, inverse iceberg, inverse Everest, shark fin, surface, and submerged profiles, which closely resembled those reported among English-speaking participants (Parsons-Smith et al., 2017). Significant differences were found in the distribution of specific mood profiles across gender and age groups. Findings supported the cross-cultural generalizability of the six mood profiles and offer new research avenues into their antecedents, correlates and behavioral consequences in Italian-language contexts.
Collapse
Affiliation(s)
- Alessandro Quartiroli
- Department of Psychology, University of Wisconsin-La Crosse, La Crosse, WI, United States
| | - Renée L Parsons-Smith
- Division of Research and Innovation, University of Southern Queensland, Toowoomba, QLD, Australia.,School of Social Sciences, University of the Sunshine Coast, Sippy Downs, QLD, Australia
| | - Gerard J Fogarty
- Division of Research and Innovation, University of Southern Queensland, Toowoomba, QLD, Australia
| | - Garry Kuan
- School of Health Sciences, Universiti Sains Malaysia, Kelantan, Malaysia.,School of Medical Sciences, Universiti Sains Malaysia, Kelantan, Malaysia
| | - Peter C Terry
- Division of Research and Innovation, University of Southern Queensland, Toowoomba, QLD, Australia
| |
Collapse
|
43
|
|
44
|
Danchin A, Ouzounis C, Tokuyasu T, Zucker JD. No wisdom in the crowd: genome annotation in the era of big data - current status and future prospects. Microb Biotechnol 2018; 11:588-605. [PMID: 29806194 PMCID: PMC6011933 DOI: 10.1111/1751-7915.13284] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Science and engineering rely on the accumulation and dissemination of knowledge to make discoveries and create new designs. Discovery-driven genome research rests on knowledge passed on via gene annotations. In response to the deluge of sequencing big data, standard annotation practice employs automated procedures that rely on majority rules. We argue this hinders progress through the generation and propagation of errors, leading investigators into blind alleys. More subtly, this inductive process discourages the discovery of novelty, which remains essential in biological research and reflects the nature of biology itself. Annotation systems, rather than being repositories of facts, should be tools that support multiple modes of inference. By combining deduction, induction and abduction, investigators can generate hypotheses when accurate knowledge is extracted from model databases. A key stance is to depart from 'the sequence tells the structure tells the function' fallacy, placing function first. We illustrate our approach with examples of critical or unexpected pathways, using MicroScope to demonstrate how tools can be implemented following the principles we advocate. We end with a challenge to the reader.
Collapse
Affiliation(s)
- Antoine Danchin
- Integromics, Institute of Cardiometabolism and Nutrition, Hôpital de la Pitié-Salpêtrière, 47 Boulevard de l'Hôpital, 75013, Paris, France
- School of Biomedical Sciences, Li KaShing Faculty of Medicine, Hong Kong University, 21 Sassoon Road, Pokfulam, Hong Kong
| | - Christos Ouzounis
- Biological Computation and Process Laboratory, Centre for Research and Technology Hellas, Chemical Process and Energy Resources Institute, Thessalonica, 57001, Greece
| | - Taku Tokuyasu
- Shenzhen Institutes of Advanced Technology, Institute of Synthetic Biology, Shenzhen University Town, 1068 Xueyuan Avenue, Shenzhen, China
| | - Jean-Daniel Zucker
- Integromics, Institute of Cardiometabolism and Nutrition, Hôpital de la Pitié-Salpêtrière, 47 Boulevard de l'Hôpital, 75013, Paris, France
| |
Collapse
|
45
|
Sandini C, Zöller D, Scariati E, Padula MC, Schneider M, Schaer M, Van De Ville D, Eliez S. Development of Structural Covariance From Childhood to Adolescence: A Longitudinal Study in 22q11.2DS. Front Neurosci 2018; 12:327. [PMID: 29867336 PMCID: PMC5968113 DOI: 10.3389/fnins.2018.00327] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 04/26/2018] [Indexed: 12/18/2022] Open
Abstract
Background: Schizophrenia is currently considered a neurodevelopmental disorder of connectivity. Still few studies have investigated how brain networks develop in children and adolescents who are at risk for developing psychosis. 22q11.2 Deletion Syndrome (22q11DS) offers a unique opportunity to investigate the pathogenesis of schizophrenia from a neurodevelopmental perspective. Structural covariance (SC) is a powerful approach to explore morphometric relations between brain regions that can furthermore detect biomarkers of psychosis, both in 22q11DS and in the general population. Methods: Here we implement a state-of-the-art sliding-window approach to characterize maturation of SC network architecture in a large longitudinal cohort of patients with 22q11DS (110 with 221 visits) and healthy controls (117 with 211 visits). We furthermore propose a new clustering-based approach to group regions according to trajectories of structural connectivity maturation. We correlate measures of SC with development of working memory, a core executive function that is highly affected in both idiopathic psychosis and 22q11DS. Finally, in 22q11DS we explore correlations between SC dysconnectivity and severity of internalizing psychopathology. Results: In HCs network architecture underwent a quadratic developmental trajectory maturing up to mid-adolescence. Late-childhood maturation was particularly evident for fronto-parietal cortices, while Default-Mode-Network-related regions showed a more protracted linear development. Working memory performance was positively correlated with network segregation and fronto-parietal connectivity. In 22q11DS, we demonstrate aberrant maturation of SC with disturbed architecture selectively emerging during adolescence and correlating more severe internalizing psychopathology. Patients also presented a lack of typical network development during late-childhood, that was particularly prominent for frontal connectivity. Conclusions: Our results suggest that SC maturation may underlie critical cognitive development occurring during late-childhood in healthy controls. Aberrant trajectories of SC maturation may reflect core developmental features of 22q11DS, including disturbed cognitive maturation during childhood and predisposition to internalizing psychopathology and psychosis during adolescence.
Collapse
Affiliation(s)
- Corrado Sandini
- Developmental Imaging and Psychopathology Laboratory, University of Geneva School of Medicine, Geneva, Switzerland
| | - Daniela Zöller
- Developmental Imaging and Psychopathology Laboratory, University of Geneva School of Medicine, Geneva, Switzerland.,Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland
| | - Elisa Scariati
- Developmental Imaging and Psychopathology Laboratory, University of Geneva School of Medicine, Geneva, Switzerland
| | - Maria C Padula
- Developmental Imaging and Psychopathology Laboratory, University of Geneva School of Medicine, Geneva, Switzerland
| | - Maude Schneider
- Developmental Imaging and Psychopathology Laboratory, University of Geneva School of Medicine, Geneva, Switzerland.,Department of Neuroscience, Center for Contextual Psychiatry, Research Group Psychiatry, KU Leuven, Leuven, Belgium
| | - Marie Schaer
- Developmental Imaging and Psychopathology Laboratory, University of Geneva School of Medicine, Geneva, Switzerland
| | - Dimitri Van De Ville
- Institute of Bioengineering, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.,Department of Radiology and Medical Informatics, University of Geneva, Geneva, Switzerland
| | - Stephan Eliez
- Developmental Imaging and Psychopathology Laboratory, University of Geneva School of Medicine, Geneva, Switzerland.,Department of Genetic Medicine and Development, University of Geneva School of Medicine, Geneva, Switzerland
| |
Collapse
|
46
|
Gaynor S, Bair E. Identification of relevant subtypes via preweighted sparse clustering. Comput Stat Data Anal 2017; 116:139-154. [PMID: 29785064 DOI: 10.1016/j.csda.2017.06.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Cluster analysis methods are used to identify homogeneous subgroups in a data set. In biomedical applications, one frequently applies cluster analysis in order to identify biologically interesting subgroups. In particular, one may wish to identify subgroups that are associated with a particular outcome of interest. Conventional clustering methods generally do not identify such subgroups, particularly when there are a large number of high-variance features in the data set. Conventional methods may identify clusters associated with these high-variance features when one wishes to obtain secondary clusters that are more interesting biologically or more strongly associated with a particular outcome of interest. A modification of sparse clustering can be used to identify such secondary clusters or clusters associated with an outcome of interest. This method correctly identifies such clusters of interest in several simulation scenarios. The method is also applied to a large prospective cohort study of temporomandibular disorders and a leukemia microarray data set.
Collapse
Affiliation(s)
- Sheila Gaynor
- Department of Biostatistics, Harvard University, Boston, MA, USA
| | - Eric Bair
- Departments of Endodontics and Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| |
Collapse
|
47
|
Poswar FDO, Santos LI, Farias LC, Guimarães TA, Santos SHS, Jones KM, de Paula AMB, Palhares RM, D'Angelo MFSV, Guimarães ALS. An adaptation of particle swarm clustering applied in basal cell carcinoma, squamous cell carcinoma of the skin and actinic keratosis. Meta Gene 2017. [DOI: 10.1016/j.mgene.2017.01.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
48
|
Vidal R. Structured Sparse Subspace Clustering: A Joint Affinity Learning and Subspace Clustering Framework. IEEE TRANSACTIONS ON IMAGE PROCESSING : A PUBLICATION OF THE IEEE SIGNAL PROCESSING SOCIETY 2017; 26:2988-3001. [PMID: 28410106 DOI: 10.1109/tip.2017.2691557] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Subspace clustering refers to the problem of segmenting data drawn from a union of subspaces. State-of-the-art approaches for solving this problem follow a two-stage approach. In the first step, an affinity matrix is learned from the data using sparse or low-rank minimization techniques. In the second step, the segmentation is found by applying spectral clustering to this affinity. While this approach has led to the state-of-the-art results in many applications, it is suboptimal, because it does not exploit the fact that the affinity and the segmentation depend on each other. In this paper, we propose a joint optimization framework - Structured Sparse Subspace Clustering (S3C) - for learning both the affinity and the segmentation. The proposed S3C framework is based on expressing each data point as a structured sparse linear combination of all other data points, where the structure is induced by a norm that depends on the unknown segmentation. Moreover, we extend the proposed S3C framework into Constrained S3C (CS3C) in which available partial side-information is incorporated into the stage of learning the affinity. We show that both the structured sparse representation and the segmentation can be found via a combination of an alternating direction method of multipliers with spectral clustering. Experiments on a synthetic data set, the Extended Yale B face data set, the Hopkins 155 motion segmentation database, and three cancer data sets demonstrate the effectiveness of our approach.
Collapse
|
49
|
Deepthi P, Thampi SM. Predicting cancer subtypes from microarray data using semi-supervised fuzzy C-means algorithm. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2017. [DOI: 10.3233/jifs-169222] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Affiliation(s)
- P.S. Deepthi
- LBS Centre for Science and Technology, Trivandrum, Kerala, India; School of CS and IT, Indian Institute of Information Technology and Management – Kerala, Trivandrum, Kerala, India
| | - Sabu M. Thampi
- School of CS and IT, Indian Institute of Information Technology and Management – Kerala, Trivandrum, Kerala, India
| |
Collapse
|