1
|
Li Y, Gu J, Li R, Yi H, He J, Gao J. Sensory and motor cortices parcellations estimated via distance-weighted sparse representation with application to autism spectrum disorder. Prog Neuropsychopharmacol Biol Psychiatry 2024; 135:111125. [PMID: 39173993 DOI: 10.1016/j.pnpbp.2024.111125] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 08/05/2024] [Accepted: 08/19/2024] [Indexed: 08/24/2024]
Abstract
BACKGROUND Motor impairments and sensory processing abnormalities are prevalent in autism spectrum disorder (ASD), closely related to the core functions of the primary motor cortex (M1) and the primary somatosensory cortex (S1). Currently, there is limited knowledge about potential therapeutic targets in the subregions of M1 and S1 in ASD patients. This study aims to map clinically significant functional subregions of M1 and S1. METHODS Resting-state functional magnetic resonance imaging data (NTD = 266) from Autism Brain Imaging Data Exchange (ABIDE) were used for subregion modeling. We proposed a distance-weighted sparse representation algorithm to construct brain functional networks. Functional subregions of M1 and S1 were identified through consensus clustering at the group level. Differences in the characteristics of functional subregions were analyzed, along with their correlation with clinical scores. RESULTS We observed symmetrical and continuous subregion organization from dorsal to ventral aspects in M1 and S1, with M1 subregions conforming to the functional pattern of the motor homunculus. Significant intergroup differences and clinical correlations were found in the dorsal and ventral aspects of M1 (p < 0.05/3, Bonferroni correction) and the ventromedial BA3 of S1 (p < 0.05/5). These functional characteristics were positively correlated with autism severity. All subregions showed significant results in the ROI-to-ROI intergroup differential analysis (p < 0.05/80). LIMITATIONS The generalizability of the segmentation model requires further evaluation. CONCLUSIONS This study highlights the significance of M1 and S1 in ASD treatment and may provide new insights into brain parcellation and the identification of therapeutic targets for ASD.
Collapse
Affiliation(s)
- Yanling Li
- School of Electrical Engineering and Electronic Information, Xihua University, 9999 Hongguang Avenue, Pixian District, Sichuan Province, Chengdu 610039, China
| | - Jiahe Gu
- School of Electrical Engineering and Electronic Information, Xihua University, 9999 Hongguang Avenue, Pixian District, Sichuan Province, Chengdu 610039, China
| | - Rui Li
- School of Electrical Engineering and Electronic Information, Xihua University, 9999 Hongguang Avenue, Pixian District, Sichuan Province, Chengdu 610039, China
| | - Hongtao Yi
- School of Electrical Engineering and Electronic Information, Xihua University, 9999 Hongguang Avenue, Pixian District, Sichuan Province, Chengdu 610039, China
| | - Junbiao He
- School of Electrical Engineering and Electronic Information, Xihua University, 9999 Hongguang Avenue, Pixian District, Sichuan Province, Chengdu 610039, China
| | - Jingjing Gao
- School of Information and Communication Engineering, University of Electronic Science and Technology of China, 2006 Xiyuan Avenue, High-tech Zone (West Zone), Sichuan Province, Chengdu 611731, China.
| |
Collapse
|
2
|
Aplakidou E, Vergoulidis N, Chasapi M, Venetsianou NK, Kokoli M, Panagiotopoulou E, Iliopoulos I, Karatzas E, Pafilis E, Georgakopoulos-Soares I, Kyrpides NC, Pavlopoulos GA, Baltoumas FA. Visualizing metagenomic and metatranscriptomic data: A comprehensive review. Comput Struct Biotechnol J 2024; 23:2011-2033. [PMID: 38765606 PMCID: PMC11101950 DOI: 10.1016/j.csbj.2024.04.060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/22/2024] Open
Abstract
The fields of Metagenomics and Metatranscriptomics involve the examination of complete nucleotide sequences, gene identification, and analysis of potential biological functions within diverse organisms or environmental samples. Despite the vast opportunities for discovery in metagenomics, the sheer volume and complexity of sequence data often present challenges in processing analysis and visualization. This article highlights the critical role of advanced visualization tools in enabling effective exploration, querying, and analysis of these complex datasets. Emphasizing the importance of accessibility, the article categorizes various visualizers based on their intended applications and highlights their utility in empowering bioinformaticians and non-bioinformaticians to interpret and derive insights from meta-omics data effectively.
Collapse
Affiliation(s)
- Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nikolaos Vergoulidis
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Chasapi
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Maria Kokoli
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| | - Eleni Panagiotopoulou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Department of Informatics and Telecommunications, Data Science and Information Technologies program, University of Athens, 15784 Athens, Greece
| | - Ioannis Iliopoulos
- Department of Basic Sciences, School of Medicine, University of Crete, 71003 Heraklion, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, UK
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikos C. Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Center of New Biotechnologies & Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Greece
- Hellenic Army Academy, 16673 Vari, Greece
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, Greece
| |
Collapse
|
3
|
Varga M, Bermejo P, Pellicer-Guridi R, Orús R, Molina-Terriza G. Quantum-inspired clustering with light. Sci Rep 2024; 14:21726. [PMID: 39289485 PMCID: PMC11408508 DOI: 10.1038/s41598-024-73053-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 09/12/2024] [Indexed: 09/19/2024] Open
Abstract
This article introduces a novel approach to perform the simulation of a single qubit quantum-inspired algorithm using laser beams. Leveraging the polarization states of photonic qubits, and inspired by variational quantum eigensolvers, we develop a variational quantum-inspired algorithm implementing a clustering procedure following the approach proposed by some of us in SciRep 13, 13284 (2023). A key aspect of our research involves the utilization of non-orthogonal states within the photonic domain, harnessing the potential of polarization schemes to reproduce unitary circuits. By mapping these non-orthogonal states into polarization states, we achieve an efficient and versatile quantum information processing unit which serves as a clustering device for a diverse set of datasets.
Collapse
Affiliation(s)
- Miguel Varga
- Centro de Física de Materiales, UPV-EHU/CSIC, Paseo Manuel de Lardizabal 5, San Sebastián, E-20018, Spain.
| | - Pablo Bermejo
- Donostia International Physics Center, Paseo Manuel de Lardizabal 4, San Sebastián, E-20018, Spain
| | - Ruben Pellicer-Guridi
- Centro de Física de Materiales, UPV-EHU/CSIC, Paseo Manuel de Lardizabal 5, San Sebastián, E-20018, Spain
- Donostia International Physics Center, Paseo Manuel de Lardizabal 4, San Sebastián, E-20018, Spain
| | - Román Orús
- Donostia International Physics Center, Paseo Manuel de Lardizabal 4, San Sebastián, E-20018, Spain
- Ikerbasque Foundation for Science, Maria Diaz de Haro 3, Bilbao, E-48013, Spain
- Multiverse Computing, Paseo de Miramón 170, San Sebastián, E-20014, Spain
| | - Gabriel Molina-Terriza
- Centro de Física de Materiales, UPV-EHU/CSIC, Paseo Manuel de Lardizabal 5, San Sebastián, E-20018, Spain
- Donostia International Physics Center, Paseo Manuel de Lardizabal 4, San Sebastián, E-20018, Spain
- Ikerbasque Foundation for Science, Maria Diaz de Haro 3, Bilbao, E-48013, Spain
| |
Collapse
|
4
|
Kim H, Moon S, Lee J, Kim E, Jin SW, Kim JL, Lee SU, Kim J, Yoo S, Lee J, Song G, Lee J. Fuzzy clustering of 24-2 visual field patterns can detect glaucoma progression. PLoS One 2024; 19:e0309011. [PMID: 39231172 PMCID: PMC11373827 DOI: 10.1371/journal.pone.0309011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Accepted: 08/03/2024] [Indexed: 09/06/2024] Open
Abstract
PURPOSE To represent 24-2 visual field (VF) losses of individual patients using a hybrid approach of archetypal analysis (AA) and fuzzy c-means (FCM) clustering. METHODS In this multicenter retrospective study, we classified characteristic patterns of 24-2 VF using AA and decomposed them with FCM clustering. We predicted the change in mean deviation (MD) through supervised machine learning from decomposition coefficient change. In addition, we compared the areas under the receiver operating characteristic curves (AUCs) of the decomposition coefficient slopes to detect VF progression using three criteria: MD slope, Visual Field Index slope, and pointwise linear regression analysis. RESULTS We identified 16 characteristic patterns (archetypes or ATs) of 24-2 VF from 132,938 VFs of 18,033 participants using AA. The hybrid approach using FCM revealed a lower mean squared error and greater correlation coefficient than the AA single approach for predicting MD change (all P ≤ 0.001). Three of 16 AUCs of the FCM decomposition coefficient slopes outperformed the AA decomposition coefficient slopes in detecting VF progression for all three criteria (AT5, superior altitudinal defect; AT10, double arcuate defect; AT13, total loss) (all P ≤ 0.028). CONCLUSION A hybrid approach combining AA and FCM to analyze 24-2 VF can visualize VF tests in characteristic patterns and enhance detection of VF progression with lossless decomposition.
Collapse
Affiliation(s)
- Hwayeong Kim
- Department of Ophthalmology, Pusan National University College of Medicine, Busan, Korea
| | - Sangwoo Moon
- Department of Ophthalmology, Pusan National University Yangsan Hospital, Pusan National University School of Medicine, Yangsan, Korea
| | - Joohwang Lee
- Department of Ophthalmology, Pusan National University College of Medicine, Busan, Korea
| | - EunAh Kim
- Department of Ophthalmology, Haeundae Paik Hospital, Inje University College of Medicine, Busan, South Korea
| | - Sang Wook Jin
- Department of Ophthalmology, Dong-A University College of Medicine, Busan, Korea
| | - Jung Lim Kim
- Department of Ophthalmology, Busan Paik Hospital, Inje University College of Medicine, Busan, Korea
| | - Seung Uk Lee
- Department of Ophthalmology, Kosin University College of Medicine, Busan, Korea
| | - Jinmi Kim
- Department of Biostatistics, Clinical Trial Center, Biomedical Research Institute, Pusan National University Hospital, Busan, Korea
| | - Seungtae Yoo
- Division of Artificial Intelligence, Department of Information Convergence Engineering, Pusan National University, Busan, Korea
| | - Jiwon Lee
- Division of Artificial Intelligence, Department of Information Convergence Engineering, Pusan National University, Busan, Korea
| | - Giltae Song
- Division of Artificial Intelligence, Department of Information Convergence Engineering, Pusan National University, Busan, Korea
- Center for Artificial Intelligence Research, Pusan National University, Busan, Korea
- School of Computer Science and Engineering, Pusan National University, Busan, Korea
| | - Jiwoong Lee
- Department of Ophthalmology, Pusan National University College of Medicine, Busan, Korea
- Biomedical Research Institute, Pusan National University Hospital, Busan, Korea
| |
Collapse
|
5
|
Agarwalla A, Lu Y, Reinholz AK, Marigi EM, Liu JN, Sanchez-Sotelo J. Identifying clinically meaningful subgroups following open reduction and internal fixation for proximal humerus fractures: a risk stratification analysis for mortality and 30-day complications using machine learning. JSES Int 2024; 8:932-940. [PMID: 39280153 PMCID: PMC11401551 DOI: 10.1016/j.jseint.2024.04.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/18/2024] Open
Abstract
Background Identification of prognostic variables for poor outcomes following open reduction internal fixation (ORIF) of displaced proximal humerus fractures have been limited to singular, linear factors and subjective clinical intuition. Machine learning (ML) has the capability to objectively segregate patients based on various outcome metrics and reports the connectivity of variables resulting in the optimal outcome. Therefore, the purpose of this study was to (1) use unsupervised ML to stratify patients to high-risk and low-risk clusters based on postoperative events, (2) compare the ML clusters to the American Society of Anesthesiologists (ASA) classification for assessment of risk, and (3) determine the variables that were associated with high-risk patients after proximal humerus ORIF. Methods The American College of Surgeons-National Surgical Quality Improvement Program database was retrospectively queried for patients undergoing ORIF for proximal humerus fractures between 2005 and 2018. Four unsupervised ML clustering algorithms were evaluated to partition subjects into "high-risk" and "low-risk" subgroups based on combinations of observed outcomes. Demographic, clinical, and treatment variables were compared between these groups using descriptive statistics. A supervised ML algorithm was generated to identify patients who were likely to be "high risk" and were compared to ASA classification. A game-theory-based explanation algorithm was used to illustrate predictors of "high-risk" status. Results Overall, 4670 patients were included, of which 202 were partitioned into the "high-risk" cluster, while the remaining (4468 patients) were partitioned into the "low-risk" cluster. Patients in the "high-risk" cluster demonstrated significantly increased rates of the following complications: 30-day mortality, 30-day readmission rates, 30-day reoperation rates, nonroutine discharge rates, length of stay, and rates of all surgical and medical complications assessed with the exception of urinary tract infection (P < .001). The best performing supervised machine learning algorithm for preoperatively identifying "high-risk" patients was the extreme-gradient boost (XGBoost), which achieved an area under the receiver operating characteristics curve of 76.8%, while ASA classification had an area under the receiver operating characteristics curve of 61.7%. Shapley values identified the following predictors of "high-risk" status: greater body mass index, increasing age, ASA class 3, increased operative time, male gender, diabetes, and smoking history. Conclusion Unsupervised ML identified that "high-risk" patients have a higher risk of complications (8.9%) than "low-risk" groups (0.4%) with respect to 30-day complication rate. A supervised ML model selected greater body mass index, increasing age, ASA class 3, increased operative time, male gender, diabetes, and smoking history to effectively predict "high-risk" patients.
Collapse
Affiliation(s)
- Avinesh Agarwalla
- Department of Orthopedic Surgery, Westchester Medical Center, Valhalla, NY, USA
| | - Yining Lu
- Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | - Anna K Reinholz
- Department of Orthopedic Surgery, Baylor Scott & White Medical Center, Temple, TX, USA
| | - Erick M Marigi
- Department of Orthopedic Surgery, Mayo Clinic, Rochester, MN, USA
| | - Joseph N Liu
- USC Epstein Family Center for Sports Medicine, Keck Medicine for USC, Los Angeles, CA, USA
| | | |
Collapse
|
6
|
Wani AA. Comprehensive analysis of clustering algorithms: exploring limitations and innovative solutions. PeerJ Comput Sci 2024; 10:e2286. [PMID: 39314716 PMCID: PMC11419652 DOI: 10.7717/peerj-cs.2286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Accepted: 08/06/2024] [Indexed: 09/25/2024]
Abstract
This survey rigorously explores contemporary clustering algorithms within the machine learning paradigm, focusing on five primary methodologies: centroid-based, hierarchical, density-based, distribution-based, and graph-based clustering. Through the lens of recent innovations such as deep embedded clustering and spectral clustering, we analyze the strengths, limitations, and the breadth of application domains-ranging from bioinformatics to social network analysis. Notably, the survey introduces novel contributions by integrating clustering techniques with dimensionality reduction and proposing advanced ensemble methods to enhance stability and accuracy across varied data structures. This work uniquely synthesizes the latest advancements and offers new perspectives on overcoming traditional challenges like scalability and noise sensitivity, thus providing a comprehensive roadmap for future research and practical applications in data-intensive environments.
Collapse
Affiliation(s)
- Aasim Ayaz Wani
- School of Engineering, Cornell University, Ithaca, New York, United States
| |
Collapse
|
7
|
Zhang F, Wang Q, Li H, Zhou Q, Tan Z, Zu X, Yan X, Zhang S, Ninomiya S, Mu Y, Tao S. Study on the Optimal Leaf Area-to-Fruit Ratio of Pear Trees on the Basis of Bearing Branch Girdling and Machine Learning. PLANT PHENOMICS (WASHINGTON, D.C.) 2024; 6:0233. [PMID: 39144673 PMCID: PMC11322523 DOI: 10.34133/plantphenomics.0233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 07/20/2024] [Indexed: 08/16/2024]
Abstract
The leaf area-to-fruit ratio (LAFR) is an important factor affecting fruit quality. Previous studies on LAFR have provided some recommendations for optimal values. However, these recommendations have been quite broad and lack effectiveness during the fruit thinning period. In this study, data on the LAFR and fruit quality of pears at 5 stages were collected by continuously girdling bearing branches throughout the entire fruit development process. Five different clustering algorithms, including KMeans, Agglomerative clustering, Spectral clustering, Birch, and Spectral biclustering, were employed to classify the fruit quality data. Agglomerative clustering yielded the best results when the dataset was divided into 4 clusters. The least squares method was utilized to fit the LAFR corresponding to the best quality cluster, and the optimal LAFR values for 28, 42, 63, 91, and 112 days after flowering were 12.54, 18.95, 23.79, 27.06, and 28.76 dm2 (the corresponding leaf-to-fruit ratio values were 19, 29, 36, 41, and 44, respectively). Furthermore, field verification experiments demonstrated that the optimal LAFR contributed to improving pear fruit quality, and a relatively high LAFR beyond the optimum value did not further increase quality. In summary, we optimized the LAFR of pear trees at different stages and confirmed the effectiveness of the optimal LAFR in improving fruit quality. Our research provides a theoretical basis for managing pear tree fruit load and achieving high-quality, clean fruit production.
Collapse
Affiliation(s)
- Fanhang Zhang
- Sanya Institute, College of Horticulture,
Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Qi Wang
- Sanya Institute, College of Horticulture,
Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Haitao Li
- Academy for Advanced Interdisciplinary Studies,
Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Qinyang Zhou
- Academy for Advanced Interdisciplinary Studies,
Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Zhihao Tan
- Sanya Institute, College of Horticulture,
Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Xiaochao Zu
- Sanya Institute, College of Horticulture,
Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Xin Yan
- Sanya Institute, College of Horticulture,
Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Shaoling Zhang
- Sanya Institute, College of Horticulture,
Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Seishi Ninomiya
- Graduate School of Agricultural and Life Sciences,
The University of Tokyo, Tokyo 188-0002, Japan
| | - Yue Mu
- Academy for Advanced Interdisciplinary Studies,
Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Shutian Tao
- Sanya Institute, College of Horticulture,
Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| |
Collapse
|
8
|
Aljassam Y, Sophocleous F, Bruse JL, Schot V, Caputo M, Biglino G. Machine Learning and Statistical Shape Modelling Methodologies to Assess Vascular Morphology before and after Aortic Valve Replacement. J Clin Med 2024; 13:4577. [PMID: 39124843 PMCID: PMC11313263 DOI: 10.3390/jcm13154577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Revised: 07/24/2024] [Accepted: 07/29/2024] [Indexed: 08/12/2024] Open
Abstract
Introduction: Statistical shape modelling (SSM) is used to analyse morphology, discover qualitatively and quantitatively unique shape features within a population, and generate mean shapes and shape modes that show morphological variability. Hierarchical agglomerative clustering is a machine learning analysis used to identify subgroups within a given population in relation to shape features. We tested the application of both methods in the clinically relevant scenario of patients undergoing aortic valve repair (AVR). Every year, around 5000 patients undergo surgical AVR in the UK. Aims: Evaluate aortic morphology and identify subgroups amongst patients who had undergone AVR, including Ozaki, Ross, and valve-sparing procedures using SSM and unsupervised hierarchical clustering analysis. This methodological framework can evaluate both pre- and post-surgical variability across subgroups undergoing different surgeries. Methods: Pre- (n = 47) and post- (n = 35) operative three-dimensional (3D) aortic models were reconstructed from computed tomography (CT) and cardiac magnetic resonance (CMR) images. Computational analyses for SSM and hierarchical clustering were run separately for the two subgroups, assessing (a) ascending aorta only and (b) the whole aorta. This allows for exploring possible variations in morphological classification related to the input shape. Results: Most patients in the Ross procedure subgroup exhibited differences in aortic morphology from other subgroups, including an elongated ascending and wide aortic arch pre-operatively, and an elongated ascending aorta with a slightly enlarged sinus post-operatively. In hierarchical clustering, the Ross aortas also appeared to cluster together compared to the other surgical procedures, both pre-operatively and post-operatively. There were significant differences between clusters in terms of clustering distance in the pre-operative analyses (p = 0.003 for ascending aortas, p = 0.016 for whole aortas). There were no significant differences between the clusters in post-operative analyses (p = 0.47 for ascending, p = 0.19 for whole aorta). Conclusions: We demonstrated the feasibility of evaluating aortic morphology before and after different aortic valve surgeries using SSM and hierarchical clustering. This framework could be used to further explore shape features associated with surgical decision-making pre-operatively and, importantly, to identify subgroups whose morphology is associated with poorer clinical outcomes post-operatively. Statistical shape modelling (SSM) and unsupervised hierarchical clustering are two statistical methods that can be used to assess morphology, show morphological variations, with the latter being able to identify subgroups within a population. These methods have been applied to the population of aortic valve replacement (AVR) patients since there are different surgical procedures (traditional AVR, Ozaki, Ross, and valve-sparing). The aim is to evaluate aortic morphology and identify subgroups within this population before and after surgery. Computed tomography and cardiac magnetic resonance images were reconstructed into 3D models of the ascending aorta and whole aorta, which were then input into SSM and hierarchical clustering. The results show that the Ross aortic morphology is quite different from the other aortas. The clustering did not classify the aortas based on the surgical procedures; however, most of the Ross group did cluster together, indicating low variability within this surgical group.
Collapse
Affiliation(s)
- Yousef Aljassam
- Department of Translational Health Sciences, Bristol Medical School, University of Bristol, Bristol BS2 8HW, UK; (Y.A.); (F.S.); (V.S.); (M.C.)
| | - Froso Sophocleous
- Department of Translational Health Sciences, Bristol Medical School, University of Bristol, Bristol BS2 8HW, UK; (Y.A.); (F.S.); (V.S.); (M.C.)
| | - Jan L. Bruse
- Fundación Vicomtech, Basque Research and Technology Alliance BRTA, Mikeletegi 57, 20009 Donostia-San Sebastián, Spain;
| | - Vico Schot
- Department of Translational Health Sciences, Bristol Medical School, University of Bristol, Bristol BS2 8HW, UK; (Y.A.); (F.S.); (V.S.); (M.C.)
| | - Massimo Caputo
- Department of Translational Health Sciences, Bristol Medical School, University of Bristol, Bristol BS2 8HW, UK; (Y.A.); (F.S.); (V.S.); (M.C.)
- Bristol Heart Institute, University Hospitals Bristol and Weston NHS Foundation Trust, Bristol BS2 8HW, UK
| | - Giovanni Biglino
- Department of Translational Health Sciences, Bristol Medical School, University of Bristol, Bristol BS2 8HW, UK; (Y.A.); (F.S.); (V.S.); (M.C.)
- Bristol Heart Institute, University Hospitals Bristol and Weston NHS Foundation Trust, Bristol BS2 8HW, UK
| |
Collapse
|
9
|
Ramamoorthy K, Rajaguru H. Exploitation of Bio-Inspired Classifiers for Performance Enhancement in Liver Cirrhosis Detection from Ultrasonic Images. Biomimetics (Basel) 2024; 9:356. [PMID: 38921235 PMCID: PMC11201414 DOI: 10.3390/biomimetics9060356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2024] [Revised: 06/03/2024] [Accepted: 06/08/2024] [Indexed: 06/27/2024] Open
Abstract
In the current scenario, liver abnormalities are one of the most serious public health concerns. Cirrhosis of the liver is one of the foremost causes of demise from liver diseases. To accurately predict the status of liver cirrhosis, physicians frequently use automated computer-aided approaches. In this paper, through clustering techniques like fuzzy c-means (FCM), possibilistic fuzzy c-means (PFCM), and possibilistic c means (PCM) and sample entropy features are extracted from normal and cirrhotic liver ultrasonic images. The extracted features are classified as normal and cirrhotic through the Gaussian mixture model (GMM), Softmax discriminant classifier (SDC), harmonic search algorithm (HSA), SVM (linear), SVM (RBF), SVM (polynomial), artificial algae optimization (AAO), and hybrid classifier artificial algae optimization (AAO) with Gaussian mixture mode (GMM). The classifiers' performances are compared based on accuracy, F1 Score, MCC, F measure, error rate, and Jaccard metric (JM). The hybrid classifier AAO-GMM, with the PFCM feature, outperforms the other classifiers and attained an accuracy of 99.03% with an MCC of 0.90.
Collapse
Affiliation(s)
| | - Harikumar Rajaguru
- Department of ECE, Bannari Amman Institute of Technology, Tamil Nadu 638401, India
| |
Collapse
|
10
|
Vervust W, Zhang DT, Ghysels A, Roet S, van Erp TS, Riccardi E. PyRETIS 3: Conquering rare and slow events without boundaries. J Comput Chem 2024; 45:1224-1234. [PMID: 38345082 DOI: 10.1002/jcc.27319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 01/16/2024] [Accepted: 01/18/2024] [Indexed: 04/19/2024]
Abstract
We present and discuss the advancements made in PyRETIS 3, the third instalment of our Python library for an efficient and user-friendly rare event simulation, focused to execute molecular simulations with replica exchange transition interface sampling (RETIS) and its variations. Apart from a general rewiring of the internal code towards a more modular structure, several recently developed sampling strategies have been implemented. These include recently developed Monte Carlo moves to increase path decorrelation and convergence rate, and new ensemble definitions to handle the challenges of long-lived metastable states and transitions with unbounded reactant and product states. Additionally, the post-analysis software PyVisa is now embedded in the main code, allowing fast use of machine-learning algorithms for clustering and visualising collective variables in the simulation data.
Collapse
Affiliation(s)
- Wouter Vervust
- IBiTech-BioMMedA Group, Ghent University, Ghent, Belgium
| | - Daniel T Zhang
- Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway
| | - An Ghysels
- IBiTech-BioMMedA Group, Ghent University, Ghent, Belgium
| | - Sander Roet
- Department of Chemistry, Utrecht University, Utrecht, The Netherlands
| | - Titus S van Erp
- Department of Chemistry, Norwegian University of Science and Technology, Trondheim, Norway
| | - Enrico Riccardi
- Department of Energy Resources, University of Stavanger, Stavanger, Norway
| |
Collapse
|
11
|
Rangam H, Sivasankaran SK, Balasubramanian V. Generation of nighttime pedestrian fatal precrash scenarios at junctions in Tamil Nadu, India, using cluster correspondence analysis. TRAFFIC INJURY PREVENTION 2024; 25:870-878. [PMID: 38832922 DOI: 10.1080/15389588.2024.2350695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 04/29/2024] [Indexed: 06/06/2024]
Abstract
OBJECTIVE Modern transportation amenities and lifestyles have changed people's behavioral patterns while using the road, specifically at nighttime. Pedestrian and driver maneuver behaviors change based on their exposure to the environment. Pedestrians are more vulnerable to fatal injuries at junctions due to increased conflict points with vehicles. Generation of precrash scenarios allows drivers and pedestrians to understand errors on the road during driver maneuvering and pedestrian walking/crossing. This study aims to generate precrash scenarios using comprehensive nighttime fatal pedestrian crashes at junctions in Tamil Nadu, India. METHODS Though numerous studies were available on identifying pedestrian crash patterns, only some focused on identifying crash patterns at junctions at night. We used cluster correspondence analysis (CCA) to address this research gap to identify the patterns in nighttime pedestrian fatal crashes at junctions. Further, high-risk precrash scenarios were generated based on the positive residual means available in each cluster. This study used crash data from the Road Accident Database Management System of Tamil Nadu State in India from 2009 to 2018. Characteristics of pedestrians, drivers, vehicles, crashes, light, and roads were input to the CCA to find optimal clusters using the average silhouette width, Calinski-Harabasz measure, and objective values. RESULTS CCA found 4 clusters with 2 dimensions as optimal clusters, with an objective value of 3.3618 and a valence criteria ratio of 80.03%. Results from the analysis distinctly clustered the pedestrian precrash behaviors: Clusters 1 and 2 on pedestrian walking behaviors and clusters 3 and 4 on crossing behaviors. Moreover, a hidden pattern was observed in cluster 4, such as transgender drivers involved in fatal pedestrian crashes at junctions at night. CONCLUSION The generated precrash scenarios may be used to train drivers (novice and inexperienced for nighttime driving), test scenario creation for developing advanced driver/rider assistance systems, hypothesis creation for researchers, and planning of effective strategic interventions for engineers and policymakers to change pedestrian and driver behaviors toward sustainable safety on Indian roads.
Collapse
Affiliation(s)
- Harikrishna Rangam
- RBG (Rehabilitation Bioengineering Group) Lab, Department of Engineering Design, IIT Madras, Chennai, India
| | - Sathish Kumar Sivasankaran
- RBG (Rehabilitation Bioengineering Group) Lab, Department of Engineering Design, IIT Madras, Chennai, India
| | - Venkatesh Balasubramanian
- RBG (Rehabilitation Bioengineering Group) Lab, Department of Engineering Design, IIT Madras, Chennai, India
| |
Collapse
|
12
|
Martins GL, Ferreira DS, Carneiro CM, Nogueira-Paiva NC, Bianchi AGC. Trajectory-driven computational analysis for element characterization in Trypanosoma cruzi video microscopy. PLoS One 2024; 19:e0304716. [PMID: 38829872 PMCID: PMC11146708 DOI: 10.1371/journal.pone.0304716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 05/14/2024] [Indexed: 06/05/2024] Open
Abstract
Optical microscopy videos enable experts to analyze the motion of several biological elements. Particularly in blood samples infected with Trypanosoma cruzi (T. cruzi), microscopy videos reveal a dynamic scenario where the parasites' motions are conspicuous. While parasites have self-motion, cells are inert and may assume some displacement under dynamic events, such as fluids and microscope focus adjustments. This paper analyzes the trajectory of T. cruzi and blood cells to discriminate between these elements by identifying the following motion patterns: collateral, fluctuating, and pan-tilt-zoom (PTZ). We consider two approaches: i) classification experiments for discrimination between parasites and cells; and ii) clustering experiments to identify the cell motion. We propose the trajectory step dispersion (TSD) descriptor based on standard deviation to characterize these elements, outperforming state-of-the-art descriptors. Our results confirm motion is valuable in discriminating T. cruzi of the cells. Since the parasites perform the collateral motion, their trajectory steps tend to randomness. The cells may assume fluctuating motion following a homogeneous and directional path or PTZ motion with trajectory steps in a restricted area. Thus, our findings may contribute to developing new computational tools focused on trajectory analysis, which can advance the study and medical diagnosis of Chagas disease.
Collapse
Affiliation(s)
- Geovani L. Martins
- Postgraduate Program in Computer Science, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
- Department of Computing, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
| | - Daniel S. Ferreira
- Department of Computing, Federal Institute of Education, Science, and Technology of Ceará, Maracanaú, CE, Brazil
| | - Claudia M. Carneiro
- Nucleus of Biological Sciences Research, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
- Department of Clinical Analysis, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
| | - Nivia C. Nogueira-Paiva
- Nucleus of Biological Sciences Research, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
| | - Andrea G. C. Bianchi
- Postgraduate Program in Computer Science, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
- Department of Computing, Federal University of Ouro Preto, Ouro Preto, MG, Brazil
| |
Collapse
|
13
|
Ireddy ATS, Ghorabe FDE, Shishatskaya EI, Ryltseva GA, Dudaev AE, Kozodaev DA, Nosonovsky M, Skorb EV, Zun PS. Benchmarking Unsupervised Clustering Algorithms for Atomic Force Microscopy Data on Polyhydroxyalkanoate Films. ACS OMEGA 2024; 9:21595-21611. [PMID: 38764678 PMCID: PMC11097174 DOI: 10.1021/acsomega.4c02502] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 04/11/2024] [Accepted: 04/12/2024] [Indexed: 05/21/2024]
Abstract
Surface of polyhydroxyalkanoate (PHA) films of varying monomer compositions are analyzed using atomic force microscopy (AFM) and unsupervised machine learning (ML) algorithms to investigate and classify films based on global attributes such as the scan size, film thickness, and monomer type. The experiment provides benchmarked results for 12 of the most widely used clustering algorithms via a hybrid investigation approach while highlighting the impact of using the Fourier transform (FT) on high-dimensional vectorized data for classification on various pools of data. Our findings indicate that the use of a one-dimensional (1D) FT of vectorized data produces the most accurate outcome. The experiment also provides insights into case-by-case investigations of algorithm performances and the impact of various data pools. Lastly, we show an early version of our tool aimed at investigating surfaces using ML approaches and discuss the results of our current experiment to configure future improvements.
Collapse
Affiliation(s)
- Ashish T. S. Ireddy
- Infochemistry
Scientific Centre, ITMO University, 9 Lomonosova St., 191002 St. Petersburg, Russia
| | - Fares D. E. Ghorabe
- Infochemistry
Scientific Centre, ITMO University, 9 Lomonosova St., 191002 St. Petersburg, Russia
| | | | - Galina A. Ryltseva
- Siberian
Federal University, 79 Svobodnyi Av., 660041 Krasnoyarsk, Russia
| | - Alexey E. Dudaev
- Siberian
Federal University, 79 Svobodnyi Av., 660041 Krasnoyarsk, Russia
| | | | - Michael Nosonovsky
- Infochemistry
Scientific Centre, ITMO University, 9 Lomonosova St., 191002 St. Petersburg, Russia
- University
of Wisconsin-Milwaukee, Milwaukee, Wisconsin 53217, United States
| | - Ekaterina V. Skorb
- Infochemistry
Scientific Centre, ITMO University, 9 Lomonosova St., 191002 St. Petersburg, Russia
| | - Pavel S. Zun
- Infochemistry
Scientific Centre, ITMO University, 9 Lomonosova St., 191002 St. Petersburg, Russia
| |
Collapse
|
14
|
Barati Jozan MM, Lotfata A, Hamilton HJ, Tabesh H. An inversion-based clustering approach for complex clusters. BMC Res Notes 2024; 17:133. [PMID: 38735941 PMCID: PMC11089746 DOI: 10.1186/s13104-024-06791-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 04/30/2024] [Indexed: 05/14/2024] Open
Abstract
BACKGROUND The choice of an appropriate similarity measure plays a pivotal role in the effectiveness of clustering algorithms. However, many conventional measures rely solely on feature values to evaluate the similarity between objects to be clustered. Furthermore, the assumption of feature independence, while valid in certain scenarios, does not hold true for all real-world problems. Hence, considering alternative similarity measures that account for inter-dependencies among features can enhance the effectiveness of clustering in various applications. METHODS In this paper, we present the Inv measure, a novel similarity measure founded on the concept of inversion. The Inv measure considers the significance of features, the values of all object features, and the feature values of other objects, leading to a comprehensive and precise evaluation of similarity. To assess the performance of our proposed clustering approach that incorporates the Inv measure, we evaluate it on simulated data using the adjusted Rand index. RESULTS The simulation results strongly indicate that inversion-based clustering outperforms other methods in scenarios where clusters are complex, i.e., apparently highly overlapped. This showcases the practicality and effectiveness of the proposed approach, making it a valuable choice for applications that involve complex clusters across various domains. CONCLUSIONS The inversion-based clustering approach may hold significant value in the healthcare industry, offering possible benefits in tasks like hospital ranking, treatment improvement, and high-risk patient identification. In social media analysis, it may prove valuable for trend detection, sentiment analysis, and user profiling. E-commerce may be able to utilize the approach for product recommendation and customer segmentation. The manufacturing sector may benefit from improved quality control, process optimization, and predictive maintenance. Additionally, the approach may be applied to traffic management and fleet optimization in the transportation domain. Its versatility and effectiveness make it a promising solution for diverse fields, providing valuable insights and optimization opportunities for complex and dynamic data analysis tasks.
Collapse
Affiliation(s)
- Mohammad Mahdi Barati Jozan
- Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Aynaz Lotfata
- Department of Pathology, Microbiology, and Immunology, School Of Veterinary Medicine, University of California, Davis, USA
| | - Howard J Hamilton
- Department of Computer Science, University of Regina, Regina, SK, Canada
| | - Hamed Tabesh
- Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
15
|
Yue S, Fu K, Liu L, Zhao Y. Electrical Sensor Calibration by Fuzzy Clustering with Mandatory Constraint. SENSORS (BASEL, SWITZERLAND) 2024; 24:3068. [PMID: 38793922 PMCID: PMC11125234 DOI: 10.3390/s24103068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 04/25/2024] [Accepted: 05/09/2024] [Indexed: 05/26/2024]
Abstract
Electrical tomography sensors have been widely used for pipeline parameter detection and estimation. Before they can be used in formal applications, the sensors must be calibrated using enough labeled data. However, due to the high complexity of actual measuring environments, the calibrated sensors are inaccurate since the labeling data may be uncertain, inconsistent, incomplete, or even invalid. Alternatively, it is always possible to obtain partial data with accurate labels, which can form mandatory constraints to correct errors in other labeling data. In this paper, a semi-supervised fuzzy clustering algorithm is proposed, and the fuzzy membership degree in the algorithm leads to a set of mandatory constraints to correct these inaccurate labels. Experiments in a dredger validate the proposed algorithm in terms of its accuracy and stability. This new fuzzy clustering algorithm can generally decrease the error of labeling data in any sensor calibration process.
Collapse
Affiliation(s)
| | | | - Liping Liu
- School of Electrical Engineering and Automation, Tianjin University, Tianjin 300072, China; (S.Y.); (K.F.); (Y.Z.)
| | | |
Collapse
|
16
|
Li M, Zhou Z, Zhang Q, Zhang J, Suo Y, Liu J, Shen D, Luo L, Li Y, Li C. Multivariate analysis for data mining to characterize poultry house environment in winter. Poult Sci 2024; 103:103633. [PMID: 38552343 PMCID: PMC11000107 DOI: 10.1016/j.psj.2024.103633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 03/04/2024] [Accepted: 03/05/2024] [Indexed: 04/11/2024] Open
Abstract
The processing and analysis of massive high-dimensional datasets are important issues in precision livestock farming (PLF). This study explored the use of multivariate analysis tools to analyze environmental data from multiple sensors located throughout a broiler house. An experiment was conducted to collect a comprehensive set of environmental data including particulate matter (TSP, PM10, and PM2.5), ammonia, carbon dioxide, air temperature, relative humidity, and in-cage and aisle wind speeds from 60 locations in a typical commercial broiler house. The dataset was divided into 3 growth phases (wk 1-3, 4-6, and 7-9). Spearman's correlation analysis and principal component analysis (PCA) were used to investigate the latent associations between environmental variables resulting in the identification of variables that played important roles in indoor air quality. Three cluster analysis methods; k-means, k-medoids, and fuzzy c-means cluster analysis (FCM), were used to group the measured parameters based on their environmental impact in the broiler house. In general, the Spearman and PCA results showed that the in-cage wind speed, aisle wind speed, and relative humidity played critical roles in indoor air quality distribution during broiler rearing. All 3 clustering methods were found to be suitable for grouping data, with FCM outperforming the other 2. Using data clustering, the broiler house spaces were divided into 3, 2, and 2 subspaces (clusters) for wk 1 to 3, 4 to 6, and 7 to 9, respectively. The subspace in the center of the house had a poorer air quality than other subspaces.
Collapse
Affiliation(s)
- Mingyang Li
- Research Center for Livestock Environmental Control and Smart Production, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu Province 210095, China
| | - Zilin Zhou
- Research Center for Livestock Environmental Control and Smart Production, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu Province 210095, China
| | - Qiang Zhang
- Univ Manitoba, Department of Biosystems Engineering, Winnipeg, MB R3T 5V6, Canada
| | - Jie Zhang
- Research Center for Livestock Environmental Control and Smart Production, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu Province 210095, China
| | - Yunpeng Suo
- Research Center for Livestock Environmental Control and Smart Production, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu Province 210095, China
| | - Junze Liu
- Research Center for Livestock Environmental Control and Smart Production, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu Province 210095, China
| | - Dan Shen
- Research Center for Livestock Environmental Control and Smart Production, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu Province 210095, China
| | - Lu Luo
- Research Center for Livestock Environmental Control and Smart Production, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu Province 210095, China
| | - Yansen Li
- Research Center for Livestock Environmental Control and Smart Production, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu Province 210095, China
| | - Chunmei Li
- Research Center for Livestock Environmental Control and Smart Production, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, Jiangsu Province 210095, China.
| |
Collapse
|
17
|
Wang H, Li N, Zhou Y, Yan J, Jiang B, Kong L, Yan X. Fast Fusion Clustering via Double Random Projection. ENTROPY (BASEL, SWITZERLAND) 2024; 26:376. [PMID: 38785624 PMCID: PMC11119451 DOI: 10.3390/e26050376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 04/25/2024] [Accepted: 04/25/2024] [Indexed: 05/25/2024]
Abstract
In unsupervised learning, clustering is a common starting point for data processing. The convex or concave fusion clustering method is a novel approach that is more stable and accurate than traditional methods such as k-means and hierarchical clustering. However, the optimization algorithm used with this method can be slowed down significantly by the complexity of the fusion penalty, which increases the computational burden. This paper introduces a random projection ADMM algorithm based on the Bernoulli distribution and develops a double random projection ADMM method for high-dimensional fusion clustering. These new approaches significantly outperform the classical ADMM algorithm due to their ability to significantly increase computational speed by reducing complexity and improving clustering accuracy by using multiple random projections under a new evaluation criterion. We also demonstrate the convergence of our new algorithm and test its performance on both simulated and real data examples.
Collapse
Affiliation(s)
- Hongni Wang
- School of Statistics and Mathematics, Shandong University of Finance and Economics, Jinan 250014, China; (H.W.); (N.L.)
| | - Na Li
- School of Statistics and Mathematics, Shandong University of Finance and Economics, Jinan 250014, China; (H.W.); (N.L.)
| | - Yanqiu Zhou
- School of Science, Guangxi University of Science and Technology, Liuzhou 545006, China;
| | - Jingxin Yan
- Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;
| | - Bei Jiang
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB T6G 2G1, Canada;
| | - Linglong Kong
- Department of Mathematical and Statistical Sciences, University of Alberta, Edmonton, AB T6G 2G1, Canada;
| | - Xiaodong Yan
- Zhongtai Securities Institute for Financial Studies, Shandong University, Jinan 250100, China
| |
Collapse
|
18
|
Jiang S, Gai X, Treggiari MM, Stead WW, Zhao Y, Page CD, Zhang AR. Soft phenotyping for sepsis via EHR time-aware soft clustering. J Biomed Inform 2024; 152:104615. [PMID: 38423266 PMCID: PMC11073833 DOI: 10.1016/j.jbi.2024.104615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/25/2024] [Accepted: 02/20/2024] [Indexed: 03/02/2024]
Abstract
OBJECTIVE Sepsis is one of the most serious hospital conditions associated with high mortality. Sepsis is the result of a dysregulated immune response to infection that can lead to multiple organ dysfunction and death. Due to the wide variability in the causes of sepsis, clinical presentation, and the recovery trajectories, identifying sepsis sub-phenotypes is crucial to advance our understanding of sepsis characterization, to choose targeted treatments and optimal timing of interventions, and to improve prognostication. Prior studies have described different sub-phenotypes of sepsis using organ-specific characteristics. These studies applied clustering algorithms to electronic health records (EHRs) to identify disease sub-phenotypes. However, prior approaches did not capture temporal information and made uncertain assumptions about the relationships among the sub-phenotypes for clustering procedures. METHODS We developed a time-aware soft clustering algorithm guided by clinical variables to identify sepsis sub-phenotypes using data available in the EHR. RESULTS We identified six novel sepsis hybrid sub-phenotypes and evaluated them for medical plausibility. In addition, we built an early-warning sepsis prediction model using logistic regression. CONCLUSION Our results suggest that these novel sepsis hybrid sub-phenotypes are promising to provide more accurate information on sepsis-related organ dysfunction and sepsis recovery trajectories which can be important to inform management decisions and sepsis prognosis.
Collapse
Affiliation(s)
- Shiyi Jiang
- Department of Electrical & Computer Engineering, Duke University, Durham, 27708, NC, USA
| | - Xin Gai
- Department of Statistical Science, Duke University, Durham, 27708, NC, USA
| | | | - William W Stead
- Department of Biomedical Informatics, Vanderbilt University, Nashville, 37235, TN, USA
| | - Yuankang Zhao
- Department of Biostatistics & Bioinformatics, Duke University, Durham, 27708, NC, USA
| | - C David Page
- Department of Biostatistics & Bioinformatics, Duke University, Durham, 27708, NC, USA
| | - Anru R Zhang
- Department of Biostatistics & Bioinformatics, Duke University, Durham, 27708, NC, USA; Department of Computer Science, Duke University, Durham, 27708, NC, USA.
| |
Collapse
|
19
|
Zhou J, Dong J, Hou H, Huang L, Li J. High-throughput microfluidic systems accelerated by artificial intelligence for biomedical applications. LAB ON A CHIP 2024; 24:1307-1326. [PMID: 38247405 DOI: 10.1039/d3lc01012k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2024]
Abstract
High-throughput microfluidic systems are widely used in biomedical fields for tasks like disease detection, drug testing, and material discovery. Despite the great advances in automation and throughput, the large amounts of data generated by the high-throughput microfluidic systems generally outpace the abilities of manual analysis. Recently, the convergence of microfluidic systems and artificial intelligence (AI) has been promising in solving the issue by significantly accelerating the process of data analysis as well as improving the capability of intelligent decision. This review offers a comprehensive introduction on AI methods and outlines the current advances of high-throughput microfluidic systems accelerated by AI, covering biomedical detection, drug screening, and automated system control and design. Furthermore, the challenges and opportunities in this field are critically discussed as well.
Collapse
Affiliation(s)
- Jianhua Zhou
- School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China.
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province, School of Biomedical Engineering, Sun Yat-sen University, Shenzhen 518107, China
| | - Jianpei Dong
- School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China.
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province, School of Biomedical Engineering, Sun Yat-sen University, Shenzhen 518107, China
| | - Hongwei Hou
- Beijing Life Science Academy, Beijing 102209, China
| | - Lu Huang
- School of Biomedical Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518107, China.
- Key Laboratory of Sensing Technology and Biomedical Instruments of Guangdong Province, School of Biomedical Engineering, Sun Yat-sen University, Shenzhen 518107, China
| | - Jinghong Li
- Department of Chemistry, Center for BioAnalytical Chemistry, Key Laboratory of Bioorganic Phosphorus Chemistry & Chemical Biology, Tsinghua University, Beijing 100084, China.
- New Cornerstone Science Laboratory, Shenzhen 518054, China
- Beijing Life Science Academy, Beijing 102209, China
- Center for BioAnalytical Chemistry, Hefei National Laboratory of Physical Science at Microscale, University of Science and Technology of China, Hefei 230026, China
| |
Collapse
|
20
|
Kauffmann J, Esders M, Ruff L, Montavon G, Samek W, Muller KR. From Clustering to Cluster Explanations via Neural Networks. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1926-1940. [PMID: 35797317 DOI: 10.1109/tnnls.2022.3185901] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A recent trend in machine learning has been to enrich learned models with the ability to explain their own predictions. The emerging field of explainable AI (XAI) has so far mainly focused on supervised learning, in particular, deep neural network classifiers. In many practical problems, however, the label information is not given and the goal is instead to discover the underlying structure of the data, for example, its clusters. While powerful methods exist for extracting the cluster structure in data, they typically do not answer the question why a certain data point has been assigned to a given cluster. We propose a new framework that can, for the first time, explain cluster assignments in terms of input features in an efficient and reliable manner. It is based on the novel insight that clustering models can be rewritten as neural networks-or "neuralized." Cluster predictions of the obtained networks can then be quickly and accurately attributed to the input features. Several showcases demonstrate the ability of our method to assess the quality of learned clusters and to extract novel insights from the analyzed data and representations.
Collapse
|
21
|
Senthilnath J, Nagaraj G, Sumanth Simha C, Kulkarni S, Thapa M, Indiramma M, Benediktsson JA. DRBM-ClustNet: A Deep Restricted Boltzmann-Kohonen Architecture for Data Clustering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:2560-2574. [PMID: 35857728 DOI: 10.1109/tnnls.2022.3190439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
A Bayesian deep restricted Boltzmann-Kohonen architecture for data clustering termed deep restricted Boltzmann machine (DRBM)-ClustNet is proposed. This core-clustering engine consists of a DRBM for processing unlabeled data by creating new features that are uncorrelated and have large variance with each other. Next, the number of clusters is predicted using the Bayesian information criterion (BIC), followed by a Kohonen network (KN)-based clustering layer. The processing of unlabeled data is done in three stages for efficient clustering of the nonlinearly separable datasets. In the first stage, DRBM performs nonlinear feature extraction by capturing the highly complex data representation by projecting the feature vectors of d dimensions into n dimensions. Most clustering algorithms require the number of clusters to be decided a priori; hence, here, to automate the number of clusters in the second stage, we use BIC. In the third stage, the number of clusters derived from BIC forms the input for the KN, which performs clustering of the feature-extracted data obtained from the DRBM. This method overcomes the general disadvantages of clustering algorithms, such as the prior specification of the number of clusters, convergence to local optima, and poor clustering accuracy on nonlinear datasets. In this research, we use two synthetic datasets, 15 benchmark datasets from the UCI Machine Learning repository, and four image datasets to analyze the DRBM-ClustNet. The proposed framework is evaluated based on clustering accuracy and ranked against other state-of-the-art clustering methods. The obtained results demonstrate that the DRBM-ClustNet outperforms state-of-the-art clustering algorithms.
Collapse
|
22
|
Li L, Wang S, Liu X, Zhu E, Shen L, Li K, Li K. Local Sample-Weighted Multiple Kernel Clustering With Consensus Discriminative Graph. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2024; 35:1721-1734. [PMID: 35839203 DOI: 10.1109/tnnls.2022.3184970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels. Constructing precise and local kernel matrices is proven to be of vital significance in applications since the unreliable distant-distance similarity estimation would degrade clustering performance. Although existing localized MKC algorithms exhibit improved performance compared with globally designed competitors, most of them widely adopt the KNN mechanism to localize kernel matrix by accounting for τ -nearest neighbors. However, such a coarse manner follows an unreasonable strategy that the ranking importance of different neighbors is equal, which is impractical in applications. To alleviate such problems, this article proposes a novel local sample-weighted MKC (LSWMKC) model. We first construct a consensus discriminative affinity graph in kernel space, revealing the latent local structures. Furthermore, an optimal neighborhood kernel for the learned affinity graph is output with naturally sparse property and clear block diagonal structure. Moreover, LSWMKC implicitly optimizes adaptive weights on different neighbors with corresponding samples. Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algorithms. The source code of LSWMKC can be publicly accessed from https://github.com/liliangnudt/LSWMKC.
Collapse
|
23
|
Wang H, Torquato S. Designer pair statistics of disordered many-particle systems with novel properties. J Chem Phys 2024; 160:044911. [PMID: 38294317 DOI: 10.1063/5.0189769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 01/02/2024] [Indexed: 02/01/2024] Open
Abstract
The knowledge of exact analytical functional forms for the pair correlation function g2(r) and its corresponding structure factor S(k) of disordered many-particle systems is limited. For fundamental and practical reasons, it is highly desirable to add to the existing database of analytical functional forms for such pair statistics. Here, we design a plethora of such pair functions in direct and Fourier spaces across the first three Euclidean space dimensions that are realizable by diverse many-particle systems with varying degrees of correlated disorder across length scales, spanning a wide spectrum of hyperuniform, typical nonhyperuniform, and antihyperuniform ones. This is accomplished by utilizing an efficient inverse algorithm that determines equilibrium states with up to pair interactions at positive temperatures that precisely match targeted forms for both g2(r) and S(k). Among other results, we realize an example with the strongest hyperuniform property among known positive-temperature equilibrium states, critical-point systems (implying unusual 1D systems with phase transitions) that are not in the Ising universality class, systems that attain self-similar pair statistics under Fourier transformation, and an experimentally feasible polymer model. We show that our pair functions enable one to achieve many-particle systems with a wide range of translational order and self-diffusion coefficients D, which are inversely related to one another. One can design other realizable pair statistics via linear combinations of our functions or by applying our inverse procedure to other desirable functional forms. Our approach facilitates the inverse design of materials with desirable physical and chemical properties by tuning their pair statistics.
Collapse
Affiliation(s)
- Haina Wang
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, USA
| | - Salvatore Torquato
- Department of Chemistry, Department of Physics, Princeton Materials Institute, and Program in Applied and Computational Mathematics, Princeton University, Princeton, New Jersey 08544, USA
- School of Natural Sciences, Institute for Advanced Study, 1 Einstein Drive, Princeton, New Jersey 08540, USA
| |
Collapse
|
24
|
Han W, Zhang S, Gao H, Bu D. Clustering on hierarchical heterogeneous data with prior pairwise relationships. BMC Bioinformatics 2024; 25:40. [PMID: 38262930 PMCID: PMC10807103 DOI: 10.1186/s12859-024-05652-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Accepted: 01/12/2024] [Indexed: 01/25/2024] Open
Abstract
BACKGROUND Clustering is a fundamental problem in statistics and has broad applications in various areas. Traditional clustering methods treat features equally and ignore the potential structure brought by the characteristic difference of features. Especially in cancer diagnosis and treatment, several types of biological features are collected and analyzed together. Treating these features equally fails to identify the heterogeneity of both data structure and cancer itself, which leads to incompleteness and inefficacy of current anti-cancer therapies. OBJECTIVES In this paper, we propose a clustering framework based on hierarchical heterogeneous data with prior pairwise relationships. The proposed clustering method fully characterizes the difference of features and identifies potential hierarchical structure by rough and refined clusters. RESULTS The refined clustering further divides the clusters obtained by the rough clustering into different subtypes. Thus it provides a deeper insight of cancer that can not be detected by existing clustering methods. The proposed method is also flexible with prior information, additional pairwise relationships of samples can be incorporated to help to improve clustering performance. Finally, well-grounded statistical consistency properties of our proposed method are rigorously established, including the accurate estimation of parameters and determination of clustering structures. CONCLUSIONS Our proposed method achieves better clustering performance than other methods in simulation studies, and the clustering accuracy increases with prior information incorporated. Meaningful biological findings are obtained in the analysis of lung adenocarcinoma with clinical imaging data and omics data, showing that hierarchical structure produced by rough and refined clustering is necessary and reasonable.
Collapse
Affiliation(s)
- Wei Han
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, China
| | - Sanguo Zhang
- School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing, China
| | - Hailong Gao
- School of Mathematics and Statistics, Qingdao University, Qingdao, China
| | - Deliang Bu
- School of Statistics, Capital University of Economics and Business, Beijing, China.
| |
Collapse
|
25
|
Chen P, Zhang S, Zhao K, Kang X, Rittman T, Liu Y. Robustly uncovering the heterogeneity of neurodegenerative disease by using data-driven subtyping in neuroimaging: A review. Brain Res 2024; 1823:148675. [PMID: 37979603 DOI: 10.1016/j.brainres.2023.148675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 10/19/2023] [Accepted: 11/07/2023] [Indexed: 11/20/2023]
Abstract
Neurodegenerative diseases are associated with heterogeneity in genetics, pathology, and clinical manifestation. Understanding this heterogeneity is particularly relevant for clinical prognosis and stratifying patients for disease modifying treatments. Recently, data-driven methods based on neuroimaging have been applied to investigate the subtyping of neurodegenerative disease, helping to disentangle this heterogeneity. We reviewed brain-based subtyping studies in aging and representative neurodegenerative diseases, including Alzheimer's disease, mild cognitive impairment, frontotemporal dementia, and Lewy body dementia, from January 2000 to November 2022. We summarized clustering methods, validation, robustness, reproducibility, and clinical relevance of 71 eligible studies in the present study. We found vast variations in approaches between studies, including ten neuroimaging modalities, 24 cluster algorithms, and 41 methods of cluster number determination. The clinical relevance of subtyping studies was evaluated by summarizing the analysis method of clinical measurements, showing a relatively low clinical utility in the current studies. Finally, we conclude that future studies of heterogeneity in neurodegenerative disease should focus on validation, comparison between subtyping approaches, and prioritise clinical utility.
Collapse
Affiliation(s)
- Pindong Chen
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; Department of Clinical Neurosciences, University of Cambridge, Cambridge, Cambridgeshire, UK
| | - Shirui Zhang
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
| | - Kun Zhao
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
| | - Xiaopeng Kang
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
| | - Timothy Rittman
- Department of Clinical Neurosciences, University of Cambridge, Cambridge, Cambridgeshire, UK
| | - Yong Liu
- Brainnetome Center, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, Beijing University of Posts and Telecommunications, Beijing, China.
| |
Collapse
|
26
|
Xu Q, Zan H, Ji S. A lightweight mixup-based short texts clustering for contrastive learning. Front Comput Neurosci 2024; 17:1334748. [PMID: 38348466 PMCID: PMC10860753 DOI: 10.3389/fncom.2023.1334748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 12/18/2023] [Indexed: 02/15/2024] Open
Abstract
Traditional text clustering based on distance struggles to distinguish between overlapping representations in medical data. By incorporating contrastive learning, the feature space can be optimized and applies mixup implicitly during the data augmentation phase to reduce computational burden. Medical case text is prevalent in everyday life, and clustering is a fundamental method of identifying major categories of conditions within vast amounts of unlabeled text. Learning meaningful clustering scores in data relating to rare diseases is difficult due to their unique sparsity. To address this issue, we propose a contrastive clustering method based on mixup, which involves selecting a small batch of data to simulate the experimental environment of rare diseases. The contrastive learning module optimizes the feature space based on the fact that positive pairs share negative samples, and clustering is employed to group data with comparable semantic features. The module mitigates the issue of overlap in data, whilst mixup generates cost-effective virtual features, resulting in superior experiment scores even when using small batch data and reducing resource usage and time overhead. Our suggested technique has acquired cutting-edge outcomes and embodies a favorable strategy for unmonitored text clustering.
Collapse
Affiliation(s)
| | | | - ShengWei Ji
- School of Artificial Intelligence and Big Data, Hefei University, Hefei, Anhui, China
| |
Collapse
|
27
|
Salmi Y, Bogucka H. Poisoning Attacks against Communication and Computing Task Classification and Detection Techniques. SENSORS (BASEL, SWITZERLAND) 2024; 24:338. [PMID: 38257431 PMCID: PMC11154489 DOI: 10.3390/s24020338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 12/20/2023] [Accepted: 12/30/2023] [Indexed: 01/24/2024]
Abstract
Machine learning-based classification algorithms allow communication and computing (2C) task offloading from the end devices to the edge computing network servers. In this paper, we consider task classification based on the hybrid k-means and k'-nearest neighbors algorithms. Moreover, we examine the poisoning attacks on such ML algorithms, namely noise-like jamming and targeted data feature falsification, and their impact on the effectiveness of 2C task allocation. Then, we also present two anomaly detection methods using noise training and the silhouette score test to detect the poisoned samples and mitigate their impact. Our simulation results show that these attacks have a fatal effect on classification in feature areas where the decision boundary is unclear. They also demonstrate the effectiveness of our countermeasures against the considered attacks.
Collapse
|
28
|
Pifferi M, Boner AL, Cangiotti A, Cudazzo A, Maj D, Gracci S, Michelucci A, Bertini V, Piazza M, Valetto A, Caligo MA, Peroni D, Bush A. The genetic framework of primary ciliary dyskinesia assessed by soft computing analysis. Pediatr Pulmonol 2024. [PMID: 38169302 DOI: 10.1002/ppul.26842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 11/12/2023] [Accepted: 12/17/2023] [Indexed: 01/05/2024]
Abstract
BACKGROUND International guidelines disagree on how best to diagnose primary ciliary dyskinesia (PCD), not least because many tests rely on pattern recognition. We hypothesized that quantitative distribution of ciliary ultrastructural and motion abnormalities would detect most frequent PCD-causing groups of genes by soft computing analysis. METHODS Archived data on transmission electron microscopy and high-speed video analysis from 212 PCD patients were re-examined to quantitate distribution of ultrastructural (10 parameters) and functional ciliary features (4 beat pattern and 2 frequency parameters). The correlation between ultrastructural and motion features was evaluated by blinded clustering analysis of the first two principal components, obtained from ultrastructural variables for each patient. Soft computing was applied to ultrastructure to predict ciliary beat frequency (CBF) and motion patterns by a regression model. Another model classified the patients into the five most frequent PCD-causing gene groups, from their ultrastructure, CBF and beat patterns. RESULTS The patients were subdivided into six clusters with similar values to homologous ultrastructural phenotype, motion patterns, and CBF, except for clusters 1 and 4, attributable to normal ultrastructure. The regression model confirmed the ability to predict functional ciliary features from ultrastructural parameters. The genetic classification model identified most of the different groups of genes, starting from all quantitative parameters. CONCLUSIONS Applying soft computing methodologies to PCD diagnostic tests optimizes their value by moving from pattern recognition to quantification. The approach may also be useful to evaluate atypical PCD, and novel genetic abnormalities of unclear disease-producing potential in the future.
Collapse
Affiliation(s)
- Massimo Pifferi
- Department of Pediatrics, University Hospital of Pisa, Pisa, Italy
| | - Attilio L Boner
- Pediatric Unit, Department of Surgical Science, Dentistry, Gynecology and Pediatrics, Verona University Medical School, Verona, Italy
| | - Angela Cangiotti
- Electron Microscopy Unit, Department of Experimental and Clinical Medicine, University Hospital of Ancona, Ancona, Italy
| | | | - Debora Maj
- Department of Pediatrics, University Hospital of Pisa, Pisa, Italy
| | - Serena Gracci
- Department of Pediatrics, University Hospital of Pisa, Pisa, Italy
| | - Angela Michelucci
- Unit of Molecular Genetics, Department of Laboratory Medicine, University Hospital of Pisa, Pisa, Italy
| | - Veronica Bertini
- Section of Cytogenetics, Department of Laboratory Medicine, University Hospital of Pisa, Pisa, Italy
| | - Michele Piazza
- Pediatric Unit, Department of Surgical Science, Dentistry, Gynecology and Pediatrics, Verona University Medical School, Verona, Italy
| | - Angelo Valetto
- Section of Cytogenetics, Department of Laboratory Medicine, University Hospital of Pisa, Pisa, Italy
| | - Maria Adelaide Caligo
- Unit of Molecular Genetics, Department of Laboratory Medicine, University Hospital of Pisa, Pisa, Italy
| | - Diego Peroni
- Department of Pediatrics, University Hospital of Pisa, Pisa, Italy
| | - Andrew Bush
- Department of Paediatric Respiratory Medicine, Imperial College and Royal Brompton Hospital, London, UK
| |
Collapse
|
29
|
Zhang X, Zhang H, Wang Z, Ma X, Luo J, Zhu Y. PWSC: a novel clustering method based on polynomial weight-adjusted sparse clustering for sparse biomedical data and its application in cancer subtyping. BMC Bioinformatics 2023; 24:490. [PMID: 38129803 PMCID: PMC10740247 DOI: 10.1186/s12859-023-05595-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 12/04/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Clustering analysis is widely used to interpret biomedical data and uncover new knowledge and patterns. However, conventional clustering methods are not effective when dealing with sparse biomedical data. To overcome this limitation, we propose a hierarchical clustering method called polynomial weight-adjusted sparse clustering (PWSC). RESULTS The PWSC algorithm adjusts feature weights using a polynomial function, redefines the distances between samples, and performs hierarchical clustering analysis based on these adjusted distances. Additionally, we incorporate a consensus clustering approach to determine the optimal number of classifications. This consensus approach utilizes relative change in the cumulative distribution function to identify the best number of clusters, resulting in more stable clustering results. Leveraging the PWSC algorithm, we successfully classified a cohort of gastric cancer patients, enabling categorization of patients carrying different types of altered genes. Further evaluation using Entropy showed a significant improvement (p = 2.905e-05), while using the Calinski-Harabasz index demonstrates a remarkable 100% improvement in the quality of the best classification compared to conventional algorithms. Similarly, significantly increased entropy (p = 0.0336) and comparable CHI, were observed when classifying another colorectal cancer cohort with microbial abundance. The above attempts in cancer subtyping demonstrate that PWSC is highly applicable to different types of biomedical data. To facilitate its application, we have developed a user-friendly tool that implements the PWSC algorithm, which canbe accessed at http://pwsc.aiyimed.com/ . CONCLUSIONS PWSC addresses the limitations of conventional approaches when clustering sparse biomedical data. By adjusting feature weights and employing consensus clustering, we achieve improved clustering results compared to conventional methods. The PWSC algorithm provides a valuable tool for researchers in the field, enabling more accurate and stable clustering analysis. Its application can enhance our understanding of complex biological systems and contribute to advancements in various biomedical disciplines.
Collapse
Affiliation(s)
- Xiaomeng Zhang
- Department of Nephrology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei Province, China
| | - Hongtao Zhang
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430070, Hubei Province, China
| | - Zhihao Wang
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430070, Hubei Province, China
| | - Xiaofei Ma
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430070, Hubei Province, China
| | - Jiancheng Luo
- School of Mathematics and Statistics, Wuhan University, Wuhan, 430070, Hubei Province, China.
| | - Yingying Zhu
- Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei Province, China.
| |
Collapse
|
30
|
Brito da Silva LE, Rayapati N, Wunsch DC. iCVI-ARTMAP: Using Incremental Cluster Validity Indices and Adaptive Resonance Theory Reset Mechanism to Accelerate Validation and Achieve Multiprototype Unsupervised Representations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:9757-9770. [PMID: 35353707 DOI: 10.1109/tnnls.2022.3160381] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
This article presents an adaptive resonance theory predictive mapping (ARTMAP) model, which uses incremental cluster validity indices (iCVIs) to perform unsupervised learning, namely, iCVI-ARTMAP. Incorporating iCVIs to the decision-making and many-to-one mapping capabilities of this adaptive resonance theory (ART)-based model can improve the choices of clusters to which samples are incrementally assigned. These improvements are accomplished by intelligently performing the operations of swapping sample assignments between clusters, splitting and merging clusters, and caching the values of variables when iCVI values need to be recomputed. Using recursive formulations enables iCVI-ARTMAP to considerably reduce the computational burden associated with cluster validity index (CVI)-based offline clustering. In this work, six iCVI-ARTMAP variants were realized via the integration of one information-theoretic and five sum-of-squares-based iCVIs into fuzzy ARTMAP. With proper choice of iCVI, iCVI-ARTMAP either outperformed or performed comparably to three ART-based and four non-ART-based clustering algorithms in experiments using benchmark datasets of different natures. Naturally, the performance of iCVI-ARTMAP is subject to the selected iCVI and its suitability to the data at hand; fortunately, it is a general model in which other iCVIs can be easily embedded.
Collapse
|
31
|
Lin G, Zhang Z, Long K, Zhang Y, Lu Y, Geng J, Zhou Z, Feng Q, Lu L, Cao L. GCLR: A self-supervised representation learning pretext task for glomerular filtration barrier segmentation in TEM images. Artif Intell Med 2023; 146:102720. [PMID: 38042604 DOI: 10.1016/j.artmed.2023.102720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 10/04/2023] [Accepted: 11/14/2023] [Indexed: 12/04/2023]
Abstract
Automatic segmentation of the three substructures of glomerular filtration barrier (GFB) in transmission electron microscopy (TEM) images holds immense potential for aiding pathologists in renal disease diagnosis. However, the labor-intensive nature of manual annotations limits the training data for a fully-supervised deep learning model. Addressing this, our study harnesses self-supervised representation learning (SSRL) to utilize vast unlabeled data and mitigate annotation scarcity. Our innovation, GCLR, is a hybrid pixel-level pretext task tailored for GFB segmentation, integrating two subtasks: global clustering (GC) and local restoration (LR). GC captures the overall GFB by learning global context representations, while LR refines three substructures by learning local detail representations. Experiments on 18,928 unlabeled glomerular TEM images for self-supervised pre-training and 311 labeled ones for fine-tuning demonstrate that our proposed GCLR obtains the state-of-the-art segmentation results for all three substructures of GFB with the Dice similarity coefficient of 86.56 ± 0.16%, 75.56 ± 0.36%, and 79.41 ± 0.16%, respectively, compared with other representative self-supervised pretext tasks. Our proposed GCLR also outperforms the fully-supervised pre-training methods based on the three large-scale public datasets - MitoEM, COCO, and ImageNet - with less training data and time.
Collapse
Affiliation(s)
- Guoyu Lin
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Zhentai Zhang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Kaixing Long
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Yiwen Zhang
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Yanmeng Lu
- Central Laboratory, Southern Medical University, Guangzhou, 510515, China
| | - Jian Geng
- Department of Pathology, School of Basic Medical Sciences, Southern Medical University, Guangzhou, 510515, China; Guangzhou Huayin Medical Laboratory Center, Guangzhou, 510515, China
| | - Zhitao Zhou
- Central Laboratory, Southern Medical University, Guangzhou, 510515, China
| | - Qianjin Feng
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China
| | - Lijun Lu
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China.
| | - Lei Cao
- School of Biomedical Engineering, Southern Medical University, Guangzhou, 510515, China; Guangdong Provincial Key Laboratory of Medical Image Processing, Southern Medical University, Guangzhou, 510515, China; Guangdong Province Engineering Laboratory for Medical Imaging and Diagnostic Technology, Southern Medical University, Guangzhou, 510515, China.
| |
Collapse
|
32
|
Mishra BK, Mohanty SN, Baidyanath RR, Ali S, Abduvalieva D, Awwad FA, Ismail EAA, Gupta M. An efficient framework for obtaining the initial cluster centers. Sci Rep 2023; 13:20821. [PMID: 38012340 PMCID: PMC10682192 DOI: 10.1038/s41598-023-48220-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Accepted: 11/23/2023] [Indexed: 11/29/2023] Open
Abstract
Clustering is an important tool for data mining since it can determine key patterns without any prior supervisory information. The initial selection of cluster centers plays a key role in the ultimate effect of clustering. More often researchers adopt the random approach for this purpose in an urge to get the centers in no time for speeding up their model. However, by doing this they sacrifice the true essence of subgroup formation and in numerous occasions ends up in achieving malicious clustering. Due to this reason we were inclined towards suggesting a qualitative approach for obtaining the initial cluster centers and also focused on attaining the well-separated clusters. Our initial contributions were an alteration to the classical K-Means algorithm in an attempt to obtain the near-optimal cluster centers. Few fresh approaches were earlier suggested by us namely, far efficient K-means (FEKM), modified center K-means (MCKM) and modified FEKM using Quickhull (MFQ) which resulted in producing the factual centers leading to excellent clusters formation. K-means, which randomly selects the centers, seem to meet its convergence slightly earlier than these methods, which is the latter's only weakness. An incessant study was continued in this regard to minimize the computational efficiency of our methods and we came up with farthest leap center selection (FLCS). All these methods were thoroughly analyzed by considering the clustering effectiveness, correctness, homogeneity, completeness, complexity and their actual execution time of convergence. For this reason performance indices like Dunn's Index, Davies-Bouldin's Index, and silhouette coefficient were used, for correctness Rand measure was used, for homogeneity and completeness V-measure was used. Experimental results on versatile real world datasets, taken from UCI repository, suggested that both FEKM and FLCS obtain well-separated centers while the later converges earlier.
Collapse
Affiliation(s)
- B K Mishra
- Silicon Institute of Technology, Bhubaneswar, Odisha, 751024, India
| | - Sachi Nandan Mohanty
- School of Computer Science & Engineering (SCOPE), VIT-AP University, Vijayawada, Andhra Pradesh, 522237, India
| | - R R Baidyanath
- Silicon Institute of Technology, Bhubaneswar, Odisha, 751024, India
| | - Shahid Ali
- School of Electronics Engineering, Peking University, Beijing, China.
| | - D Abduvalieva
- Doctor of Philosophy in Pedagogical Sciences, Tashkent State Pedagogical University, Bunyodkor Avenue, 27, 100070, Tashkent, Uzbekistan
| | - Fuad A Awwad
- Department of Quantitative Analysis, College of Business Administration, King Saud University, P.O. Box 71115, 11587, Riyadh, Saudi Arabia
| | - Emad A A Ismail
- Department of Quantitative Analysis, College of Business Administration, King Saud University, P.O. Box 71115, 11587, Riyadh, Saudi Arabia
| | - Manish Gupta
- Division of Research and Technology, Lovely Professional University, Phagwara, India
| |
Collapse
|
33
|
Liang X, Cao L, Chen H, Wang L, Wang Y, Fu L, Tan X, Chen E, Ding Y, Tang J. A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study. Brief Bioinform 2023; 25:bbad497. [PMID: 38168839 PMCID: PMC10782910 DOI: 10.1093/bib/bbad497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 10/13/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open
Abstract
Cell clustering is typically the initial step in single-cell RNA sequencing (scRNA-seq) analyses. The performance of clustering considerably impacts the validity and reproducibility of cell identification. A variety of clustering algorithms have been developed for scRNA-seq data. These algorithms generate cell label sets that assign each cell to a cluster. However, different algorithms usually yield different label sets, which can introduce variations in cell-type identification based on the generated label sets. Currently, the performance of these algorithms has not been systematically evaluated in single-cell transcriptome studies. Herein, we performed a critical assessment of seven state-of-the-art clustering algorithms including four deep learning-based clustering algorithms and commonly used methods Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL) and Single-cell consensus clustering (SC3). We used diverse evaluation indices based on 10 different scRNA-seq benchmarks to systematically evaluate their clustering performance. Our results show that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC) and SC3 consistently outperformed Single-Cell Clustering Assessment Framework and scDeepCluster based on nine effectiveness scores. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. The performance of the single-cell Variational Inference tools varied across different datasets, suggesting its sensitivity to certain dataset characteristics. Notably, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. In addition, SC3 requires more memory and exhibits slower computation speed compared to other algorithms for the same dataset. In sum, this study provides useful guidance for selecting appropriate clustering methods in scRNA-seq data analysis.
Collapse
Affiliation(s)
- Xiao Liang
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lijie Cao
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Hao Chen
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lidan Wang
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Yangyun Wang
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lijuan Fu
- Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China
- Department of Pharmacology, Academician Workstation, Changsha Medical University, Changsha 410219, China
| | - Xiaqin Tan
- The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Enxiang Chen
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
- Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Yubin Ding
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
- Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Jing Tang
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| |
Collapse
|
34
|
Jankowski R, Allard A, Boguñá M, Serrano MÁ. The D-Mercator method for the multidimensional hyperbolic embedding of real networks. Nat Commun 2023; 14:7585. [PMID: 37990019 PMCID: PMC10663512 DOI: 10.1038/s41467-023-43337-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 11/07/2023] [Indexed: 11/23/2023] Open
Abstract
One of the pillars of the geometric approach to networks has been the development of model-based mapping tools that embed real networks in its latent geometry. In particular, the tool Mercator embeds networks into the hyperbolic plane. However, some real networks are better described by the multidimensional formulation of the underlying geometric model. Here, we introduce D-Mercator, a model-based embedding method that produces multidimensional maps of real networks into the (D + 1)-hyperbolic space, where the similarity subspace is represented as a D-sphere. We used D-Mercator to produce multidimensional hyperbolic maps of real networks and estimated their intrinsic dimensionality in terms of navigability and community structure. Multidimensional representations of real networks are instrumental in the identification of factors that determine connectivity and in elucidating fundamental issues that hinge on dimensionality, such as the presence of universality in critical behavior.
Collapse
Affiliation(s)
- Robert Jankowski
- Departament de Física de la Matèria Condensada, Universitat de Barcelona, Martí i Franquès 1, 08028, Barcelona, Spain
- Universitat de Barcelona Institute of Complex Systems (UBICS), Universitat de Barcelona, Barcelona, Spain
| | - Antoine Allard
- Département de Physique, de Génie Physique et d'optique, Université Laval, Québec, Québec, G1V 0A6, Canada
- Centre Interdisciplinaire en Modélisation Mathématique, Université Laval, Québec, Québec, G1V 0A6, Canada
| | - Marián Boguñá
- Departament de Física de la Matèria Condensada, Universitat de Barcelona, Martí i Franquès 1, 08028, Barcelona, Spain
- Universitat de Barcelona Institute of Complex Systems (UBICS), Universitat de Barcelona, Barcelona, Spain
| | - M Ángeles Serrano
- Departament de Física de la Matèria Condensada, Universitat de Barcelona, Martí i Franquès 1, 08028, Barcelona, Spain.
- Universitat de Barcelona Institute of Complex Systems (UBICS), Universitat de Barcelona, Barcelona, Spain.
- ICREA, Pg. Lluís Companys 23, E-08010, Barcelona, Spain.
| |
Collapse
|
35
|
Wang M, Mei J, Darras KFA, Liu F. VGGish-based detection of biological sound components and their spatio-temporal variations in a subtropical forest in eastern China. PeerJ 2023; 11:e16462. [PMID: 38025750 PMCID: PMC10656901 DOI: 10.7717/peerj.16462] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 10/24/2023] [Indexed: 12/01/2023] Open
Abstract
Passive acoustic monitoring technology is widely used to monitor the diversity of vocal animals, but the question of how to quickly extract effective sound patterns remains a challenge due to the difficulty of distinguishing biological sounds within multiple sound sources in a soundscape. In this study, we address the potential application of the VGGish model, pre-trained on Google's AudioSet dataset, for the extraction of acoustic features, together with an unsupervised clustering method based on the Gaussian mixture model, to identify various sound sources from a soundscape of a subtropical forest in China. The results show that different biotic and abiotic components can be distinguished from various confounding sound sources. Birds and insects were the two primary biophony sound sources, and their sounds displayed distinct temporal patterns across both diurnal and monthly time frames and distinct spatial patterns in the landscape. Using the clustering and modeling method of the general sound feature set, we quickly depicted the soundscape in a subtropical forest ecosystem, which could be used to track dynamic changes in the acoustic environment and provide help for biodiversity and ecological environment monitoring.
Collapse
Affiliation(s)
- Mei Wang
- Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Jinjuan Mei
- Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Kevin FA Darras
- Sustainable Agricultural Systems & Engineering Lab, School of Engineering, Westlake University, Hangzhou, China
| | - Fanglin Liu
- Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Hefei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| |
Collapse
|
36
|
Song H, Kim M, Park D, Shin Y, Lee JG. Learning From Noisy Labels With Deep Neural Networks: A Survey. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:8135-8153. [PMID: 35254993 DOI: 10.1109/tnnls.2022.3152527] [Citation(s) in RCA: 45] [Impact Index Per Article: 45.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Deep learning has achieved remarkable success in numerous domains with help from large amounts of big data. However, the quality of data labels is a concern because of the lack of high-quality labels in many real-world scenarios. As noisy labels severely degrade the generalization performance of deep neural networks, learning from noisy labels (robust training) is becoming an important task in modern deep learning applications. In this survey, we first describe the problem of learning with label noise from a supervised learning perspective. Next, we provide a comprehensive review of 62 state-of-the-art robust training methods, all of which are categorized into five groups according to their methodological difference, followed by a systematic comparison of six properties used to evaluate their superiority. Subsequently, we perform an in-depth analysis of noise rate estimation and summarize the typically used evaluation methodology, including public noisy datasets and evaluation metrics. Finally, we present several promising research directions that can serve as a guideline for future studies.
Collapse
|
37
|
Xu N, Li Q, Zhu W, Li Q, Finkelman RB, Engle MA, Wang R, Wang Z. Advocating the Use of Bayesian Network in Analyzing the Modes of Occurrence of Elements in Coal. ACS OMEGA 2023; 8:39096-39109. [PMID: 37901523 PMCID: PMC10600927 DOI: 10.1021/acsomega.3c04109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2023] [Accepted: 09/21/2023] [Indexed: 10/31/2023]
Abstract
Modes of occurrence of elements in coal are important because they can be used not only to understand the origin of inorganic components in coal but also to determine the impact on the environment and human health and the deposition process of coal seams as well. Statistical analysis is one of the commonly used indirect methods used to analyze the modes of occurrence of elements in coal, among which hierarchical clustering is widely used. However, hierarchical clustering may lead to misleading results due to its limitation that it focuses on the clusters of elements rather than a single element. To tackle this issue, we use the first part of a well-known Bayesian network structure learning algorithm, i.e., Peter-Clark (PC) algorithm, to explore the relationships of the coal elemental data and then infer modes of occurrence of elements in coal. A data set containing 95 Late Paleozoic coal samples from the Datanhao and Adaohai mines in Inner Mongolia, China, is used for the performance evaluation. Analytical results show that many instructive and surprising insights can be concluded from the first part of the PC algorithm. Compared with the hierarchical clustering algorithm, the first part of the PC algorithm demonstrates superiority in analyzing the modes of occurrence of elements in coal.
Collapse
Affiliation(s)
- Na Xu
- College
of Geoscience and Survey Engineering, China
University of Mining and Technology (Beijing), Beijing 100083, China
| | - Qiang Li
- College
of Geoscience and Survey Engineering, China
University of Mining and Technology (Beijing), Beijing 100083, China
| | - Wei Zhu
- College
of Geoscience and Survey Engineering, China
University of Mining and Technology (Beijing), Beijing 100083, China
| | - Qing Li
- Department
of Computing, Hong Kong Polytechnic University, Hung Hom, Kowloon, HKSAR, Hong Kong, China
| | - Robert B. Finkelman
- College
of Geoscience and Survey Engineering, China
University of Mining and Technology (Beijing), Beijing 100083, China
- University
of Texas at Dallas, Richardson, Texas 75080, United States
| | - Mark A. Engle
- Department
of Earth, Environmental and Resource Sciences, University of Texas at El Paso, 500 West University Avenue, El Paso, Texas 79968, United States
| | - Ru Wang
- College
of Geoscience and Survey Engineering, China
University of Mining and Technology (Beijing), Beijing 100083, China
| | - Zhiwei Wang
- College
of Geoscience and Survey Engineering, China
University of Mining and Technology (Beijing), Beijing 100083, China
| |
Collapse
|
38
|
Saeipour P, Sarbakhsh P, Salemi S, Bakhtari Aghdam F. A Fuzzy Clustering Approach to Identify Pedestrians' Traffic Behavior Patterns. J Res Health Sci 2023; 23:e00592. [PMID: 38315907 PMCID: PMC10660506 DOI: 10.34172/jrhs.2023.127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 09/09/2023] [Accepted: 09/25/2023] [Indexed: 02/07/2024] Open
Abstract
BACKGROUND Pattern recognition of pedestrians' traffic behavior can enhance the management efficiency of interested groups by targeting access to them and facilitating planning via more specific surveys. This study aimed to evaluate the pedestrians' traffic behavior pattern by fuzzy clustering algorithm and assess the factors related to higher-risk traffic behavior of pedestrians. Study Design: This study is a secondary methodological study based on the data from a cross-sectional study. METHODS The fuzzy c-means (FCM), as a machine learning clustering method, was conducted to identify the pattern of traffic behaviors by collecting data from 600 pedestrians in Urmia, Iran via "the Pedestrian Behavior Questionnaire" (PBQ) and using 5 domains of PBQ. Multiple logistic regression was fitted to identify risk factors of traffic behaviors. RESULTS Results revealed two clusters consisting of lower-risk and higher-risk behaviors. The majority of pedestrians (64.33%) were in the lower-risk cluster. Subjects≤33 years old (Odds ratio [OR]=1.92, P<0.001), subjects with≤6 years of education (OR=1.74, P=0.010), males (OR=1.90, P=0.001), unmarried pedestrians (OR=3.61, P=0.007), and users of public transportation (OR=2.01, P=0.002) were more likely to have higher-risk traffic behavior. CONCLUSION We identified traffic behavior patterns of Urmia pedestrians with lower-risk and higher-risk behaviors via FCM. The findings from this study would be helpful for policymakers to promote safety measures and train pedestrians.
Collapse
Affiliation(s)
- Parisa Saeipour
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Parvin Sarbakhsh
- Department of Statistics and Epidemiology, Faculty of Health, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Saman Salemi
- Department of Medicine, Islamic Azad University Tehran Medical Sciences, Tehran, Iran
| | | |
Collapse
|
39
|
Gao CX, Dwyer D, Zhu Y, Smith CL, Du L, Filia KM, Bayer J, Menssink JM, Wang T, Bergmeir C, Wood S, Cotton SM. An overview of clustering methods with guidelines for application in mental health research. Psychiatry Res 2023; 327:115265. [PMID: 37348404 DOI: 10.1016/j.psychres.2023.115265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Revised: 05/20/2023] [Accepted: 05/21/2023] [Indexed: 06/24/2023]
Abstract
Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and libraries.
Collapse
Affiliation(s)
- Caroline X Gao
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia; Department of Epidemiology and Preventative Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia.
| | - Dominic Dwyer
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Ye Zhu
- School of Information Technology, Deakin University, Geelong, VIC, Australia
| | - Catherine L Smith
- Department of Epidemiology and Preventative Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, Australia
| | - Lan Du
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Kate M Filia
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Johanna Bayer
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Jana M Menssink
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Teresa Wang
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Christoph Bergmeir
- Faculty of Information Technology, Monash University, Clayton, VIC, Australia; Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Stephen Wood
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| | - Sue M Cotton
- Centre for Youth Mental Health, The University of Melbourne, Parkville, VIC, Australia; Orygen, Parkville, VIC, Australia
| |
Collapse
|
40
|
Yang L, Fan W, Bouguila N. Deep Clustering Analysis via Dual Variational Autoencoder With Spherical Latent Embeddings. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:6303-6312. [PMID: 34941534 DOI: 10.1109/tnnls.2021.3135460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
In recent years, clustering methods based on deep generative models have received great attention in various unsupervised applications, due to their capabilities for learning promising latent embeddings from original data. This article proposes a novel clustering method based on variational autoencoder (VAE) with spherical latent embeddings. The merits of our clustering method can be summarized as follows. First, instead of considering the Gaussian mixture model (GMM) as the prior over latent space as in a variety of existing VAE-based deep clustering methods, the von Mises-Fisher mixture model prior is deployed in our method, leading to spherical latent embeddings that can explicitly control the balance between the capacity of decoder and the utilization of latent embedding in a principled way. Second, a dual VAE structure is leveraged to impose the reconstruction constraint for the latent embedding and its corresponding noise counterpart, which embeds the input data into a hyperspherical latent space for clustering. Third, an augmented loss function is proposed to enhance the robustness of our model, which results in a self-supervised manner through the mutual guidance between the original data and the augmented ones. The effectiveness of the proposed deep generative clustering method is validated through comparisons with state-of-the-art deep clustering methods on benchmark datasets. The source code of the proposed model is available at https://github.com/fwt-team/DSVAE.
Collapse
|
41
|
Lin Y, Wang Y, Qu H, Xiong Y. Research on stress curve clustering algorithm of Fiber Bragg grating sensor. Sci Rep 2023; 13:11815. [PMID: 37479882 PMCID: PMC10362002 DOI: 10.1038/s41598-023-39058-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 07/19/2023] [Indexed: 07/23/2023] Open
Abstract
The global stress distribution and state parameter analysis of the building's main structure is an urgent problem to be solved in the online state assessment technology of building structure health. In this paper, a stress curve clustering algorithm of fiber Bragg grating stress sensor based on density clustering algorithm is proposed. To solve the problem of large dimension and sparse sample space of sensor stress curve, the distance between samples is measured based on improved cosine similarity. Aiming at the problem of low efficiency and poor effect of traditional clustering algorithm, density clustering algorithm based on mutual nearest neighbor is used to cluster. Finally, the classification of the daily stress load characteristics of the sensor is realized, which provides a basis for constructing the mathematical analysis model of building health. The experimental results show that the stress curve clustering method proposed in this paper is better than the latest clustering algorithms such as HDBSCAN, CBKM, K-mean++,FINCH and NPIR, and is suitable for the feature classification of stress curves of fiber Bragg grating sensors.
Collapse
Affiliation(s)
- Yisen Lin
- School of Computer Science and Engineering, Guilin University of Aerospace Technology, Guilin, 541004, China
| | - Ye Wang
- School of Computer Science and Engineering, Guilin University of Aerospace Technology, Guilin, 541004, China.
| | - Huichen Qu
- School of Computer Science and Engineering, Guilin University of Aerospace Technology, Guilin, 541004, China
| | - Yiwen Xiong
- School of Computer Science and Engineering, Guilin University of Aerospace Technology, Guilin, 541004, China
| |
Collapse
|
42
|
Adnan M, Slavic G, Martin Gomez D, Marcenaro L, Regazzoni C. Systematic and Comprehensive Review of Clustering and Multi-Target Tracking Techniques for LiDAR Point Clouds in Autonomous Driving Applications. SENSORS (BASEL, SWITZERLAND) 2023; 23:6119. [PMID: 37447967 DOI: 10.3390/s23136119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 06/03/2023] [Accepted: 06/27/2023] [Indexed: 07/15/2023]
Abstract
Autonomous vehicles (AVs) rely on advanced sensory systems, such as Light Detection and Ranging (LiDAR), to function seamlessly in intricate and dynamic environments. LiDAR produces highly accurate 3D point clouds, which are vital for the detection, classification, and tracking of multiple targets. A systematic review and classification of various clustering and Multi-Target Tracking (MTT) techniques are necessary due to the inherent challenges posed by LiDAR data, such as density, noise, and varying sampling rates. As part of this study, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology was employed to examine the challenges and advancements in MTT techniques and clustering for LiDAR point clouds within the context of autonomous driving. Searches were conducted in major databases such as IEEE Xplore, ScienceDirect, SpringerLink, ACM Digital Library, and Google Scholar, utilizing customized search strategies. We identified and critically reviewed 76 relevant studies based on rigorous screening and evaluation processes, assessing their methodological quality, data handling adequacy, and reporting compliance. As a result of this comprehensive review and classification, we were able to provide a detailed overview of current challenges, research gaps, and advancements in clustering and MTT techniques for LiDAR point clouds, thus contributing to the field of autonomous driving. Researchers and practitioners working in the field of autonomous driving will benefit from this study, which was characterized by transparency and reproducibility on a systematic basis.
Collapse
Affiliation(s)
- Muhammad Adnan
- Department of Electrical, Electronic, Telecommunications Engineering and Naval Architecture (DITEN), University of Genova, Via Opera Pia 11a, I-16145 Genoa, Italy
- Departamento de Ingeniería de Sistemas y Automática, Universidad Carlos III de Madrid, Butarque 15, Leganés, 28911 Madrid, Spain
| | - Giulia Slavic
- Department of Electrical, Electronic, Telecommunications Engineering and Naval Architecture (DITEN), University of Genova, Via Opera Pia 11a, I-16145 Genoa, Italy
- Departamento de Ingeniería de Sistemas y Automática, Universidad Carlos III de Madrid, Butarque 15, Leganés, 28911 Madrid, Spain
| | - David Martin Gomez
- Departamento de Ingeniería de Sistemas y Automática, Universidad Carlos III de Madrid, Butarque 15, Leganés, 28911 Madrid, Spain
| | - Lucio Marcenaro
- Department of Electrical, Electronic, Telecommunications Engineering and Naval Architecture (DITEN), University of Genova, Via Opera Pia 11a, I-16145 Genoa, Italy
| | - Carlo Regazzoni
- Department of Electrical, Electronic, Telecommunications Engineering and Naval Architecture (DITEN), University of Genova, Via Opera Pia 11a, I-16145 Genoa, Italy
| |
Collapse
|
43
|
Pauk J, Daunoraviciene K, Ziziene J, Minta-Bielecka K, Dzieciol-Anikiej Z. Classification of muscle activity patterns in healthy children using biclustering algorithm. Biomed Signal Process Control 2023. [DOI: 10.1016/j.bspc.2023.104731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
|
44
|
Soubra R, Mourad-Chehade F, Chkeir A. Automation of the Timed Up and Go Test Using a Doppler Radar System for Gait and Balance Analysis in Elderly People. JOURNAL OF HEALTHCARE ENGINEERING 2023; 2023:2016262. [PMID: 37426725 PMCID: PMC10325879 DOI: 10.1155/2023/2016262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 02/07/2023] [Accepted: 02/27/2023] [Indexed: 07/11/2023]
Abstract
The timed up and go (TUG) test is a simple, valid, and reliable clinical tool that is widely used to assess mobility in elderly people. Several research studies have been conducted to automate the TUG test using wearable sensors or motion-tracking systems. Despite their promising results, the adopted technological systems present inconveniences in terms of acceptability and privacy protection. In this work, we propose to overcome these problems by using a Doppler radar system set into the backrest of a chair in order to automate the TUG test and extract additional information from its phases (i.e., transfer, walk, and turn). We intend to segment its phases and extract spatiotemporal gait parameters automatically. Our methodology is mainly based on a multiresolution analysis of radar signals. We proposed a segmentation technique based on the extraction of limbs oscillations signals through a semisupervised machine learning approach, on the one hand, and the application of the DARC algorithm on the other hand. Once the speed signals of torso and limbs oscillations were detected, we suggested estimating 14 gait parameters. All our approaches were validated by comparing outcomes to those obtained from a reference Vicon system. High correlation coefficients were obtained by comparing the speed signals of the torso (ρ=0.8), the speed signals of limbs oscillations (ρ=0.91), the initial and final indices of TUG phases (ρ=0.95), and the extracted parameters (percentage error < 4.8) obtained after radar signal processing to those obtained from the Vicon system.
Collapse
Affiliation(s)
- Racha Soubra
- Laboratory of Computer Science and Digital Society (LIST3N), University of Technology of Troyes, Troyes, France
| | - Farah Mourad-Chehade
- Laboratory of Computer Science and Digital Society (LIST3N), University of Technology of Troyes, Troyes, France
| | - Aly Chkeir
- Laboratory of Computer Science and Digital Society (LIST3N), University of Technology of Troyes, Troyes, France
| |
Collapse
|
45
|
Vlahek D, Mongus D. An Efficient Iterative Approach to Explainable Feature Learning. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:2606-2618. [PMID: 34478388 DOI: 10.1109/tnnls.2021.3107049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
This article introduces a new iterative approach to explainable feature learning. During each iteration, new features are generated, first by applying arithmetic operations on the input set of features. These are then evaluated in terms of probability distribution agreements between values of samples belonging to different classes. Finally, a graph-based approach for feature selection is proposed, which allows for selecting high-quality and uncorrelated features to be used in feature generation during the next iteration. As shown by the results, the proposed method improved the accuracy of all tested classifiers, where the best accuracies were achieved using random forest. In addition, the method turned out to be insensitive to both of the input parameters, while superior performances in comparison to the state of the art were demonstrated on nine out of 15 test sets and achieving comparable results in the others. Finally, we demonstrate the explainability of the learned feature representation for knowledge discovery.
Collapse
|
46
|
Jurdana V, Lopac N, Vrankic M. Sparse Time-Frequency Distribution Reconstruction Using the Adaptive Compressed Sensed Area Optimized with the Multi-Objective Approach. SENSORS (BASEL, SWITZERLAND) 2023; 23:4148. [PMID: 37112488 PMCID: PMC10143442 DOI: 10.3390/s23084148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 04/17/2023] [Accepted: 04/19/2023] [Indexed: 06/19/2023]
Abstract
Compressive sensing (CS) of the signal ambiguity function (AF) and enforcing the sparsity constraint on the resulting signal time-frequency distribution (TFD) has been shown to be an efficient method for time-frequency signal processing. This paper proposes a method for adaptive CS-AF area selection, which extracts the magnitude-significant AF samples through a clustering approach using the density-based spatial clustering algorithm. Moreover, an appropriate criterion for the performance of the method is formalized, i.e., component concentration and preservation, as well as interference suppression, are measured utilizing the information obtained from the short-term and the narrow-band Rényi entropies, while component connectivity is evaluated using the number of regions with continuously-connected samples. The CS-AF area selection and reconstruction algorithm parameters are optimized using an automatic multi-objective meta-heuristic optimization method, minimizing the here-proposed combination of measures as objective functions. Consistent improvement in CS-AF area selection and TFD reconstruction performance has been achieved without requiring a priori knowledge of the input signal for multiple reconstruction algorithms. This was demonstrated for both noisy synthetic and real-life signals.
Collapse
Affiliation(s)
- Vedran Jurdana
- Faculty of Engineering, University of Rijeka, 51000 Rijeka, Croatia;
| | - Nikola Lopac
- Faculty of Maritime Studies, University of Rijeka, 51000 Rijeka, Croatia
- Center for Artificial Intelligence and Cybersecurity, University of Rijeka, 51000 Rijeka, Croatia
| | - Miroslav Vrankic
- Faculty of Engineering, University of Rijeka, 51000 Rijeka, Croatia;
| |
Collapse
|
47
|
Yang F, Li C, Peng Y, Liu J, Yao Y, Wen J, Yang S. Locating the propagation source in complex networks with observers-based similarity measures and direction-induced search. Soft comput 2023; 27:1-27. [PMID: 37362267 PMCID: PMC10072820 DOI: 10.1007/s00500-023-08000-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/24/2023] [Indexed: 04/07/2023]
Abstract
Locating the propagation source is one of the most important strategies to control the harmful diffusion process on complex networks. Most existing methods only consider the infection time information of the observers, but the diffusion direction information of the observers is ignored, which is helpful to locate the source. In this paper, we consider both of the diffusion direction information and the infection time information to locate the source. We introduce a relaxed direction-induced search (DIS) to utilize the diffusion direction information of the observers to approximate the actual diffusion tree on a network. Based on the relaxed DIS, we further utilize the infection time information of the observers to define two kinds of observers-based similarity measures, including the Infection Time Similarity and the Infection Time Order Similarity. With the two kinds of similarity measures and the relaxed DIS, a novel source locating method is proposed. We validate the performance of the proposed method on a series of synthetic and real networks. The experimental results show that the proposed method is feasible and effective in accurately locating the propagation source.
Collapse
Affiliation(s)
- Fan Yang
- School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou, 545006 China
- Key Laboratory of Intelligent Information Processing and Graph Processing, Guangxi University of Science and Technology, Liuzhou, 545006 China
| | - Chungui Li
- School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou, 545006 China
| | - Yong Peng
- School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou, 545006 China
| | - Jingxian Liu
- School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou, 545006 China
- Key Laboratory of Intelligent Information Processing and Graph Processing, Guangxi University of Science and Technology, Liuzhou, 545006 China
| | - Yabing Yao
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, 730050 China
| | - Jiayan Wen
- School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou, 545006 China
| | - Shuhong Yang
- School of Computer Science and Technology, Guangxi University of Science and Technology, Liuzhou, 545006 China
| |
Collapse
|
48
|
An ensemble algorithm using quantum evolutionary optimization of weighted type-II fuzzy system and staged Pegasos Quantum Support Vector Classifier with multi-criteria decision making system for diagnosis and grading of breast cancer. Soft comput 2023. [DOI: 10.1007/s00500-023-07939-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2023]
|
49
|
Baltoumas FA, Karatzas E, Paez-Espino D, Venetsianou NK, Aplakidou E, Oulas A, Finn RD, Ovchinnikov S, Pafilis E, Kyrpides NC, Pavlopoulos GA. Exploring microbial functional biodiversity at the protein family level-From metagenomic sequence reads to annotated protein clusters. FRONTIERS IN BIOINFORMATICS 2023; 3:1157956. [PMID: 36959975 PMCID: PMC10029925 DOI: 10.3389/fbinf.2023.1157956] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2023] [Accepted: 02/21/2023] [Indexed: 03/06/2023] Open
Abstract
Metagenomics has enabled accessing the genetic repertoire of natural microbial communities. Metagenome shotgun sequencing has become the method of choice for studying and classifying microorganisms from various environments. To this end, several methods have been developed to process and analyze the sequence data from raw reads to end-products such as predicted protein sequences or families. In this article, we provide a thorough review to simplify such processes and discuss the alternative methodologies that can be followed in order to explore biodiversity at the protein family level. We provide details for analysis tools and we comment on their scalability as well as their advantages and disadvantages. Finally, we report the available data repositories and recommend various approaches for protein family annotation related to phylogenetic distribution, structure prediction and metadata enrichment.
Collapse
Affiliation(s)
- Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Nefeli K. Venetsianou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Robert D. Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, United Kingdom
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, United States
| | - Evangelos Pafilis
- Institute of Marine Biology, Biotechnology and Aquaculture (IMBBC), Hellenic Centre for Marine Research (HCMR), Heraklion, Greece
| | - Nikos C. Kyrpides
- Lawrence Berkeley National Laboratory, DOE Joint Genome Institute, Berkeley, CA, United States
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Center of New Biotechnologies and Precision Medicine, Department of Medicine, School of Health Sciences, National and Kapodistrian University of Athens, Athens, Greece
- Hellenic Army Academy, Vari, Greece
| |
Collapse
|
50
|
Deng P, Li T, Wang D, Wang H, Peng H, Horng SJ. Multi-view clustering guided by unconstrained non-negative matrix factorization. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
|