1
|
Jiang Y, Dang Y, Wu Q, Yuan B, Gao L, You C. Using a k-means clustering to identify novel phenotypes of acute ischemic stroke and development of its Clinlabomics models. Front Neurol 2024; 15:1366307. [PMID: 38601342 PMCID: PMC11004235 DOI: 10.3389/fneur.2024.1366307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2024] [Accepted: 03/11/2024] [Indexed: 04/12/2024] Open
Abstract
Objective Acute ischemic stroke (AIS) is a heterogeneous condition. To stratify the heterogeneity, identify novel phenotypes, and develop Clinlabomics models of phenotypes that can conduct more personalized treatments for AIS. Methods In a retrospective analysis, consecutive AIS and non-AIS inpatients were enrolled. An unsupervised k-means clustering algorithm was used to classify AIS patients into distinct novel phenotypes. Besides, the intergroup comparisons across the phenotypes were performed in clinical and laboratory data. Next, the least absolute shrinkage and selection operator (LASSO) algorithm was used to select essential variables. In addition, Clinlabomics predictive models of phenotypes were established by a support vector machines (SVM) classifier. We used the area under curve (AUC), accuracy, sensitivity, and specificity to evaluate the performance of the models. Results Of the three derived phenotypes in 909 AIS patients [median age 64 (IQR: 17) years, 69% male], in phenotype 1 (N = 401), patients were relatively young and obese and had significantly elevated levels of lipids. Phenotype 2 (N = 463) was associated with abnormal ion levels. Phenotype 3 (N = 45) was characterized by the highest level of inflammation, accompanied by mild multiple-organ dysfunction. The external validation cohort prospectively collected 507 AIS patients [median age 60 (IQR: 18) years, 70% male]. Phenotype characteristics were similar in the validation cohort. After LASSO analysis, Clinlabomics models of phenotype 1 and 2 were constructed by the SVM algorithm, yielding high AUC (0.977, 95% CI: 0.961-0.993 and 0.984, 95% CI: 0.971-0.997), accuracy (0.936, 95% CI: 0.922-0.956 and 0.952, 95% CI: 0.938-0.972), sensitivity (0.984, 95% CI: 0.968-0.998 and 0.958, 95% CI: 0.939-0.984), and specificity (0.892, 95% CI: 0.874-0.926 and 0.945, 95% CI: 0.923-0.969). Conclusion In this study, three novel phenotypes that reflected the abnormal variables of AIS patients were identified, and the Clinlabomics models of phenotypes were established, which are conducive to individualized treatments.
Collapse
Affiliation(s)
- Yao Jiang
- Laboratory Medicine Center, The Second Hospital and Clinical Medical School, Lanzhou University, Lanzhou, China
| | - Yingqiang Dang
- Laboratory Medicine Center, The Second Hospital and Clinical Medical School, Lanzhou University, Lanzhou, China
| | - Qian Wu
- Laboratory Medicine Center, The Second Hospital and Clinical Medical School, Lanzhou University, Lanzhou, China
| | - Boyao Yuan
- Department of Neurology, The Second Hospital and Clinical Medical School, Lanzhou University, Lanzhou, China
| | - Lina Gao
- Laboratory Medicine Center, The Second Hospital and Clinical Medical School, Lanzhou University, Lanzhou, China
| | - Chongge You
- Laboratory Medicine Center, The Second Hospital and Clinical Medical School, Lanzhou University, Lanzhou, China
| |
Collapse
|
2
|
Liang X, Cao L, Chen H, Wang L, Wang Y, Fu L, Tan X, Chen E, Ding Y, Tang J. A critical assessment of clustering algorithms to improve cell clustering and identification in single-cell transcriptome study. Brief Bioinform 2023; 25:bbad497. [PMID: 38168839 PMCID: PMC10782910 DOI: 10.1093/bib/bbad497] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2023] [Revised: 10/13/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open
Abstract
Cell clustering is typically the initial step in single-cell RNA sequencing (scRNA-seq) analyses. The performance of clustering considerably impacts the validity and reproducibility of cell identification. A variety of clustering algorithms have been developed for scRNA-seq data. These algorithms generate cell label sets that assign each cell to a cluster. However, different algorithms usually yield different label sets, which can introduce variations in cell-type identification based on the generated label sets. Currently, the performance of these algorithms has not been systematically evaluated in single-cell transcriptome studies. Herein, we performed a critical assessment of seven state-of-the-art clustering algorithms including four deep learning-based clustering algorithms and commonly used methods Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL) and Single-cell consensus clustering (SC3). We used diverse evaluation indices based on 10 different scRNA-seq benchmarks to systematically evaluate their clustering performance. Our results show that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC) and SC3 consistently outperformed Single-Cell Clustering Assessment Framework and scDeepCluster based on nine effectiveness scores. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. The performance of the single-cell Variational Inference tools varied across different datasets, suggesting its sensitivity to certain dataset characteristics. Notably, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. In addition, SC3 requires more memory and exhibits slower computation speed compared to other algorithms for the same dataset. In sum, this study provides useful guidance for selecting appropriate clustering methods in scRNA-seq data analysis.
Collapse
Affiliation(s)
- Xiao Liang
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lijie Cao
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Hao Chen
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lidan Wang
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Yangyun Wang
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| | - Lijuan Fu
- Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China
- Department of Pharmacology, Academician Workstation, Changsha Medical University, Changsha 410219, China
| | - Xiaqin Tan
- The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Enxiang Chen
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
- Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Yubin Ding
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
- Joint International Research Laboratory of Reproduction and Development of the Ministry of Education of China, School of Public Health, Chongqing Medical University, Chongqing 400016, China
| | - Jing Tang
- Department of Obstetrics and Gynecology, Women and Children’s Hospital of Chongqing Medical University, Chongqing 401147, China
- School of Basic Medicine, Chongqing Medical University, Chongqing 400016, China
| |
Collapse
|
3
|
Qi L, Huang L, Zhang Y, Chen Y, Wang J, Zhang X. A Real-Time Vessel Detection and Tracking System Based on LiDAR. Sensors (Basel) 2023; 23:9027. [PMID: 38005415 PMCID: PMC10674757 DOI: 10.3390/s23229027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 10/23/2023] [Accepted: 10/24/2023] [Indexed: 11/26/2023]
Abstract
Vessel detection and tracking is of utmost importance to river traffic. Efficient detection and tracking technology offer an effective solution to address challenges related to river traffic safety and congestion. Traditional image-based object detection and tracking algorithms encounter issues such as target ID switching, difficulties in feature extraction, reduced robustness due to occlusion, target overlap, and changes in brightness and contrast. To detect and track vessels more accurately, a vessel detection and tracking algorithm based on the LiDAR point cloud was proposed. For vessel detection, statistical filtering algorithms were integrated into the Euclidean clustering algorithm to mitigate the effect of ripples on vessel detection. Our detection accuracy of vessels improved by 3.3% to 8.3% compared to three conventional algorithms. For vessel tracking, L-shape fitting of detected vessels can improve the efficiency of tracking, and a simple and efficient tracking algorithm is presented. By comparing three traditional tracking algorithms, an improvement in multiple object tracking accuracy (MOTA) and a reduction in ID switch times and number of missed detections were achieved. The results demonstrate that LiDAR point cloud-based vessel detection can significantly enhance the accuracy of vessel detection and tracking.
Collapse
Affiliation(s)
| | - Lei Huang
- School of Mechanical Engineering, Nanjing Forestry University of China, Nanjing 210037, China; (L.Q.); (Y.Z.); (Y.C.); (J.W.); (X.Z.)
| | | | | | | | | |
Collapse
|
4
|
Khvorykh GV, Sapozhnikov NA, Limborska SA, Khrunin AV. Evaluation of Density-Based Spatial Clustering for Identifying Genomic Loci Associated with Ischemic Stroke in Genome-Wide Data. Int J Mol Sci 2023; 24:15355. [PMID: 37895035 PMCID: PMC10607504 DOI: 10.3390/ijms242015355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 09/19/2023] [Accepted: 09/28/2023] [Indexed: 10/29/2023] Open
Abstract
The genetic architecture of ischemic stroke (IS), which is one of the leading causes of death worldwide, is complex and underexplored. The traditional approach for associative gene mapping is genome-wide association studies (GWASs), testing individual single-nucleotide polymorphisms (SNPs) across the genomes of case and control groups. The purpose of this research is to develop an alternative approach in which groups of SNPs are examined rather than individual ones. We proposed, validated and applied to real data a new workflow consisting of three key stages: grouping SNPs in clusters, inferring the haplotypes in the clusters and testing haplotypes for the association with phenotype. To group SNPs, we applied the clustering algorithms DBSCAN and HDBSCAN to linkage disequilibrium (LD) matrices, representing pairwise r2 values between all genotyped SNPs. These clustering algorithms have never before been applied to genotype data as part of the workflow of associative studies. In total, 883,908 SNPs and insertion/deletion polymorphisms from people of European ancestry (4929 cases and 652 controls) were processed. The subsequent testing for frequencies of haplotypes restored in the clusters of SNPs revealed dozens of genes associated with IS and suggested the complex role that protocadherin molecules play in IS. The developed workflow was validated with the use of a simulated dataset of similar ancestry and the same sample sizes. The results of classic GWASs are also provided and discussed. The considered clustering algorithms can be applied to genotypic data to identify the genomic loci associated with different qualitative traits, using the workflow presented in this research.
Collapse
Affiliation(s)
| | | | | | - Andrey V. Khrunin
- National Research Centre “Kurchatov Institute”, Kurchatov Sq. 2, Moscow 123182, Russia; (G.V.K.); (N.A.S.); (S.A.L.)
| |
Collapse
|
5
|
Adnan M, Slavic G, Martin Gomez D, Marcenaro L, Regazzoni C. Systematic and Comprehensive Review of Clustering and Multi-Target Tracking Techniques for LiDAR Point Clouds in Autonomous Driving Applications. Sensors (Basel) 2023; 23:6119. [PMID: 37447967 DOI: 10.3390/s23136119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Revised: 06/03/2023] [Accepted: 06/27/2023] [Indexed: 07/15/2023]
Abstract
Autonomous vehicles (AVs) rely on advanced sensory systems, such as Light Detection and Ranging (LiDAR), to function seamlessly in intricate and dynamic environments. LiDAR produces highly accurate 3D point clouds, which are vital for the detection, classification, and tracking of multiple targets. A systematic review and classification of various clustering and Multi-Target Tracking (MTT) techniques are necessary due to the inherent challenges posed by LiDAR data, such as density, noise, and varying sampling rates. As part of this study, the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodology was employed to examine the challenges and advancements in MTT techniques and clustering for LiDAR point clouds within the context of autonomous driving. Searches were conducted in major databases such as IEEE Xplore, ScienceDirect, SpringerLink, ACM Digital Library, and Google Scholar, utilizing customized search strategies. We identified and critically reviewed 76 relevant studies based on rigorous screening and evaluation processes, assessing their methodological quality, data handling adequacy, and reporting compliance. As a result of this comprehensive review and classification, we were able to provide a detailed overview of current challenges, research gaps, and advancements in clustering and MTT techniques for LiDAR point clouds, thus contributing to the field of autonomous driving. Researchers and practitioners working in the field of autonomous driving will benefit from this study, which was characterized by transparency and reproducibility on a systematic basis.
Collapse
Affiliation(s)
- Muhammad Adnan
- Department of Electrical, Electronic, Telecommunications Engineering and Naval Architecture (DITEN), University of Genova, Via Opera Pia 11a, I-16145 Genoa, Italy
- Departamento de Ingeniería de Sistemas y Automática, Universidad Carlos III de Madrid, Butarque 15, Leganés, 28911 Madrid, Spain
| | - Giulia Slavic
- Department of Electrical, Electronic, Telecommunications Engineering and Naval Architecture (DITEN), University of Genova, Via Opera Pia 11a, I-16145 Genoa, Italy
- Departamento de Ingeniería de Sistemas y Automática, Universidad Carlos III de Madrid, Butarque 15, Leganés, 28911 Madrid, Spain
| | - David Martin Gomez
- Departamento de Ingeniería de Sistemas y Automática, Universidad Carlos III de Madrid, Butarque 15, Leganés, 28911 Madrid, Spain
| | - Lucio Marcenaro
- Department of Electrical, Electronic, Telecommunications Engineering and Naval Architecture (DITEN), University of Genova, Via Opera Pia 11a, I-16145 Genoa, Italy
| | - Carlo Regazzoni
- Department of Electrical, Electronic, Telecommunications Engineering and Naval Architecture (DITEN), University of Genova, Via Opera Pia 11a, I-16145 Genoa, Italy
| |
Collapse
|
6
|
Portela D, Amaral R, Rodrigues PP, Freitas A, Costa E, Fonseca JA, Sousa-Pinto B. Unsupervised algorithms to identify potential under-coding of secondary diagnoses in hospitalisations databases in Portugal. HEALTH INF MANAG J 2023:18333583221144663. [PMID: 36802958 DOI: 10.1177/18333583221144663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2023]
Abstract
BACKGROUND Quantifying and dealing with lack of consistency in administrative databases (namely, under-coding) requires tracking patients longitudinally without compromising anonymity, which is often a challenging task. OBJECTIVE This study aimed to (i) assess and compare different hierarchical clustering methods on the identification of individual patients in an administrative database that does not easily allow tracking of episodes from the same patient; (ii) quantify the frequency of potential under-coding; and (iii) identify factors associated with such phenomena. METHOD We analysed the Portuguese National Hospital Morbidity Dataset, an administrative database registering all hospitalisations occurring in Mainland Portugal between 2011-2015. We applied different approaches of hierarchical clustering methods (either isolated or combined with partitional clustering methods), to identify potential individual patients based on demographic variables and comorbidities. Diagnoses codes were grouped into the Charlson an Elixhauser comorbidity defined groups. The algorithm displaying the best performance was used to quantify potential under-coding. A generalised mixed model (GML) of binomial regression was applied to assess factors associated with such potential under-coding. RESULTS We observed that the hierarchical cluster analysis (HCA) + k-means clustering method with comorbidities grouped according to the Charlson defined groups was the algorithm displaying the best performance (with a Rand Index of 0.99997). We identified potential under-coding in all Charlson comorbidity groups, ranging from 3.5% (overall diabetes) to 27.7% (asthma). Overall, being male, having medical admission, dying during hospitalisation or being admitted at more specific and complex hospitals were associated with increased odds of potential under-coding. DISCUSSION We assessed several approaches to identify individual patients in an administrative database and, subsequently, by applying HCA + k-means algorithm, we tracked coding inconsistency and potentially improved data quality. We reported consistent potential under-coding in all defined groups of comorbidities and potential factors associated with such lack of completeness. CONCLUSION Our proposed methodological framework could both enhance data quality and act as a reference for other studies relying on databases with similar problems.
Collapse
Affiliation(s)
- Diana Portela
- Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine, 26706University of Porto, Portugal
- ACES Entre o Douro e Vouga I - Feira/Arouca, Portugal
- Center for Health Technology and Services Research (CINTESIS), Faculty of Medicine, 26706University of Porto, Portugal
| | - Rita Amaral
- Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine, 26706University of Porto, Portugal
- Center for Health Technology and Services Research (CINTESIS), Faculty of Medicine, 26706University of Porto, Portugal
- ESS, IPP - Porto Health School, Polytechnic Institute of Porto, Portugal
| | - Pedro P Rodrigues
- Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine, 26706University of Porto, Portugal
- Center for Health Technology and Services Research (CINTESIS), Faculty of Medicine, 26706University of Porto, Portugal
| | - Alberto Freitas
- Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine, 26706University of Porto, Portugal
- Center for Health Technology and Services Research (CINTESIS), Faculty of Medicine, 26706University of Porto, Portugal
| | - Elísio Costa
- Center for Health Technology and Services Research (CINTESIS), Faculty of Medicine, 26706University of Porto, Portugal
- Research Unit on Applied Molecular Biosciences (UCIBIO-REQUIMTE), Faculty of Pharmacy, 26706University of Porto, Portugal
| | - João A Fonseca
- Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine, 26706University of Porto, Portugal
- Center for Health Technology and Services Research (CINTESIS), Faculty of Medicine, 26706University of Porto, Portugal
| | - Bernardo Sousa-Pinto
- Department of Community Medicine, Information and Health Decision Sciences (MEDCIDS), Faculty of Medicine, 26706University of Porto, Portugal
- Center for Health Technology and Services Research (CINTESIS), Faculty of Medicine, 26706University of Porto, Portugal
| |
Collapse
|
7
|
Rojas-Torres IL, Ahmad M, Martín Álvarez JM, Golpe AA, Gil Herrera RDJ. Mental health, suicide attempt, and family function for adolescents' primary health care during the COVID-19 pandemic. F1000Res 2022; 11:529. [PMID: 36545375 PMCID: PMC9751494 DOI: 10.12688/f1000research.109603.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/20/2022] [Indexed: 01/13/2023] Open
Abstract
Background: The study's purpose was to identify associations between mental health risk, suicide attempts, and family function. Methods: A correlational, descriptive, and cross-sectional study was carried out in a group of adolescents in the last grade of secondary school to establish the association between mental health risk, suicide attempt, and family functionality. The instruments used were the self-report questionnaire, the suicide risk assessment scale, and the family APGAR. Data analysis was performed using the artificial intelligence algorithm (gower clustering). Results: 246 adolescents responded to the three instruments, which made it possible to select those with correlations of sensitive interest and, based on these, an intervention plan. Psychological distress was found in 28%, psychotic symptoms in 85%, and problematic alcohol use in 9%. Good family functioning was identified in 34% and some type of family dysfunction in 66%. In terms of suicide risk, there was a low suicide risk of 74%, 24% medium risk, and 2% high risk. It could be shown that there is a correlation in a group of 15% of the respondents. Conclusions: The risk of suffering mental health deterioration and the suicide risk, during this pandemic period, seems to be related to family functionality.
Collapse
Affiliation(s)
- Indiana-Luz Rojas-Torres
- Universidad Simón Bolívar, Facultad Ciencias de la Salud, Barranquilla, Colombia,Universidad Americana de Europa (UNADE), Cancún, Mexico,
| | - Mostapha Ahmad
- Universidad Simón Bolívar, Facultad Ciencias de la Salud, Barranquilla, Colombia
| | | | | | | |
Collapse
|
8
|
Romeo-Arroyo E, Soria J, Mora M, Laport F, Moreno-Fernandez-de-Leceta A, Vázquez-Araújo L. Exploratory Research on Sweetness Perception: Decision Trees to Study Electroencephalographic Data and Its Relationship with the Explicit Response to Sweet Odor, Taste, and Flavor. Sensors (Basel) 2022; 22:6787. [PMID: 36146136 PMCID: PMC9504051 DOI: 10.3390/s22186787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 09/04/2022] [Accepted: 09/06/2022] [Indexed: 06/16/2023]
Abstract
Using implicit responses to determine consumers' response to different stimuli is becoming a popular approach, but research is still needed to understand the outputs of the different technologies used to collect data. During the present research, electroencephalography (EEG) responses and self-reported liking and emotions were collected on different stimuli (odor, taste, flavor samples) to better understand sweetness perception. Artificial intelligence analytics were used to classify the implicit responses, identifying decision trees to discriminate the stimuli by activated sensory system (odor/taste/flavor) and by nature of the stimuli ('sweet' vs. 'non-sweet' odors; 'sweet-taste', 'sweet-flavor', and 'non-sweet flavor'; and 'sweet stimuli' vs. 'non-sweet stimuli'). Significant differences were found among self-reported-liking of the stimuli and the emotions elicited by the stimuli, but no clear relationship was identified between explicit and implicit data. The present research sums interesting data for the EEG-linked research as well as for EEG data analysis, although much is still unknown about how to properly exploit implicit measurement technologies and their data.
Collapse
Affiliation(s)
- Elena Romeo-Arroyo
- BCC Innovation, Technology Center in Gastronomy, Basque Culinary Center, 20009 Donostia-San Sebastián, Spain
- Basque Culinary Center, Faculty of Gastronomic Sciences, Mondragon Unibertsitatea, 20009 Donostia-San Sebastián, Spain
| | - Javier Soria
- i3B, Ibermática Institute of Innovation, Gipuzkoa Technology Park, Paseo Mikeletegi, 5, 20009 Donostia-San Sebastián, Spain
| | - María Mora
- BCC Innovation, Technology Center in Gastronomy, Basque Culinary Center, 20009 Donostia-San Sebastián, Spain
- Basque Culinary Center, Faculty of Gastronomic Sciences, Mondragon Unibertsitatea, 20009 Donostia-San Sebastián, Spain
| | - Francisco Laport
- CITIC Research Center, University of A Coruña, 15008 A Coruña, Spain
| | | | - Laura Vázquez-Araújo
- BCC Innovation, Technology Center in Gastronomy, Basque Culinary Center, 20009 Donostia-San Sebastián, Spain
- Basque Culinary Center, Faculty of Gastronomic Sciences, Mondragon Unibertsitatea, 20009 Donostia-San Sebastián, Spain
| |
Collapse
|
9
|
Pan D, Fan J, Nie Z, Sun Z, Zhang J, Tong Y, He B, Song C, Kohmura Y, Yabashi M, Ishikawa T, Shen Y, Jiang H. Quantitative analysis of the effect of radiation on mitochondria structure using coherent diffraction imaging with a clustering algorithm. IUCrJ 2022; 9:223-230. [PMID: 35371506 PMCID: PMC8895015 DOI: 10.1107/s2052252521012963] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Accepted: 12/07/2021] [Indexed: 06/14/2023]
Abstract
Radiation damage and a low signal-to-noise ratio are the primary factors that limit spatial resolution in coherent diffraction imaging (CDI) of biomaterials using X-ray sources. Introduced here is a clustering algorithm named ConvRe based on deep learning, and it is applied to obtain accurate and consistent image reconstruction from noisy diffraction patterns of weakly scattering biomaterials. To investigate the impact of X-ray radiation on soft biomaterials, CDI experiments were performed on mitochondria from human embryonic kidney cells using synchrotron radiation. Benefiting from the new algorithm, structural changes in the mitochondria induced by X-ray radiation damage were quantitatively characterized and analysed at the nanoscale with different radiation doses. This work also provides a promising approach for improving the imaging quality of biomaterials with XFEL-based plane-wave CDI.
Collapse
Affiliation(s)
- Dan Pan
- School of Physical Science and Technology and Center for Transformative Science, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, People’s Republic of China
| | - Jiadong Fan
- School of Physical Science and Technology and Center for Transformative Science, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, People’s Republic of China
| | - Zhenzhen Nie
- State Key Laboratory of Medicinal Chemical Biology and College of Life Sciences, Nankai University, 94 Weijin Road, Tianjin 300071, People’s Republic of China
| | - Zhibin Sun
- School of Physical Science and Technology and Center for Transformative Science, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, People’s Republic of China
- Photon Science Division, Paul Scherrer Institute, 5232 Villigen, Switzerland
| | - Jianhua Zhang
- School of Physical Science and Technology and Center for Transformative Science, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, People’s Republic of China
| | - Yajun Tong
- School of Physical Science and Technology and Center for Transformative Science, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, People’s Republic of China
| | - Bo He
- School of Physical Science and Technology and Center for Transformative Science, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, People’s Republic of China
| | - Changyong Song
- Department of Physics, Pohang University of Science and Technology, Pohang 37673, South Korea
| | - Yoshiki Kohmura
- SPring-8 Center, RIKEN, 1-1-1, Kouto, Sayo, Hyogo 679-5148, Japan
| | - Makina Yabashi
- SPring-8 Center, RIKEN, 1-1-1, Kouto, Sayo, Hyogo 679-5148, Japan
| | - Tetsuya Ishikawa
- SPring-8 Center, RIKEN, 1-1-1, Kouto, Sayo, Hyogo 679-5148, Japan
| | - Yuequan Shen
- State Key Laboratory of Medicinal Chemical Biology and College of Life Sciences, Nankai University, 94 Weijin Road, Tianjin 300071, People’s Republic of China
| | - Huaidong Jiang
- School of Physical Science and Technology and Center for Transformative Science, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, People’s Republic of China
| |
Collapse
|
10
|
Ionita M, Schretzenmair R, Jones D, Moore J, Wang LS, Rogers W. Tailor: Targeting heavy tails in flow cytometry data with fast, interpretable mixture modeling. Cytometry A 2021; 99:133-144. [PMID: 33476090 DOI: 10.1002/cyto.a.24307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 12/22/2020] [Accepted: 01/05/2021] [Indexed: 11/11/2022]
Abstract
Automated clustering workflows are increasingly used for the analysis of high parameter flow cytometry data. This trend calls for algorithms which are able to quickly process tens of millions of data points, to compare results across subjects or time points, and to provide easily actionable interpretations of the results. To this end, we created Tailor, a model-based clustering algorithm specialized for flow cytometry data. Our approach leverages a phenotype-aware binning scheme to provide a coarse model of the data, which is then refined using a multivariate Gaussian mixture model. We benchmark Tailor using a simulation study and two flow cytometry data sets, and show that the results are robust to moderate departures from normality and inter-sample variation. Moreover, Tailor provides automated, non-overlapping annotations of its clusters, which facilitates interpretation of results and downstream analysis. Tailor is released as an R package, and the source code is publicly available at www.github.com/matei-ionita/Tailor.
Collapse
Affiliation(s)
- Matei Ionita
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Richard Schretzenmair
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Derek Jones
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Jonni Moore
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Wade Rogers
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA.,Corporate/Research, Still Pond Cytomics, West Chester, PA, USA
| |
Collapse
|
11
|
Lapegna M, Balzano W, Meyer N, Romano D. Clustering Algorithms on Low-Power and High-Performance Devices for Edge Computing Environments. Sensors (Basel) 2021; 21:5395. [PMID: 34450837 DOI: 10.3390/s21165395] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 07/28/2021] [Accepted: 08/07/2021] [Indexed: 11/16/2022]
Abstract
The synergy between Artificial Intelligence and the Edge Computing paradigm promises to transfer decision-making processes to the periphery of sensor networks without the involvement of central data servers. For this reason, we recently witnessed an impetuous development of devices that integrate sensors and computing resources in a single board to process data directly on the collection place. Due to the particular context where they are used, the main feature of these boards is the reduced energy consumption, even if they do not exhibit absolute computing powers comparable to modern high-end CPUs. Among the most popular Artificial Intelligence techniques, clustering algorithms are practical tools for discovering correlations or affinities within data collected in large datasets, but a parallel implementation is an essential requirement because of their high computational cost. Therefore, in the present work, we investigate how to implement clustering algorithms on parallel and low-energy devices for edge computing environments. In particular, we present the experiments related to two devices with different features: the quad-core UDOO X86 Advanced+ board and the GPU-based NVIDIA Jetson Nano board, evaluating them from the performance and the energy consumption points of view. The experiments show that they realize a more favorable trade-off between these two requirements than other high-end computing devices.
Collapse
|
12
|
Shor O, Benninger F, Khrennikov A. Dendrogramic Representation of Data: CHSH Violation vs. Nonergodicity. Entropy (Basel) 2021; 23:e23080971. [PMID: 34441111 PMCID: PMC8392696 DOI: 10.3390/e23080971] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 07/21/2021] [Accepted: 07/23/2021] [Indexed: 11/23/2022]
Abstract
This paper is devoted to the foundational problems of dendrogramic holographic theory (DH theory). We used the ontic–epistemic (implicate–explicate order) methodology. The epistemic counterpart is based on the representation of data by dendrograms constructed with hierarchic clustering algorithms. The ontic universe is described as a p-adic tree; it is zero-dimensional, totally disconnected, disordered, and bounded (in p-adic ultrametric spaces). Classical–quantum interrelations lose their sharpness; generally, simple dendrograms are “more quantum” than complex ones. We used the CHSH inequality as a measure of quantum-likeness. We demonstrate that it can be violated by classical experimental data represented by dendrograms. The seed of this violation is neither nonlocality nor a rejection of realism, but the nonergodicity of dendrogramic time series. Generally, the violation of ergodicity is one of the basic features of DH theory. The dendrogramic representation leads to the local realistic model that violates the CHSH inequality. We also considered DH theory for Minkowski geometry and monitored the dependence of CHSH violation and nonergodicity on geometry, as well as a Lorentz transformation of data.
Collapse
Affiliation(s)
- Oded Shor
- Felsenstein Medical Research Center, Beilinson Hospital, Petach Tikva 4941492, Israel; (O.S.); (F.B.)
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 6997801, Israel
| | - Felix Benninger
- Felsenstein Medical Research Center, Beilinson Hospital, Petach Tikva 4941492, Israel; (O.S.); (F.B.)
- Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv 6997801, Israel
- Department of Neurology, Rabin Medical Center, Petach Tikva 4941492, Israel
| | - Andrei Khrennikov
- Department of Mathematics, Faculty of Technology, Linnaeus University, 35195 Växjö, Sweden
- Correspondence:
| |
Collapse
|
13
|
Liu F, Zhou Z, Cai M, Wen Y, Zhang J. AGNEP: An Agglomerative Nesting Clustering Algorithm for Phenotypic Dimension Reduction in Joint Analysis of Multiple Phenotypes. Front Genet 2021; 12:648831. [PMID: 33981331 PMCID: PMC8107386 DOI: 10.3389/fgene.2021.648831] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2021] [Accepted: 04/01/2021] [Indexed: 11/17/2022] Open
Abstract
Genome-wide association study (GWAS) has identified thousands of genetic variants associated with complex traits and diseases. Compared with analyzing a single phenotype at a time, the joint analysis of multiple phenotypes can improve statistical power by taking into account the information from phenotypes. However, most established joint algorithms ignore the different level of correlations between multiple phenotypes; instead of that, they simultaneously analyze all phenotypes in a genetic model. Thus, they may fail to capture the genetic structure of phenotypes and consequently reduce the statistical power. In this study, we develop a novel method agglomerative nesting clustering algorithm for phenotypic dimension reduction analysis (AGNEP) to jointly analyze multiple phenotypes for GWAS. First, AGNEP uses an agglomerative nesting clustering algorithm to group correlated phenotypes and then applies principal component analysis (PCA) to generate representative phenotypes for each group. Finally, multivariate analysis is employed to test associations between genetic variants and the representative phenotypes rather than all phenotypes. We perform three simulation experiments with various genetic structures and a real dataset analysis for 19 Arabidopsis phenotypes. Compared to established methods, AGNEP is more powerful in terms of statistical power, computing time, and the number of quantitative trait nucleotides (QTNs). The analysis of the Arabidopsis real dataset further illustrates the efficiency of AGNEP for detecting QTNs, which are confirmed by The Arabidopsis Information Resource gene bank.
Collapse
Affiliation(s)
- Fengrong Liu
- College of Science, Nanjing Agricultural University, Nanjing, China.,School of Data Science, University of Science and Technology of China, Hefei, China
| | - Ziyang Zhou
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Mingzhi Cai
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Yangjun Wen
- College of Science, Nanjing Agricultural University, Nanjing, China
| | - Jin Zhang
- College of Science, Nanjing Agricultural University, Nanjing, China.,Postdoctoral Research Station of Crop Science, Nanjing Agricultural University, Nanjing, China
| |
Collapse
|
14
|
Alsuhaim AF, Azmi AM, Hussain M. Improving the Retrieval of Arabic Web Search Results Using Enhanced k-Means Clustering Algorithm. Entropy (Basel) 2021; 23:449. [PMID: 33920374 DOI: 10.3390/e23040449] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 04/02/2021] [Accepted: 04/07/2021] [Indexed: 11/30/2022]
Abstract
Traditional information retrieval systems return a ranked list of results to a user’s query. This list is often long, and the user cannot explore all the results retrieved. It is also ineffective for a highly ambiguous language such as Arabic. The modern writing style of Arabic excludes the diacritical marking, without which Arabic words become ambiguous. For a search query, the user has to skim over the document to infer if the word has the same meaning they are after, which is a time-consuming task. It is hoped that clustering the retrieved documents will collate documents into clear and meaningful groups. In this paper, we use an enhanced k-means clustering algorithm, which yields a faster clustering time than the regular k-means. The algorithm uses the distance calculated from previous iterations to minimize the number of distance calculations. We propose a system to cluster Arabic search results using the enhanced k-means algorithm, labeling each cluster with the most frequent word in the cluster. This system will help Arabic web users identify each cluster’s topic and go directly to the required cluster. Experimentally, the enhanced k-means algorithm reduced the execution time by 60% for the stemmed dataset and 47% for the non-stemmed dataset when compared to the regular k-means, while slightly improving the purity.
Collapse
|
15
|
Cecilia JM, Cano JC, Morales-García J, Llanes A, Imbernón B. Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms. Sensors (Basel) 2020; 20:s20216335. [PMID: 33172017 PMCID: PMC7664181 DOI: 10.3390/s20216335] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 10/30/2020] [Accepted: 11/03/2020] [Indexed: 11/16/2022]
Abstract
Internet of Things (IoT) is becoming a new socioeconomic revolution in which data and immediacy are the main ingredients. IoT generates large datasets on a daily basis but it is currently considered as "dark data", i.e., data generated but never analyzed. The efficient analysis of this data is mandatory to create intelligent applications for the next generation of IoT applications that benefits society. Artificial Intelligence (AI) techniques are very well suited to identifying hidden patterns and correlations in this data deluge. In particular, clustering algorithms are of the utmost importance for performing exploratory data analysis to identify a set (a.k.a., cluster) of similar objects. Clustering algorithms are computationally heavy workloads and require to be executed on high-performance computing clusters, especially to deal with large datasets. This execution on HPC infrastructures is an energy hungry procedure with additional issues, such as high-latency communications or privacy. Edge computing is a paradigm to enable light-weight computations at the edge of the network that has been proposed recently to solve these issues. In this paper, we provide an in-depth analysis of emergent edge computing architectures that include low-power Graphics Processing Units (GPUs) to speed-up these workloads. Our analysis includes performance and power consumption figures of the latest Nvidia's AGX Xavier to compare the energy-performance ratio of these low-cost platforms with a high-performance cloud-based counterpart version. Three different clustering algorithms (i.e., k-means, Fuzzy Minimals (FM), and Fuzzy C-Means (FCM)) are designed to be optimally executed on edge and cloud platforms, showing a speed-up factor of up to 11× for the GPU code compared to sequential counterpart versions in the edge platforms and energy savings of up to 150% between the edge computing and HPC platforms.
Collapse
Affiliation(s)
- José M. Cecilia
- Computer Engineering Department (DISCA), Universitat Politécnica de Valencia (UPV), 46022 Valencia, Spain;
- Correspondence:
| | - Juan-Carlos Cano
- Computer Engineering Department (DISCA), Universitat Politécnica de Valencia (UPV), 46022 Valencia, Spain;
| | - Juan Morales-García
- Computer Science Department, Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.M.-G.); (A.L.); (B.I.)
| | - Antonio Llanes
- Computer Science Department, Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.M.-G.); (A.L.); (B.I.)
| | - Baldomero Imbernón
- Computer Science Department, Universidad Católica de Murcia (UCAM), 30107 Murcia, Spain; (J.M.-G.); (A.L.); (B.I.)
| |
Collapse
|
16
|
Gakne PV, O'Keefe K. Tightly-Coupled GNSS/Vision Using a Sky-Pointing Camera for Vehicle Navigation in Urban Areas. Sensors (Basel) 2018; 18:E1244. [PMID: 29673230 DOI: 10.3390/s18041244] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Revised: 04/03/2018] [Accepted: 04/09/2018] [Indexed: 11/16/2022]
Abstract
This paper presents a method of fusing the ego-motion of a robot or a land vehicle estimated from an upward-facing camera with Global Navigation Satellite System (GNSS) signals for navigation purposes in urban environments. A sky-pointing camera is mounted on the top of a car and synchronized with a GNSS receiver. The advantages of this configuration are two-fold: firstly, for the GNSS signals, the upward-facing camera will be used to classify the acquired images into sky and non-sky (also known as segmentation). A satellite falling into the non-sky areas (e.g., buildings, trees) will be rejected and not considered for the final position solution computation. Secondly, the sky-pointing camera (with a field of view of about 90 degrees) is helpful for urban area ego-motion estimation in the sense that it does not see most of the moving objects (e.g., pedestrians, cars) and thus is able to estimate the ego-motion with fewer outliers than is typical with a forward-facing camera. The GNSS and visual information systems are tightly-coupled in a Kalman filter for the final position solution. Experimental results demonstrate the ability of the system to provide satisfactory navigation solutions and better accuracy than the GNSS-only and the loosely-coupled GNSS/vision, 20 percent and 82 percent (in the worst case) respectively, in a deep urban canyon, even in conditions with fewer than four GNSS satellites.
Collapse
|
17
|
Cha J, Jo HJ, Gibson WS, Lee JM. Functional organization of the human posterior cingulate cortex, revealed by multiple connectivity-based parcellation methods. Hum Brain Mapp 2017; 38:2808-2818. [PMID: 28294456 DOI: 10.1002/hbm.23570] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2015] [Revised: 02/26/2017] [Accepted: 03/06/2017] [Indexed: 11/12/2022] Open
Abstract
Based on cytoarchitecture, the posterior cingulate cortex (PCC) is thought to be comprised of two distinct functional subregions: the dorsal and ventral PCC (dPCC and vPCC). However, functional subregions do not completely match anatomical boundaries in the human brain. To understand the relationship between the functional organization of regions and anatomical features, it is necessary to apply parcellation algorithms based on functional properties. We therefore defined functionally informed subregions in the human PCC by parcellation of regions with similar patterns of functional connectivity in the resting brain. We used various patterns of functional connectivity, namely local, whole-brain and diffuse functional connections of the PCC, and various clustering methods, namely hierarchical, spectral, and k-means clustering to investigate the subregions of the PCC. Overall, the approximate anatomical boundaries and predicted functional regions were highly overlapped to each other. Using hierarchical clustering, the PCC could be clearly separated into two anatomical subregions, namely the dPCC and vPCC, and further divided into four subregions segregated by local functional connectivity patterns. We show that the PCC could be separated into two (dPCC and vPCC) or four subregions based on local functional connections and hierarchical clustering, and that subregions of PCC display differential global functional connectivity, particularly along the dorsal-ventral axis. These results suggest that differences in functional connectivity between dPCC and vPCC may be due to differences in local connectivity between these functionally hierarchical subregions of the PCC. Hum Brain Mapp 38:2808-2818, 2017. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Jungho Cha
- Department of Biomedical Engineering, Hanyang University, Seoul, South Korea
| | - Hang Joon Jo
- Department of Neurologic Surgery, Mayo Clinic, Rochester, Minnesota
| | - William S Gibson
- Department of Neurologic Surgery, Mayo Clinic, Rochester, Minnesota
| | - Jong-Min Lee
- Department of Biomedical Engineering, Hanyang University, Seoul, South Korea
| |
Collapse
|
18
|
Parks DR, Khettabi FE, Chase E, Hoffman RA, Perfetto SP, Spidlen J, Wood JC, Moore WA, Brinkman RR. Evaluating flow cytometer performance with weighted quadratic least squares analysis of LED and multi-level bead data. Cytometry A 2017; 91:232-249. [PMID: 28160404 PMCID: PMC5483398 DOI: 10.1002/cyto.a.23052] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2016] [Revised: 11/23/2016] [Accepted: 12/28/2016] [Indexed: 11/06/2022]
Abstract
We developed a fully automated procedure for analyzing data from LED pulses and multilevel bead sets to evaluate backgrounds and photoelectron scales of cytometer fluorescence channels. The method improves on previous formulations by fitting a full quadratic model with appropriate weighting and by providing standard errors and peak residuals as well as the fitted parameters themselves. Here we describe the details of the methods and procedures involved and present a set of illustrations and test cases that demonstrate the consistency and reliability of the results. The automated analysis and fitting procedure is generally quite successful in providing good estimates of the Spe (statistical photoelectron) scales and backgrounds for all the fluorescence channels on instruments with good linearity. The precision of the results obtained from LED data is almost always better than that from multilevel bead data, but the bead procedure is easy to carry out and provides results good enough for most purposes. Including standard errors on the fitted parameters is important for understanding the uncertainty in the values of interest. The weighted residuals give information about how well the data fits the model, and particularly high residuals indicate bad data points. Known photoelectron scales and measurement channel backgrounds make it possible to estimate the precision of measurements at different signal levels and the effects of compensated spectral overlap on measurement quality. Combining this information with measurements of standard samples carrying dyes of biological interest, we can make accurate comparisons of dye sensitivity among different instruments. Our method is freely available through the R/Bioconductor package flowQB. © 2017 International Society for Advancement of Cytometry.
Collapse
Affiliation(s)
- David R. Parks
- Shared FACS Facility and Department of Genetics, Stanford University, Stanford, CA, USA
| | | | - Eric Chase
- Cytek Biosciences, Inc., Fremont, CA, USA
| | | | | | | | - James C.S. Wood
- Wake Forest University Baptist Medical Center, Comprehensive Cancer Center and Department of Cancer Biology, Winston-Salem, NC, US
| | - Wayne A. Moore
- Shared FACS Facility and Department of Genetics, Stanford University, Stanford, CA, USA
| | | |
Collapse
|
19
|
Barley AJ, Thomson RC. Assessing the performance of DNA barcoding using posterior predictive simulations. Mol Ecol 2016; 25:1944-57. [PMID: 26915049 DOI: 10.1111/mec.13590] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2015] [Revised: 01/05/2016] [Accepted: 01/18/2016] [Indexed: 02/05/2023]
Abstract
Accurate estimates of biodiversity are required for research in a broad array of biological subdisciplines including ecology, evolution, systematics, conservation and biodiversity science. The use of statistical models and genetic data, particularly DNA barcoding, has been suggested as an important tool for remedying the large gaps in our current understanding of biodiversity. However, the reliability of biodiversity estimates obtained using these approaches depends on how well the statistical models that are used describe the evolutionary process underlying the genetic data. In this study, we utilize data from the Barcode of Life Database and posterior predictive simulations to assess the performance of DNA barcoding under commonly used substitution models. We demonstrate that the success of DNA barcoding varies widely across DNA substitution models and that model choice has a substantial impact on the number of operational taxonomic units identified (changing results by ~4-31%). Additionally, we demonstrate that the widely followed practice of a priori assuming the Kimura 2-parameter model for DNA barcoding is statistically unjustified and should be avoided. Using both data-based and inference-based test statistics, we detect variation in model performance across taxonomic groups, clustering algorithms, genetic divergence thresholds and substitution models. Taken together, these results illustrate the importance of considering both model selection and model adequacy in studies quantifying biodiversity.
Collapse
Affiliation(s)
- Anthony J Barley
- Department of Biology, University of Hawai'i at Mānoa, Honolulu, HI, 96822, USA
| | - Robert C Thomson
- Department of Biology, University of Hawai'i at Mānoa, Honolulu, HI, 96822, USA
| |
Collapse
|
20
|
Burton JN, Liachko I, Dunham MJ, Shendure J. Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 (Bethesda) 2014; 4:1339-46. [PMID: 24855317 DOI: 10.1534/g3.114.011825] [Citation(s) in RCA: 119] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Microbial communities consist of mixed populations of organisms, including unknown species in unknown abundances. These communities are often studied through metagenomic shotgun sequencing, but standard library construction methods remove long-range contiguity information; thus, shotgun sequencing and de novo assembly of a metagenome typically yield a collection of contigs that cannot readily be grouped by species. Methods for generating chromatin-level contact probability maps, e.g., as generated by the Hi-C method, provide a signal of contiguity that is completely intracellular and contains both intrachromosomal and interchromosomal information. Here, we demonstrate how this signal can be exploited to reconstruct the individual genomes of microbial species present within a mixed sample. We apply this approach to two synthetic metagenome samples, successfully clustering the genome content of fungal, bacterial, and archaeal species with more than 99% agreement with published reference genomes. We also show that the Hi-C signal can secondarily be used to create scaffolded genome assemblies of individual eukaryotic species present within the microbial community, with higher levels of contiguity than some of the species’ published reference genomes.
Collapse
|
21
|
Garyfallidis E, Brett M, Correia MM, Williams GB, Nimmo-Smith I. QuickBundles, a Method for Tractography Simplification. Front Neurosci 2012; 6:175. [PMID: 23248578 PMCID: PMC3518823 DOI: 10.3389/fnins.2012.00175] [Citation(s) in RCA: 138] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2012] [Accepted: 11/20/2012] [Indexed: 11/13/2022] Open
Abstract
Diffusion MR data sets produce large numbers of streamlines which are hard to visualize, interact with, and interpret in a clinically acceptable time scale, despite numerous proposed approaches. As a solution we present a simple, compact, tailor-made clustering algorithm, QuickBundles (QB), that overcomes the complexity of these large data sets and provides informative clusters in seconds. Each QB cluster can be represented by a single centroid streamline; collectively these centroid streamlines can be taken as an effective representation of the tractography. We provide a number of tests to show how the QB reduction has good consistency and robustness. We show how the QB reduction can help in the search for similarities across several subjects.
Collapse
Affiliation(s)
- Eleftherios Garyfallidis
- Wolfson College, University of Cambridge Cambridge, UK ; Medical Research Council Cognition and Brain Sciences Unit Cambridge, UK
| | | | | | | | | |
Collapse
|
22
|
Banković Z, Fraga D, Moya JM, Vallejo JC. Detecting unknown attacks in wireless sensor networks that contain mobile nodes. Sensors (Basel) 2012; 12:10834-50. [PMID: 23112632 DOI: 10.3390/s120810834] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2012] [Revised: 07/28/2012] [Accepted: 07/31/2012] [Indexed: 11/16/2022]
Abstract
As wireless sensor networks are usually deployed in unattended areas, security policies cannot be updated in a timely fashion upon identification of new attacks. This gives enough time for attackers to cause significant damage. Thus, it is of great importance to provide protection from unknown attacks. However, existing solutions are mostly concentrated on known attacks. On the other hand, mobility can make the sensor network more resilient to failures, reactive to events, and able to support disparate missions with a common set of sensors, yet the problem of security becomes more complicated. In order to address the issue of security in networks with mobile nodes, we propose a machine learning solution for anomaly detection along with the feature extraction process that tries to detect temporal and spatial inconsistencies in the sequences of sensed values and the routing paths used to forward these values to the base station. We also propose a special way to treat mobile nodes, which is the main novelty of this work. The data produced in the presence of an attacker are treated as outliers, and detected using clustering techniques. These techniques are further coupled with a reputation system, in this way isolating compromised nodes in timely fashion. The proposal exhibits good performances at detecting and confining previously unseen attacks, including the cases when mobile nodes are compromised.
Collapse
|
23
|
Brookings T, Grashow R, Marder E. Statistics of neuronal identification with open- and closed-loop measures of intrinsic excitability. Front Neural Circuits 2012; 6:19. [PMID: 22557947 PMCID: PMC3338007 DOI: 10.3389/fncir.2012.00019] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2012] [Accepted: 04/06/2012] [Indexed: 11/13/2022] Open
Abstract
In complex nervous systems patterns of neuronal activity and measures of intrinsic neuronal excitability are often used as criteria for identifying and/or classifying neurons. We asked how well identification of neurons by conventional measures of intrinsic excitability compares with a measure of neuronal excitability derived from a neuron’s behavior in a dynamic clamp constructed two-cell network. We used four cell types from the crab stomatogastric ganglion: the pyloric dilator, lateral pyloric, gastric mill, and dorsal gastric neurons. Each neuron was evaluated for six conventional measures of intrinsic excitability (intrinsic properties, IPs). Additionally, each neuron was coupled by reciprocal inhibitory synapses made with the dynamic clamp to a Morris–Lecar model neuron and the resulting network was assayed for four measures of network activity (network activity properties, NAPs). We searched for linear combinations of IPs that correlated with each NAP, and combinations of NAPs that correlated with each IP. In the process we developed a method to correct for multiple correlations while searching for correlating features. When properly controlled for multiple correlations, four of the IPs were correlated with NAPs, and all four NAPs were correlated with IPs. Neurons were classified into cell types by training a linear classifier on sets of properties, or using k-medoids clustering. The IPs were modestly successful in classifying the neurons, and the NAPs were more successful. Combining the two measures did better than either measure alone, but not well enough to classify neurons with perfect accuracy, thus reiterating that electrophysiological measures of single-cell properties alone are not sufficient for reliable cell identification.
Collapse
Affiliation(s)
- Ted Brookings
- Volen Center and Biology Department, Brandeis University Waltham, MA, USA
| | | | | |
Collapse
|