51
|
Single-Trial Kernel-Based Functional Connectivity for Enhanced Feature Extraction in Motor-Related Tasks. SENSORS 2021; 21:s21082750. [PMID: 33924672 PMCID: PMC8069819 DOI: 10.3390/s21082750] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/01/2021] [Accepted: 04/08/2021] [Indexed: 02/06/2023]
Abstract
Motor learning is associated with functional brain plasticity, involving specific functional connectivity changes in the neural networks. However, the degree of learning new motor skills varies among individuals, which is mainly due to the between-subject variability in brain structure and function captured by electroencephalographic (EEG) recordings. Here, we propose a kernel-based functional connectivity measure to deal with inter/intra-subject variability in motor-related tasks. To this end, from spatio-temporal-frequency patterns, we extract the functional connectivity between EEG channels through their Gaussian kernel cross-spectral distribution. Further, we optimize the spectral combination weights within a sparse-based ℓ2-norm feature selection framework matching the motor-related labels that perform the dimensionality reduction of the extracted connectivity features. From the validation results in three databases with motor imagery and motor execution tasks, we conclude that the single-trial Gaussian functional connectivity measure provides very competitive classifier performance values, being less affected by feature extraction parameters, like the sliding time window, and avoiding the use of prior linear spatial filtering. We also provide interpretability for the clustered functional connectivity patterns and hypothesize that the proposed kernel-based metric is promising for evaluating motor skills.
Collapse
|
52
|
Wang P, Zhang G, Li Y, Oad A, Huang G. Stochastic Neighbor Embedding Algorithm and its Application in Molecular Biological Data. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200414093636] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
With the advent of the era of big data, the numbers and the dimensions of data are
increasingly becoming larger. It is very critical to reduce dimensions or visualize data and then
uncover the hidden patterns of characteristics or the mechanism underlying data. Stochastic
Neighbor Embedding (SNE) has been developed for data visualization over the last ten years. Due
to its efficiency in the visualization of data, SNE has been applied to a wide range of fields. We
briefly reviewed the SNE algorithm and its variants, summarizing application of it in visualizing
single-cell sequencing data, single nucleotide polymorphisms, and mass spectrometry imaging
data. We also discussed the strength and the weakness of the SNE, with a special emphasis on how
to set parameters to promote quality of visualization, and finally indicated potential development
of SNE in the coming future.
Collapse
Affiliation(s)
- Pan Wang
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang 422000, China
| | - Guiyang Zhang
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang 422000, China
| | - You Li
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang 422000, China
| | - Ammar Oad
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang 422000, China
| | - Guohua Huang
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang 422000, China
| |
Collapse
|
53
|
Automatic Image-Based Event Detection for Large-N Seismic Arrays Using a Convolutional Neural Network. REMOTE SENSING 2021. [DOI: 10.3390/rs13030389] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Passive seismic experiments have been proposed as a cost-effective and non-invasive alternative to controlled-source seismology, allowing body–wave reflections based on seismic interferometry principles to be retrieved. However, from the huge volume of the recorded ambient noise, only selected time periods (noise panels) are contributing constructively to the retrieval of reflections. We address the issue of automatic scanning of ambient noise data recorded by a large-N array in search of body–wave energy (body–wave events) utilizing a convolutional neural network (CNN). It consists of computing first both amplitude and frequency attribute values at each receiver station for all divided portions of the recorded signal (noise panels). The created 2-D attribute maps are then converted to images and used to extract spatial and temporal patterns associated with the body–wave energy present in the data to build binary CNN-based classifiers. The ensemble of two multi-headed CNN models trained separately on the frequency and amplitude attribute maps demonstrates better generalization ability than each of its participating networks. We also compare the prediction performance of our deep learning (DL) framework with a conventional machine learning (ML) algorithm called XGBoost. The DL-based solution applied to 240 h of ambient seismic noise data recorded by the Kylylahti array in Finland demonstrates high detection accuracy and the superiority over the ML-based one. The ensemble of CNN-based models managed to find almost three times more verified body–wave events in the full unlabelled dataset than it was provided at the training stage. Moreover, the high-level abstraction features extracted at the deeper convolution layers can be used to perform unsupervised clustering of the classified panels with respect to their visual characteristics.
Collapse
|
54
|
Natural Language Processing Based Method for Clustering and Analysis of Aviation Safety Narratives. AEROSPACE 2020. [DOI: 10.3390/aerospace7100143] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The complexity of commercial aviation operations has grown substantially in recent years, together with a diversification of techniques for collecting and analyzing flight data. As a result, data-driven frameworks for enhancing flight safety have grown in popularity. Data-driven techniques offer efficient and repeatable exploration of patterns and anomalies in large datasets. Text-based flight safety data presents a unique challenge in its subjectivity, and relies on natural language processing tools to extract underlying trends from narratives. In this paper, a methodology is presented for the analysis of aviation safety narratives based on text-based accounts of in-flight events and categorical metadata parameters which accompany them. An extensive pre-processing routine is presented, including a comparison between numeric models of textual representation for the purposes of document classification. A framework for categorizing and visualizing narratives is presented through a combination of k-means clustering and 2-D mapping with t-Distributed Stochastic Neighbor Embedding (t-SNE). A cluster post-processing routine is developed for identifying driving factors in each cluster and building a hierarchical structure of cluster and sub-cluster labels. The Aviation Safety Reporting System (ASRS), which includes over a million de-identified voluntarily submitted reports describing aviation safety incidents for commercial flights, is analyzed as a case study for the methodology. The method results in the identification of 10 major clusters and a total of 31 sub-clusters. The identified groupings are post-processed through metadata-based statistical analysis of the learned clusters. The developed method shows promise in uncovering trends from clusters that are not evident in existing anomaly labels in the data and offers a new tool for obtaining insights from text-based safety data that complement existing approaches.
Collapse
|
55
|
Chen L, Guo Q, Liu Z, Zhang S, Zhang H. Enhanced synchronization-inspired clustering for high-dimensional data. COMPLEX INTELL SYST 2020. [DOI: 10.1007/s40747-020-00191-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
AbstractThe synchronization-inspired clustering algorithm (Sync) is a novel and outstanding clustering algorithm, which can accurately cluster datasets with any shape, density and distribution. However, the high-dimensional dataset with high dimensionality, high noise, and high redundancy brings some new challenges for the synchronization-inspired clustering algorithm, resulting in a significant increase in clustering time and a decrease in clustering accuracy. To address these challenges, an enhanced synchronization-inspired clustering algorithm, namely SyncHigh, is developed in this paper to quickly and accurately cluster the high-dimensional datasets. First, a PCA-based (Principal Component Analysis) dimension purification strategy is designed to find the principal components in all attributes. Second, a density-based data merge strategy is constructed to reduce the number of objects participating in the synchronization-inspired clustering algorithm, thereby speeding up clustering time. Third, the Kuramoto Model is enhanced to deal with mass differences between objects caused by the density-based data merge strategy. Finally, extensive experimental results on synthetic and real-world datasets show the effectiveness and efficiency of our SyncHigh algorithm.
Collapse
|
56
|
Chatzimparmpas A, Martins RM, Kerren A. t-viSNE: Interactive Assessment and Interpretation of t-SNE Projections. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2020; 26:2696-2714. [PMID: 32305922 DOI: 10.1109/tvcg.2020.2986996] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of multidimensional data has proven to be a popular approach, with successful applications in a wide range of domains. Despite their usefulness, t-SNE projections can be hard to interpret or even misleading, which hurts the trustworthiness of the results. Understanding the details of t-SNE itself and the reasons behind specific patterns in its output may be a daunting task, especially for non-experts in dimensionality reduction. In this article, we present t-viSNE, an interactive tool for the visual exploration of t-SNE projections that enables analysts to inspect different aspects of their accuracy and meaning, such as the effects of hyper-parameters, distance and neighborhood preservation, densities and costs of specific neighborhoods, and the correlations between dimensions and visual patterns. We propose a coherent, accessible, and well-integrated collection of different views for the visualization of t-SNE projections. The applicability and usability of t-viSNE are demonstrated through hypothetical usage scenarios with real data sets. Finally, we present the results of a user study where the tool's effectiveness was evaluated. By bringing to light information that would normally be lost after running t-SNE, we hope to support analysts in using t-SNE and making its results better understandable.
Collapse
|
57
|
Peña-Solórzano CA, Albrecht DW, Bassed RB, Gillam J, Harris PC, Dimmock MR. Semi-supervised labelling of the femur in a whole-body post-mortem CT database using deep learning. Comput Biol Med 2020; 122:103797. [PMID: 32658723 DOI: 10.1016/j.compbiomed.2020.103797] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Revised: 04/29/2020] [Accepted: 04/29/2020] [Indexed: 01/16/2023]
Abstract
A deep learning pipeline was developed and used to localize and classify a variety of implants in the femur contained in whole-body post-mortem computed tomography (PMCT) scans. The results provide a proof-of-principle approach for labelling content not described in medical/autopsy reports. The pipeline, which incorporated residual networks and an autoencoder, was trained and tested using n = 450 full-body PMCT scans. For the localization component, Dice scores of 0.99, 0.96, and 0.98 and mean absolute errors of 3.2, 7.1, and 4.2 mm were obtained in the axial, coronal, and sagittal views, respectively. A regression analysis found the orientation of the implant to the scanner axis and also the relative positioning of extremities to be statistically significant factors. For the classification component, test cases were properly labelled as nail (N+), hip replacement (H+), knee replacement (K+) or without-implant (I-) with an accuracy >97%. The recall for I- and H+ cases was 1.00, but fell to 0.82 and 0.65 for cases with K+ and N+. This semi-automatic approach provides a generalized structure for image-based labelling of features, without requiring time-consuming segmentation.
Collapse
Affiliation(s)
- C A Peña-Solórzano
- Department of Medical Imaging and Radiation Sciences, Monash University, Wellington Rd, Clayton, Melbourne, VIC, 3800, Australia.
| | - D W Albrecht
- Clayton School of Information Technology, Monash University, Wellington Rd, Clayton, Melbourne, VIC, 3800, Australia.
| | - R B Bassed
- Victorian Institute of Forensic Medicine, 57-83 Kavanagh St., Southbank, Melbourne, VIC, 3006, Australia; Department of Forensic Medicine, Monash University, Wellington Rd, Clayton, Melbourne, VIC, 3800, Australia.
| | - J Gillam
- Land Division, Defence Science and Technology Group, Fishermans Bend, Melbourne, VIC, 3207, Australia.
| | - P C Harris
- The Royal Children's Hospital Melbourne, 50 Flemington Road, Parkville, Melbourne, VIC, 3052, Australia; Department of Orthopaedic Surgery, Western Health, Footscray Hospital, Gordon St, Footscray, Melbourne, VIC, 3011, Australia.
| | - M R Dimmock
- Department of Medical Imaging and Radiation Sciences, Monash University, Wellington Rd, Clayton, Melbourne, VIC, 3800, Australia.
| |
Collapse
|
58
|
Aliverti E, Tilson JL, Filer DL, Babcock B, Colaneri A, Ocasio J, Gershon TR, Wilhelmsen KC, Dunson DB. Projected t-SNE for batch correction. Bioinformatics 2020; 36:3522-3527. [PMID: 32176244 PMCID: PMC7267829 DOI: 10.1093/bioinformatics/btaa189] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Revised: 03/02/2020] [Accepted: 03/12/2020] [Indexed: 12/31/2022] Open
Abstract
MOTIVATION Low-dimensional representations of high-dimensional data are routinely employed in biomedical research to visualize, interpret and communicate results from different pipelines. In this article, we propose a novel procedure to directly estimate t-SNE embeddings that are not driven by batch effects. Without correction, interesting structure in the data can be obscured by batch effects. The proposed algorithm can therefore significantly aid visualization of high-dimensional data. RESULTS The proposed methods are based on linear algebra and constrained optimization, leading to efficient algorithms and fast computation in many high-dimensional settings. Results on artificial single-cell transcription profiling data show that the proposed procedure successfully removes multiple batch effects from t-SNE embeddings, while retaining fundamental information on cell types. When applied to single-cell gene expression data to investigate mouse medulloblastoma, the proposed method successfully removes batches related with mice identifiers and the date of the experiment, while preserving clusters of oligodendrocytes, astrocytes, and endothelial cells and microglia, which are expected to lie in the stroma within or adjacent to the tumours. AVAILABILITY AND IMPLEMENTATION Source code implementing the proposed approach is available as an R package at https://github.com/emanuelealiverti/BC_tSNE, including a tutorial to reproduce the simulation studies. CONTACT aliverti@stat.unipd.it.
Collapse
Affiliation(s)
- Emanuele Aliverti
- Department of Statistical Sciences, University of Padova, Padova 35121, Italy
| | | | - Dayne L Filer
- RENCI, University of North Carolina, Chapel Hill, NC 27517, USA
- Department of Genetics
| | | | | | | | - Timothy R Gershon
- Department of Neurology
- UNC Neuroscience Center
- Carolina Institute for Developmental Disabilities
- Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine, Chapel Hill, NC 27599, USA
| | - Kirk C Wilhelmsen
- RENCI, University of North Carolina, Chapel Hill, NC 27517, USA
- Department of Genetics
- Department of Neurology
| | - David B Dunson
- Department of Statistical Science, Duke University, Durham, NC 27708, USA
| |
Collapse
|
59
|
Linderman GC, Mishne G, Jaffe A, Kluger Y, Steinerberger S. Randomized near-neighbor graphs, giant components and applications in data science. J Appl Probab 2020; 57:458-476. [PMID: 32913373 PMCID: PMC7480951 DOI: 10.1017/jpr.2020.21] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
If we pick n random points uniformly in [0, 1] d and connect each point to its c d log n-nearest neighbors, where d ≥ 2 is the dimension and c d is a constant depending on the dimension, then it is well known that the graph is connected with high probability. We prove that it suffices to connect every point to c d,1 log log n points chosen randomly among its c d,2 log n-nearest neighbors to ensure a giant component of size n - o(n) with high probability. This construction yields a much sparser random graph with ~ n log log n instead of ~ n log n edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of connecting each point to its k nearest neighbors, one can often pick k' ≪ k random points out of the k nearest neighbors and only connect to those without sacrificing quality of results. This approach can simplify and accelerate computation; we illustrate this with experimental results in spectral clustering of large-scale datasets.
Collapse
Affiliation(s)
- George C Linderman
- Postal address: Applied Mathematics, Yale University, New Haven, CT 06511
| | - Gal Mishne
- Postal address: Applied Mathematics, Yale University, New Haven, CT 06511
| | - Ariel Jaffe
- Postal address: Applied Mathematics, Yale University, New Haven, CT 06511
| | - Yuval Kluger
- Dept. of Pathology & Applied Mathematics, Yale University, New Haven, CT 06511
| | | |
Collapse
|
60
|
Škvorc U, Eftimov T, Korošec P. Understanding the problem space in single-objective numerical optimization using exploratory landscape analysis. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106138] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
61
|
Zhang Y, Kim MS, Reichenberger ER, Stear B, Taylor DM. Scedar: A scalable Python package for single-cell RNA-seq exploratory data analysis. PLoS Comput Biol 2020; 16:e1007794. [PMID: 32339163 PMCID: PMC7217489 DOI: 10.1371/journal.pcbi.1007794] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 05/12/2020] [Accepted: 03/17/2020] [Indexed: 11/25/2022] Open
Abstract
In single-cell RNA-seq (scRNA-seq) experiments, the number of individual cells has increased exponentially, and the sequencing depth of each cell has decreased significantly. As a result, analyzing scRNA-seq data requires extensive considerations of program efficiency and method selection. In order to reduce the complexity of scRNA-seq data analysis, we present scedar, a scalable Python package for scRNA-seq exploratory data analysis. The package provides a convenient and reliable interface for performing visualization, imputation of gene dropouts, detection of rare transcriptomic profiles, and clustering on large-scale scRNA-seq datasets. The analytical methods are efficient, and they also do not assume that the data follow certain statistical distributions. The package is extensible and modular, which would facilitate the further development of functionalities for future requirements with the open-source development community. The scedar package is distributed under the terms of the MIT license at https://pypi.org/project/scedar.
Collapse
Affiliation(s)
- Yuanchao Zhang
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
- Department of Genetics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Man S. Kim
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Erin R. Reichenberger
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Ben Stear
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Deanne M. Taylor
- Department of Biomedical and Health Informatics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
- Department of Pediatrics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
62
|
Linderman GC, Steinerberger S. NUMERICAL INTEGRATION ON GRAPHS: WHERE TO SAMPLE AND HOW TO WEIGH. MATHEMATICS OF COMPUTATION 2020; 89:1933-1952. [PMID: 33927452 PMCID: PMC8081285 DOI: 10.1090/mcom/3515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Let G = (V,E,w) be a finite, connected graph with weighted edges. We are interested in the problem of finding a subset W ⊂ V of vertices and weights aw such that 1 | V | ∑ v ∈ V f ( v ) ∼ ∑ w ∈ W a w f ( w ) for functions f : V → ℝ that are 'smooth' with respect to the geometry of the graph; here ~ indicates that we want the right-hand side to be as close to the left-hand side as possible. The main application are problems where f is known to vary smoothly over the underlying graph but is expensive to evaluate on even a single vertex. We prove an inequality showing that the integration problem can be rewritten as a geometric problem ('the optimal packing of heat balls'). We discuss how one would construct approximate solutions of the heat ball packing problem; numerical examples demonstrate the efficiency of the method.
Collapse
Affiliation(s)
- George C Linderman
- Program in Applied Mathematics, Yale University, New Haven, CT 06511, USA
| | | |
Collapse
|
63
|
Chi EC, Gaines BR, Sun WW, Zhou H, Yang J. Provable Convex Co-clustering of Tensors. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2020; 21:214. [PMID: 33312074 PMCID: PMC7731944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Cluster analysis is a fundamental tool for pattern discovery of complex heterogeneous data. Prevalent clustering methods mainly focus on vector or matrix-variate data and are not applicable to general-order tensors, which arise frequently in modern scientific and business applications. Moreover, there is a gap between statistical guarantees and computational efficiency for existing tensor clustering solutions due to the nature of their non-convex formulations. In this work, we bridge this gap by developing a provable convex formulation of tensor co-clustering. Our convex co-clustering (CoCo) estimator enjoys stability guarantees and its computational and storage costs are polynomial in the size of the data. We further establish a non-asymptotic error bound for the CoCo estimator, which reveals a surprising "blessing of dimensionality" phenomenon that does not exist in vector or matrix-variate cluster analysis. Our theoretical findings are supported by extensive simulated studies. Finally, we apply the CoCo estimator to the cluster analysis of advertisement click tensor data from a major online company. Our clustering results provide meaningful business insights to improve advertising effectiveness.
Collapse
Affiliation(s)
- Eric C Chi
- Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
| | - Brian R Gaines
- Advanced Analytics R&D, SAS Institute Inc., Cary, NC 27513, USA
| | - Will Wei Sun
- Krannert School of Management, Purdue University, West Lafayette, IN 47907, USA
| | - Hua Zhou
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
| | - Jian Yang
- Advertising Sciences, Yahoo Research, Sunnyvale, CA 94089, USA
| |
Collapse
|
64
|
Abstract
Single-cell transcriptomics yields ever growing data sets containing RNA expression levels for thousands of genes from up to millions of cells. Common data analysis pipelines include a dimensionality reduction step for visualising the data in two dimensions, most frequently performed using t-distributed stochastic neighbour embedding (t-SNE). It excels at revealing local structure in high-dimensional data, but naive applications often suffer from severe shortcomings, e.g. the global structure of the data is not represented accurately. Here we describe how to circumvent such pitfalls, and develop a protocol for creating more faithful t-SNE visualisations. It includes PCA initialisation, a high learning rate, and multi-scale similarity kernels; for very large data sets, we additionally use exaggeration and downsampling-based initialisation. We use published single-cell RNA-seq data sets to demonstrate that this protocol yields superior results compared to the naive application of t-SNE.
Collapse
Affiliation(s)
- Dmitry Kobak
- Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany.
| | - Philipp Berens
- Institute for Ophthalmic Research, University of Tübingen, Tübingen, Germany.
- Bernstein Center for Computational Neuroscience, University of Tübingen, Tübingen, Germany.
- Center for Integrative Neuroscience, University of Tübingen, Tübingen, Germany.
- Institute for Bioinformatics and Medical Informatics, University of Tübingen, Tübingen, Germany.
| |
Collapse
|