101
|
Learning visual codebooks for image classification using spectral clustering. Soft comput 2017. [DOI: 10.1007/s00500-017-2937-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
102
|
Wang J, Ferguson AL. Nonlinear machine learning in simulations of soft and biological materials. MOLECULAR SIMULATION 2017. [DOI: 10.1080/08927022.2017.1400164] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Affiliation(s)
- J. Wang
- Department of Physics, University of Illinois Urbana-Champaign , Urbana, IL, USA
| | - A. L. Ferguson
- Department of Physics, University of Illinois Urbana-Champaign , Urbana, IL, USA
- Department of Materials Science and Engineering, University of Illinois Urbana-Champaign , Urbana, IL, USA
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign , Urbana, IL, USA
| |
Collapse
|
103
|
Kwak N. Implementing Kernel Methods Incrementally by Incremental Nonlinear Projection Trick. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4003-4009. [PMID: 28113447 DOI: 10.1109/tcyb.2016.2565683] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Recently, the nonlinear projection trick (NPT) was introduced enabling direct computation of coordinates of samples in a reproducing kernel Hilbert space. With NPT, any machine learning algorithm can be extended to a kernel version without relying on the so called kernel trick. However, NPT is inherently difficult to be implemented incrementally because an ever increasing kernel matrix should be treated as additional training samples are introduced. In this paper, an incremental version of the NPT (INPT) is proposed based on the observation that the centerization step in NPT is unnecessary. Because the proposed INPT does not change the coordinates of the old data, the coordinates obtained by INPT can directly be used in any incremental methods to implement a kernel version of the incremental methods. The effectiveness of the INPT is shown by applying it to implement incremental versions of kernel methods such as, kernel singular value decomposition, kernel principal component analysis, and kernel discriminant analysis which are utilized for problems of kernel matrix reconstruction, letter classification, and face image retrieval, respectively.
Collapse
|
104
|
Liu W, Ye M, Wei J, Hu X. Compressed constrained spectral clustering framework for large-scale data sets. Knowl Based Syst 2017. [DOI: 10.1016/j.knosys.2017.08.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
105
|
Tsang IW. Principal Graph and Structure Learning Based on Reversed Graph Embedding. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2017; 39:2227-2241. [PMID: 28114001 PMCID: PMC5899072 DOI: 10.1109/tpami.2016.2635657] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Many scientific datasets are of high dimension, and the analysis usually requires retaining the most important structures of data. Principal curve is a widely used approach for this purpose. However, many existing methods work only for data with structures that are mathematically formulated by curves, which is quite restrictive for real applications. A few methods can overcome the above problem, but they either require complicated human-made rules for a specific task with lack of adaption flexibility to different tasks, or cannot obtain explicit structures of data. To address these issues, we develop a novel principal graph and structure learning framework that captures the local information of the underlying graph structure based on reversed graph embedding. As showcases, models that can learn a spanning tree or a weighted undirected `1 graph are proposed, and a new learning algorithm is developed that learns a set of principal points and a graph structure from data, simultaneously. The new algorithm is simple with guaranteed convergence. We then extend the proposed framework to deal with large-scale data. Experimental results on various synthetic and six real world datasets show that the proposed method compares favorably with baselines and can uncover the underlying structure correctly.
Collapse
|
106
|
Tayal A, Coleman TF, Li Y. Bounding the difference between RankRC and RankSVM and application to multi-level rare class kernel ranking. Data Min Knowl Discov 2017. [DOI: 10.1007/s10618-017-0540-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
107
|
Lan L, Zhang K, Ge H, Cheng W, Liu J, Rauber A, Li XL, Wang J, Zha H. Low-rank decomposition meets kernel learning: A generalized Nyström method. ARTIF INTELL 2017. [DOI: 10.1016/j.artint.2017.05.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
108
|
Fiberprint: A subject fingerprint based on sparse code pooling for white matter fiber analysis. Neuroimage 2017; 158:242-259. [DOI: 10.1016/j.neuroimage.2017.06.083] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Revised: 06/20/2017] [Accepted: 06/30/2017] [Indexed: 11/18/2022] Open
|
109
|
Hofmeyr DP. Clustering by Minimum Cut Hyperplanes. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2017; 39:1547-1560. [PMID: 27654138 DOI: 10.1109/tpami.2016.2609929] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Minimum normalised graph cuts are highly effective ways of partitioning unlabeled data, having been made popular by the success of spectral clustering. This work presents a novel method for learning hyperplane separators which minimise this graph cut objective, when data are embedded in Euclidean space. The optimisation problem associated with the proposed method can be formulated as a sequence of univariate subproblems, in which the optimal hyperplane orthogonal to a given vector is determined. These subproblems can be solved in log-linear time, by exploiting the trivial factorisation of the exponential function. Experimentation suggests that the empirical runtime of the overall algorithm is also log-linear in the number of data. Asymptotic properties of the minimum cut hyperplane, both for a finite sample, and for an increasing sample assumed to arise from an underlying probability distribution are discussed. In the finite sample case the minimum cut hyperplane converges to the maximum margin hyperplane as the scaling parameter is reduced to zero. Applying the proposed methodology, both for fixed scaling, and the large margin asymptotes, is shown to produce high quality clustering models in comparison with state-of-the-art clustering algorithms in experiments using a large collection of benchmark datasets.
Collapse
|
110
|
Goyal S, Kumar S, Zaveri MA, Shukla AK. Fuzzy Similarity Measure Based Spectral Clustering Framework for Noisy Image Segmentation. INT J UNCERTAIN FUZZ 2017. [DOI: 10.1142/s0218488517500283] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
In recent times, graph based spectral clustering algorithms have received immense attention in many areas like, data mining, object recognition, image analysis and processing. The commonly used similarity measure in the clustering algorithms is the Gaussian kernel function which uses sensitive scaling parameter and when applied to the segmentation of noise contaminated images leads to unsatisfactory performance because of neglecting the spatial pixel information. The present work introduces a novel framework for spectral clustering which embodied local spatial information and fuzzy based similarity measure to tackle the above mentioned issues. In our approach, firstly we filter the noise components from original image by using the spatial and gray–level information. The similarity matrix is then constructed by employing a similarity measure which takes into account the fuzzy c-partition matrix and vectors of the cluster centers obtained by fuzzy c-means clustering algorithm. In the last step, spectral clustering technique is realized on derived similarity matrix to obtain the desired segmentation result. Experimental results on segmentation of synthetic and Berkeley benchmark images with noise demonstrates the effectiveness and robustness of the proposed method, giving it an edge over the clustering based segmentation method reported in the literature.
Collapse
Affiliation(s)
- Subhanshu Goyal
- Department of Mathematics, Marwadi University, Rajkot, Gujarat 360003, India
| | - Sushil Kumar
- Applied Mathematics & Humanities Department, S.V. National Institute of Technology, Surat, Gujarat 395007, India
| | - M. A. Zaveri
- Department of Computer Science & Engineering, S.V. National Institute of Technology, Surat, Gujarat 395007, India
| | - A. K. Shukla
- Applied Mathematics & Humanities Department, S.V. National Institute of Technology, Surat, Gujarat 395007, India
| |
Collapse
|
111
|
Spectral clustering using Nyström approximation for the accurate identification of cancer molecular subtypes. Sci Rep 2017; 7:4896. [PMID: 28687729 PMCID: PMC5501792 DOI: 10.1038/s41598-017-05275-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 05/26/2017] [Indexed: 11/08/2022] Open
Abstract
A major challenge in clinical cancer research is the identification of accurate molecular subtype. While unsupervised clustering methods have been applied for class discovery, this clustering method remains a bottleneck in developing accurate method for molecular subtype discovery. In this analysis, we hypothesize that spectral clustering method could identify molecular subtypes in correlation with survival outcomes. We propose an accurate subtype identification method, Cancer Subtype Identification with Spectral Clustering using Nyström approximation (CSISCN), for the discovery of molecular subtypes, based on spectral clustering method. CSISCN could be used to improve gene expression-based identification of breast cancer molecular subtypes. We demonstrated that CSISCN identified the molecular subtypes with distinct clinical outcomes and was valid for the number of molecular subtypes. Furthermore, CSISCN identified molecular subtypes for improving clinical and molecular relevance which significantly outperformed consensus clustering and spectral clustering methods. To test the general applicability of the CSISCN, we further applied it on human CRC datasets and AML datasets and demonstrated superior performance as compared to consensus clustering method. In summary, CSISCN demonstrated the great potential in gene expression-based subtype identification.
Collapse
|
112
|
Gong C, Tao D, Liu W, Liu L, Yang J. Label Propagation via Teaching-to-Learn and Learning-to-Teach. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2017; 28:1452-1465. [PMID: 27076470 DOI: 10.1109/tnnls.2016.2514360] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
How to propagate label information from labeled examples to unlabeled examples over a graph has been intensively studied for a long time. Existing graph-based propagation algorithms usually treat unlabeled examples equally, and transmit seed labels to the unlabeled examples that are connected to the labeled examples in a neighborhood graph. However, such a popular propagation scheme is very likely to yield inaccurate propagation, because it falls short of tackling ambiguous but critical data points (e.g., outliers). To this end, this paper treats the unlabeled examples in different levels of difficulties by assessing their reliability and discriminability, and explicitly optimizes the propagation quality by manipulating the propagation sequence to move from simple to difficult examples. In particular, we propose a novel iterative label propagation algorithm in which each propagation alternates between two paradigms, teaching-to-learn and learning-to-teach (TLLT). In the teaching-to-learn step, the learner conducts the propagation on the simplest unlabeled examples designated by the teacher. In the learning-to-teach step, the teacher incorporates the learner's feedback to adjust the choice of the subsequent simplest examples. The proposed TLLT strategy critically improves the accuracy of label propagation, making our algorithm substantially robust to the values of tuning parameters, such as the Gaussian kernel width used in graph construction. The merits of our algorithm are theoretically justified and empirically demonstrated through experiments performed on both synthetic and real-world data sets.
Collapse
|
113
|
He L, Ray N, Zhang H. Error bound of Nyström-approximated NCut eigenvectors and its application to training size selection. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.02.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
114
|
Wang J, Gayatri MA, Ferguson AL. Mesoscale Simulation and Machine Learning of Asphaltene Aggregation Phase Behavior and Molecular Assembly Landscapes. J Phys Chem B 2017; 121:4923-4944. [DOI: 10.1021/acs.jpcb.7b02574] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Affiliation(s)
- Jiang Wang
- Department
of Physics, University of Illinois Urbana−Champaign, 1110 West Green Street, Urbana, Illinois 61801, United States
| | - Mohit A. Gayatri
- Department
of Chemical and Biomolecular Engineering, University of Illinois Urbana−Champaign, 600 South Mathews Avenue, Urbana, Illinois 61801, United States
| | - Andrew L. Ferguson
- Department
of Chemical and Biomolecular Engineering, University of Illinois Urbana−Champaign, 600 South Mathews Avenue, Urbana, Illinois 61801, United States
- Department
of Materials Science and Engineering, University of Illinois Urbana−Champaign, 1304 West Green Street, Urbana, Illinois 61801, United States
| |
Collapse
|
115
|
|
116
|
Banisch R, Koltai P. Understanding the geometry of transport: Diffusion maps for Lagrangian trajectory data unravel coherent sets. CHAOS (WOODBURY, N.Y.) 2017; 27:035804. [PMID: 28364763 DOI: 10.1063/1.4971788] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
Dynamical systems often exhibit the emergence of long-lived coherent sets, which are regions in state space that keep their geometric integrity to a high extent and thus play an important role in transport. In this article, we provide a method for extracting coherent sets from possibly sparse Lagrangian trajectory data. Our method can be seen as an extension of diffusion maps to trajectory space, and it allows us to construct "dynamical coordinates," which reveal the intrinsic low-dimensional organization of the data with respect to transport. The only a priori knowledge about the dynamics that we require is a locally valid notion of distance, which renders our method highly suitable for automated data analysis. We show convergence of our method to the analytic transfer operator framework of coherence in the infinite data limit and illustrate its potential on several two- and three-dimensional examples as well as real world data.
Collapse
Affiliation(s)
- Ralf Banisch
- School of Mathematics, University of Edinburgh, Edinburgh EH9 3FD, United Kingdom
| | - Péter Koltai
- Institute of Mathematics, Freie Universität Berlin, 14195 Berlin, Germany
| |
Collapse
|
117
|
Pelillo M, Elezi I, Fiorucci M. Revealing structure in large graphs: Szemerédi’s regularity lemma and its use in pattern recognition. Pattern Recognit Lett 2017. [DOI: 10.1016/j.patrec.2016.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
118
|
Banijamali E, Ghodsi A. Fast Spectral Clustering Using Autoencoders and Landmarks. LECTURE NOTES IN COMPUTER SCIENCE 2017. [DOI: 10.1007/978-3-319-59876-5_42] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
119
|
Iosifidis A, Tefas A, Pitas I. Approximate kernel extreme learning machine for large scale data classification. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2016.09.023] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
120
|
I SK, Chaudhury S. Scalable clustering and applications. PROCEEDINGS OF THE TENTH INDIAN CONFERENCE ON COMPUTER VISION, GRAPHICS AND IMAGE PROCESSING 2016. [DOI: 10.1145/3009977.3010073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/19/2023]
|
121
|
|
122
|
Mirzaei G, Adeli H. Resting state functional magnetic resonance imaging processing techniques in stroke studies. Rev Neurosci 2016; 27:871-885. [DOI: 10.1515/revneuro-2016-0052] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2016] [Accepted: 10/01/2016] [Indexed: 01/15/2023]
Abstract
AbstractIn recent years, there has been considerable research interest in the study of brain connectivity using the resting state functional magnetic resonance imaging (rsfMRI). Studies have explored the brain networks and connection between different brain regions. These studies have revealed interesting new findings about the brain mapping as well as important new insights in the overall organization of functional communication in the brain network. In this paper, after a general discussion of brain networks and connectivity imaging, the brain connectivity and resting state networks are described with a focus on rsfMRI imaging in stroke studies. Then, techniques for preprocessing of the rsfMRI for stroke patients are reviewed, followed by brain connectivity processing techniques. Recent research on brain connectivity using rsfMRI is reviewed with an emphasis on stroke studies. The authors hope this paper generates further interest in this emerging area of computational neuroscience with potential applications in rehabilitation of stroke patients.
Collapse
Affiliation(s)
- Golrokh Mirzaei
- 1Department of Computer Science and Engineering, The Ohio State University, Marion, OH 43302, United States of America
| | - Hojjat Adeli
- 2Department of Biomedical Engineering, Biomedical Informatics, Neurology, Neuroscience, Electrical and Computer Engineering, Civil and Environmental Engineering, The Ohio State University, Columbus, OH 43210, United States of America
| |
Collapse
|
123
|
Peng X, Tang H, Zhang L, Yi Z, Xiao S. A Unified Framework for Representation-Based Subspace Clustering of Out-of-Sample and Large-Scale Data. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:2499-2512. [PMID: 26540718 DOI: 10.1109/tnnls.2015.2490080] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Under the framework of spectral clustering, the key of subspace clustering is building a similarity graph, which describes the neighborhood relations among data points. Some recent works build the graph using sparse, low-rank, and l2 -norm-based representation, and have achieved the state-of-the-art performance. However, these methods have suffered from the following two limitations. First, the time complexities of these methods are at least proportional to the cube of the data size, which make those methods inefficient for solving the large-scale problems. Second, they cannot cope with the out-of-sample data that are not used to construct the similarity graph. To cluster each out-of-sample datum, the methods have to recalculate the similarity graph and the cluster membership of the whole data set. In this paper, we propose a unified framework that makes the representation-based subspace clustering algorithms feasible to cluster both the out-of-sample and the large-scale data. Under our framework, the large-scale problem is tackled by converting it as the out-of-sample problem in the manner of sampling, clustering, coding, and classifying. Furthermore, we give an estimation for the error bounds by treating each subspace as a point in a hyperspace. Extensive experimental results on various benchmark data sets show that our methods outperform several recently proposed scalable methods in clustering a large-scale data set.
Collapse
|
124
|
|
125
|
Ghoshdastidar D, Adsul AP, Dukkipati A. Learning With Jensen-Tsallis Kernels. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2016; 27:2108-2119. [PMID: 27101624 DOI: 10.1109/tnnls.2016.2550578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Jensen-type [Jensen-Shannon (JS) and Jensen-Tsallis] kernels were first proposed by Martins et al. (2009). These kernels are based on JS divergences that originated in the information theory. In this paper, we extend the Jensen-type kernels on probability measures to define positive-definite kernels on Euclidean space. We show that the special cases of these kernels include dot-product kernels. Since Jensen-type divergences are multidistribution divergences, we propose their multipoint variants, and study spectral clustering and kernel methods based on these. We also provide experimental studies on benchmark image database and gene expression database that show the benefits of the proposed kernels compared with the existing kernels. The experiments on clustering also demonstrate the use of constructing multipoint similarities.
Collapse
|
126
|
Affiliation(s)
- Arik Nemtsov
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Amir Averbuch
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Alon Schclar
- School of Computer Science, The Academic College of Tel Aviv-Yaffo, Tel Aviv, Israel
| |
Collapse
|
127
|
Shi J, Lei Y, Zhou Y, Gong M. Enhanced rough–fuzzy c -means algorithm with strict rough sets properties. Appl Soft Comput 2016. [DOI: 10.1016/j.asoc.2015.12.031] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
128
|
|
129
|
Hadjighasem A, Karrasch D, Teramoto H, Haller G. Spectral-clustering approach to Lagrangian vortex detection. Phys Rev E 2016; 93:063107. [PMID: 27415358 DOI: 10.1103/physreve.93.063107] [Citation(s) in RCA: 97] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2015] [Indexed: 11/07/2022]
Abstract
One of the ubiquitous features of real-life turbulent flows is the existence and persistence of coherent vortices. Here we show that such coherent vortices can be extracted as clusters of Lagrangian trajectories. We carry out the clustering on a weighted graph, with the weights measuring pairwise distances of fluid trajectories in the extended phase space of positions and time. We then extract coherent vortices from the graph using tools from spectral graph theory. Our method locates all coherent vortices in the flow simultaneously, thereby showing high potential for automated vortex tracking. We illustrate the performance of this technique by identifying coherent Lagrangian vortices in several two- and three-dimensional flows.
Collapse
Affiliation(s)
- Alireza Hadjighasem
- Department of Mechanical and Process Engineering, Institute of Mechanical Systems, ETH Zürich, Leonhardstrasse 21, 8092 Zürich, Switzerland
| | - Daniel Karrasch
- Department of Mechanical and Process Engineering, Institute of Mechanical Systems, ETH Zürich, Leonhardstrasse 21, 8092 Zürich, Switzerland
| | - Hiroshi Teramoto
- Molecule & Life Nonlinear Sciences Laboratory, Research Institute for Electronic Science, Hokkaido University, Kita 20 Nishi 10, Kita-ku, Sapporo 001-0020, Japan
| | - George Haller
- Department of Mechanical and Process Engineering, Institute of Mechanical Systems, ETH Zürich, Leonhardstrasse 21, 8092 Zürich, Switzerland
| |
Collapse
|
130
|
Sparks R, Madabhushi A. Out-of-Sample Extrapolation utilizing Semi-Supervised Manifold Learning (OSE-SSL): Content Based Image Retrieval for Histopathology Images. Sci Rep 2016; 6:27306. [PMID: 27264985 PMCID: PMC4893667 DOI: 10.1038/srep27306] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2015] [Accepted: 05/16/2016] [Indexed: 12/22/2022] Open
Abstract
Content-based image retrieval (CBIR) retrieves database images most similar to the query image by (1) extracting quantitative image descriptors and (2) calculating similarity between database and query image descriptors. Recently, manifold learning (ML) has been used to perform CBIR in a low dimensional representation of the high dimensional image descriptor space to avoid the curse of dimensionality. ML schemes are computationally expensive, requiring an eigenvalue decomposition (EVD) for every new query image to learn its low dimensional representation. We present out-of-sample extrapolation utilizing semi-supervised ML (OSE-SSL) to learn the low dimensional representation without recomputing the EVD for each query image. OSE-SSL incorporates semantic information, partial class label, into a ML scheme such that the low dimensional representation co-localizes semantically similar images. In the context of prostate histopathology, gland morphology is an integral component of the Gleason score which enables discrimination between prostate cancer aggressiveness. Images are represented by shape features extracted from the prostate gland. CBIR with OSE-SSL for prostate histology obtained from 58 patient studies, yielded an area under the precision recall curve (AUPRC) of 0.53 ± 0.03 comparatively a CBIR with Principal Component Analysis (PCA) to learn a low dimensional space yielded an AUPRC of 0.44 ± 0.01.
Collapse
Affiliation(s)
- Rachel Sparks
- University College of London, Centre for Medical Image Computing, London, UK
| | - Anant Madabhushi
- Case Western Reserve University, Department of Biomedical Engineering, Cleveland, OH, USA
| |
Collapse
|
131
|
Abstract
Phylogenetic inference can potentially result in a more accurate tree using data from multiple loci. However, if the loci are incongruent-due to events such as incomplete lineage sorting or horizontal gene transfer-it can be misleading to infer a single tree. To address this, many previous contributions have taken a mechanistic approach, by modeling specific processes. Alternatively, one can cluster loci without assuming how these incongruencies might arise. Such "process-agnostic" approaches typically infer a tree for each locus and cluster these. There are, however, many possible combinations of tree distance and clustering methods; their comparative performance in the context of tree incongruence is largely unknown. Furthermore, because standard model selection criteria such as AIC cannot be applied to problems with a variable number of topologies, the issue of inferring the optimal number of clusters is poorly understood. Here, we perform a large-scale simulation study of phylogenetic distances and clustering methods to infer loci of common evolutionary history. We observe that the best-performing combinations are distances accounting for branch lengths followed by spectral clustering or Ward's method. We also introduce two statistical tests to infer the optimal number of clusters and show that they strongly outperform the silhouette criterion, a general-purpose heuristic. We illustrate the usefulness of the approach by 1) identifying errors in a previous phylogenetic analysis of yeast species and 2) identifying topological incongruence among newly sequenced loci of the globeflower fly genus Chiastocheta We release treeCl, a new program to cluster genes of common evolutionary history (http://git.io/treeCl).
Collapse
Affiliation(s)
- Kevin Gori
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Campus, Hinxton, United Kingdom
| | - Tomasz Suchan
- Department of Ecology and Evolution, Biophore Building, UNIL-Sorge, University of Lausanne, Lausanne, Switzerland
| | - Nadir Alvarez
- Department of Ecology and Evolution, Biophore Building, UNIL-Sorge, University of Lausanne, Lausanne, Switzerland
| | - Nick Goldman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Campus, Hinxton, United Kingdom
| | - Christophe Dessimoz
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Campus, Hinxton, United Kingdom Department of Ecology and Evolution, Biophore Building, UNIL-Sorge, University of Lausanne, Lausanne, Switzerland Department of Genetics, Evolution & Environment, University College London, London, United Kingdom Department of Computer Science, University College London, London, United Kingdom Centre for Integrative Genomics, University of Lausanne, Lausanne, Switzerland Swiss Institute of Bioinformatics, Biophore, Lausanne, Switzerland
| |
Collapse
|
132
|
Entropy-Based Incomplete Cholesky Decomposition for a Scalable Spectral Clustering Algorithm: Computational Studies and Sensitivity Analysis. ENTROPY 2016. [DOI: 10.3390/e18050182] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
133
|
Jia H, Ding S, Du M. A Nyström spectral clustering algorithm based on probability incremental sampling. Soft comput 2016. [DOI: 10.1007/s00500-016-2160-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
134
|
|
135
|
Jothi R, Mohanty SK, Ojha A. Functional grouping of similar genes using eigenanalysis on minimum spanning tree based neighborhood graph. Comput Biol Med 2016; 71:135-48. [PMID: 26945461 DOI: 10.1016/j.compbiomed.2016.02.007] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Revised: 01/16/2016] [Accepted: 02/12/2016] [Indexed: 10/22/2022]
Abstract
Gene expression data clustering is an important biological process in DNA microarray analysis. Although there have been many clustering algorithms for gene expression analysis, finding a suitable and effective clustering algorithm is always a challenging problem due to the heterogeneous nature of gene profiles. Minimum Spanning Tree (MST) based clustering algorithms have been successfully employed to detect clusters of varying shapes and sizes. This paper proposes a novel clustering algorithm using Eigenanalysis on Minimum Spanning Tree based neighborhood graph (E-MST). As MST of a set of points reflects the similarity of the points with their neighborhood, the proposed algorithm employs a similarity graph obtained from k(') rounds of MST (k(')-MST neighborhood graph). By studying the spectral properties of the similarity matrix obtained from k(')-MST graph, the proposed algorithm achieves improved clustering results. We demonstrate the efficacy of the proposed algorithm on 12 gene expression datasets. Experimental results show that the proposed algorithm performs better than the standard clustering algorithms.
Collapse
Affiliation(s)
- R Jothi
- Indian Institute of Information Technology, Design and Manufacturing Jabalpur, Madhya Pradesh, India.
| | - Sraban Kumar Mohanty
- Indian Institute of Information Technology, Design and Manufacturing Jabalpur, Madhya Pradesh, India.
| | - Aparajita Ojha
- Indian Institute of Information Technology, Design and Manufacturing Jabalpur, Madhya Pradesh, India.
| |
Collapse
|
136
|
|
137
|
Athawale T, Sakhaee E, Entezari A. Isosurface Visualization of Data with Nonparametric Models for Uncertainty. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2016; 22:777-786. [PMID: 26529727 DOI: 10.1109/tvcg.2015.2467958] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The problem of isosurface extraction in uncertain data is an important research problem and may be approached in two ways. One can extract statistics (e.g., mean) from uncertain data points and visualize the extracted field. Alternatively, data uncertainty, characterized by probability distributions, can be propagated through the isosurface extraction process. We analyze the impact of data uncertainty on topology and geometry extraction algorithms. A novel, edge-crossing probability based approach is proposed to predict underlying isosurface topology for uncertain data. We derive a probabilistic version of the midpoint decider that resolves ambiguities that arise in identifying topological configurations. Moreover, the probability density function characterizing positional uncertainty in isosurfaces is derived analytically for a broad class of nonparametric distributions. This analytic characterization can be used for efficient closed-form computation of the expected value and variation in geometry. Our experiments show the computational advantages of our analytic approach over Monte-Carlo sampling for characterizing positional uncertainty. We also show the advantage of modeling underlying error densities in a nonparametric statistical framework as opposed to a parametric statistical framework through our experiments on ensemble datasets and uncertain scalar fields.
Collapse
|
138
|
|
139
|
|
140
|
Wang Q, Zhang K, Chen Z, Wang D, Jiang G, Marsic I. Enhancing semi-supervised learning through label-aware base kernels. Neurocomputing 2016. [DOI: 10.1016/j.neucom.2015.07.072] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
141
|
|
142
|
|
143
|
Hu EL, Kwok JT. Scalable Nonparametric Low-Rank Kernel Learning Using Block Coordinate Descent. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2015; 26:1927-1938. [PMID: 25343772 DOI: 10.1109/tnnls.2014.2361159] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Nonparametric kernel learning (NPKL) is a flexible approach to learn the kernel matrix directly without assuming any parametric form. It can be naturally formulated as a semidefinite program (SDP), which, however, is not very scalable. To address this problem, we propose the combined use of low-rank approximation and block coordinate descent (BCD). Low-rank approximation avoids the expensive positive semidefinite constraint in the SDP by replacing the kernel matrix variable with V(T)V, where V is a low-rank matrix. The resultant nonlinear optimization problem is then solved by BCD, which optimizes each column of V sequentially. It can be shown that the proposed algorithm has nice convergence properties and low computational complexities. Experiments on a number of real-world data sets show that the proposed algorithm outperforms state-of-the-art NPKL solvers.
Collapse
|
144
|
Semertzidis T, Rafailidis D, Strintzis M, Daras P. Large-scale spectral clustering based on pairwise constraints. Inf Process Manag 2015. [DOI: 10.1016/j.ipm.2015.05.007] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
145
|
Gong C, Fu K, Zhou L, Yang J, He X. Scalable Semi-Supervised Classification via Neumann Series. Neural Process Lett 2015. [DOI: 10.1007/s11063-014-9351-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
146
|
Cai D, Chen X. Large Scale Spectral Clustering Via Landmark-Based Sparse Representation. IEEE TRANSACTIONS ON CYBERNETICS 2015; 45:1669-1680. [PMID: 25265642 DOI: 10.1109/tcyb.2014.2358564] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Spectral clustering is one of the most popular clustering approaches. However, it is not a trivial task to apply spectral clustering to large-scale problems due to its computational complexity of O(n(3)), where n is the number of samples. Recently, many approaches have been proposed to accelerate the spectral clustering. Unfortunately, these methods usually sacrifice quite a lot information of the original data, thus result in a degradation of performance. In this paper, we propose a novel approach, called landmark-based spectral clustering, for large-scale clustering problems. Specifically, we select p ( << n) representative data points as the landmarks and represent the original data points as sparse linear combinations of these landmarks. The spectral embedding of the data can then be efficiently computed with the landmark-based representation. The proposed algorithm scales linearly with the problem size. Extensive experiments show the effectiveness and efficiency of our approach comparing to the state-of-the-art methods.
Collapse
|
147
|
Yoo SW, Guevara P, Jeong Y, Yoo K, Shin JS, Mangin JF, Seong JK. An Example-Based Multi-Atlas Approach to Automatic Labeling of White Matter Tracts. PLoS One 2015; 10:e0133337. [PMID: 26225419 PMCID: PMC4520495 DOI: 10.1371/journal.pone.0133337] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2014] [Accepted: 06/25/2015] [Indexed: 11/18/2022] Open
Abstract
We present an example-based multi-atlas approach for classifying white matter (WM) tracts into anatomic bundles. Our approach exploits expert-provided example data to automatically classify the WM tracts of a subject. Multiple atlases are constructed to model the example data from multiple subjects in order to reflect the individual variability of bundle shapes and trajectories over subjects. For each example subject, an atlas is maintained to allow the example data of a subject to be added or deleted flexibly. A voting scheme is proposed to facilitate the multi-atlas exploitation of example data. For conceptual simplicity, we adopt the same metrics in both example data construction and WM tract labeling. Due to the huge number of WM tracts in a subject, it is time-consuming to label each WM tract individually. Thus, the WM tracts are grouped according to their shape similarity, and WM tracts within each group are labeled simultaneously. To further enhance the computational efficiency, we implemented our approach on the graphics processing unit (GPU). Through nested cross-validation we demonstrated that our approach yielded high classification performance. The average sensitivities for bundles in the left and right hemispheres were 89.5% and 91.0%, respectively, and their average false discovery rates were 14.9% and 14.2%, respectively.
Collapse
Affiliation(s)
- Sang Wook Yoo
- Department of Biomedical Engineering, Korea University, Seoul, Republic of Korea
- Department of Computer Science, KAIST, Daejeon, Republic of Korea
| | - Pamela Guevara
- IBM, CEA, Gif-sur-Yvette, France
- Institut Fédératif de Recherche 49, Gif-sur-Yvette, France
- University of Concepción, Concepción, Chile
| | - Yong Jeong
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Kwangsun Yoo
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Joseph S. Shin
- Department of Computer Science, KAIST, Daejeon, Republic of Korea
- Handong Global University, Pohang, Republic of Korea
| | - Jean-Francois Mangin
- Institut Fédératif de Recherche 49, Gif-sur-Yvette, France
- University of Concepción, Concepción, Chile
| | - Joon-Kyung Seong
- Department of Biomedical Engineering, Korea University, Seoul, Republic of Korea
| |
Collapse
|
148
|
|
149
|
Automated Segmentation of MS Lesions in MR Images Based on an Information Theoretic Clustering and Contrast Transformations. TECHNOLOGIES 2015. [DOI: 10.3390/technologies3020142] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
150
|
Yang Y, Ma Z, Yang Y, Nie F, Shen HT. Multitask spectral clustering by exploring intertask correlation. IEEE TRANSACTIONS ON CYBERNETICS 2015; 45:1069-1080. [PMID: 25252288 DOI: 10.1109/tcyb.2014.2344015] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Clustering, as one of the most classical research problems in pattern recognition and data mining, has been widely explored and applied to various applications. Due to the rapid evolution of data on the Web, more emerging challenges have been posed on traditional clustering techniques: 1) correlations among related clustering tasks and/or within individual task are not well captured; 2) the problem of clustering out-of-sample data is seldom considered; and 3) the discriminative property of cluster label matrix is not well explored. In this paper, we propose a novel clustering model, namely multitask spectral clustering (MTSC), to cope with the above challenges. Specifically, two types of correlations are well considered: 1) intertask clustering correlation, which refers the relations among different clustering tasks and 2) intratask learning correlation, which enables the processes of learning cluster labels and learning mapping function to reinforce each other. We incorporate a novel l2,p -norm regularizer to control the coherence of all the tasks based on an assumption that related tasks should share a common low-dimensional representation. Moreover, for each individual task, an explicit mapping function is simultaneously learnt for predicting cluster labels by mapping features to the cluster label matrix. Meanwhile, we show that the learning process can naturally incorporate discriminative information to further improve clustering performance. We explore and discuss the relationships between our proposed model and several representative clustering techniques, including spectral clustering, k -means and discriminative k -means. Extensive experiments on various real-world datasets illustrate the advantage of the proposed MTSC model compared to state-of-the-art clustering approaches.
Collapse
|