1
|
Bobrowski O, Skraba P. Cluster Persistence for Weighted Graphs. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1587. [PMID: 38136467 PMCID: PMC10743168 DOI: 10.3390/e25121587] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/09/2023] [Accepted: 11/22/2023] [Indexed: 12/24/2023]
Abstract
Persistent homology is a natural tool for probing the topological characteristics of weighted graphs, essentially focusing on their 0-dimensional homology. While this area has been thoroughly studied, we present a new approach to constructing a filtration for cluster analysis via persistent homology. The key advantages of the new filtration is that (a) it provides richer signatures for connected components by introducing non-trivial birth times, and (b) it is robust to outliers. The key idea is that nodes are ignored until they belong to sufficiently large clusters. We demonstrate the computational efficiency of our filtration, its practical effectiveness, and explore into its properties when applied to random graphs.
Collapse
Affiliation(s)
- Omer Bobrowski
- School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, UK
- Viterbi Faculty of Electrical and Computer Engineering, Technion, Haifa 3200003, Israel
| | - Primoz Skraba
- School of Mathematical Sciences, Queen Mary University of London, London E1 4NS, UK
- Department for Artificial Intelligence, Jozef Stefan Institute, 1000 Ljubljana, Slovenia
| |
Collapse
|
2
|
Bobrowski O, Skraba P. A universal null-distribution for topological data analysis. Sci Rep 2023; 13:12274. [PMID: 37507400 PMCID: PMC10382541 DOI: 10.1038/s41598-023-37842-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 06/28/2023] [Indexed: 07/30/2023] Open
Abstract
One of the most elusive challenges within the area of topological data analysis is understanding the distribution of persistence diagrams arising from data. Despite much effort and its many successful applications, this is largely an open problem. We present a surprising discovery: normalized properly, persistence diagrams arising from random point-clouds obey a universal probability law. Our statements are based on extensive experimentation on both simulated and real data, covering point-clouds with vastly different geometry, topology, and probability distributions. Our results also include an explicit well-known distribution as a candidate for the universal law. We demonstrate the power of these new discoveries by proposing a new hypothesis testing framework for computing significance values for individual topological features within persistence diagrams, providing a new quantitative way to assess the significance of structure in data.
Collapse
Affiliation(s)
- Omer Bobrowski
- Viterbi Faculty of Electrical and Computer Engineering, Technion - Israel Institute of Technology, Haifa, Israel.
- School of Mathematical Sciences, Queen Mary University of London, London, UK.
| | - Primoz Skraba
- School of Mathematical Sciences, Queen Mary University of London, London, UK.
| |
Collapse
|
3
|
Dawson M, Dudley C, Omoma S, Tung HR, Ciocanel MV. Characterizing emerging features in cell dynamics using topological data analysis methods. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:3023-3046. [PMID: 36899570 DOI: 10.3934/mbe.2023143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Filament-motor interactions inside cells play essential roles in many developmental as well as other biological processes. For instance, actin-myosin interactions drive the emergence or closure of ring channel structures during wound healing or dorsal closure. These dynamic protein interactions and the resulting protein organization lead to rich time-series data generated by using fluorescence imaging experiments or by simulating realistic stochastic models. We propose methods based on topological data analysis to track topological features through time in cell biology data consisting of point clouds or binary images. The framework proposed here is based on computing the persistent homology of the data at each time point and on connecting topological features through time using established distance metrics between topological summaries. The methods retain aspects of monomer identity when analyzing significant features in filamentous structure data, and capture the overall closure dynamics when assessing the organization of multiple ring structures through time. Using applications of these techniques to experimental data, we show that the proposed methods can describe features of the emergent dynamics and quantitatively distinguish between control and perturbation experiments.
Collapse
Affiliation(s)
- Madeleine Dawson
- Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC 27708, USA
| | - Carson Dudley
- Department of Mathematics, Duke University, Durham, NC 27708, USA
| | - Sasamon Omoma
- Department of Mathematics, Duke University, Durham, NC 27708, USA
| | - Hwai-Ray Tung
- Department of Mathematics, Duke University, Durham, NC 27708, USA
| | - Maria-Veronica Ciocanel
- Department of Mathematics, Duke University, Durham, NC 27708, USA
- Department of Biology, Duke University, Durham, NC 27708, USA
| |
Collapse
|
4
|
Bobrowski O. Homological connectivity in random Čech complexes. Probab Theory Relat Fields 2022. [DOI: 10.1007/s00440-022-01149-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
5
|
Adams H, Moy M. Topology Applied to Machine Learning: From Global to Local. Front Artif Intell 2021; 4:668302. [PMID: 34056580 PMCID: PMC8160457 DOI: 10.3389/frai.2021.668302] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 04/15/2021] [Indexed: 11/24/2022] Open
Abstract
Through the use of examples, we explain one way in which applied topology has evolved since the birth of persistent homology in the early 2000s. The first applications of topology to data emphasized the global shape of a dataset, such as the three-circle model for 3 × 3 pixel patches from natural images, or the configuration space of the cyclo-octane molecule, which is a sphere with a Klein bottle attached via two circles of singularity. In these studies of global shape, short persistent homology bars are disregarded as sampling noise. More recently, however, persistent homology has been used to address questions about the local geometry of data. For instance, how can local geometry be vectorized for use in machine learning problems? Persistent homology and its vectorization methods, including persistence landscapes and persistence images, provide popular techniques for incorporating both local geometry and global topology into machine learning. Our meta-hypothesis is that the short bars are as important as the long bars for many machine learning tasks. In defense of this claim, we survey applications of persistent homology to shape recognition, agent-based modeling, materials science, archaeology, and biology. Additionally, we survey work connecting persistent homology to geometric features of spaces, including curvature and fractal dimension, and various methods that have been used to incorporate persistent homology into machine learning.
Collapse
Affiliation(s)
- Henry Adams
- Department of Mathematics, Colorado State University, Fort Collins, CO, United States
| | - Michael Moy
- Department of Mathematics, Colorado State University, Fort Collins, CO, United States
| |
Collapse
|
6
|
Ciocanel MV, Juenemann R, Dawes AT, McKinley SA. Topological Data Analysis Approaches to Uncovering the Timing of Ring Structure Onset in Filamentous Networks. Bull Math Biol 2021; 83:21. [PMID: 33452960 PMCID: PMC7811524 DOI: 10.1007/s11538-020-00847-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 12/11/2020] [Indexed: 11/30/2022]
Abstract
In developmental biology as well as in other biological systems, emerging structure and organization can be captured using time-series data of protein locations. In analyzing this time-dependent data, it is a common challenge not only to determine whether topological features emerge, but also to identify the timing of their formation. For instance, in most cells, actin filaments interact with myosin motor proteins and organize into polymer networks and higher-order structures. Ring channels are examples of such structures that maintain constant diameters over time and play key roles in processes such as cell division, development, and wound healing. Given the limitations in studying interactions of actin with myosin in vivo, we generate time-series data of protein polymer interactions in cells using complex agent-based models. Since the data has a filamentous structure, we propose sampling along the actin filaments and analyzing the topological structure of the resulting point cloud at each time. Building on existing tools from persistent homology, we develop a topological data analysis (TDA) method that assesses effective ring generation in this dynamic data. This method connects topological features through time in a path that corresponds to emergence of organization in the data. In this work, we also propose methods for assessing whether the topological features of interest are significant and thus whether they contribute to the formation of an emerging hole (ring channel) in the simulated protein interactions. In particular, we use the MEDYAN simulation platform to show that this technique can distinguish between the actin cytoskeleton organization resulting from distinct motor protein binding parameters.
Collapse
Affiliation(s)
| | - Riley Juenemann
- Department of Mathematics, Tulane University, New Orleans, USA
| | - Adriana T Dawes
- Department of Mathematics and Department of Molecular Genetics, The Ohio State University, Columbus, USA
| | | |
Collapse
|
7
|
|
8
|
Abstract
AbstractThe objective of this study is to examine the asymptotic behavior of Betti numbers of Čech complexes treated as stochastic processes and formed from random points in the d-dimensional Euclidean space
${\mathbb{R}}^d$
. We consider the case where the points of the Čech complex are generated by a Poisson process with intensity nf for a probability density f. We look at the cases where the behavior of the connectivity radius of the Čech complex causes simplices of dimension greater than
$k+1$
to vanish in probability, the so-called sparse regime, as well when the connectivity radius is of the order of
$n^{-1/d}$
, the critical regime. We establish limit theorems in the aforementioned regimes: central limit theorems for the sparse and critical regimes, and a Poisson limit theorem for the sparse regime. When the connectivity radius of the Čech complex is
$o(n^{-1/d})$
, i.e. the sparse regime, we can decompose the limiting processes into a time-changed Brownian motion or a time-changed homogeneous Poisson process respectively. In the critical regime, the limiting process is a centered Gaussian process but has a much more complicated representation, because the Čech complex becomes highly connected with many topological holes of any dimension.
Collapse
|
9
|
Owada T. Limit theorems for Betti numbers of extreme sample clouds with application to persistence barcodes. ANN APPL PROBAB 2018. [DOI: 10.1214/17-aap1375] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
10
|
|
11
|
Modeling and replicating statistical topology and evidence for CMB nonhomogeneity. Proc Natl Acad Sci U S A 2017; 114:11878-11883. [PMID: 29078301 DOI: 10.1073/pnas.1706885114] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Under the banner of "big data," the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is "TDA," or "topological data analysis," one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis-the typical case for big data applications-the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity.
Collapse
|