1
|
Huynh T, Cang Z. Topological and geometric analysis of cell states in single-cell transcriptomic data. Brief Bioinform 2024; 25:bbae176. [PMID: 38632952 PMCID: PMC11024518 DOI: 10.1093/bib/bbae176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Revised: 01/29/2024] [Accepted: 03/24/2024] [Indexed: 04/19/2024] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) enables dissecting cellular heterogeneity in tissues, resulting in numerous biological discoveries. Various computational methods have been devised to delineate cell types by clustering scRNA-seq data, where clusters are often annotated using prior knowledge of marker genes. In addition to identifying pure cell types, several methods have been developed to identify cells undergoing state transitions, which often rely on prior clustering results. The present computational approaches predominantly investigate the local and first-order structures of scRNA-seq data using graph representations, while scRNA-seq data frequently display complex high-dimensional structures. Here, we introduce scGeom, a tool that exploits the multiscale and multidimensional structures in scRNA-seq data by analyzing the geometry and topology through curvature and persistent homology of both cell and gene networks. We demonstrate the utility of these structural features to reflect biological properties and functions in several applications, where we show that curvatures and topological signatures of cell and gene networks can help indicate transition cells and the differentiation potential of cells. We also illustrate that structural characteristics can improve the classification of cell types.
Collapse
Affiliation(s)
- Tram Huynh
- Department of Mathematics and Center for Research in Scientific Computation, North Carolina State University, NC 27695, USA
| | - Zixuan Cang
- Department of Mathematics and Center for Research in Scientific Computation, North Carolina State University, NC 27695, USA
| |
Collapse
|
2
|
Xia K, Liu X, Wee J. Persistent Homology for RNA Data Analysis. Methods Mol Biol 2023; 2627:211-229. [PMID: 36959450 DOI: 10.1007/978-1-0716-2974-1_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Molecular representations are of great importance for machine learning models in RNA data analysis. Essentially, efficient molecular descriptors or fingerprints that characterize the intrinsic structural and interactional information of RNAs can significantly boost the performance of all learning modeling. In this paper, we introduce two persistent models, including persistent homology and persistent spectral, for RNA structure and interaction representations and their applications in RNA data analysis. Different from traditional geometric and graph representations, persistent homology is built on simplicial complex, which is a generalization of graph models to higher-dimensional situations. Hypergraph is a further generalization of simplicial complexes and hypergraph-based embedded persistent homology has been proposed recently. Moreover, persistent spectral models, which combine filtration process with spectral models, including spectral graph, spectral simplicial complex, and spectral hypergraph, are proposed for molecular representation. The persistent attributes for RNAs can be obtained from these two persistent models and further combined with machine learning models for RNA structure, flexibility, dynamics, and function analysis.
Collapse
Affiliation(s)
- Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore.
| | - Xiang Liu
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
- Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China
| | - JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
3
|
Anand DV, Wei RKJ, Xia K. Coarse-Grained Models for Vault Normal Model Analysis. Methods Mol Biol 2023; 2671:307-318. [PMID: 37308652 DOI: 10.1007/978-1-0716-3222-2_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Recent experiments have shown that the molecular complex of vault has large conformational changes at its shoulder and cap regions in solution. From the comparison of two configuration structures, it has been found that the shoulder region can twist and move outward, while the cap region will rotate and push upward correspondingly. To further understand these experimental results, in this paper, we study the vault dynamics for the first time. Since vault has an extremely large-sized structure with around 63,336 Cα atoms, traditional normal mode method with the Cα coarse-grained representation will fall short. We employ a newly invented multiscale virtual particle-based anisotropic network model (MVP-ANM). To reduce the complexity, the 39-folder vault structure is coarse-grained to about 6000 virtual particles, which significantly reduces the computational cost while still maintaining the basic structure information. Among the 14 low frequency eigenmodes from Mode 7 to Mode 20, two eigenmodes, i.e., Mode 9 and Mode 20, are found to be directly associated with the experimental observations. In Mode 9, shoulder region undergoes a significant expansion while the cap part is lifted upward. In Mode 20, a clear rotation of both shoulder and cap regions is well observed. Our results are consistent with the experimental observations. More importantly, these low frequency eigenmodes indicate that the vault waist, shoulder and lower cap regions are the most likely regions for the opening of the vault particle. And the opening mechanism is highly likely to be rotation and expansion at these regions. As far as we know, this is the first work to provide the normal mode analysis for the vault complex.
Collapse
Affiliation(s)
- D Vijay Anand
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| | - Ronald Koh Joon Wei
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| | - Kelin Xia
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
4
|
Affiliation(s)
- Chengyuan Wu
- Data Analytics Consulting Centre, Department of Statistics and Applied Probability, Faculty of Science, National University of Singapore, Singapore
- Institute of High Performance Computing, A*STAR, Singapore
| | - Carol Anne Hargreaves
- Data Analytics Consulting Centre, Department of Statistics and Applied Probability, Faculty of Science, National University of Singapore, Singapore
| |
Collapse
|
5
|
Pun CS, Lee SX, Xia K. Persistent-homology-based machine learning: a survey and a comparative study. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10146-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
6
|
Li L, Thompson C, Henselman-Petrusek G, Giusti C, Ziegelmeier L. Minimal Cycle Representatives in Persistent Homology Using Linear Programming: An Empirical Study With User's Guide. Front Artif Intell 2021; 4:681117. [PMID: 34708196 PMCID: PMC8544243 DOI: 10.3389/frai.2021.681117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 05/14/2021] [Indexed: 12/24/2022] Open
Abstract
Cycle representatives of persistent homology classes can be used to provide descriptions of topological features in data. However, the non-uniqueness of these representatives creates ambiguity and can lead to many different interpretations of the same set of classes. One approach to solving this problem is to optimize the choice of representative against some measure that is meaningful in the context of the data. In this work, we provide a study of the effectiveness and computational cost of severalℓ 1 minimization optimization procedures for constructing homological cycle bases for persistent homology with rational coefficients in dimension one, including uniform-weighted and length-weighted edge-loss algorithms as well as uniform-weighted and area-weighted triangle-loss algorithms. We conduct these optimizations via standard linear programming methods, applying general-purpose solvers to optimize over column bases of simplicial boundary matrices. Our key findings are: 1) optimization is effective in reducing the size of cycle representatives, though the extent of the reduction varies according to the dimension and distribution of the underlying data, 2) the computational cost of optimizing a basis of cycle representatives exceeds the cost of computing such a basis, in most data sets we consider, 3) the choice of linear solvers matters a lot to the computation time of optimizing cycles, 4) the computation time of solving an integer program is not significantly longer than the computation time of solving a linear program for most of the cycle representatives, using the Gurobi linear solver, 5) strikingly, whether requiring integer solutions or not, we almost always obtain a solution with the same cost and almost all solutions found have entries in{ - 1,0,1 } and therefore, are also solutions to a restrictedℓ 0 optimization problem, and 6) we obtain qualitatively different results for generators in Erdős-Rényi random clique complexes than in real-world and synthetic point cloud data.
Collapse
Affiliation(s)
- Lu Li
- Mathematics, Statistics, and Computer Science Department, Macalester College, Saint Paul, MN, United States
| | - Connor Thompson
- Department of Mathematics, Purdue University, West Lafayette, IN, United States
| | | | - Chad Giusti
- Department of Mathematical Sciences, University of Delaware, Newark, DE, United States
| | - Lori Ziegelmeier
- Mathematics, Statistics, and Computer Science Department, Macalester College, Saint Paul, MN, United States
| |
Collapse
|
7
|
Weighted persistent homology for osmolyte molecular aggregation and hydrogen-bonding network analysis. Sci Rep 2020; 10:9685. [PMID: 32546801 PMCID: PMC7297731 DOI: 10.1038/s41598-020-66710-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2019] [Accepted: 05/20/2020] [Indexed: 12/24/2022] Open
Abstract
It has long been observed that trimethylamine N-oxide (TMAO) and urea demonstrate dramatically different properties in a protein folding process. Even with the enormous theoretical and experimental research work on these two osmolytes, various aspects of their underlying mechanisms still remain largely elusive. In this paper, we propose to use the weighted persistent homology to systematically study the osmolytes molecular aggregation and their hydrogen-bonding network from a local topological perspective. We consider two weighted models, i.e., localized persistent homology (LPH) and interactive persistent homology (IPH). Boltzmann persistent entropy (BPE) is proposed to quantitatively characterize the topological features from LPH and IPH, together with persistent Betti number (PBN). More specifically, from the localized persistent homology models, we have found that TMAO and urea have very different local topology. TMAO is found to exhibit a local network structure. With the concentration increase, the circle elements in these networks show a clear increase in their total numbers and a decrease in their relative sizes. In contrast, urea shows two types of local topological patterns, i.e., local clusters around 6 Å and a few global circle elements at around 12 Å. From the interactive persistent homology models, it has been found that our persistent radial distribution function (PRDF) from the global-scale IPH has same physical properties as the traditional radial distribution function. Moreover, PRDFs from the local-scale IPH can also be generated and used to characterize the local interaction information. Other than the clear difference of the first peak value of PRDFs at filtration size 4 Å, TMAO and urea also shows very different behaviors at the second peak region from filtration size 5 Å to 10 Å. These differences are also reflected in the PBNs and BPEs of the local-scale IPH. These localized topological information has never been revealed before. Since graphs can be transferred into simplicial complexes by the clique complex, our weighted persistent homology models can be used in the analysis of various networks and graphs from any molecular structures and aggregation systems.
Collapse
|
8
|
Ben-Elazar S, Chor B, Yakhini Z. The Functional 3D Organization of Unicellular Genomes. Sci Rep 2019; 9:12734. [PMID: 31484964 PMCID: PMC6726614 DOI: 10.1038/s41598-019-48798-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 08/12/2019] [Indexed: 11/09/2022] Open
Abstract
Genome conformation capture techniques permit a systematic investigation into the functional spatial organization of genomes, including functional aspects like assessing the co-localization of sets of genomic elements. For example, the co-localization of genes targeted by a transcription factor (TF) within a transcription factory. We quantify spatial co-localization using a rigorous statistical model that measures the enrichment of a subset of elements in neighbourhoods inferred from Hi-C data. We also control for co-localization that can be attributed to genomic order. We systematically apply our open-sourced framework, spatial-mHG, to search for spatial co-localization phenomena in multiple unicellular Hi-C datasets with corresponding genomic annotations. Our biological findings shed new light on the functional spatial organization of genomes, including: In C. crescentus, DNA replication genes reside in two genomic clusters that are spatially co-localized. Furthermore, these clusters contain similar gene copies and lay in genomic vicinity to the ori and ter sequences. In S. cerevisae, Ty5 retrotransposon family element spatially co-localize at a spatially adjacent subset of telomeres. In N. crassa, both Proteasome lid subcomplex genes and protein refolding genes jointly spatially co-localize at a shared location. An implementation of our algorithms is available online.
Collapse
|
9
|
Steinberg L, Russo J, Frey J. A new topological descriptor for water network structure. J Cheminform 2019; 11:48. [PMID: 31292766 PMCID: PMC6617667 DOI: 10.1186/s13321-019-0369-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 07/02/2019] [Indexed: 11/10/2022] Open
Abstract
Bulk water molecular dynamics simulations based on a series of atomistic water potentials (TIP3P, TIP4P/Ew, SPC/E and OPC) are compared using new techniques from the field of topological data analysis. The topological invariants (the different degrees of homology) derived from each simulation frame are used to create a series of persistence diagrams from the atomic positions. These are averaged over the simulation time using the persistence image formalism, before being normalised by their total magnitude (the L1 norm) to ensure a size independent descriptor (L1NPI). We demonstrate that the L1NPI formalism is suitable for the analysis of systems where the number of molecules varies by at least a factor of 10. Using standard machine learning techniques, a basic linear SVM, it is shown that differences in water models are able to be isolated to different degrees of homology. In particular, whereas first degree homology is able to distinguish between all atomistic potentials studied, OPC is the only potential that differs in its second degree homology. The L1 normalised persistence images are then used in the comparison of a series of Stillinger-Weber potential simulations to the atomistic potentials and the effects of changing the strength of three-body interactions on the structures is easily evident in L1NPI space, with a reduction in variance of structures as interaction strength increases being the most obvious result. Furthermore, there is a clear tracking in L1NPI space of the λ parameter. The L1NPI formalism presents a useful new technique for the analysis of water and other materials. It is approximately size-independent, and has been shown to contain information as to real structures in the system. We finally present a perspective on the use of L1NPIs and other persistent homology techniques as a descriptor for water solubility.
Collapse
Affiliation(s)
- Lee Steinberg
- School of Chemistry, University of Southampton, Southampton, SO17 1BJ UK
| | - John Russo
- School of Mathematics, University of Bristol, Bristol, UK
| | - Jeremy Frey
- School of Chemistry, University of Southampton, Southampton, SO17 1BJ UK
| |
Collapse
|
10
|
Xia K, Anand DV, Shikhar S, Mu Y. Persistent homology analysis of osmolyte molecular aggregation and their hydrogen-bonding networks. Phys Chem Chem Phys 2019; 21:21038-21048. [DOI: 10.1039/c9cp03009c] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Dramatically different patterns can be observed in the topological fingerprints for hydrogen-bonding networks from two types of osmolyte systems.
Collapse
Affiliation(s)
- Kelin Xia
- Division of Mathematical Sciences
- School of Physical and Mathematical Sciences
- School of Biological Sciences
- Nanyang Technological University
- Singapore
| | - D. Vijay Anand
- Division of Mathematical Sciences
- School of Physical and Mathematical Sciences
- School of Biological Sciences
- Nanyang Technological University
- Singapore
| | - Saxena Shikhar
- School of Biological Sciences
- Nanyang Technological University
- Singapore
| | - Yuguang Mu
- School of Biological Sciences
- Nanyang Technological University
- Singapore
| |
Collapse
|
11
|
Anand DV, Meng Z, Xia K. A complex multiscale virtual particle model based elastic network model (CMVP-ENM) for the normal mode analysis of biomolecular complexes. Phys Chem Chem Phys 2019; 21:4359-4366. [DOI: 10.1039/c8cp07442a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The CMVP-ENM for virus normal mode analysis. With a special ratio parameter, CMVP-ENM can characterize the multi-material properties of biomolecular complexes and systematically enhance or suppress the modes for different components.
Collapse
Affiliation(s)
- D. Vijay Anand
- Division of Mathematical Sciences
- School of Physical and Mathematical Sciences
- Nanyang Technological University
- Singapore
| | - Zhenyu Meng
- School of Biological Sciences
- Nanyang Technological University
- Singapore
| | - Kelin Xia
- Division of Mathematical Sciences
- School of Physical and Mathematical Sciences
- Nanyang Technological University
- Singapore
- School of Biological Sciences
| |
Collapse
|
12
|
Xia K. Persistent homology analysis of ion aggregations and hydrogen-bonding networks. Phys Chem Chem Phys 2018; 20:13448-13460. [PMID: 29722784 DOI: 10.1039/c8cp01552j] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Despite the great advancement of experimental tools and theoretical models, a quantitative characterization of the microscopic structures of ion aggregates and their associated water hydrogen-bonding networks still remains a challenging problem. In this paper, a newly-invented mathematical method called persistent homology is introduced, for the first time, to quantitatively analyze the intrinsic topological properties of ion aggregation systems and hydrogen-bonding networks. The two most distinguishable properties of persistent homology analysis of assembly systems are as follows. First, it does not require a predefined bond length to construct the ion or hydrogen-bonding network. Persistent homology results are determined by the morphological structure of the data only. Second, it can directly measure the size of circles or holes in ion aggregates and hydrogen-bonding networks. To validate our model, we consider two well-studied systems, i.e., NaCl and KSCN solutions, generated from molecular dynamics simulations. They are believed to represent two morphological types of aggregation, i.e., local clusters and extended ion networks. It has been found that the two aggregation types have distinguishable topological features and can be characterized by our topological model very well. Further, we construct two types of networks, i.e., O-networks and H2O-networks, for analyzing the topological properties of hydrogen-bonding networks. It is found that for both models, KSCN systems demonstrate much more dramatic variations in their local circle structures with a concentration increase. A consistent increase of large-sized local circle structures is observed and the sizes of these circles become more and more diverse. In contrast, NaCl systems show no obvious increase of large-sized circles. Instead a consistent decline of the average size of the circle structures is observed and the sizes of these circles become more and more uniform with a concentration increase. As far as we know, these unique intrinsic topological features in ion aggregation systems have never been pointed out before. More importantly, our models can be directly used to quantitatively analyze the intrinsic topological invariants, including circles, loops, holes, and cavities, of any network-like structures, such as nanomaterials, colloidal systems, biomolecular assemblies, among others. These topological invariants cannot be described by traditional graph and network models.
Collapse
Affiliation(s)
- Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, School of Biological Sciences, Nanyang Technological University, 637371, Singapore.
| |
Collapse
|
13
|
Xia K. Multiscale virtual particle based elastic network model (MVP-ENM) for normal mode analysis of large-sized biomolecules. Phys Chem Chem Phys 2018; 20:658-669. [PMID: 29227479 DOI: 10.1039/c7cp07177a] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
In this paper, a multiscale virtual particle based elastic network model (MVP-ENM) is proposed for the normal mode analysis of large-sized biomolecules. The multiscale virtual particle (MVP) model is proposed for the discretization of biomolecular density data. With this model, large-sized biomolecular structures can be coarse-grained into virtual particles such that a balance between model accuracy and computational cost can be achieved. An elastic network is constructed by assuming "connections" between virtual particles. The connection is described by a special harmonic potential function, which considers the influence from both the mass distributions and distance relations of the virtual particles. Two independent models, i.e., the multiscale virtual particle based Gaussian network model (MVP-GNM) and the multiscale virtual particle based anisotropic network model (MVP-ANM), are proposed. It has been found that in the Debye-Waller factor (B-factor) prediction, the results from our MVP-GNM with a high resolution are as good as the ones from GNM. Even with low resolutions, our MVP-GNM can still capture the global behavior of the B-factor very well with mismatches predominantly from the regions with large B-factor values. Further, it has been demonstrated that the low-frequency eigenmodes from our MVP-ANM are highly consistent with the ones from ANM even with very low resolutions and a coarse grid. Finally, the great advantage of MVP-ANM model for large-sized biomolecules has been demonstrated by using two poliovirus virus structures. The paper ends with a conclusion.
Collapse
Affiliation(s)
- Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371.
| |
Collapse
|