1
|
Arango AS, Park H, Tajkhorshid E. Topological Learning Approach to Characterizing Biological Membranes. J Chem Inf Model 2024; 64:5242-5252. [PMID: 38912752 DOI: 10.1021/acs.jcim.4c00552] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/25/2024]
Abstract
Biological membranes play key roles in cellular compartmentalization, structure, and its signaling pathways. At varying temperatures, individual membrane lipids sample from different configurations, a process that frequently leads to higher-order phase behavior and phenomena. Here, we present a persistent homology (PH)-based method for quantifying the structural features of individual and bulk lipids, providing local and contextual information on lipid tail organization. Our method leverages the mathematical machinery of algebraic topology and machine learning to infer temperature-dependent structural information on lipids from static coordinates. To train our model, we generated multiple molecular dynamics trajectories of dipalmitoyl-phosphatidylcholine membranes at varying temperatures. A fingerprint was then constructed for each set of lipid coordinates by PH filtration, in which interaction spheres were grown around the lipid atoms while tracking their intersections. The sphere filtration formed a simplicial complex that captures enduring key topological features of the configuration landscape using homology, yielding persistence data. Following fingerprint extraction for physiologically relevant temperatures, the persistence data were used to train an attention-based neural network for assignment of effective temperature values to selected membrane regions. Our persistence homology-based method captures the local structural effects, via effective temperature, of lipids adjacent to other membrane constituents, e.g., sterols and proteins. This topological learning approach can predict lipid effective temperatures from static coordinates across multiple spatial resolutions. The tool, called MembTDA, can be accessed at https://github.com/hyunp2/Memb-TDA.
Collapse
Affiliation(s)
- Andres S Arango
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Hyun Park
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| | - Emad Tajkhorshid
- Theoretical and Computational Biophysics Group, NIH Resource Center for Macromolecular Modeling and Visualization, Beckman Institute for Advanced Science and Technology, Department of Biochemistry, and Center for Biophysics and Quantitative Biology, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, United States
| |
Collapse
|
2
|
Sun X, Yang S, Wu Z, Su J, Hu F, Chang F, Li C. PMSPcnn: Predicting protein stability changes upon single point mutations with convolutional neural network. Structure 2024; 32:838-848.e3. [PMID: 38508191 DOI: 10.1016/j.str.2024.02.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/19/2023] [Accepted: 02/22/2024] [Indexed: 03/22/2024]
Abstract
Protein missense mutations and resulting protein stability changes are important causes for many human genetic diseases. However, the accurate prediction of stability changes due to mutations remains a challenging problem. To address this problem, we have developed an unbiased effective model: PMSPcnn that is based on a convolutional neural network. We have included an anti-symmetry property to build a balanced training dataset, which improves the prediction, in particular for stabilizing mutations. Persistent homology, which is an effective approach for characterizing protein structures, is used to obtain topological features. Additionally, a regression stratification cross-validation scheme has been proposed to improve the prediction for mutations with extreme ΔΔG. For three test datasets: Ssym, p53, and myoglobin, PMSPcnn achieves a better performance than currently existing predictors. PMSPcnn also outperforms currently available methods for membrane proteins. Overall, PMSPcnn is a promising method for the prediction of protein stability changes caused by single point mutations.
Collapse
Affiliation(s)
- Xiaohan Sun
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Shuang Yang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Zhixiang Wu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Jingjie Su
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fangrui Hu
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Fubin Chang
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China
| | - Chunhua Li
- College of Chemistry and Life Science, Beijing University of Technology, Beijing 100124, China.
| |
Collapse
|
3
|
Zhang B, Lin H. Functional loops: Monitoring functional organization of deep neural networks using algebraic topology. Neural Netw 2024; 174:106239. [PMID: 38508049 DOI: 10.1016/j.neunet.2024.106239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 03/06/2024] [Accepted: 03/13/2024] [Indexed: 03/22/2024]
Abstract
Various topological methods have emerged in recent years to investigate the inner workings of deep neural networks (DNNs) based on the structural and weight information. However, their effectiveness is restricted due to the stratified structure and volatile weight information. In this study, we explore the relationship between functional organizations and network performance using algebraic topology. Our results indicate that functional loops reveal functional interaction patterns of multiple neurons in DNNs. We also propose functional persistence as a measure of functional complexity and develop an early stopping criterion that achieves competitive results without requiring a validation set.
Collapse
Affiliation(s)
- Ben Zhang
- School of Mathematical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang Provence, China; State Key Lab. of CAD & CG, Zhejiang University, Hangzhou, 310058, Zhejiang Provence, China
| | - Hongwei Lin
- School of Mathematical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang Provence, China; State Key Lab. of CAD & CG, Zhejiang University, Hangzhou, 310058, Zhejiang Provence, China.
| |
Collapse
|
4
|
Uesugi F, Wen Y, Hashimoto A, Ishii M. Prediction of nanocomposite properties and process optimization using persistent homology and machine learning. Micron 2024; 183:103664. [PMID: 38820861 DOI: 10.1016/j.micron.2024.103664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2024] [Revised: 05/18/2024] [Accepted: 05/22/2024] [Indexed: 06/02/2024]
Abstract
Physical property prediction and synthesis process optimization are key targets in material informatics. In this study, we propose a machine learning approach that utilizes ridge regression to predict the oxygen permeability at fuel cell electrode surfaces and determine the optimal process temperature. These predictions are based on a persistence diagram derived from tomographic images captured using transmission electron microscopy (TEM). Through machine learning analysis of the complex structures present in the Pt/CeO2 nanocomposites, we discovered that l2 regularization considering diverse structural elements is more appropriate than l1 regularization (sparse modeling). Notably, our model successfully captured the activation energy of oxygen permeability, a phenomenon that could not be solely explained by the geometric feature of the Betti numbers, as demonstrated in a previous study. The correspondence between the ridge regression coefficient and persistence diagram revealed the formation process of the local and three-dimensional structures of CeO2 and their contributions to pre-exponential factor and activation energies. This analysis facilitated the determination of the annealing temperature required to achieve the optimal structure and accurately predict the physical properties.
Collapse
Affiliation(s)
- Fumihiko Uesugi
- National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan.
| | - Yu Wen
- National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan; University of Tsukuba, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Ayako Hashimoto
- National Institute for Materials Science, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan; University of Tsukuba, 1-2-1 Sengen, Tsukuba, Ibaraki 305-0047, Japan
| | - Masashi Ishii
- National Institute for Materials Science, 1-1 Namiki, Tsukuba, Ibaraki 305-0044, Japan
| |
Collapse
|
5
|
Siva NK, Singh Y, Hathaway QA, Sengupta PP, Yanamala N. A novel multi-task machine learning classifier for rare disease patterning using cardiac strain imaging data. Sci Rep 2024; 14:10672. [PMID: 38724564 PMCID: PMC11082231 DOI: 10.1038/s41598-024-61201-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Accepted: 05/02/2024] [Indexed: 05/12/2024] Open
Abstract
To provide accurate predictions, current machine learning-based solutions require large, manually labeled training datasets. We implement persistent homology (PH), a topological tool for studying the pattern of data, to analyze echocardiography-based strain data and differentiate between rare diseases like constrictive pericarditis (CP) and restrictive cardiomyopathy (RCM). Patient population (retrospectively registered) included those presenting with heart failure due to CP (n = 51), RCM (n = 47), and patients without heart failure symptoms (n = 53). Longitudinal, radial, and circumferential strains/strain rates for left ventricular segments were processed into topological feature vectors using Machine learning PH workflow. In differentiating CP and RCM, the PH workflow model had a ROC AUC of 0.94 (Sensitivity = 92%, Specificity = 81%), compared with the GLS model AUC of 0.69 (Sensitivity = 65%, Specificity = 66%). In differentiating between all three conditions, the PH workflow model had an AUC of 0.83 (Sensitivity = 68%, Specificity = 84%), compared with the GLS model AUC of 0.68 (Sensitivity = 52% and Specificity = 76%). By employing persistent homology to differentiate the "pattern" of cardiac deformations, our machine-learning approach provides reasonable accuracy when evaluating small datasets and aids in understanding and visualizing patterns of cardiac imaging data in clinically challenging disease states.
Collapse
Affiliation(s)
- Nanda K Siva
- School of Medicine, West Virginia University, Morgantown, WV, USA
- Division of Cardiology, Heart and Vascular Institute, West Virginia University, Morgantown, WV, USA
| | - Yashbir Singh
- Division of Cardiology, Heart and Vascular Institute, West Virginia University, Morgantown, WV, USA
- Department of Radiology, Mayo Clinic, Rochester, MN, USA
| | - Quincy A Hathaway
- School of Medicine, West Virginia University, Morgantown, WV, USA
- Division of Cardiology, Heart and Vascular Institute, West Virginia University, Morgantown, WV, USA
| | - Partho P Sengupta
- Division of Cardiovascular Disease and Hypertension, Rutgers Robert Wood Johnson Medical School, 125 Patterson St, New Brunswick, NJ, 08901, USA.
| | - Naveena Yanamala
- Division of Cardiovascular Disease and Hypertension, Rutgers Robert Wood Johnson Medical School, 125 Patterson St, New Brunswick, NJ, 08901, USA.
- Institute for Software Research, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
6
|
Bou Dagher L, Madern D, Malbos P, Brochier-Armanet C. Persistent homology reveals strong phylogenetic signal in 3D protein structures. PNAS NEXUS 2024; 3:pgae158. [PMID: 38689707 PMCID: PMC11058471 DOI: 10.1093/pnasnexus/pgae158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 04/01/2024] [Indexed: 05/02/2024]
Abstract
Changes that occur in proteins over time provide a phylogenetic signal that can be used to decipher their evolutionary history and the relationships between organisms. Sequence comparison is the most common way to access this phylogenetic signal, while those based on 3D structure comparisons are still in their infancy. In this study, we propose an effective approach based on Persistent Homology Theory (PH) to extract the phylogenetic information contained in protein structures. PH provides efficient and robust algorithms for extracting and comparing geometric features from noisy datasets at different spatial resolutions. PH has a growing number of applications in the life sciences, including the study of proteins (e.g. classification, folding). However, it has never been used to study the phylogenetic signal they may contain. Here, using 518 protein families, representing 22,940 protein sequences and structures, from 10 major taxonomic groups, we show that distances calculated with PH from protein structures correlate strongly with phylogenetic distances calculated from protein sequences, at both small and large evolutionary scales. We test several methods for calculating PH distances and propose some refinements to improve their relevance for addressing evolutionary questions. This work opens up new perspectives in evolutionary biology by proposing an efficient way to access the phylogenetic signal contained in protein structures, as well as future developments of topological analysis in the life sciences.
Collapse
Affiliation(s)
- Léa Bou Dagher
- Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France
- Université Libanaise, Laboratoire de Mathématiques, École Doctorale en Science et Technologie, PO BOX 5 Hadath, Liban
| | - Dominique Madern
- University Grenoble Alpes, CEA, CNRS, IBS, 38000 Grenoble, France
| | - Philippe Malbos
- Université Claude Bernard Lyon 1, CNRS, Institut Camille Jordan, UMR5208, F-69622 Villeurbanne, France
| | - Céline Brochier-Armanet
- Université Claude Bernard Lyon 1, CNRS, VetAgro Sup, Laboratoire de Biométrie et BiologieÉvolutive, UMR5558, F-69622 Villeurbanne, France
| |
Collapse
|
7
|
Chen J, Xu Y, Yang X, Cang Z, Geng W, Wei GW. Poisson-Boltzmann-based machine learning model for electrostatic analysis. Biophys J 2024:S0006-3495(24)00107-3. [PMID: 38356263 DOI: 10.1016/j.bpj.2024.02.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 01/26/2024] [Accepted: 02/09/2024] [Indexed: 02/16/2024] Open
Abstract
Electrostatics is of paramount importance to chemistry, physics, biology, and medicine. The Poisson-Boltzmann (PB) theory is a primary model for electrostatic analysis. However, it is highly challenging to compute accurate PB electrostatic solvation free energies for macromolecules due to the nonlinearity, dielectric jumps, charge singularity, and geometric complexity associated with the PB equation. The present work introduces a PB-based machine learning (PBML) model for biomolecular electrostatic analysis. Trained with the second-order accurate MIBPB solver, the proposed PBML model is found to be more accurate and faster than several eminent PB solvers in electrostatic analysis. The proposed PBML model can provide highly accurate PB electrostatic solvation free energy of new biomolecules or new conformations generated by molecular dynamics with much reduced computational cost.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematics, University of Arkansas, Fayetteville, Arkansas
| | | | - Xin Yang
- Department of Mathematics, Southern Methodist University, Dallas, Texas
| | - Zixuan Cang
- Department of Mathematics, North Carolina State University, Raleigh, North Carolina
| | - Weihua Geng
- Department of Mathematics, Southern Methodist University, Dallas, Texas.
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan.
| |
Collapse
|
8
|
Wee J, Chen J, Xia K, Wei GW. Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation. Comput Biol Med 2024; 169:107918. [PMID: 38194782 PMCID: PMC10922365 DOI: 10.1016/j.compbiomed.2024.107918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 12/21/2023] [Accepted: 01/01/2024] [Indexed: 01/11/2024]
Abstract
Protein mutations can significantly influence protein solubility, which results in altered protein functions and leads to various diseases. Despite tremendous effort, machine learning prediction of protein solubility changes upon mutation remains a challenging task as indicated by the poor scores of normalized Correct Prediction Ratio (CPR). Part of the challenge stems from the fact that there is no three-dimensional (3D) structures for the wild-type and mutant proteins. This work integrates persistent Laplacians and pre-trained Transformer for the task. The Transformer, pretrained with hundreds of millions of protein sequences, embeds wild-type and mutant sequences, while persistent Laplacians track the topological invariant change and homotopic shape evolution induced by mutations in 3D protein structures, which are rendered from AlphaFold2. The resulting machine learning model was trained on an extensive data set labeled with three solubility types. Our model outperforms all existing predictive methods and improves the state-of-the-art up to 15%.
Collapse
Affiliation(s)
- JunJie Wee
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore.
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
9
|
Wee J, Chen J, Xia K, Wei GW. Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation. ARXIV 2023:arXiv:2310.18760v2. [PMID: 37961732 PMCID: PMC10635294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Protein mutations can significantly influence protein solubility, which results in altered protein functions and leads to various diseases. Despite of tremendous effort, machine learning prediction of protein solubility changes upon mutation remains a challenging task as indicated by the poor scores of normalized Correct Prediction Ratio (CPR). Part of the challenge stems from the fact that there is no three-dimensional (3D) structures for the wild-type and mutant proteins. This work integrates persistent Laplacians and pre-trained Transformer for the task. The Transformer, pretrained with hunderds of millions of protein sequences, embeds wild-type and mutant sequences, while persistent Laplacians track the topological invariant change and homotopic shape evolution induced by mutations in 3D protein structures, which are rendered from AlphaFold2. The resulting machine learning model was trained on an extensive data set labeled with three solubility types. Our model outperforms all existing predictive methods and improves the state-of-the-art up to 15%.
Collapse
Affiliation(s)
- JunJie Wee
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
10
|
Tarín-Pelló A, Suay-García B, Forés-Martos J, Falcó A, Pérez-Gracia MT. Computer-aided drug repurposing to tackle antibiotic resistance based on topological data analysis. Comput Biol Med 2023; 166:107496. [PMID: 37793206 DOI: 10.1016/j.compbiomed.2023.107496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Revised: 08/29/2023] [Accepted: 09/15/2023] [Indexed: 10/06/2023]
Abstract
The progressive emergence of antimicrobial resistance has become a global health problem in need of rapid solution. Research into new antimicrobial drugs is imperative. Drug repositioning, together with computational mathematical prediction models, could be a fast and efficient method of searching for new antibiotics. The aim of this study was to identify compounds with potential antimicrobial capacity against Escherichia coli from US Food and Drug Administration-approved drugs, and the similarity between known drug targets and E. coli proteins using a topological structure-activity data analysis model. This model has been shown to identify molecules with known antibiotic capacity, such as carbapenems and cephalosporins, as well as new molecules that could act as antimicrobials. Topological similarities were also found between E. coli proteins and proteins from different bacterial species such as Mycobacterium tuberculosis, Pseudomonas aeruginosa and Salmonella Typhimurium, which could imply that the selected molecules have a broader spectrum than expected. These molecules include antitumor drugs, antihistamines, lipid-lowering agents, hypoglycemic agents, antidepressants, nucleotides, and nucleosides, among others. The results presented in this study prove the ability of computational mathematical prediction models to predict molecules with potential antimicrobial capacity and/or possible new pharmacological targets of interest in the design of new antibiotics and in the better understanding of antimicrobial resistance.
Collapse
Affiliation(s)
- Antonio Tarín-Pelló
- Área de Microbiología, Departamento de Farmacia, Instituto de Ciencias Biomédicas, Facultad de Ciencias de la Salud Universidad Cardenal Herrera-CEU, CEU Universities, C/ Santiago Ramón y Cajal, 46115, Alfara del Patriarca, Valencia, Spain
| | - Beatriz Suay-García
- ESI International Chair@CEU-UCH, Departamento de Matemáticas, Física y Ciencias Tecnológicas, Universidad Cardenal Herrera-CEU, CEU Universities, C/ San Bartolomé 55, 46115, Alfara del Patriarca, Valencia, Spain
| | - Jaume Forés-Martos
- ESI International Chair@CEU-UCH, Departamento de Matemáticas, Física y Ciencias Tecnológicas, Universidad Cardenal Herrera-CEU, CEU Universities, C/ San Bartolomé 55, 46115, Alfara del Patriarca, Valencia, Spain
| | - Antonio Falcó
- ESI International Chair@CEU-UCH, Departamento de Matemáticas, Física y Ciencias Tecnológicas, Universidad Cardenal Herrera-CEU, CEU Universities, C/ San Bartolomé 55, 46115, Alfara del Patriarca, Valencia, Spain
| | - María-Teresa Pérez-Gracia
- Área de Microbiología, Departamento de Farmacia, Instituto de Ciencias Biomédicas, Facultad de Ciencias de la Salud Universidad Cardenal Herrera-CEU, CEU Universities, C/ Santiago Ramón y Cajal, 46115, Alfara del Patriarca, Valencia, Spain.
| |
Collapse
|
11
|
Wei X, Chen J, Wei GW. Persistent topological Laplacian analysis of SARS-CoV-2 variants. JOURNAL OF COMPUTATIONAL BIOPHYSICS AND CHEMISTRY 2023; 22:569-587. [PMID: 37829318 PMCID: PMC10569362 DOI: 10.1142/s2737416523500278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/14/2023]
Abstract
Topological data analysis (TDA) is an emerging field in mathematics and data science. Its central technique, persistent homology, has had tremendous success in many science and engineering disciplines. However, persistent homology has limitations, including its inability to handle heterogeneous information, such as multiple types of geometric objects; being qualitative rather than quantitative, e.g., counting a 5-member ring the same as a 6-member ring, and a failure to describe non-topological changes, such as homotopic changes in protein-protein binding. Persistent topological Laplacians (PTLs), such as persistent Laplacian and persistent sheaf Laplacian, were proposed to overcome the limitations of persistent homology. In this work, we examine the modeling and analysis power of PTLs in the study of the protein structures of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike receptor binding domain (RBD). First, we employ PTLs to study how the RBD mutation-induced structural changes of RBD-angiotensin-converting enzyme 2 (ACE2) binding complexes are captured in the changes of spectra of the PTLs among SARS-CoV-2 variants. Additionally, we use PTLs to analyze the binding of RBD and ACE2-induced structural changes of various SARS-CoV-2 variants. Finally, we explore the impacts of computationally generated RBD structures on a topological deep learning paradigm and predictions of deep mutational scanning datasets for the SARS-CoV-2 Omicron BA.2 variant. Our results indicate that PTLs have advantages over persistent homology in analyzing protein structural changes and provide a powerful new TDA tool for data science.
Collapse
Affiliation(s)
- Xiaoqi Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
12
|
Gauthier S, Tran-Dinh A, Morilla I. Plasma proteome dynamics of COVID-19 severity learnt by a graph convolutional network of multi-scale topology. Life Sci Alliance 2023; 6:e202201624. [PMID: 36806094 PMCID: PMC9941303 DOI: 10.26508/lsa.202201624] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 02/06/2023] [Accepted: 02/06/2023] [Indexed: 02/22/2023] Open
Abstract
Efforts to understand the molecular mechanisms of COVID-19 have led to the identification of ACE2 as the main receptor for the SARS-CoV-2 spike protein on cell surfaces. However, there are still important questions about the role of other proteins in disease progression. To address these questions, we modelled the plasma proteome of 384 COVID-19 patients using protein level measurements taken at three different times and incorporating comprehensive clinical evaluation data collected 28 d after hospitalisation. Our analysis can accurately assess the severity of the illness using a metric based on WHO scores. By using topological vectorisation, we identified proteins that vary most in expression based on disease severity, and then utilised these findings to construct a graph convolutional network. This dynamic model allows us to learn the molecular interactions between these proteins, providing a tool to determine the severity of a COVID-19 infection at an early stage and identify potential pharmacological treatments by studying the dynamic interactions between the most relevant proteins.
Collapse
Affiliation(s)
- Samy Gauthier
- Université Sorbonne Paris Nord, LAGA, CNRS, UMR 7539, Laboratoire d'excellence Inflamex, Villetaneuse, France
| | - Alexy Tran-Dinh
- Département d'anesthésie-Réanimation, INSERM, Université de Paris, AP-HP, Hôpital Bichat Claude Bernard, Paris, France
- Université de Paris, LVTS, Inserm U1148, Paris, France
| | - Ian Morilla
- Université Sorbonne Paris Nord, LAGA, CNRS, UMR 7539, Laboratoire d'excellence Inflamex, Villetaneuse, France
- Department of Genetics, University of Malaga, MLiMO, Málaga, Spain
| |
Collapse
|
13
|
Anand DV, Chung MK. Hodge Laplacian of Brain Networks. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:1563-1573. [PMID: 37018280 PMCID: PMC10909176 DOI: 10.1109/tmi.2022.3233876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The closed loops or cycles in a brain network embeds higher order signal transmission paths, which provide fundamental insights into the functioning of the brain. In this work, we propose an efficient algorithm for systematic identification and modeling of cycles using persistent homology and the Hodge Laplacian. Various statistical inference procedures on cycles are developed. We validate the our methods on simulations and apply to brain networks obtained through the resting state functional magnetic resonance imaging. The computer codes for the Hodge Laplacian are given in https://github.com/laplcebeltrami/hodge.
Collapse
|
14
|
Wei X, Chen J, Guo-Wei W. Persistent topological Laplacian analysis of SARS-CoV-2 variants. ARXIV 2023:arXiv:2301.10865v2. [PMID: 36748007 PMCID: PMC9900960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Topological data analysis (TDA) is an emerging field in mathematics and data science. Its central technique, persistent homology, has had tremendous success in many science and engineering disciplines. However, persistent homology has limitations, including its inability to handle heterogeneous information, such as multiple types of geometric objects; being qualitative rather than quantitative, e.g., counting a 5-member ring the same as a 6-member ring, and a failure to describe non-topological changes, such as homotopic changes in protein-protein binding. Persistent topological Laplacians (PTLs), such as persistent Laplacian and persistent sheaf Laplacian, were proposed to overcome the limitations of persistent homology. In this work, we examine the modeling and analysis power of PTLs in the study of the protein structures of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike receptor binding domain (RBD). First, we employ PTLs to study how the RBD mutation-induced structural changes of RBD-angiotensin-converting enzyme 2 (ACE2) binding complexes are captured in the changes of spectra of the PTLs among SARS-CoV-2 variants. Additionally, we use PTLs to analyze the binding of RBD and ACE2-induced structural changes of various SARS-CoV-2 variants. Finally, we explore the impacts of computationally generated RBD structures on a topological deep learning paradigm and predictions of deep mutational scanning datasets for the SARS-CoV-2 Omicron BA.2 variant. Our results indicate that PTLs have advantages over persistent homology in analyzing protein structural changes and provide a powerful new TDA tool for data science.
Collapse
Affiliation(s)
- Xiaoqi Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Wei Guo-Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
15
|
Benjamin K, Mukta L, Moryoussef G, Uren C, Harrington HA, Tillmann U, Barbensi A. Homology of homologous knotted proteins. J R Soc Interface 2023; 20:20220727. [PMID: 37122282 PMCID: PMC10130707 DOI: 10.1098/rsif.2022.0727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023] Open
Abstract
Quantification and classification of protein structures, such as knotted proteins, often requires noise-free and complete data. Here, we develop a mathematical pipeline that systematically analyses protein structures. We showcase this geometric framework on proteins forming open-ended trefoil knots, and we demonstrate that the mathematical tool, persistent homology, faithfully represents their structural homology. This topological pipeline identifies important geometric features of protein entanglement and clusters the space of trefoil proteins according to their depth. Persistence landscapes quantify the topological difference between a family of knotted and unknotted proteins in the same structural homology class. This difference is localized and interpreted geometrically with recent advancements in systematic computation of homology generators. The topological and geometric quantification we find is robust to noisy input data, which demonstrates the potential of this approach in contexts where standard knot theoretic tools fail.
Collapse
Affiliation(s)
| | - Lamisah Mukta
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
| | | | - Christopher Uren
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
| | - Heather A Harrington
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
- Wellcome Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - Ulrike Tillmann
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
- Isaac Newton Institute for Mathematical Sciences, University of Cambridge, Cambridge CB3 0EH, UK
| | - Agnese Barbensi
- Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK
- School of Mathematics and Statistics, University of Melbourne, Melbourne, Victoria 3010, Australia
| |
Collapse
|
16
|
Abstract
Path homology proposed by S.-T.Yau and his co-workers provides a new mathematical model for directed graphs and networks. Persistent path homology (PPH) extends the path homology with filtration to deal with asymmetry structures. However, PPH is constrained to purely topological persistence and cannot track the homotopic shape evolution of data during filtration. To overcome the limitation of PPH, persistent path Laplacian (PPL) is introduced to capture the shape evolution of data. PPL's harmonic spectra fully recover PPH's topological persistence and its non-harmonic spectra reveal the homotopic shape evolution of data during filtration.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
17
|
Songdechakraiwut T, Chung MK. TOPOLOGICAL LEARNING FOR BRAIN NETWORKS. Ann Appl Stat 2023; 17:403-433. [PMID: 36911168 PMCID: PMC9997114 DOI: 10.1214/22-aoas1633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
This paper proposes a novel topological learning framework that integrates networks of different sizes and topology through persistent homology. Such challenging task is made possible through the introduction of a computationally efficient topological loss. The use of the proposed loss bypasses the intrinsic computational bottleneck associated with matching networks. We validate the method in extensive statistical simulations to assess its effectiveness when discriminating networks with different topology. The method is further demonstrated in a twin brain imaging study where we determine if brain networks are genetically heritable. The challenge here is due to the difficulty of overlaying the topologically different functional brain networks obtained from resting-state functional MRI onto the template structural brain network obtained through diffusion MRI.
Collapse
Affiliation(s)
| | - Moo K Chung
- Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison
| |
Collapse
|
18
|
Feng H, Wei GW. Virtual screening of DrugBank database for hERG blockers using topological Laplacian-assisted AI models. Comput Biol Med 2023; 153:106491. [PMID: 36599209 PMCID: PMC10120853 DOI: 10.1016/j.compbiomed.2022.106491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2022] [Revised: 11/29/2022] [Accepted: 12/27/2022] [Indexed: 12/29/2022]
Abstract
The human ether-a-go-go (hERG) potassium channel (Kv11.1) plays a critical role in mediating cardiac action potential. The blockade of this ion channel can potentially lead fatal disorder and/or long QT syndrome. Many drugs have been withdrawn because of their serious hERG-cardiotoxicity. It is crucial to assess the hERG blockade activity in the early stage of drug discovery. We are particularly interested in the hERG-cardiotoxicity of compounds collected in the DrugBank database considering that many DrugBank compounds have been approved for therapeutic treatments or have high potential to become drugs. Machine learning-based in silico tools offer a rapid and economical platform to virtually screen DrugBank compounds. We design accurate and robust classifiers for blockers/non-blockers and then build regressors to quantitatively analyze the binding potency of the DrugBank compounds on the hERG channel. Molecular sequences are embedded with two natural language processing (NLP) methods, namely, autoencoder and transformer. Complementary three-dimensional (3D) molecular structures are embedded with two advanced mathematical approaches, i.e., topological Laplacians and algebraic graphs. With our state-of-the-art tools, we reveal that 227 out of the 8641 DrugBank compounds are potential hERG blockers, suggesting serious drug safety problems. Our predictions provide guidance for the further experimental interrogation of DrugBank compounds' hERG-cardiotoxicity.
Collapse
Affiliation(s)
- Hongsong Feng
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA.
| |
Collapse
|
19
|
De Lara MLD. Persistent homology classification algorithm. PeerJ Comput Sci 2023; 9:e1195. [PMID: 37346603 PMCID: PMC10280283 DOI: 10.7717/peerj-cs.1195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 12/01/2022] [Indexed: 06/23/2023]
Abstract
Data classification is an important aspect of machine learning, as it is utilized to solve issues in a wide variety of contexts. There are numerous classifiers, but there is no single best-performing classifier for all types of data, as the no free lunch theorem implies. Topological data analysis is an emerging topic concerned with the shape of data. One of the key tools in this field for analyzing the shape or topological properties of a dataset is persistent homology, an algebraic topology-based method for estimating the topological features of a space of points that persists across several resolutions. This study proposes a supervised learning classification algorithm that makes use of persistent homology between training data classes in the form of persistence diagrams to predict the output category of new observations. Validation of the developed algorithm was performed on real-world and synthetic datasets. The performance of the proposed classification algorithm on these datasets was compared to that of the most widely used classifiers. Validation runs demonstrated that the proposed persistent homology classification algorithm performed at par if not better than the majority of classifiers considered.
Collapse
Affiliation(s)
- Mark Lexter D. De Lara
- Institute of Mathematical Sciences and Physics, College of Arts and Sciences, University of the Philippines Los Baños, College, Los Baños, Laguna, Philippines
- Institute of Mathematics, University of the Philippines Diliman, Quezon City, Metro Manila, Philippines
| |
Collapse
|
20
|
Xia K, Liu X, Wee J. Persistent Homology for RNA Data Analysis. Methods Mol Biol 2023; 2627:211-229. [PMID: 36959450 DOI: 10.1007/978-1-0716-2974-1_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Molecular representations are of great importance for machine learning models in RNA data analysis. Essentially, efficient molecular descriptors or fingerprints that characterize the intrinsic structural and interactional information of RNAs can significantly boost the performance of all learning modeling. In this paper, we introduce two persistent models, including persistent homology and persistent spectral, for RNA structure and interaction representations and their applications in RNA data analysis. Different from traditional geometric and graph representations, persistent homology is built on simplicial complex, which is a generalization of graph models to higher-dimensional situations. Hypergraph is a further generalization of simplicial complexes and hypergraph-based embedded persistent homology has been proposed recently. Moreover, persistent spectral models, which combine filtration process with spectral models, including spectral graph, spectral simplicial complex, and spectral hypergraph, are proposed for molecular representation. The persistent attributes for RNAs can be obtained from these two persistent models and further combined with machine learning models for RNA structure, flexibility, dynamics, and function analysis.
Collapse
Affiliation(s)
- Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore.
| | - Xiang Liu
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
- Chern Institute of Mathematics and LPMC, Nankai University, Tianjin, China
| | - JunJie Wee
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
21
|
Chen J, Qiu Y, Wang R, Wei GW. Persistent Laplacian projected Omicron BA.4 and BA.5 to become new dominating variants. Comput Biol Med 2022; 151:106262. [PMID: 36379191 PMCID: PMC10754203 DOI: 10.1016/j.compbiomed.2022.106262] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 10/21/2022] [Accepted: 10/30/2022] [Indexed: 11/15/2022]
Abstract
Due to its high transmissibility, Omicron BA.1 ousted the Delta variant to become a dominating variant in late 2021 and was replaced by more transmissible Omicron BA.2 in March 2022. An important question is which new variants will dominate in the future. Topology-based deep learning models have had tremendous success in forecasting emerging variants in the past. However, topology is insensitive to homotopic shape evolution in virus-human protein-protein binding, which is crucial to viral evolution and transmission. This challenge is tackled with persistent Laplacian, which is able to capture both the topological change and homotopic shape evolution of data. Persistent Laplacian-based deep learning models are developed to systematically evaluate variant infectivity. Our comparative analysis of Alpha, Beta, Gamma, Delta, Lambda, Mu, and Omicron BA.1, BA.1.1, BA.2, BA.2.11, BA.2.12.1, BA.3, BA.4, and BA.5 unveils that Omicron BA.2.11, BA.2.12.1, BA.3, BA.4, and BA.5 are more contagious than BA.2. In particular, BA.4 and BA.5 are about 36% more infectious than BA.2 and are projected to become new dominant variants by natural selection. Moreover, the proposed models outperform the state-of-the-art methods on three major benchmark datasets for mutation-induced protein-protein binding free energy changes. Our key projection about BA4 and BA.5's dominance made on May 1, 2022 (see arXiv:2205.00532) became a reality in late June 2022.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Yuchi Qiu
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Rui Wang
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
22
|
Liu J, Xia KL, Wu J, Yau SST, Wei GW. Biomolecular Topology: Modelling and Analysis. ACTA MATHEMATICA SINICA, ENGLISH SERIES 2022; 38:1901-1938. [PMID: 36407804 PMCID: PMC9640850 DOI: 10.1007/s10114-022-2326-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 07/12/2022] [Indexed: 05/25/2023]
Abstract
With the great advancement of experimental tools, a tremendous amount of biomolecular data has been generated and accumulated in various databases. The high dimensionality, structural complexity, the nonlinearity, and entanglements of biomolecular data, ranging from DNA knots, RNA secondary structures, protein folding configurations, chromosomes, DNA origami, molecular assembly, to others at the macromolecular level, pose a severe challenge in their analysis and characterization. In the past few decades, mathematical concepts, models, algorithms, and tools from algebraic topology, combinatorial topology, computational topology, and topological data analysis, have demonstrated great power and begun to play an essential role in tackling the biomolecular data challenge. In this work, we introduce biomolecular topology, which concerns the topological problems and models originated from the biomolecular systems. More specifically, the biomolecular topology encompasses topological structures, properties and relations that are emerged from biomolecular structures, dynamics, interactions, and functions. We discuss the various types of biomolecular topology from structures (of proteins, DNAs, and RNAs), protein folding, and protein assembly. A brief discussion of databanks (and databases), theoretical models, and computational algorithms, is presented. Further, we systematically review related topological models, including graphs, simplicial complexes, persistent homology, persistent Laplacians, de Rham-Hodge theory, Yau-Hausdorff distance, and the topology-based machine learning models.
Collapse
Affiliation(s)
- Jian Liu
- School of Mathematical Sciences, Hebei Normal University, Shijiazhuang, 050024 P. R. China
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China
| | - Ke-Lin Xia
- School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore, 639798 Singapore
| | - Jie Wu
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China
- Department of Mathematical Sciences, Tsinghua University, Beijing, 100084 P. R. China
| | - Stephen Shing-Toung Yau
- Yanqi Lake Beijing Institute of Mathematical Sciences and Applications, Beijing, 101408 P. R. China
- Department of Mathematical Sciences, Tsinghua University, Beijing, 100084 P. R. China
| | - Guo-Wei Wei
- Department of Mathematics & Department of Biochemistry and Molecular Biology & Department of Electrical and Computer Engineering, Michigan State University, Wells Hall 619 Red Cedar Road, East Lansing, MI 48824-1027 USA
| |
Collapse
|
23
|
Woodard J, Iqbal S, Mashaghi A. Circuit topology predicts pathogenicity of missense mutations. Proteins 2022; 90:1634-1644. [PMID: 35394672 PMCID: PMC9543832 DOI: 10.1002/prot.26342] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 03/07/2022] [Accepted: 03/30/2022] [Indexed: 12/05/2022]
Abstract
The contact topology of a protein determines important aspects of the folding process. The topological measure of contact order has been shown to be predictive of the rate of folding. Circuit topology is emerging as another fundamental descriptor of biomolecular structure, with predicted effects on the folding rate. We analyze the residue‐based circuit topological environments of 21 K mutations labeled as pathogenic or benign. Multiple statistical lines of reasoning support the conclusion that the number of contacts in two specific circuit topological arrangements, namely inverse parallel and cross relations, with contacts involving the mutated residue have discriminatory value in determining the pathogenicity of human variants. We investigate how results vary with residue type and according to whether the gene is essential. We further explore the relationship to a number of structural features and find that circuit topology provides nonredundant information on protein structures and pathogenicity of mutations. Results may have implications for the polymer physics of protein folding and suggest that “local” topological information, including residue‐based circuit topology and residue contact order, could be useful in improving state‐of‐the‐art machine learning algorithms for pathogenicity prediction.
Collapse
Affiliation(s)
- Jaie Woodard
- Medical Systems Biophysics and Bioengineering, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Leiden, The Netherlands.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Sumaiya Iqbal
- Center for the Development of Therapeutics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.,Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Alireza Mashaghi
- Medical Systems Biophysics and Bioengineering, Leiden Academic Centre for Drug Research, Faculty of Science, Leiden University, Leiden, The Netherlands.,Centre for Interdisciplinary Genome Research, Faculty of Science, Leiden University, Leiden, The Netherlands
| |
Collapse
|
24
|
Gao K, Wang R, Chen J, Cheng L, Frishcosy J, Huzumi Y, Qiu Y, Schluckbier T, Wei X, Wei GW. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2. Chem Rev 2022; 122:11287-11368. [PMID: 35594413 PMCID: PMC9159519 DOI: 10.1021/acs.chemrev.1c00965] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Despite tremendous efforts in the past two years, our understanding of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), virus-host interactions, immune response, virulence, transmission, and evolution is still very limited. This limitation calls for further in-depth investigation. Computational studies have become an indispensable component in combating coronavirus disease 2019 (COVID-19) due to their low cost, their efficiency, and the fact that they are free from safety and ethical constraints. Additionally, the mechanism that governs the global evolution and transmission of SARS-CoV-2 cannot be revealed from individual experiments and was discovered by integrating genotyping of massive viral sequences, biophysical modeling of protein-protein interactions, deep mutational data, deep learning, and advanced mathematics. There exists a tsunami of literature on the molecular modeling, simulations, and predictions of SARS-CoV-2 and related developments of drugs, vaccines, antibodies, and diagnostics. To provide readers with a quick update about this literature, we present a comprehensive and systematic methodology-centered review. Aspects such as molecular biophysics, bioinformatics, cheminformatics, machine learning, and mathematics are discussed. This review will be beneficial to researchers who are looking for ways to contribute to SARS-CoV-2 studies and those who are interested in the status of the field.
Collapse
Affiliation(s)
- Kaifu Gao
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Rui Wang
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jiahui Chen
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Limei Cheng
- Clinical
Pharmacology and Pharmacometrics, Bristol
Myers Squibb, Princeton, New Jersey 08536, United States
| | - Jaclyn Frishcosy
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuta Huzumi
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuchi Qiu
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Tom Schluckbier
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xiaoqi Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department
of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department
of Biochemistry and Molecular Biology, Michigan
State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
25
|
Topological Data Analysis Helps to Improve Accuracy of Deep Learning Models for Fake News Detection Trained on Very Small Training Sets. BIG DATA AND COGNITIVE COMPUTING 2022. [DOI: 10.3390/bdcc6030074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Topological data analysis has recently found applications in various areas of science, such as computer vision and understanding of protein folding. However, applications of topological data analysis to natural language processing remain under-researched. This study applies topological data analysis to a particular natural language processing task: fake news detection. We have found that deep learning models are more accurate in this task than topological data analysis. However, assembling a deep learning model with topological data analysis significantly improves the model’s accuracy if the available training set is very small.
Collapse
|
26
|
Skaf Y, Laubenbacher R. Topological data analysis in biomedicine: A review. J Biomed Inform 2022; 130:104082. [PMID: 35508272 DOI: 10.1016/j.jbi.2022.104082] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 03/20/2022] [Accepted: 04/23/2022] [Indexed: 01/22/2023]
Abstract
Significant technological advances made in recent years have shepherded a dramatic increase in utilization of digital technologies for biomedicine- everything from the widespread use of electronic health records to improved medical imaging capabilities and the rising ubiquity of genomic sequencing contribute to a "digitization" of biomedical research and clinical care. With this shift toward computerized tools comes a dramatic increase in the amount of available data, and current tools for data analysis capable of extracting meaningful knowledge from this wealth of information have yet to catch up. This article seeks to provide an overview of emerging mathematical methods with the potential to improve the abilities of clinicians and researchers to analyze biomedical data, but may be hindered from doing so by a lack of conceptual accessibility and awareness in the life sciences research community. In particular, we focus on topological data analysis (TDA), a set of methods grounded in the mathematical field of algebraic topology that seeks to describe and harness features related to the "shape" of data. We aim to make such techniques more approachable to non-mathematicians by providing a conceptual discussion of their theoretical foundations followed by a survey of their published applications to scientific research. Finally, we discuss the limitations of these methods and suggest potential avenues for future work integrating mathematical tools into clinical care and biomedical informatics.
Collapse
Affiliation(s)
- Yara Skaf
- University of Florida, Department of Mathematics, Gainesville, FL, USA; University of Florida, Department of Medicine, Division of Pulmonary, Critical Care, & Sleep Medicine, Gainesville, FL, USA.
| | - Reinhard Laubenbacher
- University of Florida, Department of Mathematics, Gainesville, FL, USA; University of Florida, Department of Medicine, Division of Pulmonary, Critical Care, & Sleep Medicine, Gainesville, FL, USA.
| |
Collapse
|
27
|
Grbić J, Wu J, Xia K, Wei GW. ASPECTS OF TOPOLOGICAL APPROACHES FOR DATA SCIENCE. FOUNDATIONS OF DATA SCIENCE (SPRINGFIELD, MO.) 2022; 4:165-216. [PMID: 36712596 PMCID: PMC9881677 DOI: 10.3934/fods.2022002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
We establish a new theory which unifies various aspects of topological approaches for data science, by being applicable both to point cloud data and to graph data, including networks beyond pairwise interactions. We generalize simplicial complexes and hypergraphs to super-hypergraphs and establish super-hypergraph homology as an extension of simplicial homology. Driven by applications, we also introduce super-persistent homology.
Collapse
Affiliation(s)
- Jelena Grbić
- School of Mathematical Sciences, University of Southampton, Southampton, UK
| | - Jie Wu
- School of Mathematical Sciences, Center of Topology and Geometry based Technology, Hebei Normal University, Yuhua District, Shijiazhuang, Hebei, 050024 China
- Yanqi Lake Beijing Institute of Mathematica Sciences, Yanqihu, Huairou District, Beijing, 101408 China
| | - Kelin Xia
- School of Physical and Mathematical Sciences, Nanyang Technological University, SPMS-MAS-05-18, 21 Nanyang Link, 1, Singapore 63737
| | - Guo-Wei Wei
- Department of Mathematics, Department of Computer Science and Engineering, Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
28
|
Watanabe S, Yamana H. Overfitting measurement of convolutional neural networks using trained network weights. INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS 2022. [DOI: 10.1007/s41060-022-00332-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
29
|
Chen J, Wei GW. Mathematical artificial intelligence design of mutation-proof COVID-19 monoclonal antibodies. ARXIV 2022:arXiv:2204.09471v1. [PMID: 35475234 PMCID: PMC9040270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Emerging severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants have compromised existing vaccines and posed a grand challenge to coronavirus disease 2019 (COVID-19) prevention, control, and global economic recovery. For COVID-19 patients, one of the most effective COVID-19 medications is monoclonal antibody (mAb) therapies. The United States Food and Drug Administration (U.S. FDA) has given the emergency use authorization (EUA) to a few mAbs, including those from Regeneron, Eli Elly, etc. However, they are also undermined by SARS-CoV-2 mutations. It is imperative to develop effective mutation-proof mAbs for treating COVID-19 patients infected by all emerging variants and/or the original SARS-CoV-2. We carry out a deep mutational scanning to present the blueprint of such mAbs using algebraic topology and artificial intelligence (AI). To reduce the risk of clinical trial-related failure, we select five mAbs either with FDA EUA or in clinical trials as our starting point. We demonstrate that topological AI-designed mAbs are effective to variants of concerns and variants of interest designated by the World Health Organization (WHO), as well as the original SARS-CoV-2. Our topological AI methodologies have been validated by tens of thousands of deep mutational data and their predictions have been confirmed by results from tens of experimental laboratories and population-level statistics of genome isolates from hundreds of thousands of patients.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
30
|
Wang R, Chen J, Hozumi Y, Yin C, Wei GW. Emerging Vaccine-Breakthrough SARS-CoV-2 Variants. ACS Infect Dis 2022; 8:546-556. [PMID: 35133792 PMCID: PMC8848511 DOI: 10.1021/acsinfecdis.1c00557] [Citation(s) in RCA: 43] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Indexed: 12/28/2022]
Abstract
The surge of COVID-19 infections has been fueled by new SARS-CoV-2 variants, namely Alpha, Beta, Gamma, Delta, and so forth. The molecular mechanism underlying such surge is elusive due to the existence of 28 554 unique mutations, including 4 653 non-degenerate mutations on the spike protein. Understanding the molecular mechanism of SARS-CoV-2 transmission and evolution is a prerequisite to foresee the trend of emerging vaccine-breakthrough variants and the design of mutation-proof vaccines and monoclonal antibodies. We integrate the genotyping of 1 489 884 SARS-CoV-2 genomes, a library of 130 human antibodies, tens of thousands of mutational data, topological data analysis, and deep learning to reveal SARS-CoV-2 evolution mechanism and forecast emerging vaccine-breakthrough variants. We show that prevailing variants can be quantitatively explained by infectivity-strengthening and vaccine-escape (co-)mutations on the spike protein RBD due to natural selection and/or vaccination-induced evolutionary pressure. We illustrate that infectivity strengthening mutations were the main mechanism for viral evolution, while vaccine-escape mutations become a dominating viral evolutionary mechanism among highly vaccinated populations. We demonstrate that Lambda is as infectious as Delta but is more vaccine-resistant. We analyze emerging vaccine-breakthrough comutations in highly vaccinated countries, including the United Kingdom, the United States, Denmark, and so forth. Finally, we identify sets of comutations that have a high likelihood of massive growth: [A411S, L452R, T478K], [L452R, T478K, N501Y], [V401L, L452R, T478K], [K417N, L452R, T478K], [L452R, T478K, E484K, N501Y], and [P384L, K417N, E484K, N501Y]. We predict they can escape existing vaccines. We foresee an urgent need to develop new virus combating strategies.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yuta Hozumi
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Changchuan Yin
- Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
31
|
Noshita K, Murata H, Kirie S. Model-based plant phenomics on morphological traits using morphometric descriptors. BREEDING SCIENCE 2022; 72:19-30. [PMID: 36045892 PMCID: PMC8987841 DOI: 10.1270/jsbbs.21078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2021] [Accepted: 12/20/2021] [Indexed: 06/15/2023]
Abstract
The morphological traits of plants contribute to many important functional features such as radiation interception, lodging tolerance, gas exchange efficiency, spatial competition between individuals and/or species, and disease resistance. Although the importance of plant phenotyping techniques is increasing with advances in molecular breeding strategies, there are barriers to its advancement, including the gap between measured data and phenotypic values, low quantitativity, and low throughput caused by the lack of models for representing morphological traits. In this review, we introduce morphological descriptors that can be used for phenotyping plant morphological traits. Geometric morphometric approaches pave the way to a general-purpose method applicable to single units. Hierarchical structures composed of an indefinite number of multiple elements, which is often observed in plants, can be quantified in terms of their multi-scale topological characteristics using topological data analysis. Theoretical morphological models capture specific anatomical structures, if recognized. These morphological descriptors provide us with the advantages of model-based plant phenotyping, including robust quantification of limited datasets. Moreover, we discuss the future possibilities that a system of model-based measurement and model refinement would solve the lack of morphological models and the difficulties in scaling out the phenotyping processes.
Collapse
Affiliation(s)
- Koji Noshita
- Department of Biology, Kyushu University, Fukuoka, Fukuoka 819-0395, Japan
- Plant Frontier Research Center, Kyushu University, Fukuoka, Fukuoka 819-0395, Japan
| | - Hidekazu Murata
- Department of Biology, Kyushu University, Fukuoka, Fukuoka 819-0395, Japan
| | - Shiryu Kirie
- metaPhorest (Bioaesthetics Platform), Department of Electrical Engineering and Bioscience, Waseda University, TWIns, Tokyo 162-8480, Japan
| |
Collapse
|
32
|
Pun CS, Lee SX, Xia K. Persistent-homology-based machine learning: a survey and a comparative study. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10146-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
33
|
Ichinomiya T. Topological data analysis gives two folding paths in HP35(nle-nle), double mutant of villin headpiece subdomain. Sci Rep 2022; 12:2719. [PMID: 35177744 PMCID: PMC8854739 DOI: 10.1038/s41598-022-06682-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 02/04/2022] [Indexed: 11/16/2022] Open
Abstract
The folding dynamics of proteins is a primary area of interest in protein science. We carried out topological data analysis (TDA) of the folding process of HP35(nle-nle), a double-mutant of the villin headpiece subdomain. Using persistent homology and non-negative matrix factorization, we reduced the dimension of protein structure and investigated the flow in the reduced space. We found this protein has two folding paths, distinguished by the pairings of inter-helix residues. Our analysis showed the excellent performance of TDA in capturing the formation of tertiary structure.
Collapse
Affiliation(s)
- Takashi Ichinomiya
- Department of Systems Biology, Gifu University School of Medicine, Yanagido 1-1, Gifu, 501-1194, Japan. .,The United Graduate School of Drug Discovery and Medical Information Sciences of Gifu University, Yanagido 1-1, Gifu, 501-1194, Japan.
| |
Collapse
|
34
|
Chen J, Wei GW. Mathematical artificial intelligence design of mutation-proof COVID-19 monoclonal antibodies. COMMUNICATIONS IN INFORMATION AND SYSTEMS 2022; 22:339-361. [PMID: 36713633 PMCID: PMC9881605 DOI: 10.4310/cis.2022.v22.n3.a3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Emerging severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants have compromised existing vaccines and posed a grand challenge to coronavirus disease 2019 (COVID-19) prevention, control, and global economic recovery. For COVID-19 patients, one of the most effective COVID-19 medications is monoclonal antibody (mAb) therapies. The United States Food and Drug Administration (U.S. FDA) has given the emergency use authorization (EUA) to a few mAbs, including those from Regeneron, Eli Elly, etc. However, they are also undermined by SARS-CoV-2 mutations. It is imperative to develop effective mutation-proof mAbs for treating COVID-19 patients infected by all emerging variants and/or the original SARS-CoV-2. We carry out a deep mutational scanning to present the blueprint of such mAbs using algebraic topology and artificial intelligence (AI). To reduce the risk of clinical trial-related failure, we select five mAbs either with FDA EUA or in clinical trials as our starting point. We demonstrate that topological AI-designed mAbs are effective for variants of concerns and variants of interest designated by the World Health Organization (WHO), as well as the original SARS-CoV-2. Our topological AI methodologies have been validated by tens of thousands of deep mutational data and their predictions have been confirmed by results from tens of experimental laboratories and population-level statistics of genome isolates from hundreds of thousands of patients.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of mathematics, Michigan State University, East Lansing, MI 48823, USA
| | | |
Collapse
|
35
|
Jhun B. Topological analysis of the latent geometry of a complex network. CHAOS (WOODBURY, N.Y.) 2022; 32:013116. [PMID: 35105131 DOI: 10.1063/5.0073107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 12/16/2021] [Indexed: 06/14/2023]
Abstract
Most real-world networks are embedded in latent geometries. If a node in a network is found in the vicinity of another node in the latent geometry, the two nodes have a disproportionately high probability of being connected by a link. The latent geometry of a complex network is a central topic of research in network science, which has an expansive range of practical applications, such as efficient navigation, missing link prediction, and brain mapping. Despite the important role of topology in the structures and functions of complex systems, little to no study has been conducted to develop a method to estimate the general unknown latent geometry of complex networks. Topological data analysis, which has attracted extensive attention in the research community owing to its convincing performance, can be directly implemented into complex networks; however, even a small fraction (0.1%) of long-range links can completely erase the topological signature of the latent geometry. Inspired by the fact that long-range links in a network have disproportionately high loads, we develop a set of methods that can analyze the latent geometry of a complex network: the modified persistent homology diagram and the map of the latent geometry. These methods successfully reveal the topological properties of the synthetic and empirical networks used to validate the proposed methods.
Collapse
Affiliation(s)
- Bukyoung Jhun
- CCSS, CTP, and Department of Physics and Astronomy, Seoul National University, Seoul 08826, South Korea and Department of Physics, The University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
36
|
Qin Y, Fasy BT, Wenk C, Summa B. A Domain-Oblivious Approach for Learning Concise Representations of Filtered Topological Spaces for Clustering. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:302-312. [PMID: 34587087 DOI: 10.1109/tvcg.2021.3114872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Persistence diagrams have been widely used to quantify the underlying features of filtered topological spaces in data visualization. In many applications, computing distances between diagrams is essential; however, computing these distances has been challenging due to the computational cost. In this paper, we propose a persistence diagram hashing framework that learns a binary code representation of persistence diagrams, which allows for fast computation of distances. This framework is built upon a generative adversarial network (GAN) with a diagram distance loss function to steer the learning process. Instead of using standard representations, we hash diagrams into binary codes, which have natural advantages in large-scale tasks. The training of this model is domain-oblivious in that it can be computed purely from synthetic, randomly created diagrams. As a consequence, our proposed method is directly applicable to various datasets without the need for retraining the model. These binary codes, when compared using fast Hamming distance, better maintain topological similarity properties between datasets than other vectorized representations. To evaluate this method, we apply our framework to the problem of diagram clustering and we compare the quality and performance of our approach to the state-of-the-art. In addition, we show the scalability of our approach on a dataset with 10k persistence diagrams, which is not possible with current techniques. Moreover, our experimental results demonstrate that our method is significantly faster with the potential of less memory usage, while retaining comparable or better quality comparisons.
Collapse
|
37
|
Li S, Liu Y, Chen D, Jiang Y, Nie Z, Pan F. Encoding the atomic structure for machine learning in materials science. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1558] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Shunning Li
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Yuanji Liu
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Dong Chen
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Yi Jiang
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Zhiwei Nie
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| | - Feng Pan
- School of Advanced Materials Peking University, Shenzhen Graduate School Shenzhen China
| |
Collapse
|
38
|
Stenseke J. Persistent homology and the shape of evolutionary games. J Theor Biol 2021; 531:110903. [PMID: 34534569 DOI: 10.1016/j.jtbi.2021.110903] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 09/08/2021] [Accepted: 09/09/2021] [Indexed: 11/17/2022]
Abstract
For nearly three decades, spatial games have produced a wealth of insights to the study of behavior and its relation to population structure. However, as different rules and factors are added or altered, the dynamics of spatial models often become increasingly complicated to interpret. To tackle this problem, we introduce persistent homology as a rigorous framework that can be used to both define and compute higher-order features of data in a manner which is invariant to parameter choices, robust to noise, and independent of human observation. Our work demonstrates its relevance for spatial games by showing how topological features of simulation data that persist over different spatial scales reflect the stability of strategies in 2D lattice games. To do so, we analyze the persistent homology of scenarios from two games: a Prisoner's Dilemma and a SIRS epidemic model. The experimental results show how the method accurately detects features that correspond to real aspects of the game dynamics. Unlike other tools that study dynamics of spatial systems, persistent homology can tell us something meaningful about population structure while remaining neutral about the underlying structure itself. Regardless of game complexity, since strategies either succeed or fail to conform to shapes of a certain topology there is much potential for the method to provide novel insights for a wide variety of spatially extended systems in biology, social science, and physics.
Collapse
Affiliation(s)
- Jakob Stenseke
- Department of Philosophy, Lund University, Helgonavagen 3, Lund 221 00, Sweden.
| |
Collapse
|
39
|
Wang R, Chen J, Wei GW. Mechanisms of SARS-CoV-2 Evolution Revealing Vaccine-Resistant Mutations in Europe and America. J Phys Chem Lett 2021; 12:11850-11857. [PMID: 34873910 PMCID: PMC8672435 DOI: 10.1021/acs.jpclett.1c03380] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 12/02/2021] [Indexed: 05/20/2023]
Abstract
The importance of understanding SARS-CoV-2 evolution cannot be overlooked. Recent studies confirm that natural selection is the dominating mechanism of SARS-CoV-2 evolution, which favors mutations that strengthen viral infectivity. Here, we demonstrate that vaccine-breakthrough or antibody-resistant mutations provide a new mechanism of viral evolution. Specifically, vaccine-resistant mutation Y449S in the spike (S) protein receptor-binding domain, which occurred in co-mutations Y449S and N501Y, has reduced infectivity compared to that of the original SARS-CoV-2 but can disrupt existing antibodies that neutralize the virus. By tracking the evolutionary trajectories of vaccine-resistant mutations in more than 2.2 million SARS-CoV-2 genomes, we reveal that the occurrence and frequency of vaccine-resistant mutations correlate strongly with the vaccination rates in Europe and America. We anticipate that as a complementary transmission pathway, vaccine-breakthrough or antibody-resistant mutations, like those in Omicron, will become a dominating mechanism of SARS-CoV-2 evolution when most of the world's population is either vaccinated or infected. Our study sheds light on SARS-CoV-2 evolution and transmission and enables the design of the next-generation mutation-proof vaccines and antibody drugs.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering Michigan State University, MI 48824, USA
| |
Collapse
|
40
|
WEI XIAOQI, WEI GUOWEI. HOMOTOPY CONTINUATION FOR THE SPECTRA OF PERSISTENT LAPLACIANS. FOUNDATIONS OF DATA SCIENCE (SPRINGFIELD, MO.) 2021; 3:677-700. [PMID: 35822080 PMCID: PMC9273002 DOI: 10.3934/fods.2021017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The p-persistent q-combinatorial Laplacian defined for a pair of simplicial complexes is a generalization of the q-combinatorial Laplacian. Given a filtration, the spectra of persistent combinatorial Laplacians not only recover the persistent Betti numbers of persistent homology but also provide extra multiscale geometrical information of the data. Paired with machine learning algorithms, the persistent Laplacian has many potential applications in data science. Seeking different ways to find the spectrum of an operator is an active research topic, becoming interesting when ideas are originated from multiple fields. In this work, we explore an alternative approach for the spectrum of persistent Laplacians. As the eigenvalues of a persistent Laplacian matrix are the roots of its characteristic polynomial, one may attempt to find the roots of the characteristic polynomial by homotopy continuation, and thus resolving the spectrum of the corresponding persistent Laplacian. We consider a set of simple polytopes and small molecules to prove the principle that algebraic topology, combinatorial graph, and algebraic geometry can be integrated to understand the shape of data.
Collapse
Affiliation(s)
- XIAOQI WEI
- Department of Mathematics, Michigan State University, MI 48824, USA
| | | |
Collapse
|
41
|
Chen J, Wang R, Wei GW. SARS-CoV-2 becoming more infectious as revealed by algebraic topology and deep learning. COMMUNICATIONS IN INFORMATION AND SYSTEMS 2021; 21:31-36. [PMID: 34675755 DOI: 10.4310/cis.2021.v21.n1.a2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused by coronavirus disease 2019 (COVID-19) has led to a tremendous human fatality and economic loss. SARS-CoV-2 infectivity is a key reason for the widespread viral transmission, but its rigorous experimental measurement is essentially impossible due to the ongoing genome evolution around the world. We show that artificial intelligence (AI) and algebraic topology (AT) offer an accurate and efficient alternative to the experimental determination of viral infectivity. AI and AT analysis indicates that the on-going mutations make SARS-CoV-2 more infectious.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematics, Michigan State University MI 48824, USA
| | - Rui Wang
- Department of Mathematics, Michigan State University MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University MI 48824, USA
| |
Collapse
|
42
|
Chen J, Zhao R, Tong Y, Wei GW. EVOLUTIONARY DE RHAM-HODGE METHOD. DISCRETE AND CONTINUOUS DYNAMICAL SYSTEMS. SERIES B 2021; 26:3785-3821. [PMID: 34675756 DOI: 10.3934/dcdsb.2020257] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The de Rham-Hodge theory is a landmark of the 20th Century's mathematics and has had a great impact on mathematics, physics, computer science, and engineering. This work introduces an evolutionary de Rham-Hodge method to provide a unified paradigm for the multiscale geometric and topological analysis of evolving manifolds constructed from a filtration, which induces a family of evolutionary de Rham complexes. While the present method can be easily applied to close manifolds, the emphasis is given to more challenging compact manifolds with 2-manifold boundaries, which require appropriate analysis and treatment of boundary conditions on differential forms to maintain proper topological properties. Three sets of unique evolutionary Hodge Laplacians are proposed to generate three sets of topology-preserving singular spectra, for which the multiplicities of zero eigenvalues correspond to exactly the persistent Betti numbers of dimensions 0, 1 and 2. Additionally, three sets of non-zero eigenvalues further reveal both topological persistence and geometric progression during the manifold evolution. Extensive numerical experiments are carried out via the discrete exterior calculus to demonstrate the potential of the proposed paradigm for data representation and shape analysis of both point cloud data and density maps. To demonstrate the utility of the proposed method, the application is considered to the protein B-factor predictions of a few challenging cases for which existing biophysical models break down.
Collapse
Affiliation(s)
- Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Rundong Zhao
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Yiying Tong
- Department of Computer Science and Engineering, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
| |
Collapse
|
43
|
Li L, Thompson C, Henselman-Petrusek G, Giusti C, Ziegelmeier L. Minimal Cycle Representatives in Persistent Homology Using Linear Programming: An Empirical Study With User's Guide. Front Artif Intell 2021; 4:681117. [PMID: 34708196 PMCID: PMC8544243 DOI: 10.3389/frai.2021.681117] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Accepted: 05/14/2021] [Indexed: 12/24/2022] Open
Abstract
Cycle representatives of persistent homology classes can be used to provide descriptions of topological features in data. However, the non-uniqueness of these representatives creates ambiguity and can lead to many different interpretations of the same set of classes. One approach to solving this problem is to optimize the choice of representative against some measure that is meaningful in the context of the data. In this work, we provide a study of the effectiveness and computational cost of severalℓ 1 minimization optimization procedures for constructing homological cycle bases for persistent homology with rational coefficients in dimension one, including uniform-weighted and length-weighted edge-loss algorithms as well as uniform-weighted and area-weighted triangle-loss algorithms. We conduct these optimizations via standard linear programming methods, applying general-purpose solvers to optimize over column bases of simplicial boundary matrices. Our key findings are: 1) optimization is effective in reducing the size of cycle representatives, though the extent of the reduction varies according to the dimension and distribution of the underlying data, 2) the computational cost of optimizing a basis of cycle representatives exceeds the cost of computing such a basis, in most data sets we consider, 3) the choice of linear solvers matters a lot to the computation time of optimizing cycles, 4) the computation time of solving an integer program is not significantly longer than the computation time of solving a linear program for most of the cycle representatives, using the Gurobi linear solver, 5) strikingly, whether requiring integer solutions or not, we almost always obtain a solution with the same cost and almost all solutions found have entries in{ - 1,0,1 } and therefore, are also solutions to a restrictedℓ 0 optimization problem, and 6) we obtain qualitatively different results for generators in Erdős-Rényi random clique complexes than in real-world and synthetic point cloud data.
Collapse
Affiliation(s)
- Lu Li
- Mathematics, Statistics, and Computer Science Department, Macalester College, Saint Paul, MN, United States
| | - Connor Thompson
- Department of Mathematics, Purdue University, West Lafayette, IN, United States
| | | | - Chad Giusti
- Department of Mathematical Sciences, University of Delaware, Newark, DE, United States
| | - Lori Ziegelmeier
- Mathematics, Statistics, and Computer Science Department, Macalester College, Saint Paul, MN, United States
| |
Collapse
|
44
|
Wang R, Chen J, Wei GW. The evolution of the mechanisms of SARS-CoV-2 evolution revealing vaccine-resistant mutations in Europe and America. ARXIV 2021:arXiv:2110.04626v1. [PMID: 34642638 PMCID: PMC8509097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The importance of understanding SARS-CoV-2 evolution cannot be overemphasized. Recent studies confirm that natural selection is the dominating mechanism of SARS-CoV-2 evolution, which favors mutations that strengthen viral infectivity. We demonstrate that vaccine-breakthrough or antibody-resistant mutations provide a new mechanism of viral evolution. Specifically, vaccine-resistant mutation Y449S in the spike (S) protein receptor-bonding domain (RBD), which occurred in co-mutation [Y449S, N501Y], has reduced infectivity compared to the original SARS-CoV-2 but can disrupt existing antibodies that neutralize the virus. By tracing the evolutionary trajectories of vaccine-resistant mutations in over 1.9 million SARS-CoV-2 genomes, we reveal that the occurrence and frequency of vaccine-resistant mutations correlate strongly with the vaccination rates in Europe and America. We anticipate that as a complementary transmission pathway, vaccine-resistant mutations will become a dominating mechanism of SARS-CoV-2 evolution when most of the world's population is vaccinated. Our study sheds light on SARS-CoV-2 evolution and transmission and enables the design of the next-generation mutation-proof vaccines and antibody drugs.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
45
|
Moroni D, Pascali MA. Learning Topology: Bridging Computational Topology and Machine Learning. PATTERN RECOGNITION AND IMAGE ANALYSIS 2021. [DOI: 10.1134/s1054661821030184] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
46
|
Yen PTW, Xia K, Cheong SA. Understanding Changes in the Topology and Geometry of Financial Market Correlations during a Market Crash. ENTROPY (BASEL, SWITZERLAND) 2021; 23:1211. [PMID: 34573837 PMCID: PMC8467365 DOI: 10.3390/e23091211] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 09/05/2021] [Accepted: 09/06/2021] [Indexed: 12/24/2022]
Abstract
In econophysics, the achievements of information filtering methods over the past 20 years, such as the minimal spanning tree (MST) by Mantegna and the planar maximally filtered graph (PMFG) by Tumminello et al., should be celebrated. Here, we show how one can systematically improve upon this paradigm along two separate directions. First, we used topological data analysis (TDA) to extend the notions of nodes and links in networks to faces, tetrahedrons, or k-simplices in simplicial complexes. Second, we used the Ollivier-Ricci curvature (ORC) to acquire geometric information that cannot be provided by simple information filtering. In this sense, MSTs and PMFGs are but first steps to revealing the topological backbones of financial networks. This is something that TDA can elucidate more fully, following which the ORC can help us flesh out the geometry of financial networks. We applied these two approaches to a recent stock market crash in Taiwan and found that, beyond fusions and fissions, other non-fusion/fission processes such as cavitation, annihilation, rupture, healing, and puncture might also be important. We also successfully identified neck regions that emerged during the crash, based on their negative ORCs, and performed a case study on one such neck region.
Collapse
Affiliation(s)
- Peter Tsung-Wen Yen
- Center for Crystal Researches, National Sun Yet-Sen University, No. 70, Lien-hai Rd., Kaohsiung 80424, Taiwan;
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, 21 Nanyang Link, Singapore 637371, Singapore;
| | - Siew Ann Cheong
- Division of Physics and Applied Physics, School of Physical and Mathematical Sciences, Nanyang Technological University, 21 Nanyang Link, Singapore 637371, Singapore
| |
Collapse
|
47
|
Wang R, Chen J, Hozumi Y, Yin C, Wei GW. Emerging vaccine-breakthrough SARS-CoV-2 variants. ARXIV 2021:arXiv:2109.04509v1. [PMID: 34518803 PMCID: PMC8437313] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The recent global surge in COVID-19 infections has been fueled by new SARS-CoV-2 variants, namely Alpha, Beta, Gamma, Delta, etc. The molecular mechanism underlying such surge is elusive due to 4,653 non-degenerate mutations on the spike protein, which is the target of most COVID-19 vaccines. The understanding of the molecular mechanism of transmission and evolution is a prerequisite to foresee the trend of emerging vaccine-breakthrough variants and the design of mutation-proof vaccines and monoclonal antibodies. We integrate the genotyping of 1,489,884 SARS-CoV-2 genomes isolates, 130 human antibodies, tens of thousands of mutational data points, topological data analysis, and deep learning to reveal SARS-CoV-2 evolution mechanism and forecast emerging vaccine-escape variants. We show that infectivity-strengthening and antibody-disruptive co-mutations on the S protein RBD can quantitatively explain the infectivity and virulence of all prevailing variants. We demonstrate that Lambda is as infectious as Delta but is more vaccine-resistant. We analyze emerging vaccine-breakthrough co-mutations in 20 countries, including the United Kingdom, the United States, Denmark, Brazil, and Germany, etc. We envision that natural selection through infectivity will continue to be the main mechanism for viral evolution among unvaccinated populations, while antibody disruptive co-mutations will fuel the future growth of vaccine-breakthrough variants among fully vaccinated populations. Finally, we have identified the co-mutations that have the great likelihood of becoming dominant: [A411S, L452R, T478K], [L452R, T478K, N501Y], [V401L, L452R, T478K], [K417N, L452R, T478K], [L452R, T478K, E484K, N501Y], and [P384L, K417N, E484K, N501Y]. We predict they, particularly the last four, will break through existing vaccines. We foresee an urgent need to develop new vaccines that target these co-mutations.
Collapse
Affiliation(s)
- Rui Wang
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Jiahui Chen
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Yuta Hozumi
- Department of Mathematics, Michigan State University, MI 48824, USA
| | - Changchuan Yin
- Department of Mathematics, Statistics, and Computer Science, University of Illinois at Chicago, Chicago, IL 60607, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, MI 48824, USA
- Department of Electrical and Computer Engineering, Michigan State University, MI 48824, USA
- Department of Biochemistry and Molecular Biology, Michigan State University, MI 48824, USA
| |
Collapse
|
48
|
Songdechakraiwut T, Shen L, Chung M. Topological Learning and Its Application to Multimodal Brain Network Integration. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION : MICCAI ... INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION 2021; 12902:166-176. [PMID: 35098263 PMCID: PMC8797159 DOI: 10.1007/978-3-030-87196-3_16] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
A long-standing challenge in multimodal brain network analyses is to integrate topologically different brain networks obtained from diffusion and functional MRI in a coherent statistical framework. Existing multimodal frameworks will inevitably destroy the topological difference of the networks. In this paper, we propose a novel topological learning framework that integrates networks of different topology through persistent homology. Such challenging task is made possible through the introduction of a new topological loss that bypasses intrinsic computational bottlenecks and thus enables us to perform various topological computations and optimizations with ease. We validate the topological loss in extensive statistical simulations with ground truth to assess its effectiveness of discriminating networks. Among many possible applications, we demonstrate the versatility of topological loss in the twin imaging study where we determine the extend to which brain networks are genetically heritable.
Collapse
Affiliation(s)
- Tananun Songdechakraiwut
- University of Wisconsin–Madison, USA
- Correspondence should be addressed to Tananun Songdechakraiwut ()
| | - Li Shen
- University of Pennsylvania, USA
| | - Moo Chung
- University of Wisconsin–Madison, USA
| |
Collapse
|
49
|
Clark AE, Adams H, Hernandez R, Krylov AI, Niklasson AMN, Sarupria S, Wang Y, Wild SM, Yang Q. The Middle Science: Traversing Scale In Complex Many-Body Systems. ACS CENTRAL SCIENCE 2021; 7:1271-1287. [PMID: 34471670 PMCID: PMC8393217 DOI: 10.1021/acscentsci.1c00685] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
A roadmap is developed that integrates simulation methodology and data science methods to target new theories that traverse the multiple length- and time-scale features of many-body phenomena.
Collapse
Affiliation(s)
- Aurora E. Clark
- Department of Chemistry, Washington State University, Pullman, Washington 99163, United States
| | - Henry Adams
- Department of Mathematics, Colorado State
University, Fort Collins, Colorado 80523, United States
| | - Rigoberto Hernandez
- Departments
of Chemistry, Chemical and Biomolecular Engineering, and Materials
Science and Engineering, Johns Hopkins University, Baltimore, Maryland 21218, United States
| | - Anna I. Krylov
- Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States
| | - Anders M. N. Niklasson
- Theoretical
Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, United States
| | - Sapna Sarupria
- Department of Chemical and Biomolecular Engineering, Center for Optical
Materials Science and Engineering Technologies (COMSET), Clemson University, Clemson, South Carolina 29670, United States
- Department
of Chemistry, University of Minnesota, Minneapolis, Minnesota 55455, United States
| | - Yusu Wang
- Halıcıŏglu Data Science Institute, University of California, San Diego, La Jolla, California 92093, United States
| | - Stefan M. Wild
- Mathematics
and Computer Science Division, Argonne National
Laboratory, Lemont, Illinois 60439, United
States
| | - Qian Yang
- Computer Science and Engineering Department, University of Connecticut, Storrs, Connecticut 06269-4155, United States
| |
Collapse
|
50
|
Salch A, Regalski A, Abdallah H, Suryadevara R, Catanzaro MJ, Diwadkar VA. From mathematics to medicine: A practical primer on topological data analysis (TDA) and the development of related analytic tools for the functional discovery of latent structure in fMRI data. PLoS One 2021; 16:e0255859. [PMID: 34383838 PMCID: PMC8360597 DOI: 10.1371/journal.pone.0255859] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Accepted: 07/23/2021] [Indexed: 11/19/2022] Open
Abstract
fMRI is the preeminent method for collecting signals from the human brain in vivo, for using these signals in the service of functional discovery, and relating these discoveries to anatomical structure. Numerous computational and mathematical techniques have been deployed to extract information from the fMRI signal. Yet, the application of Topological Data Analyses (TDA) remain limited to certain sub-areas such as connectomics (that is, with summarized versions of fMRI data). While connectomics is a natural and important area of application of TDA, applications of TDA in the service of extracting structure from the (non-summarized) fMRI data itself are heretofore nonexistent. “Structure” within fMRI data is determined by dynamic fluctuations in spatially distributed signals over time, and TDA is well positioned to help researchers better characterize mass dynamics of the signal by rigorously capturing shape within it. To accurately motivate this idea, we a) survey an established method in TDA (“persistent homology”) to reveal and describe how complex structures can be extracted from data sets generally, and b) describe how persistent homology can be applied specifically to fMRI data. We provide explanations for some of the mathematical underpinnings of TDA (with expository figures), building ideas in the following sequence: a) fMRI researchers can and should use TDA to extract structure from their data; b) this extraction serves an important role in the endeavor of functional discovery, and c) TDA approaches can complement other established approaches toward fMRI analyses (for which we provide examples). We also provide detailed applications of TDA to fMRI data collected using established paradigms, and offer our software pipeline for readers interested in emulating our methods. This working overview is both an inter-disciplinary synthesis of ideas (to draw researchers in TDA and fMRI toward each other) and a detailed description of methods that can motivate collaborative research.
Collapse
Affiliation(s)
- Andrew Salch
- Department of Mathematics, Wayne State University, Detroit, Michigan, United States of America
- * E-mail: (AS); (AR); (HA)
| | - Adam Regalski
- Department of Mathematics, Wayne State University, Detroit, Michigan, United States of America
- * E-mail: (AS); (AR); (HA)
| | - Hassan Abdallah
- Department of Mathematics, Wayne State University, Detroit, Michigan, United States of America
- * E-mail: (AS); (AR); (HA)
| | - Raviteja Suryadevara
- Department of Mathematics, Wayne State University, Detroit, Michigan, United States of America
- Department of Psychiatry & Behavioral Neuroscience, Wayne State University, Detroit, Michigan, United States of America
| | - Michael J. Catanzaro
- Department of Mathematics, Iowa State University, Ames, Iowa, United States of America
| | - Vaibhav A. Diwadkar
- Department of Psychiatry & Behavioral Neuroscience, Wayne State University, Detroit, Michigan, United States of America
| |
Collapse
|