1
|
Guzmán-Vargas L, Zabaleta-Ortega A, Guzmán-Sáenz A. Simplicial complex entropy for time series analysis. Sci Rep 2023; 13:22696. [PMID: 38123652 PMCID: PMC10733285 DOI: 10.1038/s41598-023-49958-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 12/13/2023] [Indexed: 12/23/2023] Open
Abstract
The complex behavior of many systems in nature requires the application of robust methodologies capable of identifying changes in their dynamics. In the case of time series (which are sensed values of a system during a time interval), several methods have been proposed to evaluate their irregularity. However, for some types of dynamics such as stochastic and chaotic, new approaches are required that can provide a better characterization of them. In this paper we present the simplicial complex approximate entropy, which is based on the conditional probability of the occurrence of elements of a simplicial complex. Our results show that this entropy measure provides a wide range of values with details not easily identifiable with standard methods. In particular, we show that our method is able to quantify the irregularity in simulated random sequences and those from low-dimensional chaotic dynamics. Furthermore, it is possible to consistently differentiate cardiac interbeat sequences from healthy subjects and from patients with heart failure, as well as to identify changes between dynamical states of coupled chaotic maps. Our results highlight the importance of the structures revealed by the simplicial complexes, which holds promise for applications of this approach in various contexts.
Collapse
Affiliation(s)
- Lev Guzmán-Vargas
- Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, 07340, Mexico City, Mexico.
| | - Alvaro Zabaleta-Ortega
- Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, Instituto Politécnico Nacional, 07340, Mexico City, Mexico
| | - Aldo Guzmán-Sáenz
- Topological Data Analysis in Genomics, Thomas J. Watson Research Center, Yorktown Heights, NY, USA
| |
Collapse
|
2
|
Takahashi K, Abe K, Kubota SI, Fukatsu N, Morishita Y, Yoshimatsu Y, Hirakawa S, Kubota Y, Watabe T, Ehata S, Ueda HR, Shimamura T, Miyazono K. An analysis modality for vascular structures combining tissue-clearing technology and topological data analysis. Nat Commun 2022; 13:5239. [PMID: 36097010 PMCID: PMC9468184 DOI: 10.1038/s41467-022-32848-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Accepted: 08/22/2022] [Indexed: 11/17/2022] Open
Abstract
The blood and lymphatic vasculature networks are not yet fully understood even in mouse because of the inherent limitations of imaging systems and quantification methods. This study aims to evaluate the usefulness of the tissue-clearing technology for visualizing blood and lymphatic vessels in adult mouse. Clear, unobstructed brain/body imaging cocktails and computational analysis (CUBIC) enables us to capture the high-resolution 3D images of organ- or area-specific vascular structures. To evaluate these 3D structural images, signals are first classified from the original captured images by machine learning at pixel base. Then, these classified target signals are subjected to topological data analysis and non-homogeneous Poisson process model to extract geometric features. Consequently, the structural difference of vasculatures is successfully evaluated in mouse disease models. In conclusion, this study demonstrates the utility of CUBIC for analysis of vascular structures and presents its feasibility as an analysis modality in combination with 3D images and mathematical frameworks.
Collapse
Affiliation(s)
- Kei Takahashi
- Department of Molecular Pathology, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Ko Abe
- Laboratory of Medical Statistics, Pharmaceutical Science, Faculty of Pharmacy, Kobe Pharmaceutical University, 4-19-1 Motoyama-Kitamachi, Higashi-Nada-ku, Kobe, Hyogo, 658-8558, Japan
| | - Shimpei I Kubota
- Department of Molecular Pathology, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Noriaki Fukatsu
- Division of Systems Biology, Graduate School of Medicine, Nagoya University, 65 Tsurumai-Cho, Showa-ku, Nagoya, 466-8550, Japan
| | - Yasuyuki Morishita
- Department of Molecular Pathology, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Yasuhiro Yoshimatsu
- Division of Pharmacology, Graduate School of Medical and Dental Sciences, Niigata University, 1-757 Asahimachi-dori, Chuo-ku, Niigata, 951-8510, Japan
| | - Satoshi Hirakawa
- Institute for NanoSuit Research, Preeminent Medical Photonics Education & Research Center, Hamamatsu University School of Medicine, 1-20-1 Handayama, Higashi-ku, Hamamatsu, Shizuoka, 431-3125, Japan
| | - Yoshiaki Kubota
- Department of Anatomy, Keio University School of Medicine, 35 Shinanomachi, Shinjuku-ku, Tokyo, 160-8582, Japan
| | - Tetsuro Watabe
- Department of Biochemistry, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University (TMDU), 1-5-45 Yushima, Bunkyo-ku, Tokyo, 113-8549, Japan
| | - Shogo Ehata
- Department of Molecular Pathology, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
| | - Hiroki R Ueda
- Department of Systems Pharmacology, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
- Laboratory for Synthetic Biology, RIKEN Center for Biosystems Dynamics Research, 1-3 Yamadaoka, Suita, Osaka, 565-0871, Japan
| | - Teppei Shimamura
- Division of Systems Biology, Graduate School of Medicine, Nagoya University, 65 Tsurumai-Cho, Showa-ku, Nagoya, 466-8550, Japan.
| | - Kohei Miyazono
- Department of Molecular Pathology, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan.
| |
Collapse
|
3
|
Vihinen M. Individual Genetic Heterogeneity. Genes (Basel) 2022; 13:1626. [PMID: 36140794 PMCID: PMC9498725 DOI: 10.3390/genes13091626] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 08/25/2022] [Accepted: 09/08/2022] [Indexed: 11/28/2022] Open
Abstract
Genetic variation has been widely covered in literature, however, not from the perspective of an individual in any species. Here, a synthesis of genetic concepts and variations relevant for individual genetic constitution is provided. All the different levels of genetic information and variation are covered, ranging from whether an organism is unmixed or hybrid, has variations in genome, chromosomes, and more locally in DNA regions, to epigenetic variants or alterations in selfish genetic elements. Genetic constitution and heterogeneity of microbiota are highly relevant for health and wellbeing of an individual. Mutation rates vary widely for variation types, e.g., due to the sequence context. Genetic information guides numerous aspects in organisms. Types of inheritance, whether Mendelian or non-Mendelian, zygosity, sexual reproduction, and sex determination are covered. Functions of DNA and functional effects of variations are introduced, along with mechanism that reduce and modulate functional effects, including TARAR countermeasures and intraindividual genetic conflict. TARAR countermeasures for tolerance, avoidance, repair, attenuation, and resistance are essential for life, integrity of genetic information, and gene expression. The genetic composition, effects of variations, and their expression are considered also in diseases and personalized medicine. The text synthesizes knowledge and insight on individual genetic heterogeneity and organizes and systematizes the central concepts.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22184 Lund, Sweden
| |
Collapse
|
4
|
Migdałek G, Żelawski M. Measuring population-level plant gene flow with topological data analysis. ECOL INFORM 2022. [DOI: 10.1016/j.ecoinf.2022.101740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
5
|
Lee B, Cyrill SL, Lee W, Melchiotti R, Andiappan AK, Poidinger M, Rötzschke O. Analysis of archaic human haplotypes suggests that 5hmC acts as an epigenetic guide for NCO recombination. BMC Biol 2022; 20:173. [PMID: 35927700 PMCID: PMC9354366 DOI: 10.1186/s12915-022-01353-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2021] [Accepted: 06/17/2022] [Indexed: 11/17/2022] Open
Abstract
Background Non-crossover (NCO) refers to a mechanism of homologous recombination in which short tracks of DNA are copied between homologue chromatids. The allelic changes are typically restricted to one or few SNPs, which potentially allow for the gradual adaptation and maturation of haplotypes. It is assumed to be a stochastic process but the analysis of archaic and modern human haplotypes revealed a striking variability in local NCO recombination rates. Methods NCO recombination rates of 1.9 million archaic SNPs shared with Denisovan hominids were defined by a linkage study and correlated with functional and genomic annotations as well as ChIP-Seq data from modern humans. Results We detected a strong correlation between NCO recombination rates and the function of the respective region: low NCO rates were evident in introns and quiescent intergenic regions but high rates in splice sites, exons, 5′- and 3′-UTRs, as well as CpG islands. Correlations with ChIP-Seq data from ENCODE and other public sources further identified epigenetic modifications that associated directly with these recombination events. A particularly strong association was observed for 5-hydroxymethylcytosine marks (5hmC), which were enriched in virtually all of the functional regions associated with elevated NCO rates, including CpG islands and ‘poised’ bivalent regions. Conclusion Our results suggest that 5hmC marks may guide the NCO machinery specifically towards functionally relevant regions and, as an intermediate of oxidative demethylation, may open a pathway for environmental influence by specifically targeting recently opened gene loci. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-022-01353-9.
Collapse
Affiliation(s)
- Bernett Lee
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore.,Present address: Lee Kong Chian School of Medicine, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore
| | - Samantha Leeanne Cyrill
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore.,Present address: Cold Spring Harbor Laboratory, One Bungtown Road, NY, 11724, Cold Spring Harbor, USA
| | - Wendy Lee
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore
| | - Rossella Melchiotti
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore
| | - Anand Kumar Andiappan
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore
| | - Michael Poidinger
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore.,Present address: Murdoch Children's Research Institute, Royal Children's Hospital, Flemington Road, Parkville, Victoria, 3052, Australia
| | - Olaf Rötzschke
- Singapore Immunology Network (SIgN), Agency of Science Technology and Research (A*STAR), 8A Biomedical Drive, Singapore, 138648, Singapore.
| |
Collapse
|
6
|
Loughrey C, Fitzpatrick P, Orr N, Jurek-Loughrey A. The topology of data: Opportunities for cancer research. Bioinformatics 2021; 37:3091-3098. [PMID: 34320632 PMCID: PMC8504620 DOI: 10.1093/bioinformatics/btab553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 06/14/2021] [Accepted: 07/28/2021] [Indexed: 01/20/2023] Open
Abstract
Motivation Topological methods have recently emerged as a reliable and interpretable framework for extracting information from high-dimensional data, leading to the creation of a branch of applied mathematics called Topological Data Analysis (TDA). Since then, TDA has been progressively adopted in biomedical research. Biological data collection can result in enormous datasets, comprising thousands of features and spanning diverse datatypes. This presents a barrier to initial data analysis as the fundamental structure of the dataset becomes hidden, obstructing the discovery of important features and patterns. TDA provides a solution to obtain the underlying shape of datasets over continuous resolutions, corresponding to key topological features independent of noise. TDA has the potential to support future developments in healthcare as biomedical datasets rise in complexity and dimensionality. Previous applications extend across the fields of neuroscience, oncology, immunology and medical image analysis. TDA has been used to reveal hidden subgroups of cancer patients, construct organizational maps of brain activity and classify abnormal patterns in medical images. The utility of TDA is broad and to understand where current achievements lie, we have evaluated the present state of TDA in cancer data analysis. Results This article aims to provide an overview of TDA in Cancer Research. A brief introduction to the main concepts of TDA is provided to ensure that the article is accessible to readers who are not familiar with this field. Following this, a focussed literature review on the field is presented, discussing how TDA has been applied across heterogeneous datatypes for cancer research.
Collapse
Affiliation(s)
- Ciara Loughrey
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, BT9 5BN, United Kingdom
| | - Padraig Fitzpatrick
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, BT9 5BN, United Kingdom
| | - Nick Orr
- Patrick G Johnston Centre for Cancer Research, Queen's University Belfast, BT9 7AE, United Kingdom
| | - Anna Jurek-Loughrey
- School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, BT9 5BN, United Kingdom
| |
Collapse
|
7
|
Abstract
Viral recombination is a major evolutionary mechanism driving adaptation processes, such as the ability of host-switching. Understanding global patterns of recombination could help to identify underlying mechanisms and to evaluate the potential risks of rapid adaptation. Conventional approaches (e.g., those based on linkage disequilibrium) are computationally demanding or even intractable when sequence alignments include hundreds of sequences, common in viral data sets. We present a comprehensive analysis of recombination across 30 genomic alignments from viruses infecting humans. In order to scale the analysis and avoid the computational limitations of conventional approaches, we apply newly developed topological data analysis methods able to infer recombination rates for large data sets. We show that viruses, such as ZEBOV and MARV, consistently displayed low levels of recombination, whereas high levels of recombination were observed in Sarbecoviruses, HBV, HEV, Rhinovirus A, and HIV. We observe that recombination is more common in positive single-stranded RNA viruses than in negatively single-stranded RNA ones. Interestingly, the comparison across multiple viruses suggests an inverse correlation between genome length and recombination rate. Positional analyses of recombination breakpoints along viral genomes, combined with our approach, detected at least 39 nonuniform patterns of recombination (i.e., cold or hotspots) in 18 viral groups. Among these, noteworthy hotspots are found in MERS-CoV and Sarbecoviruses (at spike, Nucleocapsid and ORF8). In summary, we have developed a fast pipeline to measure recombination that, combined with other approaches, has allowed us to find both common and lineage-specific patterns of recombination among viruses with potential relevance in viral adaptation.
Collapse
Affiliation(s)
- Juan Ángel Patiño-Galindo
- Program for Mathematical Genomics, Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY, USA
| | - Ioan Filip
- Program for Mathematical Genomics, Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY, USA
| | - Raul Rabadan
- Program for Mathematical Genomics, Departments of Systems Biology and Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
8
|
Bukkuri A, Andor N, Darcy IK. Applications of Topological Data Analysis in Oncology. Front Artif Intell 2021; 4:659037. [PMID: 33928240 PMCID: PMC8076640 DOI: 10.3389/frai.2021.659037] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 03/16/2021] [Indexed: 12/12/2022] Open
Abstract
The emergence of the information age in the last few decades brought with it an explosion of biomedical data. But with great power comes great responsibility: there is now a pressing need for new data analysis algorithms to be developed to make sense of the data and transform this information into knowledge which can be directly translated into the clinic. Topological data analysis (TDA) provides a promising path forward: using tools from the mathematical field of algebraic topology, TDA provides a framework to extract insights into the often high-dimensional, incomplete, and noisy nature of biomedical data. Nowhere is this more evident than in the field of oncology, where patient-specific data is routinely presented to clinicians in a variety of forms, from imaging to single cell genomic sequencing. In this review, we focus on applications involving persistent homology, one of the main tools of TDA. We describe some recent successes of TDA in oncology, specifically in predicting treatment responses and prognosis, tumor segmentation and computer-aided diagnosis, disease classification, and cellular architecture determination. We also provide suggestions on avenues for future research including utilizing TDA to analyze cancer time-series data such as gene expression changes during pathogenesis, investigation of the relation between angiogenic vessel structure and treatment efficacy from imaging data, and experimental confirmation that geometric and topological connectivity implies functional connectivity in the context of cancer.
Collapse
Affiliation(s)
- Anuraag Bukkuri
- Department of Integrated Mathematical Oncology, Moffitt Cancer Center, Tampa, FL, United States
| | - Noemi Andor
- Department of Integrated Mathematical Oncology, Moffitt Cancer Center, Tampa, FL, United States
| | - Isabel K. Darcy
- Department of Mathematics, University of Iowa, Iowa City, IA, United States
| |
Collapse
|
9
|
Sardiu ME, Box AC, Haug JS, Washburn MP. Identification of stem cells from large cell populations with topological scoring. Mol Omics 2020; 17:59-65. [PMID: 32924050 DOI: 10.1039/d0mo00039f] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Machine learning and topological analysis methods are becoming increasingly used on various large-scale omics datasets. Modern high dimensional flow cytometry data sets share many features with other omics datasets like genomics and proteomics. For example, genomics or proteomics datasets can be sparse and have high dimensionality, and flow cytometry datasets can also share these features. This makes flow cytometry data potentially a suitable candidate for employing machine learning and topological scoring strategies, for example, to gain novel insights into patterns within the data. We have previously developed a Topological Score (TopS) and implemented it for the analysis of quantitative protein interaction network datasets. Here we show that TopS approach for large scale data analysis is applicable to the analysis of a previously described flow cytometry sorted human hematopoietic stem cell dataset. We demonstrate that TopS is capable of effectively sorting this dataset into cell populations and identify rare cell populations. We demonstrate the utility of TopS when coupled with multiple approaches including topological data analysis, X-shift clustering, and t-Distributed Stochastic Neighbor Embedding (t-SNE). Our results suggest that TopS could be effectively used to analyze large scale flow cytometry datasets to find rare cell populations.
Collapse
Affiliation(s)
- Mihaela E Sardiu
- Stowers Institute for Medical Research, 1000 E. 50th St, Kansas City, MO 64110, USA.
| | | | | | | |
Collapse
|
10
|
Rabadán R, Mohamedi Y, Rubin U, Chu T, Alghalith AN, Elliott O, Arnés L, Cal S, Obaya ÁJ, Levine AJ, Cámara PG. Identification of relevant genetic alterations in cancer using topological data analysis. Nat Commun 2020; 11:3808. [PMID: 32732999 PMCID: PMC7393176 DOI: 10.1038/s41467-020-17659-7] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 07/09/2020] [Indexed: 01/05/2023] Open
Abstract
Large-scale cancer genomic studies enable the systematic identification of mutations that lead to the genesis and progression of tumors, uncovering the underlying molecular mechanisms and potential therapies. While some such mutations are recurrently found in many tumors, many others exist solely within a few samples, precluding detection by conventional recurrence-based statistical approaches. Integrated analysis of somatic mutations and RNA expression data across 12 tumor types reveals that mutations of cancer genes are usually accompanied by substantial changes in expression. We use topological data analysis to leverage this observation and uncover 38 elusive candidate cancer-associated genes, including inactivating mutations of the metalloproteinase ADAMTS12 in lung adenocarcinoma. We show that ADAMTS12-/- mice have a five-fold increase in the susceptibility to develop lung tumors, confirming the role of ADAMTS12 as a tumor suppressor gene. Our results demonstrate that data integration through topological techniques can increase our ability to identify previously unreported cancer-related alterations.
Collapse
Affiliation(s)
- Raúl Rabadán
- Departments of Systems Biology and Biomedical Informatics, Columbia University, 1130 St. Nicholas Ave., New York, NY, 10032, USA.
| | - Yamina Mohamedi
- Departamento de Bioquimica y Biologia Molecular, Universidad de Oviedo, Oviedo, Asturias, Spain
- IUOPA, Instituto Universitario de Oncologia, Oviedo, Asturias, Spain
| | - Udi Rubin
- Departments of Systems Biology and Biomedical Informatics, Columbia University, 1130 St. Nicholas Ave., New York, NY, 10032, USA
- Memorial Sloan Kettering Cancer Center, 1275 York Ave, New York, NY, 10065, USA
| | - Tim Chu
- Departments of Systems Biology and Biomedical Informatics, Columbia University, 1130 St. Nicholas Ave., New York, NY, 10032, USA
| | - Adam N Alghalith
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA, 19104, USA
| | - Oliver Elliott
- Departments of Systems Biology and Biomedical Informatics, Columbia University, 1130 St. Nicholas Ave., New York, NY, 10032, USA
| | - Luis Arnés
- Departments of Systems Biology and Biomedical Informatics, Columbia University, 1130 St. Nicholas Ave., New York, NY, 10032, USA
| | - Santiago Cal
- Departamento de Bioquimica y Biologia Molecular, Universidad de Oviedo, Oviedo, Asturias, Spain
- IUOPA, Instituto Universitario de Oncologia, Oviedo, Asturias, Spain
| | - Álvaro J Obaya
- IUOPA, Instituto Universitario de Oncologia, Oviedo, Asturias, Spain
- Departamento de Biologia Funcional, Universidad de Oviedo, Oviedo, Asturias, Spain
| | - Arnold J Levine
- The Simons Center for Systems Biology, Institute for Advanced Study, Princeton, NJ, 08540, USA.
| | - Pablo G Cámara
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 3400 Civic Center Blvd., Philadelphia, PA, 19104, USA.
| |
Collapse
|
11
|
Tokodi M, Shrestha S, Bianco C, Kagiyama N, Casaclang-Verzosa G, Narula J, Sengupta PP. Interpatient Similarities in Cardiac Function: A Platform for Personalized Cardiovascular Medicine. JACC Cardiovasc Imaging 2020; 13:1119-1132. [PMID: 32199835 PMCID: PMC7556337 DOI: 10.1016/j.jcmg.2019.12.018] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/02/2019] [Revised: 10/31/2019] [Accepted: 12/19/2019] [Indexed: 12/20/2022]
Abstract
OBJECTIVES The authors applied unsupervised machine-learning techniques for integrating echocardiographic features of left ventricular (LV) structure and function into a patient similarity network that predicted major adverse cardiac event(s) (MACE) in an individual patient. BACKGROUND Patient similarity analysis is an evolving paradigm for precision medicine in which patients are clustered or classified based on their similarities in several clinical features. METHODS A retrospective cohort of 866 patients was used to develop a network architecture using 9 echocardiographic features of LV structure and function. The data for 468 patients from 2 prospective cohort registries were then added to test the model's generalizability. RESULTS The map of cross-sectional data in the retrospective cohort resulted in a looped patient network that persisted even after the addition of data from the prospective cohort registries. After subdividing the loop into 4 regions, patients in each region showed unique differences in LV function, with Kaplan-Meier curves demonstrating significant differences in MACE-related rehospitalization and death (both p < 0.001). Addition of network information to clinical risk predictors resulted in significant improvements in net reclassification, integrated discrimination, and median risk scores for predicting MACE (p < 0.05 for all). Furthermore, the network predicted the cardiac disease cycle in each of the 96 patients who had second echocardiographic evaluations. An improvement or remaining in low-risk regions was associated with lower MACE-related rehospitalization rates than worsening or remaining in high-risk regions (3% vs. 37%; p < 0.001). CONCLUSIONS Patient similarity analysis integrates multiple features of cardiac function to develop a phenotypic network in which patients can be mapped to specific locations associated with specific disease stage and clinical outcomes. The use of patient similarity analysis may have relevance for automated staging of cardiac disease severity, personalized prediction of prognosis, and monitoring progression or response to therapies.
Collapse
Affiliation(s)
- Márton Tokodi
- Division of Cardiology, West Virginia University Heart & Vascular Institute, Morgantown, West Virginia; Heart and Vascular Center, Semmelweis University, Budapest, Hungary
| | - Sirish Shrestha
- Division of Cardiology, West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | - Christopher Bianco
- Division of Cardiology, West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | - Nobuyuki Kagiyama
- Division of Cardiology, West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | - Grace Casaclang-Verzosa
- Division of Cardiology, West Virginia University Heart & Vascular Institute, Morgantown, West Virginia
| | - Jagat Narula
- Division of Cardiology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Partho P Sengupta
- Division of Cardiology, West Virginia University Heart & Vascular Institute, Morgantown, West Virginia.
| |
Collapse
|
12
|
Amézquita EJ, Quigley MY, Ophelders T, Munch E, Chitwood DH. The shape of things to come: Topological data analysis and biology, from molecules to organisms. Dev Dyn 2020; 249:816-833. [PMID: 32246730 PMCID: PMC7383827 DOI: 10.1002/dvdy.175] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2020] [Revised: 03/29/2020] [Accepted: 03/29/2020] [Indexed: 11/11/2022] Open
Abstract
Shape is data and data is shape. Biologists are accustomed to thinking about how the shape of biomolecules, cells, tissues, and organisms arise from the effects of genetics, development, and the environment. Less often do we consider that data itself has shape and structure, or that it is possible to measure the shape of data and analyze it. Here, we review applications of topological data analysis (TDA) to biology in a way accessible to biologists and applied mathematicians alike. TDA uses principles from algebraic topology to comprehensively measure shape in data sets. Using a function that relates the similarity of data points to each other, we can monitor the evolution of topological features-connected components, loops, and voids. This evolution, a topological signature, concisely summarizes large, complex data sets. We first provide a TDA primer for biologists before exploring the use of TDA across biological sub-disciplines, spanning structural biology, molecular biology, evolution, and development. We end by comparing and contrasting different TDA approaches and the potential for their use in biology. The vision of TDA, that data are shape and shape is data, will be relevant as biology transitions into a data-driven era where the meaningful interpretation of large data sets is a limiting factor.
Collapse
Affiliation(s)
- Erik J Amézquita
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Michelle Y Quigley
- Department of Horticulture, Michigan State University, East Lansing, Michigan, USA
| | - Tim Ophelders
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA
| | - Elizabeth Munch
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA.,Department of Mathematics, Michigan State University, East Lansing, Michigan, USA
| | - Daniel H Chitwood
- Department of Computational Mathematics, Science & Engineering, Michigan State University, East Lansing, Michigan, USA.,Department of Horticulture, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
13
|
Mathews JC, Pouryahya M, Moosmüller C, Kevrekidis YG, Deasy JO, Tannenbaum A. Molecular phenotyping using networks, diffusion, and topology: soft tissue sarcoma. Sci Rep 2019; 9:13982. [PMID: 31562358 PMCID: PMC6764992 DOI: 10.1038/s41598-019-50300-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2018] [Accepted: 09/06/2019] [Indexed: 11/24/2022] Open
Abstract
Many biological datasets are high-dimensional yet manifest an underlying order. In this paper, we describe an unsupervised data analysis methodology that operates in the setting of a multivariate dataset and a network which expresses influence between the variables of the given set. The technique involves network geometry employing the Wasserstein distance, global spectral analysis in the form of diffusion maps, and topological data analysis using the Mapper algorithm. The prototypical application is to gene expression profiles obtained from RNA-Seq experiments on a collection of tissue samples, considering only genes whose protein products participate in a known pathway or network of interest. Employing the technique, we discern several coherent states or signatures displayed by the gene expression profiles of the sarcomas in the Cancer Genome Atlas along the TP53 (p53) signaling network. The signatures substantially recover the leiomyosarcoma, dedifferentiated liposarcoma (DDLPS), and synovial sarcoma histological subtype diagnoses, and they also include a new signature defined by activation and inactivation of about a dozen genes, including activation of serine endopeptidase inhibitor SERPINE1 and inactivation of TP53-family tumor suppressor gene TP73.
Collapse
Affiliation(s)
- James C Mathews
- Department of Medical Physics, Memorial Sloan-Kettering Cancer Center, New York, USA.
| | - Maryam Pouryahya
- Department of Medical Physics, Memorial Sloan-Kettering Cancer Center, New York, USA
| | - Caroline Moosmüller
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, USA
| | - Yannis G Kevrekidis
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, USA
| | - Joseph O Deasy
- Department of Medical Physics, Memorial Sloan-Kettering Cancer Center, New York, USA
| | - Allen Tannenbaum
- Departments of Computer Science and Applied Mathematics & Statistics, Stony Brook University, Stony Brook, USA
| |
Collapse
|
14
|
A Primer on Persistent Homology of Finite Metric Spaces. Bull Math Biol 2019; 81:2074-2116. [PMID: 31140053 DOI: 10.1007/s11538-019-00614-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2018] [Accepted: 05/10/2019] [Indexed: 10/26/2022]
Abstract
Topological data analysis (TDA) is a relatively new area of research related to importing classical ideas from topology into the realm of data analysis. Under the umbrella term TDA, there falls, in particular, the notion of persistent homology PH, which can be described in a nutshell, as the study of scale-dependent homological invariants of datasets. In these notes, we provide a terse self-contained description of the main ideas behind the construction of persistent homology as an invariant feature of datasets, and its stability to perturbations.
Collapse
|
15
|
Seetharam K, Shrestha S, Sengupta PP. Artificial Intelligence in Cardiovascular Medicine. CURRENT TREATMENT OPTIONS IN CARDIOVASCULAR MEDICINE 2019; 21:25. [PMID: 31089906 PMCID: PMC7561035 DOI: 10.1007/s11936-019-0728-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
PURPOSE OF REVIEW The ripples of artificial intelligence are being felt in various sectors of human life. Machine learning, a subset of artificial intelligence, extracts information from large databases of information and is gaining traction in various fields of cardiology. In this review, we highlight noteworthy examples of machine learning utilization in echocardiography, nuclear cardiology, computed tomography, and magnetic resonance imaging over the past year. RECENT FINDINGS In the past year, machine learning (ML) has expanded its boundaries in cardiology with several positive results. Some studies have integrated clinical and imaging information to further augment the accuracy of these ML algorithms. All the studies mentioned in this review have clearly demonstrated superior results of ML in relation to conventional approaches for identifying obstructions or predicting major adverse events in reference to conventional approaches. As the influx of data arriving from gradually evolving technologies in health care and wearable devices continues to be more complex, ML may serve as the bridge to transcend the gap between health care and patients in the future. In order to facilitate a seamless transition between both, a few issues must be resolved for a successful implementation of ML in health care.
Collapse
Affiliation(s)
- Karthik Seetharam
- WVU Heart & Vascular Institute, 1 Medical Center Drive, Morgantown, WV, 26506, USA
| | - Sirish Shrestha
- WVU Heart & Vascular Institute, 1 Medical Center Drive, Morgantown, WV, 26506, USA
| | - Partho P Sengupta
- WVU Heart & Vascular Institute, 1 Medical Center Drive, Morgantown, WV, 26506, USA.
| |
Collapse
|
16
|
Fast Estimation of Recombination Rates Using Topological Data Analysis. Genetics 2019; 211:1191-1204. [PMID: 30787042 DOI: 10.1534/genetics.118.301565] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2018] [Accepted: 02/13/2019] [Indexed: 01/26/2023] Open
Abstract
Accurate estimation of recombination rates is critical for studying the origins and maintenance of genetic diversity. Because the inference of recombination rates under a full evolutionary model is computationally expensive, we developed an alternative approach using topological data analysis (TDA) on genome sequences. We find that this method can analyze datasets larger than what can be handled by any existing recombination inference software, and has accuracy comparable to commonly used model-based methods with significantly less processing time. Previous TDA methods used information contained solely in the first Betti number ([Formula: see text]) of a set of genomes, which aims to capture the number of loops that can be detected within a genealogy. These explorations have proven difficult to connect to the theory of the underlying biological process of recombination, and, consequently, have unpredictable behavior under perturbations of the data. We introduce a new topological feature, which we call ψ, with a natural connection to coalescent models, and present novel arguments relating [Formula: see text] to population genetic models. Using simulations, we show that ψ and [Formula: see text] are differentially affected by missing data, and package our approach as TREE (Topological Recombination Estimator). TREE's efficiency and accuracy make it well suited as a first-pass estimator of recombination rate heterogeneity or hotspots throughout the genome. Our work empirically and theoretically justifies the use of topological statistics as summaries of genome sequences and describes a new, unintuitive relationship between topological features of the distribution of sequence data and the footprint of recombination on genomes.
Collapse
|
17
|
Gene Coexpression Network Comparison via Persistent Homology. Int J Genomics 2018; 2018:7329576. [PMID: 30327773 PMCID: PMC6169238 DOI: 10.1155/2018/7329576] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Revised: 07/21/2018] [Accepted: 07/26/2018] [Indexed: 11/17/2022] Open
Abstract
Persistent homology, a topological data analysis (TDA) method, is applied to microarray data sets. Although there are a few papers referring to TDA methods in microarray analysis, the usage of persistent homology in the comparison of several weighted gene coexpression networks (WGCN) was not employed before to the very best of our knowledge. We calculate the persistent homology of weighted networks constructed from 38 Arabidopsis microarray data sets to test the relevance and the success of this approach in distinguishing the stress factors. We quantify multiscale topological features of each network using persistent homology and apply a hierarchical clustering algorithm to the distance matrix whose entries are pairwise bottleneck distance between the networks. The immunoresponses to different stress factors are distinguishable by our method. The networks of similar immunoresponses are found to be close with respect to bottleneck distance indicating the similar topological features of WGCNs. This computationally efficient technique analyzing networks provides a quick test for advanced studies.
Collapse
|
18
|
Pharmacogenomic landscape of patient-derived tumor cells informs precision oncology therapy. Nat Genet 2018; 50:1399-1411. [DOI: 10.1038/s41588-018-0209-6] [Citation(s) in RCA: 110] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 07/27/2018] [Indexed: 02/07/2023]
|
19
|
Duponchel L. Exploring hyperspectral imaging data sets with topological data analysis. Anal Chim Acta 2018; 1000:123-131. [PMID: 29289301 DOI: 10.1016/j.aca.2017.11.029] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2017] [Revised: 11/16/2017] [Accepted: 11/17/2017] [Indexed: 11/15/2022]
Affiliation(s)
- Ludovic Duponchel
- LASIR CNRS UMR 8516, Université Lille 1, Sciences et Technologies, 59655 Villeneuve d'Ascq Cedex, France.
| |
Collapse
|
20
|
Johnson KW, Shameer K, Glicksberg BS, Readhead B, Sengupta PP, Björkegren JLM, Kovacic JC, Dudley JT. Enabling Precision Cardiology Through Multiscale Biology and Systems Medicine. ACTA ACUST UNITED AC 2017; 2:311-327. [PMID: 30062151 PMCID: PMC6034501 DOI: 10.1016/j.jacbts.2016.11.010] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Revised: 11/29/2016] [Accepted: 11/30/2016] [Indexed: 12/20/2022]
Abstract
The traditional paradigm of cardiovascular disease research derives insight from large-scale, broadly inclusive clinical studies of well-characterized pathologies. These insights are then put into practice according to standardized clinical guidelines. However, stagnation in the development of new cardiovascular therapies and variability in therapeutic response implies that this paradigm is insufficient for reducing the cardiovascular disease burden. In this state-of-the-art review, we examine 3 interconnected ideas we put forth as key concepts for enabling a transition to precision cardiology: 1) precision characterization of cardiovascular disease with machine learning methods; 2) the application of network models of disease to embrace disease complexity; and 3) using insights from the previous 2 ideas to enable pharmacology and polypharmacology systems for more precise drug-to-patient matching and patient-disease stratification. We conclude by exploring the challenges of applying a precision approach to cardiology, which arise from a deficit of the required resources and infrastructure, and emerging evidence for the clinical effectiveness of this nascent approach.
Collapse
Affiliation(s)
- Kipp W Johnson
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York, New York.,Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Khader Shameer
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York, New York.,Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Benjamin S Glicksberg
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York, New York.,Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Ben Readhead
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York, New York.,Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Partho P Sengupta
- The Zena and Michael A. Wiener Cardiovascular Institute, Icahn School of Medicine at Mount Sinai, New York, New York.,Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Johan L M Björkegren
- Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York.,Department of Medical Biochemistry and Biophysics Vascular Biology Unit, Karolinska Institutet, Stockholm, Sweden
| | - Jason C Kovacic
- The Zena and Michael A. Wiener Cardiovascular Institute, Icahn School of Medicine at Mount Sinai, New York, New York.,Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York
| | - Joel T Dudley
- Institute for Next Generation Healthcare, Mount Sinai Health System, New York, New York.,Department of Genetics and Genomic Sciences, Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York.,Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York
| |
Collapse
|
21
|
Single-cell topological RNA-seq analysis reveals insights into cellular differentiation and development. Nat Biotechnol 2017; 35:551-560. [PMID: 28459448 PMCID: PMC5569300 DOI: 10.1038/nbt.3854] [Citation(s) in RCA: 148] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2016] [Accepted: 03/20/2017] [Indexed: 12/29/2022]
Abstract
Transcriptional programs control cellular lineage commitment and differentiation during development. Understanding of cell fate has been advanced by studying single-cell RNA-sequencing (RNA-seq) but is limited by the assumptions of current analytic methods regarding the structure of data. We present single-cell topological data analysis (scTDA), an algorithm for topology-based computational analyses to study temporal, unbiased transcriptional regulation. Unlike other methods, scTDA is a nonlinear, model-independent, unsupervised statistical framework that can characterize transient cellular states. We applied scTDA to the analysis of murine embryonic stem cell (mESC) differentiation in vitro in response to inducers of motor neuron differentiation. scTDA resolved asynchrony and continuity in cellular identity over time and identified four transient states (pluripotent, precursor, progenitor, and fully differentiated cells) based on changes in stage-dependent combinations of transcription factors, RNA-binding proteins, and long noncoding RNAs (lncRNAs). scTDA can be applied to study asynchronous cellular responses to either developmental cues or environmental perturbations.
Collapse
|
22
|
Sardiu ME, Gilmore JM, Groppe B, Florens L, Washburn MP. Identification of Topological Network Modules in Perturbed Protein Interaction Networks. Sci Rep 2017; 7:43845. [PMID: 28272416 PMCID: PMC5341041 DOI: 10.1038/srep43845] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2016] [Accepted: 01/30/2017] [Indexed: 12/31/2022] Open
Abstract
Biological networks consist of functional modules, however detecting and characterizing such modules in networks remains challenging. Perturbing networks is one strategy for identifying modules. Here we used an advanced mathematical approach named topological data analysis (TDA) to interrogate two perturbed networks. In one, we disrupted the S. cerevisiae INO80 protein interaction network by isolating complexes after protein complex components were deleted from the genome. In the second, we reanalyzed previously published data demonstrating the disruption of the human Sin3 network with a histone deacetylase inhibitor. Here we show that disrupted networks contained topological network modules (TNMs) with shared properties that mapped onto distinct locations in networks. We define TMNs as proteins that occupy close network positions depending on their coordinates in a topological space. TNMs provide new insight into networks by capturing proteins from different categories including proteins within a complex, proteins with shared biological functions, and proteins disrupted across networks.
Collapse
Affiliation(s)
- Mihaela E Sardiu
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Joshua M Gilmore
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Brad Groppe
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Laurence Florens
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA
| | - Michael P Washburn
- Stowers Institute for Medical Research, Kansas City, MO 64110, USA.,Department of Pathology and Laboratory Medicine, The University of Kansas Medical Center, 3901 Rainbow Boulevard, Kansas City, Kansas 66160, USA
| |
Collapse
|
23
|
Abstract
Topological methods are emerging as a new set of tools for the analysis of large genomic datasets. They are mathematically grounded methods that extract information from the geometric structure of data. In the last few years, applications to evolutionary biology, cancer genomics, and the analysis of complex diseases have uncovered significant biological results, highlighting their utility for fulfilling some of the current analytic needs of genomics. In this review, the state of the art in the application of topological methods to genomics is summarized, and some of the present limitations and possible future developments are reviewed.
Collapse
|