1
|
Desai N, Morris JS, Baladandayuthapani V. NetCellMatch: Multiscale Network-Based Matching of Cancer Cell Lines to Patients Using Graphical Wavelets. Chem Biodivers 2022; 19:e202200746. [PMID: 36279370 PMCID: PMC10066864 DOI: 10.1002/cbdv.202200746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 10/21/2022] [Indexed: 12/27/2022]
Abstract
Cancer cell lines serve as model in vitro systems for investigating therapeutic interventions. Recent advances in high-throughput genomic profiling have enabled the systematic comparison between cell lines and patient tumor samples. The highly interconnected nature of biological data, however, presents a challenge when mapping patient tumors to cell lines. Standard clustering methods can be particularly susceptible to the high level of noise present in these datasets and only output clusters at one unknown scale of the data. In light of these challenges, we present NetCellMatch, a robust framework for network-based matching of cell lines to patient tumors. NetCellMatch first constructs a global network across all cell line-patient samples using their genomic similarity. Then, a multi-scale community detection algorithm integrates information across topologically meaningful (clustering) scales to obtain Network-Based Matching Scores (NBMS). NBMS are measures of cluster robustness which map patient tumors to cell lines. We use NBMS to determine representative "avatar" cell lines for subgroups of patients. We apply NetCellMatch to reverse-phase protein array data obtained from The Cancer Genome Atlas for patients and the MD Anderson Cell Line Project for cell lines. Along with avatar cell line identification, we evaluate connectivity patterns for breast, lung, and colon cancer and explore the proteomic profiles of avatars and their corresponding top matching patients. Our results demonstrate our framework's ability to identify both patient-cell line matches and potential proteomic drivers of similarity. Our methods are general and can be easily adapted to other'omic datasets.
Collapse
Affiliation(s)
- Neel Desai
- Division of Biostatistics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | - Jeffrey S Morris
- Division of Biostatistics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA 19104, USA
| | | |
Collapse
|
2
|
Pancaldi V. Chromatin Network Analyses: Towards Structure-Function Relationships in Epigenomics. FRONTIERS IN BIOINFORMATICS 2021; 1:742216. [PMID: 36303769 PMCID: PMC9581029 DOI: 10.3389/fbinf.2021.742216] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 10/04/2021] [Indexed: 01/16/2023] Open
Abstract
Recent technological advances have allowed us to map chromatin conformation and uncover the genome's spatial organization of the genome inside the nucleus. These experiments have revealed the complexities of genome folding, characterized by the presence of loops and domains at different scales, which can change across development and in different cell types. There is strong evidence for a relationship between the topological properties of chromatin contacts and cellular phenotype. Chromatin can be represented as a network, in which genomic fragments are the nodes and connections represent experimentally observed spatial proximity of two genomically distant regions in a specific cell type or biological condition. With this approach we can consider a variety of chromatin features in association with the 3D structure, investigating how nuclear chromatin organization can be related to gene regulation, replication, malignancy, phenotypic variability and plasticity. We briefly review the results obtained on genome architecture through network theoretic approaches. As previously observed in protein-protein interaction networks and many types of non-biological networks, external conditions could shape network topology through a yet unidentified structure-function relationship. Similar to scientists studying the brain, we are confronted with a duality between a spatially embedded network of physical contacts, a related network of correlation in the dynamics of network nodes and, finally, an abstract definition of function of this network, related to phenotype. We summarise major developments in the study of networks in other fields, which we think can suggest a path towards better understanding how 3D genome configuration can impact biological function and adaptation to the environment.
Collapse
Affiliation(s)
- Vera Pancaldi
- Centre de Recherches en Cancérologie de Toulouse (CRCT), Institut National de la Santé et de la Recherche Médicale (Inserm) U1037, Centre National de la Recherche Scientifique (CNRS) U5071, Université Paul Sabatier, Toulouse, France
- Barcelona Supercomputing Center, Barcelona, Spain
| |
Collapse
|
3
|
Cresswell KG, Stansfield JC, Dozmorov MG. SpectralTAD: an R package for defining a hierarchy of topologically associated domains using spectral clustering. BMC Bioinformatics 2020; 21:319. [PMID: 32689928 PMCID: PMC7372752 DOI: 10.1186/s12859-020-03652-w] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 07/10/2020] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND The three-dimensional (3D) structure of the genome plays a crucial role in gene expression regulation. Chromatin conformation capture technologies (Hi-C) have revealed that the genome is organized in a hierarchy of topologically associated domains (TADs), sub-TADs, and chromatin loops. Identifying such hierarchical structures is a critical step in understanding genome regulation. Existing tools for TAD calling are frequently sensitive to biases in Hi-C data, depend on tunable parameters, and are computationally inefficient. METHODS To address these challenges, we developed a novel sliding window-based spectral clustering framework that uses gaps between consecutive eigenvectors for TAD boundary identification. RESULTS Our method, implemented in an R package, SpectralTAD, detects hierarchical, biologically relevant TADs, has automatic parameter selection, is robust to sequencing depth, resolution, and sparsity of Hi-C data. SpectralTAD outperforms four state-of-the-art TAD callers in simulated and experimental settings. We demonstrate that TAD boundaries shared among multiple levels of the TAD hierarchy were more enriched in classical boundary marks and more conserved across cell lines and tissues. In contrast, boundaries of TADs that cannot be split into sub-TADs showed less enrichment and conservation, suggesting their more dynamic role in genome regulation. CONCLUSION SpectralTAD is available on Bioconductor, http://bioconductor.org/packages/SpectralTAD/ .
Collapse
Affiliation(s)
- Kellen G. Cresswell
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA USA
| | - John C. Stansfield
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA USA
| |
Collapse
|
4
|
Kumar V, Leclerc S, Taniguchi Y. BHi-Cect: a top-down algorithm for identifying the multi-scale hierarchical structure of chromosomes. Nucleic Acids Res 2020; 48:e26. [PMID: 32009153 PMCID: PMC7049727 DOI: 10.1093/nar/gkaa004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2019] [Revised: 12/23/2019] [Accepted: 01/05/2020] [Indexed: 01/15/2023] Open
Abstract
High-throughput chromosome conformation capture (Hi-C) technology enables the investigation of genome-wide interactions among chromosome loci. Current algorithms focus on topologically associating domains (TADs), that are contiguous clusters along the genome coordinate, to describe the hierarchical structure of chromosomes. However, high resolution Hi-C displays a variety of interaction patterns beyond what current TAD detection methods can capture. Here, we present BHi-Cect, a novel top-down algorithm that finds clusters by considering every locus with no assumption of genomic contiguity using spectral clustering. Our results reveal that the hierarchical structure of chromosome is organized as ‘enclaves’, which are complex interwoven clusters at both local and global scales. We show that the nesting of local clusters within global clusters characterizing enclaves, is associated with the epigenomic activity found on the underlying DNA. Furthermore, we show that the hierarchical nesting that links different enclaves integrates their respective function. BHi-Cect provides means to uncover the general principles guiding chromatin architecture.
Collapse
Affiliation(s)
- Vipin Kumar
- Laboratory for Cell Systems Control, RIKEN Center for Biosystems Dynamics Research, Suita, Osaka 5650874, Japan
| | - Simon Leclerc
- Laboratory for Cell Systems Control, RIKEN Center for Biosystems Dynamics Research, Suita, Osaka 5650874, Japan
| | - Yuichi Taniguchi
- Laboratory for Cell Systems Control, RIKEN Center for Biosystems Dynamics Research, Suita, Osaka 5650874, Japan.,PRESTO, Japan Science and Technology Agency, Kawaguchi, Saitama 3320012, Japan
| |
Collapse
|
5
|
Tran QH, Vo VT, Hasegawa Y. Scale-variant topological information for characterizing the structure of complex networks. Phys Rev E 2019; 100:032308. [PMID: 31640058 DOI: 10.1103/physreve.100.032308] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Indexed: 06/10/2023]
Abstract
The structure of real-world networks is usually difficult to characterize owing to the variation of topological scales, the nondyadic complex interactions, and the fluctuations in the network. We aim to address these problems by introducing a general framework using a method based on topological data analysis. By considering the diffusion process at a single specified timescale in a network, we map the network nodes to a finite set of points that contains the topological information of the network at a single scale. Subsequently, we study the shape of these point sets over variable timescales that provide scale-variant topological information, to understand the varying topological scales and the complex interactions in the network. We conduct experiments on synthetic and real-world data to demonstrate the effectiveness of the proposed framework in identifying network models, classifying real-world networks, and detecting transition points in time-evolving networks. Overall, our study presents a unified analysis that can be applied to more complex network structures, as in the case of multilayer and multiplex networks.
Collapse
Affiliation(s)
- Quoc Hoan Tran
- Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-8656, Japan
| | - Van Tuan Vo
- Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-8656, Japan
| | - Yoshihiko Hasegawa
- Department of Information and Communication Engineering, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-8656, Japan
| |
Collapse
|
6
|
Malik L, Patro R. Rich Chromatin Structure Prediction from Hi-C Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1448-1458. [PMID: 29994683 DOI: 10.1109/tcbb.2018.2851200] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Recent studies involving the 3-dimensional conformation of chromatin have revealed the important role it has to play in different processes within the cell. These studies have also led to the discovery of densely interacting segments of the chromosome, called topologically associating domains. The accurate identification of these domains from Hi-C interaction data is an interesting and important computational problem for which numerous methods have been proposed. Unfortunately, most existing algorithms designed to identify these domains assume that they are non-overlapping whereas there is substantial evidence to believe a nested structure exists. We present a methodology to predict hierarchical chromatin domains using chromatin conformation capture data. Our method predicts domains at different resolutions, calculated using intrinsic properties of the chromatin data, and effectively clusters these to construct the hierarchy. At each individual level, the domains are non-overlapping in such a way that the intra-domain interaction frequencies are maximized. We show that our predicted structure is highly enriched for actively transcribing housekeeping genes and various chromatin markers, including CTCF, around the domain boundaries. We also show that large-scale domains, at multiple resolutions within our hierarchy, are conserved across cell types and species. We also provide comparisons against existing tools for extracting hierarchical domains. Our software, Matryoshka, is written in C++11 and licensed under GPL v3; it is available at https://github.com/COMBINE-lab/matryoshka.
Collapse
|
7
|
Tan ZW, Guarnera E, Berezovsky IN. Exploring chromatin hierarchical organization via Markov State Modelling. PLoS Comput Biol 2018; 14:e1006686. [PMID: 30596637 PMCID: PMC6355033 DOI: 10.1371/journal.pcbi.1006686] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2018] [Revised: 01/31/2019] [Accepted: 11/27/2018] [Indexed: 01/02/2023] Open
Abstract
We propose a new computational method for exploring chromatin structural organization based on Markov State Modelling of Hi-C data represented as an interaction network between genomic loci. A Markov process describes the random walk of a traveling probe in the corresponding energy landscape, mimicking the motion of a biomolecule involved in chromatin function. By studying the metastability of the associated Markov State Model upon annealing, the hierarchical structure of individual chromosomes is observed, and corresponding set of structural partitions is identified at each level of hierarchy. Then, the notion of effective interaction between partitions is derived, delineating the overall topology and architecture of chromosomes. Mapping epigenetic data on the graphs of intra-chromosomal effective interactions helps in understanding how chromosome organization facilitates its function. A sketch of whole-genome interactions obtained from the analysis of 539 partitions from all 23 chromosomes, complemented by distributions of gene expression regulators and epigenetic factors, sheds light on the structure-function relationships in chromatin, delineating chromosomal territories, as well as structural partitions analogous to topologically associating domains and active / passive epigenomic compartments. In addition to the overall genome architecture shown by effective interactions, the affinity between partitions of different chromosomes was analyzed as an indicator of the degree of association between partitions in functionally relevant genomic interactions. The overall static picture of whole-genome interactions obtained with the method presented in this work provides a foundation for chromatin structural reconstruction, for the modelling of chromatin dynamics, and for exploring the regulation of genome function. The algorithms used in this study are implemented in a freely available Python package ChromaWalker (https://bitbucket.org/ZhenWahTan/chromawalker).
Collapse
Affiliation(s)
- Zhen Wah Tan
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Matrix, Singapore
| | - Enrico Guarnera
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Matrix, Singapore
| | - Igor N. Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), Matrix, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), Singapore
| |
Collapse
|
8
|
Granero-Belinchón C, Roux SG, Garnier NB. Kullback-Leibler divergence measure of intermittency: Application to turbulence. Phys Rev E 2018; 97:013107. [PMID: 29448390 DOI: 10.1103/physreve.97.013107] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Indexed: 11/07/2022]
Abstract
For generic systems exhibiting power law behaviors, and hence multiscale dependencies, we propose a simple tool to analyze multifractality and intermittency, after noticing that these concepts are directly related to the deformation of a probability density function from Gaussian at large scales to non-Gaussian at smaller scales. Our framework is based on information theory and uses Shannon entropy and Kullback-Leibler divergence. We provide an extensive application to three-dimensional fully developed turbulence, seen here as a paradigmatic complex system where intermittency was historically defined and the concepts of scale invariance and multifractality were extensively studied and benchmarked. We compute our quantity on experimental Eulerian velocity measurements, as well as on synthetic processes and phenomenological models of fluid turbulence. Our approach is very general and does not require any underlying model of the system, although it can probe the relevance of such a model.
Collapse
Affiliation(s)
- Carlos Granero-Belinchón
- Univ Lyon, Ens de Lyon, Univ Claude Bernard, CNRS UMR 5672, Laboratoire de Physique, F-69342 Lyon, France
| | - Stéphane G Roux
- Univ Lyon, Ens de Lyon, Univ Claude Bernard, CNRS UMR 5672, Laboratoire de Physique, F-69342 Lyon, France
| | - Nicolas B Garnier
- Univ Lyon, Ens de Lyon, Univ Claude Bernard, CNRS UMR 5672, Laboratoire de Physique, F-69342 Lyon, France
| |
Collapse
|