1
|
Zhu X, Ma X, Wu C. A methylomics-correlated nomogram predicts the recurrence free survival risk of kidney renal clear cell carcinoma. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021; 18:8559-8576. [PMID: 34814313 DOI: 10.3934/mbe.2021424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
BACKGROUND Various studies have suggested that the DNA methylation signatures were promising to identify novel hallmarks for predicting prognosis of cancer. However, few studies have explored the capacity of DNA methylation for prognostic prediction in patients with kidney renal clear cell carcinoma (KIRC). It's very promising to develop a methylomics-related signature for predicting prognosis of KIRC. METHODS The 282 patients with complete DNA methylation data and corresponding clinical information were selected to construct the prognostic model. The 282 patients were grouped into a training set (70%, n = 198 samples) to determine a prognostic predictor by univariate Cox proportional hazard analysis, least absolute shrinkage and selection operator (LASSO) and multivariate Cox regression analysis. The internal validation set (30%, n = 84) and an external validation set (E-MTAB-3274) were used to validate the predictive value of the predictor by receiver operating characteristic (ROC) analysis and Kaplan-Meier survival analysis. RESULTS We successfully identified a 9-DNA methylation signature for recurrence free survival (RFS) of KIRC patients. We proved the strong robustness of the 9-DNA methylation signature for predicting RFS through ROC analysis (AUC at 1, 3, 5 years in internal dataset (0.859, 0.840, 0.817, respectively), external validation dataset (0.674, 0.739, 0.793, respectively), entire TCGA dataset (0.834, 0.862, 0.842, respectively)). In addition, a nomogram combining methylation risk score with the conventional clinic-related covariates was constructed to improve the prognostic predicted ability for KIRC patients. The result implied a good performance of the nomogram. CONCLUSIONS we successfully identified a DNA methylation-associated nomogram, which was helpful in improving the prognostic predictive ability of KIRC patients.
Collapse
Affiliation(s)
- Xiuxian Zhu
- Department of Gastrointestinal Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Xianxiong Ma
- Department of Gastrointestinal Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Chuanqing Wu
- Department of Gastrointestinal Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
2
|
D’Agostino D, Liò P, Aldinucci M, Merelli I. Advantages of using graph databases to explore chromatin conformation capture experiments. BMC Bioinformatics 2021; 22:43. [PMID: 33902433 PMCID: PMC8073886 DOI: 10.1186/s12859-020-03937-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 12/15/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND High-throughput sequencing Chromosome Conformation Capture (Hi-C) allows the study of DNA interactions and 3D chromosome folding at the genome-wide scale. Usually, these data are represented as matrices describing the binary contacts among the different chromosome regions. On the other hand, a graph-based representation can be advantageous to describe the complex topology achieved by the DNA in the nucleus of eukaryotic cells. METHODS Here we discuss the use of a graph database for storing and analysing data achieved by performing Hi-C experiments. The main issue is the size of the produced data and, working with a graph-based representation, the consequent necessity of adequately managing a large number of edges (contacts) connecting nodes (genes), which represents the sources of information. For this, currently available graph visualisation tools and libraries fall short with Hi-C data. The use of graph databases, instead, supports both the analysis and the visualisation of the spatial pattern present in Hi-C data, in particular for comparing different experiments or for re-mapping omics data in a space-aware context efficiently. In particular, the possibility of describing graphs through statistical indicators and, even more, the capability of correlating them through statistical distributions allows highlighting similarities and differences among different Hi-C experiments, in different cell conditions or different cell types. RESULTS These concepts have been implemented in NeoHiC, an open-source and user-friendly web application for the progressive visualisation and analysis of Hi-C networks based on the use of the Neo4j graph database (version 3.5). CONCLUSION With the accumulation of more experiments, the tool will provide invaluable support to compare neighbours of genes across experiments and conditions, helping in highlighting changes in functional domains and identifying new co-organised genomic compartments.
Collapse
Affiliation(s)
- Daniele D’Agostino
- Institute of Electronics, Computer and Telecommunication Engineering, National Research Council of Italy, Genoa, Italy
| | - Pietro Liò
- Computer Laboratory, University of Cambridge, Cambridge, UK
| | - Marco Aldinucci
- Computer Science Department, University of Turin, Turin, Italy
| | - Ivan Merelli
- Institute for Biomedical Technologies, National Research Council of Italy, Segrate, MI Italy
| |
Collapse
|
3
|
Cresswell KG, Dozmorov MG. TADCompare: An R Package for Differential and Temporal Analysis of Topologically Associated Domains. Front Genet 2020; 11:158. [PMID: 32211023 PMCID: PMC7076128 DOI: 10.3389/fgene.2020.00158] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 02/11/2020] [Indexed: 12/02/2022] Open
Abstract
Recent research using chromatin conformation capture technologies, such as Hi-C, has demonstrated the importance of topologically associated domains (TADs) and smaller chromatin loops, collectively referred hereafter as "interacting domains." Many such domains change during development or disease, and exhibit cell- and condition-specific differences. Quantification of the dynamic behavior of interacting domains will help to better understand genome regulation. Methods for comparing interacting domains between cells and conditions are highly limited. We developed TADCompare, a method for differential analysis of boundaries of interacting domains between two or more Hi-C datasets. TADCompare is based on a spectral clustering-derived measure called the eigenvector gap, which enables a loci-by-loci comparison of boundary differences. Using this measure, we introduce methods for identifying differential and consensus boundaries of interacting domains and tracking boundary changes over time. We further propose a novel framework for the systematic classification of boundary changes. Colocalization- and gene enrichment analysis of different types of boundary changes demonstrated distinct biological functionality associated with them. TADCompare is available on https://github.com/dozmorovlab/TADCompare and Bioconductor (submitted).
Collapse
|
4
|
Di Filippo L, Righelli D, Gagliardi M, Matarazzo MR, Angelini C. HiCeekR: A Novel Shiny App for Hi-C Data Analysis. Front Genet 2019; 10:1079. [PMID: 31749839 PMCID: PMC6844183 DOI: 10.3389/fgene.2019.01079] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 10/09/2019] [Indexed: 01/14/2023] Open
Abstract
The High-throughput Chromosome Conformation Capture (Hi-C) technique combines the power of the Next Generation Sequencing technologies with chromosome conformation capture approach to study the 3D chromatin organization at the genome-wide scale. Although such a technique is quite recent, many tools are already available for pre-processing and analyzing Hi-C data, allowing to identify chromatin loops, topological associating domains and A/B compartments. However, only a few of them provide an exhaustive analysis pipeline or allow to easily integrate and visualize other omic layers. Moreover, most of the available tools are designed for expert users, who have great confidence with command-line applications. In this paper, we present HiCeekR (https://github.com/lucidif/HiCeekR), a novel R Graphical User Interface (GUI) that allows researchers to easily perform a complete Hi-C data analysis. With the aid of the Shiny libraries, it integrates several R/Bioconductor packages for Hi-C data analysis and visualization, guiding the user during the entire process. Here, we describe its architecture and functionalities, then illustrate its capabilities using a publicly available dataset.
Collapse
Affiliation(s)
- Lucio Di Filippo
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
| | - Dario Righelli
- Istituto per le Applicazioni del Calcolo "Mauro Picone," Consiglio Nazionale delle Ricerche, Napoli, Italy
| | - Miriam Gagliardi
- Max Planck Institute for Psychiatry, Munich, Germany.,Institute of Genetics and Biophysics "A. Buzzati A. Traverso," Consiglio Nazionale delle Ricerche, Napoli, Italy
| | - Maria Rosaria Matarazzo
- Institute of Genetics and Biophysics "A. Buzzati A. Traverso," Consiglio Nazionale delle Ricerche, Napoli, Italy
| | - Claudia Angelini
- Istituto per le Applicazioni del Calcolo "Mauro Picone," Consiglio Nazionale delle Ricerche, Napoli, Italy
| |
Collapse
|
5
|
Tangherloni A, Spolaor S, Rundo L, Nobile MS, Cazzaniga P, Mauri G, Liò P, Merelli I, Besozzi D. GenHap: a novel computational method based on genetic algorithms for haplotype assembly. BMC Bioinformatics 2019; 20:172. [PMID: 30999845 PMCID: PMC6471693 DOI: 10.1186/s12859-019-2691-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background In order to fully characterize the genome of an individual, the reconstruction of the two distinct copies of each chromosome, called haplotypes, is essential. The computational problem of inferring the full haplotype of a cell starting from read sequencing data is known as haplotype assembly, and consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the two chromosomes. Indeed, the knowledge of complete haplotypes is generally more informative than analyzing single SNPs and plays a fundamental role in many medical applications. Results To reconstruct the two haplotypes, we addressed the weighted Minimum Error Correction (wMEC) problem, which is a successful approach for haplotype assembly. This NP-hard problem consists in computing the two haplotypes that partition the sequencing reads into two disjoint sub-sets, with the least number of corrections to the SNP values. To this aim, we propose here GenHap, a novel computational method for haplotype assembly based on Genetic Algorithms, yielding optimal solutions by means of a global search process. In order to evaluate the effectiveness of our approach, we run GenHap on two synthetic (yet realistic) datasets, based on the Roche/454 and PacBio RS II sequencing technologies. We compared the performance of GenHap against HapCol, an efficient state-of-the-art algorithm for haplotype phasing. Our results show that GenHap always obtains high accuracy solutions (in terms of haplotype error rate), and is up to 4× faster than HapCol in the case of Roche/454 instances and up to 20× faster when compared on the PacBio RS II dataset. Finally, we assessed the performance of GenHap on two different real datasets. Conclusions Future-generation sequencing technologies, producing longer reads with higher coverage, can highly benefit from GenHap, thanks to its capability of efficiently solving large instances of the haplotype assembly problem. Moreover, the optimization approach proposed in GenHap can be extended to the study of allele-specific genomic features, such as expression, methylation and chromatin conformation, by exploiting multi-objective optimization techniques. The source code and the full documentation are available at the following GitHub repository: https://github.com/andrea-tango/GenHap.
Collapse
Affiliation(s)
- Andrea Tangherloni
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy.
| | - Simone Spolaor
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy
| | - Leonardo Rundo
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy.,Institute of Molecular Bioimaging and Physiology, Italian National Research Council, Contrada Pietrapollastra-Pisciotto, Cefalù (PA), 90015, Italy
| | - Marco S Nobile
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy.,SYSBIO.IT Centre of Systems Biology, Piazza della Scienza 2, Milan, 20126, Italy
| | - Paolo Cazzaniga
- Department of Human and Social Sciences, University of Bergamo, Piazzale Sant'Agostino 2, Bergamo, 24129, Italy.,SYSBIO.IT Centre of Systems Biology, Piazza della Scienza 2, Milan, 20126, Italy
| | - Giancarlo Mauri
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy.,SYSBIO.IT Centre of Systems Biology, Piazza della Scienza 2, Milan, 20126, Italy
| | - Pietro Liò
- Computer Laboratory, University of Cambridge, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK
| | - Ivan Merelli
- Institute of Biomedical Technologies, Italian National Research Council, Via Fratelli Cervi 93, Segrate (MI), 20090, Italy
| | - Daniela Besozzi
- Department of Informatics, Systems and Communication (DISCo), University of Milano-Bicocca, Viale Sarca 336, U14 Building, Milan, 20126, Italy
| |
Collapse
|
6
|
Misra BB, Langefeld CD, Olivier M, Cox LA. Integrated Omics: Tools, Advances, and Future Approaches. J Mol Endocrinol 2018; 62:JME-18-0055. [PMID: 30006342 DOI: 10.1530/jme-18-0055] [Citation(s) in RCA: 214] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2018] [Revised: 07/02/2018] [Accepted: 07/12/2018] [Indexed: 12/13/2022]
Abstract
With the rapid adoption of high-throughput omic approaches to analyze biological samples such as genomics, transcriptomics, proteomics, and metabolomics, each analysis can generate tera- to peta-byte sized data files on a daily basis. These data file sizes, together with differences in nomenclature among these data types, make the integration of these multi-dimensional omics data into biologically meaningful context challenging. Variously named as integrated omics, multi-omics, poly-omics, trans-omics, pan-omics, or shortened to just 'omics', the challenges include differences in data cleaning, normalization, biomolecule identification, data dimensionality reduction, biological contextualization, statistical validation, data storage and handling, sharing, and data archiving. The ultimate goal is towards the holistic realization of a 'systems biology' understanding of the biological question in hand. Commonly used approaches in these efforts are currently limited by the 3 i's - integration, interpretation, and insights. Post integration, these very large datasets aim to yield unprecedented views of cellular systems at exquisite resolution for transformative insights into processes, events, and diseases through various computational and informatics frameworks. With the continued reduction in costs and processing time for sample analyses, and increasing types of omics datasets generated such as glycomics, lipidomics, microbiomics, and phenomics, an increasing number of scientists in this interdisciplinary domain of bioinformatics face these challenges. We discuss recent approaches, existing tools, and potential caveats in the integration of omics datasets for development of standardized analytical pipelines that could be adopted by the global omics research community.
Collapse
Affiliation(s)
- Biswapriya B Misra
- B Misra, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Carl D Langefeld
- C Langefeld, Biostatistical Sciences, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Michael Olivier
- M Olivier, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| | - Laura A Cox
- L Cox, Internal Medicine, Wake Forest University School of Medicine, Winston-Salem, United States
| |
Collapse
|
7
|
Kel I, Chang Z, Galluccio N, Romeo M, Beretta S, Diomede L, Mezzelani A, Milanesi L, Dieterich C, Merelli I. SPIRE, a modular pipeline for eQTL analysis of RNA-Seq data, reveals a regulatory hotspot controlling miRNA expression in C. elegans. MOLECULAR BIOSYSTEMS 2017; 12:3447-3458. [PMID: 27722582 DOI: 10.1039/c6mb00453a] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The interpretation of genome-wide association study is difficult, as it is hard to understand how polymorphisms can affect gene regulation, in particular for trans-regulatory elements located far from their controlling gene. Using RNA or protein expression data as phenotypes, it is possible to correlate their variations with specific genotypes. This technique is usually referred to as expression Quantitative Trait Loci (eQTLs) analysis and only few packages exist for the integration of genotype patterns and expression profiles. In particular, tools are needed for the analysis of next-generation sequencing (NGS) data on a genome-wide scale, which is essential to identify eQTLs able to control a large number of genes (hotspots). Here we present SPIRE (Software for Polymorphism Identification Regulating Expression), a generic, modular and functionally highly flexible pipeline for eQTL processing. SPIRE integrates different univariate and multivariate approaches for eQTL analysis, paying particular attention to the scalability of the procedure in order to support cis- as well as trans-mapping, thus allowing the identification of hotspots in NGS data. In particular, we demonstrated how SPIRE can handle big association study datasets, reproducing published results and improving the identification of trans-eQTLs. Furthermore, we employed the pipeline to analyse novel data concerning the genotypes of two different C. elegans strains (N2 and Hawaii) and related miRNA expression data, obtained using RNA-Seq. A miRNA regulatory hotspot was identified in chromosome 1, overlapping the transcription factor grh-1, known to be involved in the early phases of embryonic development of C. elegans. In a follow-up qPCR experiment we were able to verify most of the predicted eQTLs, as well as to show, for a novel miRNA, a significant difference in the sequences of the two analysed strains of C. elegans. SPIRE is publicly available as open source software at , together with some example data, a readme file, supplementary material and a short tutorial.
Collapse
Affiliation(s)
- Ivan Kel
- Instituto di Tecnologie Biomediche - Consiglio Nazionale delle Ricerche, via F.lli Cervi 93, 20090, Segrate, Milano, Italy.
| | - Zisong Chang
- Max Delbrück Center for Molecular Medicine, Berlin Institute for Medical Systems Biology, Robert-Rössle-Straße 10, 13125, Berlin, Germany.
| | - Nadia Galluccio
- Instituto di Tecnologie Biomediche - Consiglio Nazionale delle Ricerche, via F.lli Cervi 93, 20090, Segrate, Milano, Italy.
| | - Margherita Romeo
- Dipartimento di Biochimica e Farmacologia Molecolare, IRCCS - Istituto di Ricerche Farmacologiche "Mario Negri", Via Giuseppe La Masa 19, Milan, Italy.
| | - Stefano Beretta
- Dipartimento di Informatica Sistemistica e Comunicazione, Università degli studi di Milano-Biccoca, Viale Sarca 336, 20125 Milano, Italy.
| | - Luisa Diomede
- Dipartimento di Biochimica e Farmacologia Molecolare, IRCCS - Istituto di Ricerche Farmacologiche "Mario Negri", Via Giuseppe La Masa 19, Milan, Italy.
| | - Alessandra Mezzelani
- Instituto di Tecnologie Biomediche - Consiglio Nazionale delle Ricerche, via F.lli Cervi 93, 20090, Segrate, Milano, Italy.
| | - Luciano Milanesi
- Instituto di Tecnologie Biomediche - Consiglio Nazionale delle Ricerche, via F.lli Cervi 93, 20090, Segrate, Milano, Italy.
| | - Christoph Dieterich
- Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology and Department of Internal Medicine III, University of Heidelberg, Grabengasse 1, 69117 Heidelberg, Germany.
| | - Ivan Merelli
- Instituto di Tecnologie Biomediche - Consiglio Nazionale delle Ricerche, via F.lli Cervi 93, 20090, Segrate, Milano, Italy.
| |
Collapse
|
8
|
Tordini F, Aldinucci M, Milanesi L, Liò P, Merelli I. The Genome Conformation As an Integrator of Multi-Omic Data: The Example of Damage Spreading in Cancer. Front Genet 2016; 7:194. [PMID: 27895661 PMCID: PMC5108817 DOI: 10.3389/fgene.2016.00194] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2016] [Accepted: 10/24/2016] [Indexed: 12/17/2022] Open
Abstract
Publicly available multi-omic databases, in particular if associated with medical annotations, are rich resources with the potential to lead a rapid transition from high-throughput molecular biology experiments to better clinical outcomes for patients. In this work, we propose a model for multi-omic data integration (i.e., genetic variations, gene expression, genome conformation, and epigenetic patterns), which exploits a multi-layer network approach to analyse, visualize, and obtain insights from such biological information, in order to use achieved results at a macroscopic level. Using this representation, we can describe how driver and passenger mutations accumulate during the development of diseases providing, for example, a tool able to characterize the evolution of cancer. Indeed, our test case concerns the MCF-7 breast cancer cell line, before and after the stimulation with estrogen, since many datasets are available for this case study. In particular, the integration of data about cancer mutations, gene functional annotations, genome conformation, epigenetic patterns, gene expression, and metabolic pathways in our multi-layer representation will allow a better interpretation of the mechanisms behind a complex disease such as cancer. Thanks to this multi-layer approach, we focus on the interplay of chromatin conformation and cancer mutations in different pathways, such as metabolic processes, that are very important for tumor development. Working on this model, a variance analysis can be implemented to identify normal variations within each omics and to characterize, by contrast, variations that can be accounted to pathological samples compared to normal ones. This integrative model can be used to identify novel biomarkers and to provide innovative omic-based guidelines for treating many diseases, improving the efficacy of decision trees currently used in clinic.
Collapse
Affiliation(s)
- Fabio Tordini
- Computer Science Department, University of Torino Torino, Italy
| | - Marco Aldinucci
- Computer Science Department, University of Torino Torino, Italy
| | - Luciano Milanesi
- Institute of Biomedical Technologies, Italian National Research Council Milan, Italy
| | - Pietro Liò
- Computer Laboratory, University of Cambridge Cambridge, UK
| | - Ivan Merelli
- Institute of Biomedical Technologies, Italian National Research Council Milan, Italy
| |
Collapse
|
9
|
Pancaldi V, Carrillo-de-Santa-Pau E, Javierre BM, Juan D, Fraser P, Spivakov M, Valencia A, Rico D. Integrating epigenomic data and 3D genomic structure with a new measure of chromatin assortativity. Genome Biol 2016; 17:152. [PMID: 27391817 PMCID: PMC4939006 DOI: 10.1186/s13059-016-1003-3] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 06/07/2016] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Network analysis is a powerful way of modeling chromatin interactions. Assortativity is a network property used in social sciences to identify factors affecting how people establish social ties. We propose a new approach, using chromatin assortativity, to integrate the epigenomic landscape of a specific cell type with its chromatin interaction network and thus investigate which proteins or chromatin marks mediate genomic contacts. RESULTS We use high-resolution promoter capture Hi-C and Hi-Cap data as well as ChIA-PET data from mouse embryonic stem cells to investigate promoter-centered chromatin interaction networks and calculate the presence of specific epigenomic features in the chromatin fragments constituting the nodes of the network. We estimate the association of these features with the topology of four chromatin interaction networks and identify features localized in connected areas of the network. Polycomb group proteins and associated histone marks are the features with the highest chromatin assortativity in promoter-centered networks. We then ask which features distinguish contacts amongst promoters from contacts between promoters and other genomic elements. We observe higher chromatin assortativity of the actively elongating form of RNA polymerase 2 (RNAPII) compared with inactive forms only in interactions between promoters and other elements. CONCLUSIONS Contacts among promoters and between promoters and other elements have different characteristic epigenomic features. We identify a possible role for the elongating form of RNAPII in mediating interactions among promoters, enhancers, and transcribed gene bodies. Our approach facilitates the study of multiple genome-wide epigenomic profiles, considering network topology and allowing the comparison of chromatin interaction networks.
Collapse
Affiliation(s)
- Vera Pancaldi
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain.
| | | | | | - David Juan
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Peter Fraser
- Nuclear Dynamics Programme, The Babraham Institute, Cambridge, UK
| | - Mikhail Spivakov
- Nuclear Dynamics Programme, The Babraham Institute, Cambridge, UK
| | - Alfonso Valencia
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Daniel Rico
- Structural Biology and BioComputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain.
| |
Collapse
|
10
|
Bersanelli M, Mosca E, Remondini D, Giampieri E, Sala C, Castellani G, Milanesi L. Methods for the integration of multi-omics data: mathematical aspects. BMC Bioinformatics 2016; 17 Suppl 2:15. [PMID: 26821531 PMCID: PMC4959355 DOI: 10.1186/s12859-015-0857-9] [Citation(s) in RCA: 221] [Impact Index Per Article: 27.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Methods for the integrative analysis of multi-omics data are required to draw a more complete and accurate picture of the dynamics of molecular systems. The complexity of biological systems, the technological limits, the large number of biological variables and the relatively low number of biological samples make the analysis of multi-omics datasets a non-trivial problem. RESULTS AND CONCLUSIONS We review the most advanced strategies for integrating multi-omics datasets, focusing on mathematical and methodological aspects.
Collapse
Affiliation(s)
- Matteo Bersanelli
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy. .,Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| | - Ettore Mosca
- Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| | - Daniel Remondini
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Enrico Giampieri
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Claudia Sala
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Gastone Castellani
- Department of Physics and Astronomy, Universita' di Bologna, Via B. Pichat 6/2, Bologna, 40127, Italy.
| | - Luciano Milanesi
- Institute of Biomedical Technologies - CNR, Via Fratelli Cervi 93, Segrate MI, 20090, Italy.
| |
Collapse
|
11
|
Wang J, Meng X, Chen H, Yuan C, Li X, Zhou Y, Chen M. Exploring the mechanisms of genome-wide long-range interactions: interpreting chromosome organization. Brief Funct Genomics 2016; 15:385-95. [PMID: 26769147 DOI: 10.1093/bfgp/elv062] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Developments in chromosome conformation capture (3C) technologies have revealed that the three-dimensional organization of a genome leads widely separated functional elements to reside in close proximity. However, the mechanisms responsible for mediating long-range interactions are still not completely known. In this review, we firstly evaluate and compare the current seven 3C-based methods, summarize their advantages and discuss their limitations to our current understanding of genome structure. Then, software packages available to perform the analysis of 3C-based data are described. Moreover, we review the insights into the two main mechanisms of long-range interactions, which regulate gene expression by bringing together promoters and distal regulatory elements and by creating structural domains that contain functionally related genes with similar expression landscape. At last, we summarize what is known about the mediating factors involved in stimulation/repression of long-range interactions, such as transcription factors and noncoding RNAs.
Collapse
|
12
|
Shavit Y, Merelli I, Milanesi L, Lio’ P. How computer science can help in understanding the 3D genome architecture. Brief Bioinform 2015; 17:733-44. [DOI: 10.1093/bib/bbv085] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Indexed: 01/20/2023] Open
|
13
|
Dobigny G, Britton-Davidian J, Robinson TJ. Chromosomal polymorphism in mammals: an evolutionary perspective. Biol Rev Camb Philos Soc 2015; 92:1-21. [PMID: 26234165 DOI: 10.1111/brv.12213] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Revised: 06/23/2015] [Accepted: 07/09/2015] [Indexed: 12/28/2022]
Abstract
Although chromosome rearrangements (CRs) are central to studies of genome evolution, our understanding of the evolutionary consequences of the early stages of karyotypic differentiation (i.e. polymorphism), especially the non-meiotic impacts, is surprisingly limited. We review the available data on chromosomal polymorphisms in mammals so as to identify taxa that hold promise for developing a more comprehensive understanding of chromosomal change. In doing so, we address several key questions: (i) to what extent are mammalian karyotypes polymorphic, and what types of rearrangements are principally involved? (ii) Are some mammalian lineages more prone to chromosomal polymorphism than others? More specifically, do (karyotypically) polymorphic mammalian species belong to lineages that are also characterized by past, extensive karyotype repatterning? (iii) How long can chromosomal polymorphisms persist in mammals? We discuss the evolutionary implications of these questions and propose several research avenues that may shed light on the role of chromosome change in the diversification of mammalian populations and species.
Collapse
Affiliation(s)
- Gauthier Dobigny
- Institut de Recherche pour le Développement, Centre de Biologie pour la Gestion des Populations (UMR IRD-INRA-Cirad-Montpellier SupAgro), Campus International de Baillarguet, CS30016, 34988, Montferrier-sur-Lez, France
| | - Janice Britton-Davidian
- Institut des Sciences de l'Evolution, Université de Montpellier, CNRS, IRD, EPHE, Cc065, Place Eugène Bataillon, 34095, Montpellier Cedex 5, France
| | - Terence J Robinson
- Evolutionary Genomics Group, Department of Botany and Zoology, Stellenbosch University, Private Bag X1, Matieland, Stellenbosch, 7062, South Africa
| |
Collapse
|
14
|
Merelli I, Tordini F, Drocco M, Aldinucci M, Liò P, Milanesi L. Integrating multi-omic features exploiting Chromosome Conformation Capture data. Front Genet 2015; 6:40. [PMID: 25717338 PMCID: PMC4324155 DOI: 10.3389/fgene.2015.00040] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 01/27/2015] [Indexed: 02/02/2023] Open
Abstract
The representation, integration, and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus. In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. In particular, a novel sequencing technique called Chromosome Conformation Capture allows the analysis of the chromosome organization in the cell’s natural state. While performed genome wide, this technique is usually called Hi–C. Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi–C data to describe the chromosomal neighborhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques. Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression.
Collapse
Affiliation(s)
- Ivan Merelli
- Bioinformatics Unit, Institute of Biomedical Technologies, Italian National Research Council Milan, Italy
| | - Fabio Tordini
- Computer Science Department, University of Torino Torino, Italy
| | - Maurizio Drocco
- Computer Science Department, University of Torino Torino, Italy
| | - Marco Aldinucci
- Computer Science Department, University of Torino Torino, Italy
| | - Pietro Liò
- Computer Laboratory, University of Cambridge Cambridge, UK
| | - Luciano Milanesi
- Bioinformatics Unit, Institute of Biomedical Technologies, Italian National Research Council Milan, Italy
| |
Collapse
|
15
|
Fondi M, Liò P. Multi -omics and metabolic modelling pipelines: challenges and tools for systems microbiology. Microbiol Res 2015; 171:52-64. [PMID: 25644953 DOI: 10.1016/j.micres.2015.01.003] [Citation(s) in RCA: 86] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2014] [Revised: 01/02/2015] [Accepted: 01/03/2015] [Indexed: 12/27/2022]
Abstract
Integrated -omics approaches are quickly spreading across microbiology research labs, leading to (i) the possibility of detecting previously hidden features of microbial cells like multi-scale spatial organization and (ii) tracing molecular components across multiple cellular functional states. This promises to reduce the knowledge gap between genotype and phenotype and poses new challenges for computational microbiologists. We underline how the capability to unravel the complexity of microbial life will strongly depend on the integration of the huge and diverse amount of information that can be derived today from -omics experiments. In this work, we present opportunities and challenges of multi -omics data integration in current systems biology pipelines. We here discuss which layers of biological information are important for biotechnological and clinical purposes, with a special focus on bacterial metabolism and modelling procedures. A general review of the most recent computational tools for performing large-scale datasets integration is also presented, together with a possible framework to guide the design of systems biology experiments by microbiologists.
Collapse
Affiliation(s)
- Marco Fondi
- Florence Computational Biology Group (ComBo), University of Florence, Via Madonna del Piano 6, Sesto Fiorentino, Florence 50019, Italy; Laboratory of Microbial and Molecular Evolution, Department of Biology, University of Florence, Via Madonna del Piano 6, Sesto Fiorentino, Florence 50019, Italy.
| | - Pietro Liò
- University of Cambridge, Computer Laboratory, 15 JJ Thomson Avenue, CB3 0FD Cambridge, UK
| |
Collapse
|
16
|
Shavit Y, Lio' P. Combining a wavelet change point and the Bayes factor for analysing chromosomal interaction data. MOLECULAR BIOSYSTEMS 2014; 10:1576-85. [PMID: 24710657 DOI: 10.1039/c4mb00142g] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Over the past few decades we have witnessed great efforts to understand the cellular function at the cytoplasm level. Nowadays there is a growing interest in understanding the relationship between function and structure at the nuclear, chromosomal and sub-chromosomal levels. Data on chromosomal interactions that are now becoming available in unprecedented resolution and scale open the way to address this challenge. Consequently, there is a growing need for new methods and tools that will transform these data into knowledge and insights. Here, we have developed all the steps required for the analysis of chromosomal interaction data (Hi-C data). The result is a methodology which combines a wavelet change point with the Bayes factor for useful correction, segmentation and comparison of Hi-C data. We further developed chromoR, an R package that implements the methods presented here. The chromoR package provides researchers with a means to analyse chromosomal interaction data using statistical bioinformatics, offering a new and comprehensive solution to this task.
Collapse
Affiliation(s)
- Yoli Shavit
- Computer Laboratory, University of Cambridge, Cambridge, CB3 0FD, UK.
| | | |
Collapse
|