1
|
Kumar Halder A, Agarwal A, Jodkowska K, Plewczynski D. A systematic analyses of different bioinformatics pipelines for genomic data and its impact on deep learning models for chromatin loop prediction. Brief Funct Genomics 2024; 23:538-548. [PMID: 38555493 DOI: 10.1093/bfgp/elae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 02/07/2024] [Accepted: 03/04/2024] [Indexed: 04/02/2024] Open
Abstract
Genomic data analysis has witnessed a surge in complexity and volume, primarily driven by the advent of high-throughput technologies. In particular, studying chromatin loops and structures has become pivotal in understanding gene regulation and genome organization. This systematic investigation explores the realm of specialized bioinformatics pipelines designed specifically for the analysis of chromatin loops and structures. Our investigation incorporates two protein (CTCF and Cohesin) factor-specific loop interaction datasets from six distinct pipelines, amassing a comprehensive collection of 36 diverse datasets. Through a meticulous review of existing literature, we offer a holistic perspective on the methodologies, tools and algorithms underpinning the analysis of this multifaceted genomic feature. We illuminate the vast array of approaches deployed, encompassing pivotal aspects such as data preparation pipeline, preprocessing, statistical features and modelling techniques. Beyond this, we rigorously assess the strengths and limitations inherent in these bioinformatics pipelines, shedding light on the interplay between data quality and the performance of deep learning models, ultimately advancing our comprehension of genomic intricacies.
Collapse
Affiliation(s)
- Anup Kumar Halder
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Abhishek Agarwal
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Karolina Jodkowska
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland
| |
Collapse
|
2
|
Tang L, Liao J, Hill MC, Hu J, Zhao Y, Ellinor P, Li M. MMCT-Loop: a mix model-based pipeline for calling targeted 3D chromatin loops. Nucleic Acids Res 2024; 52:e25. [PMID: 38281134 PMCID: PMC10954456 DOI: 10.1093/nar/gkae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 12/03/2023] [Accepted: 01/12/2024] [Indexed: 01/30/2024] Open
Abstract
Protein-specific Chromatin Conformation Capture (3C)-based technologies have become essential for identifying distal genomic interactions with critical roles in gene regulation. The standard techniques include Chromatin Interaction Analysis by Paired-End Tag (ChIA-PET), in situ Hi-C followed by chromatin immunoprecipitation (HiChIP) also known as PLAC-seq. To identify chromatin interactions from these data, a variety of computational methods have emerged. Although these state-of-art methods address many issues with loop calling, only few methods can fit different data types simultaneously, and the accuracy as well as the efficiency these approaches remains limited. Here we have generated a pipeline, MMCT-Loop, which ensures the accurate identification of strong loops as well as dynamic or weak loops through a mixed model. MMCT-Loop outperforms existing methods in accuracy, and the detected loops show higher activation functionality. To highlight the utility of MMCT-Loop, we applied it to conformational data derived from neural stem cell (NSCs) and uncovered several previously unidentified regulatory regions for key master regulators of stem cell identity. MMCT-Loop is an accurate and efficient loop caller for targeted conformation capture data, which supports raw data or pre-processed valid pairs as input, the output interactions are formatted and easily uploaded to a genome browser for visualization.
Collapse
Affiliation(s)
- Li Tang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Jiaqi Liao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Matthew C Hill
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02129, USA
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Jiaxin Hu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Patrick T Ellinor
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02129, USA
- Cardiovascular Disease Initiative, The Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
3
|
Agarwal A, Korsak S, Choudhury A, Plewczynski D. The dynamic role of cohesin in maintaining human genome architecture. Bioessays 2023; 45:e2200240. [PMID: 37603403 DOI: 10.1002/bies.202200240] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 08/03/2023] [Accepted: 08/07/2023] [Indexed: 08/22/2023]
Abstract
Recent advances in genomic and imaging techniques have revealed the complex manner of organizing billions of base pairs of DNA necessary for maintaining their functionality and ensuring the proper expression of genetic information. The SMC proteins and cohesin complex primarily contribute to forming higher-order chromatin structures, such as chromosomal territories, compartments, topologically associating domains (TADs) and chromatin loops anchored by CCCTC-binding factor (CTCF) protein or other genome organizers. Cohesin plays a fundamental role in chromatin organization, gene expression and regulation. This review aims to describe the current understanding of the dynamic nature of the cohesin-DNA complex and its dependence on cohesin for genome maintenance. We discuss the current 3C technique and numerous bioinformatics pipelines used to comprehend structural genomics and epigenetics focusing on the analysis of Cohesin-centred interactions. We also incorporate our present comprehension of Loop Extrusion (LE) and insights from stochastic modelling.
Collapse
Affiliation(s)
- Abhishek Agarwal
- Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Sevastianos Korsak
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | | | - Dariusz Plewczynski
- Centre of New Technologies, University of Warsaw, Warsaw, Poland
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| |
Collapse
|
4
|
Deng S, Feng Y, Pauklin S. 3D chromatin architecture and transcription regulation in cancer. J Hematol Oncol 2022; 15:49. [PMID: 35509102 PMCID: PMC9069733 DOI: 10.1186/s13045-022-01271-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2022] [Accepted: 04/21/2022] [Indexed: 12/18/2022] Open
Abstract
Chromatin has distinct three-dimensional (3D) architectures important in key biological processes, such as cell cycle, replication, differentiation, and transcription regulation. In turn, aberrant 3D structures play a vital role in developing abnormalities and diseases such as cancer. This review discusses key 3D chromatin structures (topologically associating domain, lamina-associated domain, and enhancer-promoter interactions) and corresponding structural protein elements mediating 3D chromatin interactions [CCCTC-binding factor, polycomb group protein, cohesin, and Brother of the Regulator of Imprinted Sites (BORIS) protein] with a highlight of their associations with cancer. We also summarise the recent development of technologies and bioinformatics approaches to study the 3D chromatin interactions in gene expression regulation, including crosslinking and proximity ligation methods in the bulk cell population (ChIA-PET and HiChIP) or single-molecule resolution (ChIA-drop), and methods other than proximity ligation, such as GAM, SPRITE, and super-resolution microscopy techniques.
Collapse
Affiliation(s)
- Siwei Deng
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Old Road, Headington, Oxford, OX3 7LD, UK
| | - Yuliang Feng
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Old Road, Headington, Oxford, OX3 7LD, UK
| | - Siim Pauklin
- Nuffield Department of Orthopaedics, Rheumatology and Musculoskeletal Sciences, Botnar Research Centre, University of Oxford, Old Road, Headington, Oxford, OX3 7LD, UK.
| |
Collapse
|
5
|
Arega Y, Jiang H, Wang S, Zhang J, Niu X, Li G. ChIAMM: A Mixture Model for Statistical Analysis of Long-Range Chromatin Interactions From ChIA-PET Experiments. Front Genet 2021; 11:616160. [PMID: 33381154 PMCID: PMC7767989 DOI: 10.3389/fgene.2020.616160] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2020] [Accepted: 11/11/2020] [Indexed: 11/13/2022] Open
Abstract
Chromatin interaction analysis by paired-end tag sequencing (ChIA-PET) is an important experimental method for detecting specific protein-mediated chromatin loops genome-wide at high resolution. Here, we proposed a new statistical approach with a mixture model, chromatin interaction analysis using mixture model (ChIAMM), to detect significant chromatin interactions from ChIA-PET data. The statistical model is cast into a Bayesian framework to consider more systematic biases: the genomic distance, local enrichment, mappability, and GC content. Using different ChIA-PET datasets, we evaluated the performance of ChIAMM and compared it with the existing methods, including ChIA-PET Tool, ChiaSig, Mango, ChIA-PET2, and ChIAPoP. The result showed that the new approach performed better than most top existing methods in detecting significant chromatin interactions in ChIA-PET experiments.
Collapse
Affiliation(s)
- Yibeltal Arega
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Hao Jiang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Shuangqi Wang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Jingwen Zhang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| | - Xiaohui Niu
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Guoliang Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan, China.,National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
6
|
Hernández-Lemus E, Reyes-Gopar H, Espinal-Enríquez J, Ochoa S. The Many Faces of Gene Regulation in Cancer: A Computational Oncogenomics Outlook. Genes (Basel) 2019; 10:E865. [PMID: 31671657 PMCID: PMC6896122 DOI: 10.3390/genes10110865] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Revised: 10/16/2019] [Accepted: 10/24/2019] [Indexed: 12/16/2022] Open
Abstract
Cancer is a complex disease at many different levels. The molecular phenomenology of cancer is also quite rich. The mutational and genomic origins of cancer and their downstream effects on processes such as the reprogramming of the gene regulatory control and the molecular pathways depending on such control have been recognized as central to the characterization of the disease. More important though is the understanding of their causes, prognosis, and therapeutics. There is a multitude of factors associated with anomalous control of gene expression in cancer. Many of these factors are now amenable to be studied comprehensively by means of experiments based on diverse omic technologies. However, characterizing each dimension of the phenomenon individually has proven to fall short in presenting a clear picture of expression regulation as a whole. In this review article, we discuss some of the more relevant factors affecting gene expression control both, under normal conditions and in tumor settings. We describe the different omic approaches that we can use as well as the computational genomic analysis needed to track down these factors. Then we present theoretical and computational frameworks developed to integrate the amount of diverse information provided by such single-omic analyses. We contextualize this within a systems biology-based multi-omic regulation setting, aimed at better understanding the complex interplay of gene expression deregulation in cancer.
Collapse
Affiliation(s)
- Enrique Hernández-Lemus
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| | - Helena Reyes-Gopar
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
| | - Jesús Espinal-Enríquez
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico.
| | - Soledad Ochoa
- Computational Genomics Division, National Institute of Genomic Medicine, Mexico City 14610, Mexico.
| |
Collapse
|