1
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and Deep Learning Methods for Predicting 3D Genome Organization. Methods Mol Biol 2025; 2856:357-400. [PMID: 39283464 DOI: 10.1007/978-1-0716-4136-1_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Three-dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, topologically associating domains (TADs), and A/B compartments, play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers and transcription factor binding site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, and TAD boundaries) and analyze their pros and cons. We also point out obstacles to the computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P G Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - J Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
2
|
Conte M, Abraham A, Esposito A, Yang L, Gibcus JH, Parsi KM, Vercellone F, Fontana A, Di Pierno F, Dekker J, Nicodemi M. Polymer Physics Models Reveal Structural Folding Features of Single-Molecule Gene Chromatin Conformations. Int J Mol Sci 2024; 25:10215. [PMID: 39337699 PMCID: PMC11432541 DOI: 10.3390/ijms251810215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 09/17/2024] [Accepted: 09/22/2024] [Indexed: 09/30/2024] Open
Abstract
Here, we employ polymer physics models of chromatin to investigate the 3D folding of a 2 Mb wide genomic region encompassing the human LTN1 gene, a crucial DNA locus involved in key cellular functions. Through extensive Molecular Dynamics simulations, we reconstruct in silico the ensemble of single-molecule LTN1 3D structures, which we benchmark against recent in situ Hi-C 2.0 data. The model-derived single molecules are then used to predict structural folding features at the single-cell level, providing testable predictions for super-resolution microscopy experiments.
Collapse
Affiliation(s)
- Mattia Conte
- Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant'Angelo, 80126 Naples, Italy
| | - Alex Abraham
- Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant'Angelo, 80126 Naples, Italy
| | - Andrea Esposito
- Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant'Angelo, 80126 Naples, Italy
| | - Liyan Yang
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Johan H Gibcus
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Krishna M Parsi
- Diabetes Center of Excellence and Program in Molecular Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01655, USA
| | - Francesca Vercellone
- DIETI, Università di Napoli Federico II, Via Claudio 21, 80125 Naples, Italy
- INFN Napoli, Complesso Universitario di Monte Sant'Angelo, 80126 Naples, Italy
| | - Andrea Fontana
- Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant'Angelo, 80126 Naples, Italy
| | - Florinda Di Pierno
- DIETI, Università di Napoli Federico II, Via Claudio 21, 80125 Naples, Italy
- INFN Napoli, Complesso Universitario di Monte Sant'Angelo, 80126 Naples, Italy
| | - Job Dekker
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Mario Nicodemi
- Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant'Angelo, 80126 Naples, Italy
| |
Collapse
|
3
|
Ozcelik F, Dundar MS, Yildirim AB, Henehan G, Vicente O, Sánchez-Alcázar JA, Gokce N, Yildirim DT, Bingol NN, Karanfilska DP, Bertelli M, Pojskic L, Ercan M, Kellermayer M, Sahin IO, Greiner-Tollersrud OK, Tan B, Martin D, Marks R, Prakash S, Yakubi M, Beccari T, Lal R, Temel SG, Fournier I, Ergoren MC, Mechler A, Salzet M, Maffia M, Danalev D, Sun Q, Nei L, Matulis D, Tapaloaga D, Janecke A, Bown J, Cruz KS, Radecka I, Ozturk C, Nalbantoglu OU, Sag SO, Ko K, Arngrimsson R, Belo I, Akalin H, Dundar M. The impact and future of artificial intelligence in medical genetics and molecular medicine: an ongoing revolution. Funct Integr Genomics 2024; 24:138. [PMID: 39147901 DOI: 10.1007/s10142-024-01417-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 08/01/2024] [Accepted: 08/05/2024] [Indexed: 08/17/2024]
Abstract
Artificial intelligence (AI) platforms have emerged as pivotal tools in genetics and molecular medicine, as in many other fields. The growth in patient data, identification of new diseases and phenotypes, discovery of new intracellular pathways, availability of greater sets of omics data, and the need to continuously analyse them have led to the development of new AI platforms. AI continues to weave its way into the fabric of genetics with the potential to unlock new discoveries and enhance patient care. This technology is setting the stage for breakthroughs across various domains, including dysmorphology, rare hereditary diseases, cancers, clinical microbiomics, the investigation of zoonotic diseases, omics studies in all medical disciplines. AI's role in facilitating a deeper understanding of these areas heralds a new era of personalised medicine, where treatments and diagnoses are tailored to the individual's molecular features, offering a more precise approach to combating genetic or acquired disorders. The significance of these AI platforms is growing as they assist healthcare professionals in the diagnostic and treatment processes, marking a pivotal shift towards more informed, efficient, and effective medical practice. In this review, we will explore the range of AI tools available and show how they have become vital in various sectors of genomic research supporting clinical decisions.
Collapse
Affiliation(s)
- Firat Ozcelik
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Mehmet Sait Dundar
- Department of Electrical and Computer Engineering, Graduate School of Engineering and Sciences, Abdullah Gul University, Kayseri, Turkey
| | - A Baki Yildirim
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Gary Henehan
- School of Food Science and Environmental Health, Technological University of Dublin, Dublin, Ireland
| | - Oscar Vicente
- Institute for the Conservation and Improvement of Valencian Agrodiversity (COMAV), Universitat Politècnica de València, Valencia, Spain
| | - José A Sánchez-Alcázar
- Centro de Investigación Biomédica en Red: Enfermedades Raras, Centro Andaluz de Biología del Desarrollo (CABD-CSIC-Universidad Pablo de Olavide), Instituto de Salud Carlos III, Sevilla, Spain
| | - Nuriye Gokce
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Duygu T Yildirim
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Nurdeniz Nalbant Bingol
- Department of Translational Medicine, Institute of Health Sciences, Bursa Uludag University, Bursa, Turkey
| | - Dijana Plaseska Karanfilska
- Research Centre for Genetic Engineering and Biotechnology, Macedonian Academy of Sciences and Arts, Skopje, Macedonia
| | | | - Lejla Pojskic
- Institute for Genetic Engineering and Biotechnology, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
| | - Mehmet Ercan
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Miklos Kellermayer
- Department of Biophysics and Radiation Biology, Faculty of Medicine, Semmelweis University, Budapest, Hungary
| | - Izem Olcay Sahin
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | | | - Busra Tan
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Donald Martin
- University Grenoble Alpes, CNRS, TIMC-IMAG/SyNaBi (UMR 5525), Grenoble, France
| | - Robert Marks
- Avram and Stella Goldstein-Goren Department of Biotechnology Engineering, Ben-Gurion University of the Negev, Be'er Sheva, Israel
| | - Satya Prakash
- Department of Biomedical Engineering, University of McGill, Montreal, QC, Canada
| | - Mustafa Yakubi
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey
| | - Tommaso Beccari
- Department of Pharmeceutical Sciences, University of Perugia, Perugia, Italy
| | - Ratnesh Lal
- Neuroscience Research Institute, University of California, Santa Barbara, USA
| | - Sehime G Temel
- Department of Translational Medicine, Institute of Health Sciences, Bursa Uludag University, Bursa, Turkey
- Department of Medical Genetics, Bursa Uludag University Faculty of Medicine, Bursa, Turkey
- Department of Histology and Embryology, Faculty of Medicine, Bursa Uludag University, Bursa, Turkey
| | - Isabelle Fournier
- Réponse Inflammatoire et Spectrométrie de Masse-PRISM, University of Lille, Lille, France
| | - M Cerkez Ergoren
- Department of Medical Genetics, Near East University Faculty of Medicine, Nicosia, Cyprus
| | - Adam Mechler
- Department of Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, VIC, Australia
| | - Michel Salzet
- Réponse Inflammatoire et Spectrométrie de Masse-PRISM, University of Lille, Lille, France
| | - Michele Maffia
- Department of Experimental Medicine, University of Salento, Via Lecce-Monteroni, Lecce, 73100, Italy
| | - Dancho Danalev
- University of Chemical Technology and Metallurgy, Sofia, Bulgaria
| | - Qun Sun
- Department of Food Science and Technology, Sichuan University, Chengdu, China
| | - Lembit Nei
- School of Engineering Tallinn University of Technology, Tartu College, Tartu, Estonia
| | - Daumantas Matulis
- Department of Biothermodynamics and Drug Design, Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Dana Tapaloaga
- Faculty of Veterinary Medicine, University of Agronomic Sciences and Veterinary Medicine of Bucharest, Bucharest, Romania
| | - Andres Janecke
- Department of Paediatrics I, Medical University of Innsbruck, Innsbruck, Austria
- Division of Human Genetics, Medical University of Innsbruck, Innsbruck, Austria
| | - James Bown
- School of Science, Engineering and Technology, Abertay University, Dundee, UK
| | | | - Iza Radecka
- School of Science, Faculty of Science and Engineering, University of Wolverhampton, Wolverhampton, UK
| | - Celal Ozturk
- Department of Software Engineering, Erciyes University, Kayseri, Turkey
| | - Ozkan Ufuk Nalbantoglu
- Department of Computer Engineering, Engineering Faculty, Erciyes University, Kayseri, Turkey
| | - Sebnem Ozemri Sag
- Department of Medical Genetics, Bursa Uludag University Faculty of Medicine, Bursa, Turkey
| | - Kisung Ko
- Department of Medicine, College of Medicine, Chung-Ang University, Seoul, Korea
| | - Reynir Arngrimsson
- Iceland Landspitali University Hospital, University of Iceland, Reykjavik, Iceland
| | - Isabel Belo
- Centre of Biological Engineering, University of Minho, Braga, Portugal
| | - Hilal Akalin
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey.
| | - Munis Dundar
- Department of Medical Genetics, Faculty of Medicine, Erciyes University, Kayseri, Turkey.
| |
Collapse
|
4
|
Liu B, Zhang W, Zeng X, Loza M, Park SJ, Nakai K. TF-EPI: an interpretable enhancer-promoter interaction detection method based on Transformer. Front Genet 2024; 15:1444459. [PMID: 39184348 PMCID: PMC11341371 DOI: 10.3389/fgene.2024.1444459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 07/24/2024] [Indexed: 08/27/2024] Open
Abstract
The detection of enhancer-promoter interactions (EPIs) is crucial for understanding gene expression regulation, disease mechanisms, and more. In this study, we developed TF-EPI, a deep learning model based on Transformer designed to detect these interactions solely from DNA sequences. The performance of TF-EPI surpassed that of other state-of-the-art methods on multiple benchmark datasets. Importantly, by utilizing the attention mechanism of the Transformer, we identified distinct cell type-specific motifs and sequences in enhancers and promoters, which were validated against databases such as JASPAR and UniBind, highlighting the potential of our method in discovering new biological insights. Moreover, our analysis of the transcription factors (TFs) corresponding to these motifs and short sequence pairs revealed the heterogeneity and commonality of gene regulatory mechanisms and demonstrated the ability to identify TFs relevant to the source information of the cell line. Finally, the introduction of transfer learning can mitigate the challenges posed by cell type-specific gene regulation, yielding enhanced accuracy in cross-cell line EPI detection. Overall, our work unveils important sequence information for the investigation of enhancer-promoter pairs based on the attention mechanism of the Transformer, providing an important milestone in the investigation of cis-regulatory grammar.
Collapse
Affiliation(s)
- Bowen Liu
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Weihang Zhang
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Xin Zeng
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
| | - Martin Loza
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Sung-Joon Park
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| | - Kenta Nakai
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, University of Tokyo, Tokyo, Japan
- Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo, Japan
| |
Collapse
|
5
|
Sokolova K, Chen KM, Hao Y, Zhou J, Troyanskaya OG. Deep Learning Sequence Models for Transcriptional Regulation. Annu Rev Genomics Hum Genet 2024; 25:105-122. [PMID: 38594933 DOI: 10.1146/annurev-genom-021623-024727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]
Abstract
Deciphering the regulatory code of gene expression and interpreting the transcriptional effects of genome variation are critical challenges in human genetics. Modern experimental technologies have resulted in an abundance of data, enabling the development of sequence-based deep learning models that link patterns embedded in DNA to the biochemical and regulatory properties contributing to transcriptional regulation, including modeling epigenetic marks, 3D genome organization, and gene expression, with tissue and cell-type specificity. Such methods can predict the functional consequences of any noncoding variant in the human genome, even rare or never-before-observed variants, and systematically characterize their consequences beyond what is tractable from experiments or quantitative genetics studies alone. Recently, the development and application of interpretability approaches have led to the identification of key sequence patterns contributing to the predicted tasks, providing insights into the underlying biological mechanisms learned and revealing opportunities for improvement in future models.
Collapse
Affiliation(s)
- Ksenia Sokolova
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; , ,
| | - Kathleen M Chen
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; , ,
| | - Yun Hao
- Flatiron Institute, Simons Foundation, New York, NY, USA;
| | - Jian Zhou
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA;
| | - Olga G Troyanskaya
- Princeton Precision Health, Princeton University, Princeton, New Jersey, USA
- Flatiron Institute, Simons Foundation, New York, NY, USA;
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; , ,
| |
Collapse
|
6
|
Ng YB, Akincilar SC. Shaping DNA damage responses: Therapeutic potential of targeting telomeric proteins and DNA repair factors in cancer. Curr Opin Pharmacol 2024; 76:102460. [PMID: 38776747 DOI: 10.1016/j.coph.2024.102460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 10/06/2023] [Accepted: 10/11/2023] [Indexed: 05/25/2024]
Abstract
Shelterin proteins regulate genomic stability by preventing inappropriate DNA damage responses (DDRs) at telomeres. Unprotected telomeres lead to persistent DDR causing cell cycle inhibition, growth arrest, and apoptosis. Cancer cells rely on DDR to protect themselves from DNA lesions and exogenous DNA-damaging agents such as chemotherapy and radiotherapy. Therefore, targeting DDR machinery is a promising strategy to increase the sensitivity of cancer cells to existing cancer therapies. However, the success of these DDR inhibitors depends on other mutations, and over time, patients develop resistance to these therapies. This suggests the need for alternative approaches. One promising strategy is co-inhibiting shelterin proteins with DDR molecules, which would offset cellular fitness in DNA repair in a mutation-independent manner. This review highlights the associations and dependencies of the shelterin complex with the DDR proteins and discusses potential co-inhibition strategies that might improve the therapeutic potential of current inhibitors.
Collapse
Affiliation(s)
- Yu Bin Ng
- Laboratory of NFκB Signalling, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A∗STAR), 61 Biopolis Drive, Proteos, Singapore 138673, Republic of Singapore
| | - Semih Can Akincilar
- Laboratory of NFκB Signalling, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A∗STAR), 61 Biopolis Drive, Proteos, Singapore 138673, Republic of Singapore.
| |
Collapse
|
7
|
Wu C, Huang J. Enhancer selectivity across cell types delineates three functionally distinct enhancer-promoter regulation patterns. BMC Genomics 2024; 25:483. [PMID: 38750461 PMCID: PMC11097474 DOI: 10.1186/s12864-024-10408-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Accepted: 05/13/2024] [Indexed: 05/18/2024] Open
Abstract
BACKGROUND Multiple enhancers co-regulating the same gene is prevalent and plays a crucial role during development and disease. However, how multiple enhancers coordinate the same gene expression across various cell types remains largely unexplored at genome scale. RESULTS We develop a computational approach that enables the quantitative assessment of enhancer specificity and selectivity across diverse cell types, leveraging enhancer-promoter (E-P) interactions data. We observe two well-known gene regulation patterns controlled by enhancer clusters, which regulate the same gene either in a limited number of cell types (Specific pattern, Spe) or in the majority of cell types (Conserved pattern, Con), both of which are enriched for super-enhancers (SEs). We identify a previously overlooked pattern (Variable pattern, Var) that multiple enhancers link to the same gene, but rarely coexist in the same cell type. These three patterns control the genes associating with distinct biological function and exhibit unique epigenetic features. Specifically, we discover a subset of Var patterns contains Shared enhancers with stable enhancer-promoter interactions in the majority of cell types, which might contribute to maintaining gene expression by recruiting abundant CTCF. CONCLUSIONS Together, our findings reveal three distinct E-P regulation patterns across different cell types, providing insights into deciphering the complexity of gene transcriptional regulation.
Collapse
Affiliation(s)
- Chengyi Wu
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China
| | - Jialiang Huang
- State Key Laboratory of Cellular Stress Biology, School of Life Sciences, Faculty of Medicine and Life Sciences, Xiamen University, Xiamen, 361102, Fujian, China.
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, 361102, Fujian, China.
| |
Collapse
|
8
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and deep learning methods for predicting 3D genome organization. ARXIV 2024:arXiv:2403.03231v1. [PMID: 38495565 PMCID: PMC10942493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers, Transcription Factor Binding Site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, TAD boundaries) and analyze their pros and cons. We also point out obstacles of computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P. G. Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - J. Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
9
|
Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, Ma J. Computational methods for analysing multiscale 3D genome organization. Nat Rev Genet 2024; 25:123-141. [PMID: 37673975 PMCID: PMC11127719 DOI: 10.1038/s41576-023-00638-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2023] [Indexed: 09/08/2023]
Abstract
Recent progress in whole-genome mapping and imaging technologies has enabled the characterization of the spatial organization and folding of the genome in the nucleus. In parallel, advanced computational methods have been developed to leverage these mapping data to reveal multiscale three-dimensional (3D) genome features and to provide a more complete view of genome structure and its connections to genome functions such as transcription. Here, we discuss how recently developed computational tools, including machine-learning-based methods and integrative structure-modelling frameworks, have led to a systematic, multiscale delineation of the connections among different scales of 3D genome organization, genomic and epigenomic features, functional nuclear components and genome function. However, approaches that more comprehensively integrate a wide variety of genomic and imaging datasets are still needed to uncover the functional role of 3D genome structure in defining cellular phenotypes in health and disease.
Collapse
Affiliation(s)
- Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tom Misteli
- Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| | - Frank Alber
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA.
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
10
|
Abbas A, Chandratre K, Gao Y, Yuan J, Zhang MQ, Mani RS. ChIPr: accurate prediction of cohesin-mediated 3D genome organization from 2D chromatin features. Genome Biol 2024; 25:15. [PMID: 38217027 PMCID: PMC10785520 DOI: 10.1186/s13059-023-03158-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 12/22/2023] [Indexed: 01/14/2024] Open
Abstract
The three-dimensional genome organization influences diverse nuclear processes. Here we present Chromatin Interaction Predictor (ChIPr), a suite of regression models based on deep neural networks, random forest, and gradient boosting to predict cohesin-mediated chromatin interaction strength between any two loci in the genome. The predictions of ChIPr correlate well with ChIA-PET data in four cell lines. The standard ChIPr model requires three experimental inputs: ChIP-Seq signals for RAD21, H3K27ac, and H3K27me3 but works well with just RAD21 signal. Integrative analysis reveals novel insights into the role of CTCF motif, its orientation, and CTCF binding on cohesin-mediated chromatin interactions.
Collapse
Affiliation(s)
- Ahmed Abbas
- Department of Pathology, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Khyati Chandratre
- Department of Biological Sciences, Center for Systems Biology, The University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Yunpeng Gao
- Department of Pathology, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Jiapei Yuan
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, 300020, China
| | - Michael Q Zhang
- Department of Biological Sciences, Center for Systems Biology, The University of Texas at Dallas, Richardson, TX, 75080, USA.
| | - Ram S Mani
- Department of Pathology, UT Southwestern Medical Center, Dallas, TX, 75390, USA.
- Department of Urology, UT Southwestern Medical Center, Dallas, TX, 75390, USA.
- Harold C. Simmons Comprehensive Cancer Center, UT Southwestern Medical Center, Dallas, TX, 75390, USA.
| |
Collapse
|
11
|
Arora S, Satija S, Mittal A, Solanki S, Mohanty SK, Srivastava V, Sengupta D, Rout D, Arul Murugan N, Borkar RM, Ahuja G. Unlocking The Mysteries of DNA Adducts with Artificial Intelligence. Chembiochem 2024; 25:e202300577. [PMID: 37874183 DOI: 10.1002/cbic.202300577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 10/18/2023] [Accepted: 10/23/2023] [Indexed: 10/25/2023]
Abstract
Cellular genome is considered a dynamic blueprint of a cell since it encodes genetic information that gets temporally altered due to various endogenous and exogenous insults. Largely, the extent of genomic dynamicity is controlled by the trade-off between DNA repair processes and the genotoxic potential of the causative agent (genotoxins or potential carcinogens). A subset of genotoxins form DNA adducts by covalently binding to the cellular DNA, triggering structural or functional changes that lead to significant alterations in cellular processes via genetic (e. g., mutations) or non-genetic (e. g., epigenome) routes. Identification, quantification, and characterization of DNA adducts are indispensable for their comprehensive understanding and could expedite the ongoing efforts in predicting carcinogenicity and their mode of action. In this review, we elaborate on using Artificial Intelligence (AI)-based modeling in adducts biology and present multiple computational strategies to gain advancements in decoding DNA adducts. The proposed AI-based strategies encompass predictive modeling for adduct formation via metabolic activation, novel adducts' identification, prediction of biochemical routes for adduct formation, adducts' half-life predictions within biological ecosystems, and, establishing methods to predict the link between adducts chemistry and its location within the genomic DNA. In summary, we discuss some futuristic AI-based approaches in DNA adduct biology.
Collapse
Affiliation(s)
- Sakshi Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Shiva Satija
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Aayushi Mittal
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Saveena Solanki
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Sanjay Kumar Mohanty
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Vaibhav Srivastava
- Division of Glycoscience, Department of Chemistry CBH School, Royal Institute of Technology (KTH) AlbaNova University Center, 10691, Stockholm, Sweden
| | - Debarka Sengupta
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Diptiranjan Rout
- Department of Transfusion Medicine National Cancer Institute, AIIMS, New Delhi, All India Institute of Medical Sciences, Ansari Nagar, New Delhi, 110608, India
| | - Natarajan Arul Murugan
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| | - Roshan M Borkar
- Department of Pharmaceutical Analysis, National Institute of Pharmaceutical Education and Research (NIPER)-Guwahati, Sila Katamur Halugurisuk P.O.: Changsari, Dist, Guwahati, Assam, 781101, India
| | - Gaurav Ahuja
- Department of Computational Biology, Indraprastha Institute of Information Technology (IIIT-Delhi) Okhla, Phase III, New Delhi, 110020, India
| |
Collapse
|
12
|
Zhang P, Wu H. IChrom-Deep: An Attention-Based Deep Learning Model for Identifying Chromatin Interactions. IEEE J Biomed Health Inform 2023; 27:4559-4568. [PMID: 37402191 DOI: 10.1109/jbhi.2023.3292299] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/06/2023]
Abstract
Identification of chromatin interactions is crucial for advancing our knowledge of gene regulation. However, due to the limitations of high-throughput experimental techniques, there is an urgent need to develop computational methods for predicting chromatin interactions. In this study, we propose a novel attention-based deep learning model, termed IChrom-Deep, to identify chromatin interactions using sequence features and genomic features. The experimental results based on the datasets of three cell lines demonstrate that the IChrom-Deep achieves satisfactory performance and is superior to the previous methods. We also investigate the effect of DNA sequence and associated features and genomic features on chromatin interactions, and highlight the applicable scenarios of some features, such as sequence conservation and distance. Moreover, we identify a few genomic features that are extremely important across different cell lines, and IChrom-Deep achieves comparable performance with only these significant genomic features versus using all genomic features. It is believed that IChrom-Deep can serve as a useful tool for future studies that seek to identify chromatin interactions.
Collapse
|
13
|
Xu H, Yi X, Fan X, Wu C, Wang W, Chu X, Zhang S, Dong X, Wang Z, Wang J, Zhou Y, Zhao K, Yao H, Zheng N, Wang J, Chen Y, Plewczynski D, Sham PC, Chen K, Huang D, Li MJ. Inferring CTCF-binding patterns and anchored loops across human tissues and cell types. PATTERNS (NEW YORK, N.Y.) 2023; 4:100798. [PMID: 37602215 PMCID: PMC10436006 DOI: 10.1016/j.patter.2023.100798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 01/25/2023] [Accepted: 06/20/2023] [Indexed: 08/22/2023]
Abstract
CCCTC-binding factor (CTCF) is a transcription regulator with a complex role in gene regulation. The recognition and effects of CTCF on DNA sequences, chromosome barriers, and enhancer blocking are not well understood. Existing computational tools struggle to assess the regulatory potential of CTCF-binding sites and their impact on chromatin loop formation. Here we have developed a deep-learning model, DeepAnchor, to accurately characterize CTCF binding using high-resolution genomic/epigenomic features. This has revealed distinct chromatin and sequence patterns for CTCF-mediated insulation and looping. An optimized implementation of a previous loop model based on DeepAnchor score excels in predicting CTCF-anchored loops. We have established a compendium of CTCF-anchored loops across 52 human tissue/cell types, and this suggests that genomic disruption of these loops could be a general mechanism of disease pathogenesis. These computational models and resources can help investigate how CTCF-mediated cis-regulatory elements shape context-specific gene regulation in cell development and disease progression.
Collapse
Affiliation(s)
- Hang Xu
- Department of Epidemiology and Biostatistics, Key Laboratory of Prevention and Control of Human Major Diseases (Ministry of Education), National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
- Singapore Immunology Network (SIgN), Agency for Science, Technology and Research (A∗STAR), Singapore 138648, Singapore
| | - Xianfu Yi
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xutong Fan
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Chengyue Wu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Wei Wang
- Department of Epidemiology and Biostatistics, Key Laboratory of Prevention and Control of Human Major Diseases (Ministry of Education), National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Xinlei Chu
- Department of Epidemiology and Biostatistics, Key Laboratory of Prevention and Control of Human Major Diseases (Ministry of Education), National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Shijie Zhang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xiaobao Dong
- Department of Genetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Zhao Wang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Jianhua Wang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Yao Zhou
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Ke Zhao
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hongcheng Yao
- Centre for PanorOmic Sciences-Genomics and Bioinformatics Cores, The University of Hong Kong, Hong Kong 999077, China
| | - Nan Zheng
- Department of Network Security and Informatization, Tianjin Medical University, Tianjin 300070, China
| | - Junwen Wang
- Department of Health Sciences Research and Center for Individualized Medicine, Mayo Clinic, Scottsdale, AZ 85259, USA
| | - Yupeng Chen
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Dariusz Plewczynski
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Pak Chung Sham
- Centre for PanorOmic Sciences-Genomics and Bioinformatics Cores, The University of Hong Kong, Hong Kong 999077, China
| | - Kexin Chen
- Department of Epidemiology and Biostatistics, Key Laboratory of Prevention and Control of Human Major Diseases (Ministry of Education), National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| | - Dandan Huang
- Wuxi School of Medicine, Jiangnan University, Wuxi 214122, China
| | - Mulin Jun Li
- Department of Epidemiology and Biostatistics, Key Laboratory of Prevention and Control of Human Major Diseases (Ministry of Education), National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| |
Collapse
|
14
|
Tan J, Shenker-Tauris N, Rodriguez-Hernaez J, Wang E, Sakellaropoulos T, Boccalatte F, Thandapani P, Skok J, Aifantis I, Fenyö D, Xia B, Tsirigos A. Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening. Nat Biotechnol 2023; 41:1140-1150. [PMID: 36624151 PMCID: PMC10329734 DOI: 10.1038/s41587-022-01612-8] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Accepted: 11/14/2022] [Indexed: 01/11/2023]
Abstract
Investigating how chromatin organization determines cell-type-specific gene expression remains challenging. Experimental methods for measuring three-dimensional chromatin organization, such as Hi-C, are costly and have technical limitations, restricting their broad application particularly in high-throughput genetic perturbations. We present C.Origami, a multimodal deep neural network that performs de novo prediction of cell-type-specific chromatin organization using DNA sequence and two cell-type-specific genomic features-CTCF binding and chromatin accessibility. C.Origami enables in silico experiments to examine the impact of genetic changes on chromatin interactions. We further developed an in silico genetic screening approach to assess how individual DNA elements may contribute to chromatin organization and to identify putative cell-type-specific trans-acting regulators that collectively determine chromatin architecture. Applying this approach to leukemia cells and normal T cells, we demonstrate that cell-type-specific in silico genetic screening, enabled by C.Origami, can be used to systematically discover novel chromatin regulation circuits in both normal and disease-related biological systems.
Collapse
Affiliation(s)
- Jimin Tan
- Institute for Systems Genetics, New York University Grossman School of Medicine, New York, NY, USA
| | - Nina Shenker-Tauris
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- Applied Bioinformatics Laboratories, New York University Grossman School of Medicine, New York, NY, USA
| | - Javier Rodriguez-Hernaez
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- Applied Bioinformatics Laboratories, New York University Grossman School of Medicine, New York, NY, USA
| | - Eric Wang
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- The Jackson Laboratory for Genomics Medicine, Farmington, CT, USA
| | | | - Francesco Boccalatte
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
- Department of Women's and Children's Health, University of Padua, Padua, Italy
| | - Palaniraja Thandapani
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Jane Skok
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Iannis Aifantis
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - David Fenyö
- Institute for Systems Genetics, New York University Grossman School of Medicine, New York, NY, USA
- Department of Biochemistry and Molecular Pharmacology, New York University Grossman School of Medicine, New York, NY, USA
| | - Bo Xia
- Institute for Systems Genetics, New York University Grossman School of Medicine, New York, NY, USA.
- Society of Fellows, Harvard University, Cambridge, MA, USA.
- Gene Regulation Observatory, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Aristotelis Tsirigos
- Department of Pathology, New York University Grossman School of Medicine, New York, NY, USA.
- Applied Bioinformatics Laboratories, New York University Grossman School of Medicine, New York, NY, USA.
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA.
| |
Collapse
|
15
|
Yeo SJ, Ying C, Fullwood MJ, Tergaonkar V. Emerging regulatory mechanisms of noncoding RNAs in topologically associating domains. Trends Genet 2023; 39:217-232. [PMID: 36642680 DOI: 10.1016/j.tig.2022.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 12/17/2022] [Accepted: 12/20/2022] [Indexed: 01/15/2023]
Abstract
Topologically associating domains (TADs) are integral to spatial genome organization, instructing gene expression, and cell fate. Recently, several advances have uncovered roles for noncoding RNAs (ncRNAs) in the regulation of the form and function of mammalian TADs. Phase separation has also emerged as a potential arbiter of ncRNAs in the regulation of TADs. In this review we discuss the implications of these novel findings in relation to how ncRNAs might structurally and functionally regulate TADs from two perspectives: moderating loop extrusion through interactions with architectural proteins, and facilitating TAD phase separation. Additionally, we propose future studies and directions to investigate these phenomena.
Collapse
Affiliation(s)
- Samuel Jianjie Yeo
- Laboratory of NFκB Signaling, Institute of Molecular Biology (IMCB), A*STAR (Agency for Science, Technology and Research), Singapore 138673, Singapore; Lee Kong Chian School of Medicine, Nanyang Technological University (NTU), Singapore 308232, Singapore
| | - Chen Ying
- Laboratory of NFκB Signaling, Institute of Molecular Biology (IMCB), A*STAR (Agency for Science, Technology and Research), Singapore 138673, Singapore
| | - Melissa Jane Fullwood
- Cancer Science Institute of Singapore, Centre for Translational Medicine, National University of Singapore, Singapore 117599, Singapore; School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore.
| | - Vinay Tergaonkar
- Laboratory of NFκB Signaling, Institute of Molecular Biology (IMCB), A*STAR (Agency for Science, Technology and Research), Singapore 138673, Singapore; Department of Pathology and the Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore (NUS), Singapore 117597, Singapore.
| |
Collapse
|
16
|
Multiomics characteristics and immunotherapeutic potential of EZH2 in pan-cancer. Biosci Rep 2023; 43:232355. [PMID: 36545914 PMCID: PMC9842950 DOI: 10.1042/bsr20222230] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 11/29/2022] [Accepted: 12/13/2022] [Indexed: 12/24/2022] Open
Abstract
Enhancer of zeste homolog 2 (EZH2) is a significant epigenetic regulator that plays a critical role in the development and progression of cancer. However, the multiomics features and immunological effects of EZH2 in pan-cancer remain unclear. Transcriptome and clinical raw data of pan-cancer samples were acquired from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases, and subsequent data analyses were conducted by using R software (version 4.1.0). Furthermore, numerous bioinformatics analysis databases also reapplied to comprehensively explore and elucidate the oncogenic mechanism and therapeutic potential of EZH2 from pan-cancer insight. Finally, quantitative reverse transcription polymerase chain reaction and immunohistochemical assays were performed to verify the differential expression of EZH2 gene in various cancers at the mRNA and protein levels. EZH2 was widely expressed in multiple normal and tumor tissues, predominantly located in the nucleoplasm. Compared with matched normal tissues, EZH2 was aberrantly expressed in most cancers either at the mRNA or protein level, which might be caused by genetic mutations, DNA methylation, and protein phosphorylation. Additionally, EZH2 expression was correlated with clinical prognosis, and its up-regulation usually indicated poor survival outcomes in cancer patients. Subsequent analysis revealed that EZH2 could promote tumor immune evasion through T-cell dysfunction and T-cell exclusion. Furthermore, expression of EZH2 exhibited a strong correlation with several immunotherapy-associated responses (i.e., immune checkpoint molecules, tumor mutation burden (TMB), microsatellite instability (MSI), mismatch repair (MMR) status, and neoantigens), suggesting that EZH2 appeared to be a novel target for evaluating the therapeutic efficacy of immunotherapy.
Collapse
|
17
|
Agarwal A, Chen L. DeepPHiC: predicting promoter-centered chromatin interactions using a novel deep learning approach. Bioinformatics 2023; 39:6887158. [PMID: 36495179 PMCID: PMC9825766 DOI: 10.1093/bioinformatics/btac801] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 11/23/2022] [Accepted: 12/09/2022] [Indexed: 12/14/2022] Open
Abstract
MOTIVATION Promoter-centered chromatin interactions, which include promoter-enhancer (PE) and promoter-promoter (PP) interactions, are important to decipher gene regulation and disease mechanisms. The development of next-generation sequencing technologies such as promoter capture Hi-C (pcHi-C) leads to the discovery of promoter-centered chromatin interactions. However, pcHi-C experiments are expensive and thus may be unavailable for tissues/cell types of interest. In addition, these experiments may be underpowered due to insufficient sequencing depth or various artifacts, which results in a limited finding of interactions. Most existing computational methods for predicting chromatin interactions are based on in situ Hi-C and can detect chromatin interactions across the entire genome. However, they may not be optimal for predicting promoter-centered chromatin interactions. RESULTS We develop a supervised multi-modal deep learning model, which utilizes a comprehensive set of features such as genomic sequence, epigenetic signal, anchor distance, evolutionary features and DNA structural features to predict tissue/cell type-specific PE and PP interactions. We further extend the deep learning model in a multi-task learning and a transfer learning framework and demonstrate that the proposed approach outperforms state-of-the-art deep learning methods. Moreover, the proposed approach can achieve comparable prediction performance using predefined biologically relevant tissues/cell types compared to using all tissues/cell types in the pretraining especially for predicting PE interactions. The prediction performance can be further improved by using computationally inferred biologically relevant tissues/cell types in the pretraining, which are defined based on the common genes in the proximity of two anchors in the chromatin interactions. AVAILABILITY AND IMPLEMENTATION https://github.com/lichen-lab/DeepPHiC. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Aman Agarwal
- Department of Computer Science, Indiana University, Bloomington, IN 47405, USA
| | - Li Chen
- To whom correspondence should be addressed.
| |
Collapse
|
18
|
Yang L, Yang Y, Huang L, Cui X, Liu Y. From single- to multi-omics: future research trends in medicinal plants. Brief Bioinform 2022; 24:6840072. [PMID: 36416120 PMCID: PMC9851310 DOI: 10.1093/bib/bbac485] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Revised: 10/13/2022] [Accepted: 10/14/2022] [Indexed: 11/25/2022] Open
Abstract
Medicinal plants are the main source of natural metabolites with specialised pharmacological activities and have been widely examined by plant researchers. Numerous omics studies of medicinal plants have been performed to identify molecular markers of species and functional genes controlling key biological traits, as well as to understand biosynthetic pathways of bioactive metabolites and the regulatory mechanisms of environmental responses. Omics technologies have been widely applied to medicinal plants, including as taxonomics, transcriptomics, metabolomics, proteomics, genomics, pangenomics, epigenomics and mutagenomics. However, because of the complex biological regulation network, single omics usually fail to explain the specific biological phenomena. In recent years, reports of integrated multi-omics studies of medicinal plants have increased. Until now, there have few assessments of recent developments and upcoming trends in omics studies of medicinal plants. We highlight recent developments in omics research of medicinal plants, summarise the typical bioinformatics resources available for analysing omics datasets, and discuss related future directions and challenges. This information facilitates further studies of medicinal plants, refinement of current approaches and leads to new ideas.
Collapse
Affiliation(s)
- Lifang Yang
- Kunming University of Science and Technology, China
| | - Ye Yang
- Kunming University of Science and Technology, China
| | - Luqi Huang
- the academician of the Chinese Academy of Engineering, studies the development of traditional Chinese medicine, Chinese Academy of Chinese Medical Sciences, China
| | - Xiuming Cui
- Corresponding authors. X. M. Cui, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail: ; Y. Liu, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail:
| | - Yuan Liu
- Corresponding authors. X. M. Cui, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail: ; Y. Liu, Yunnan Provincial Key Laboratory of Panax notoginseng, Faculty of Life Science and Technology, Kunming University of Science and Technology, Kunming, Yunnan 650500, China. E-mail:
| |
Collapse
|
19
|
Lan AY, Corces MR. Deep learning approaches for noncoding variant prioritization in neurodegenerative diseases. Front Aging Neurosci 2022; 14:1027224. [PMID: 36466610 PMCID: PMC9716280 DOI: 10.3389/fnagi.2022.1027224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 10/24/2022] [Indexed: 11/19/2022] Open
Abstract
Determining how noncoding genetic variants contribute to neurodegenerative dementias is fundamental to understanding disease pathogenesis, improving patient prognostication, and developing new clinical treatments. Next generation sequencing technologies have produced vast amounts of genomic data on cell type-specific transcription factor binding, gene expression, and three-dimensional chromatin interactions, with the promise of providing key insights into the biological mechanisms underlying disease. However, this data is highly complex, making it challenging for researchers to interpret, assimilate, and dissect. To this end, deep learning has emerged as a powerful tool for genome analysis that can capture the intricate patterns and dependencies within these large datasets. In this review, we organize and discuss the many unique model architectures, development philosophies, and interpretation methods that have emerged in the last few years with a focus on using deep learning to predict the impact of genetic variants on disease pathogenesis. We highlight both broadly-applicable genomic deep learning methods that can be fine-tuned to disease-specific contexts as well as existing neurodegenerative disease research, with an emphasis on Alzheimer's-specific literature. We conclude with an overview of the future of the field at the intersection of neurodegeneration, genomics, and deep learning.
Collapse
Affiliation(s)
- Alexander Y. Lan
- Gladstone Institute of Neurological Disease, San Francisco, CA, United States
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, United States
- Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| | - M. Ryan Corces
- Gladstone Institute of Neurological Disease, San Francisco, CA, United States
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, United States
- Department of Neurology, University of California San Francisco, San Francisco, CA, United States
| |
Collapse
|
20
|
Yang M, Ma J. Machine Learning Methods for Exploring Sequence Determinants of 3D Genome Organization. J Mol Biol 2022; 434:167666. [PMID: 35659533 DOI: 10.1016/j.jmb.2022.167666] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 05/23/2022] [Accepted: 05/27/2022] [Indexed: 01/25/2023]
Abstract
In higher eukaryotic cells, chromosomes are folded inside the nucleus. Recent advances in whole-genome mapping technologies have revealed the multiscale features of 3D genome organization that are intertwined with fundamental genome functions. However, DNA sequence determinants that modulate the formation of 3D genome organization remain poorly characterized. In the past few years, predicting 3D genome organization based on DNA sequence features has become an active area of research. Here, we review the recent progress in computational approaches to unraveling important sequence elements for 3D genome organization. In particular, we discuss the rapid development of machine learning-based methods that facilitate the connections between DNA sequence features and 3D genome architectures at different scales. While much progress has been made in developing predictive models for revealing important sequence features for 3D genome organization, new research is urgently needed to incorporate multi-omic data and enhance model interpretability, further advancing our understanding of gene regulation mechanisms through the lens of 3D genome organization.
Collapse
Affiliation(s)
- Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, United States. https://twitter.com/muyu_wendy_yang
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, United States.
| |
Collapse
|
21
|
Piecyk RS, Schlegel L, Johannes F. Predicting 3D chromatin interactions from DNA sequence using Deep Learning. Comput Struct Biotechnol J 2022; 20:3439-3448. [PMID: 35832620 PMCID: PMC9271978 DOI: 10.1016/j.csbj.2022.06.047] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/21/2022] [Accepted: 06/21/2022] [Indexed: 11/22/2022] Open
Abstract
Gene regulation in eukaryotes is profoundly shaped by the 3D organization of chromatin within the cell nucleus. Distal regulatory interactions between enhancers and their target genes are widespread and many causal loci underlying heritable agricultural or clinical traits have been mapped to distal cis-regulatory elements. Dissecting the sequence features that mediate such distal interactions is key to understanding their underlying biology. Deep Learning (DL) models coupled with genome-wide 3C-based sequencing data have emerged as powerful tools to infer the DNA sequence grammar underlying such distal interactions. In this review we show that most DL models have remarkably high prediction accuracy, which indicates that DNA sequence features are important determinants of chromatin looping. However, DL model training has so far been limited to a small set of human cell lines, raising questions about the generalization of these predictions to other tissue-types and species. Furthermore, we find that the model architecture seems less relevant for model performance than the training strategy and the data preparation step. Transfer learning, coupled with functionally curated interactions, appear to be the most promising approach to learn cell-type specific and possibly species- specific sequence features in future applications.
Collapse
Affiliation(s)
- Robert S. Piecyk
- Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Luca Schlegel
- Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
| | - Frank Johannes
- Department of Molecular Life Sciences, Technical University of Munich, Freising, Germany
- TUM Institute for Advanced Study, Garching, Germany
| |
Collapse
|
22
|
Akıncılar S, Chua J, Ng Q, Chan C, Eslami-S Z, Chen K, Low JL, Arumugam S, Aswad L, Chua C, Tan I, DasGupta R, Fullwood M, Tergaonkar V. Identification of mechanism of cancer-cell-specific reactivation of hTERT offers therapeutic opportunities for blocking telomerase specifically in human colorectal cancer. Nucleic Acids Res 2022; 51:1-16. [PMID: 35697349 PMCID: PMC9841410 DOI: 10.1093/nar/gkac479] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 04/18/2022] [Accepted: 05/26/2022] [Indexed: 01/29/2023] Open
Abstract
Transcriptional reactivation of hTERT is the limiting step in tumorigenesis. While mutations in hTERT promoter present in 19% of cancers are recognized as key drivers of hTERT reactivation, mechanisms by which wildtype hTERT (WT-hTERT) promoter is reactivated, in majority of human cancers, remain unknown. Using primary colorectal cancers (CRC) we identified Tert INTeracting region 2 (T-INT2), the critical chromatin region essential for reactivating WT-hTERT promoter in CRCs. Elevated β-catenin and JunD level in CRC facilitates chromatin interaction between hTERT promoter and T-INT2 that is necessary to turn on hTERTexpression. Pharmacological screens uncovered salinomycin, which inhibits JunD mediated hTERT-T-INT2 interaction that is required for the formation of a stable transcription complex on the hTERT promoter. Our results showed for the first time how known CRC alterations, such as APC, lead to WT-hTERT promoter reactivation during stepwise-tumorigenesis and provide a new perspective for developing cancer-specific drugs.
Collapse
Affiliation(s)
- Semih Can Akıncılar
- Division of Cancer Genetics and Therapeutics, Laboratory of NFκB Signaling, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), 138673, Singapore
| | - Joelle Yi Heng Chua
- Division of Cancer Genetics and Therapeutics, Laboratory of NFκB Signaling, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), 138673, Singapore
| | - Qin Feng Ng
- Division of Cancer Genetics and Therapeutics, Laboratory of NFκB Signaling, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), 138673, Singapore
| | - Claire Hian Tzer Chan
- Division of Cancer Genetics and Therapeutics, Laboratory of NFκB Signaling, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), 138673, Singapore
| | - Zahra Eslami-S
- Division of Cancer Genetics and Therapeutics, Laboratory of NFκB Signaling, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), 138673, Singapore
| | - Kaijing Chen
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore
| | - Joo-Leng Low
- Laboratory of Precision Oncology and Cancer Evolution, Genome Institute of Singapore, A*STAR, 138672, Singapore
| | - Surendar Arumugam
- Division of Cancer Genetics and Therapeutics, Laboratory of NFκB Signaling, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), 138673, Singapore
| | - Luay Aswad
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore
| | - Clarinda Chua
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), 138672, Singapore,Department of Medical Oncology, National Cancer Centre Singapore, 169610, Singapore
| | - Iain Beehuat Tan
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), 138672, Singapore,Department of Medical Oncology, National Cancer Centre Singapore, 169610, Singapore
| | - Ramanuj DasGupta
- Laboratory of Precision Oncology and Cancer Evolution, Genome Institute of Singapore, A*STAR, 138672, Singapore
| | - Melissa Jane Fullwood
- Cancer Science Institute of Singapore, National University of Singapore, 117599, Singapore,School of Biological Sciences, Nanyang Technological University, 637551, Singapore
| | - Vinay Tergaonkar
- To whom correspondence should be addressed. Tel: +65 65869836; Fax: +65 67791117;
| |
Collapse
|
23
|
Avdeyev P, Zhou J. Computational Approaches for Understanding Sequence Variation Effects on the 3D Genome Architecture. Annu Rev Biomed Data Sci 2022; 5:183-204. [PMID: 35537461 DOI: 10.1146/annurev-biodatasci-102521-012018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Decoding how genomic sequence and its variations affect 3D genome architecture is indispensable for understanding the genetic architecture of various traits and diseases. The 3D genome organization can be significantly altered by genome variations and in turn impact the function of the genomic sequence. Techniques for measuring the 3D genome architecture across spatial scales have opened up new possibilities for understanding how the 3D genome depends upon the genomic sequence and how it can be altered by sequence variations. Computational methods have become instrumental in analyzing and modeling the sequence effects on 3D genome architecture, and recent development in deep learning sequence models have opened up new opportunities for studying the interplay between sequence variations and the 3D genome. In this review, we focus on computational approaches for both the detection and modeling of sequence variation effects on the 3D genome, and we discuss the opportunities presented by these approaches. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Pavel Avdeyev
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA;
| | - Jian Zhou
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA;
| |
Collapse
|
24
|
Chen K, Zhao H, Yang Y. Capturing large genomic contexts for accurately predicting enhancer-promoter interactions. Brief Bioinform 2022; 23:6513727. [DOI: 10.1093/bib/bbab577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 12/13/2021] [Accepted: 12/15/2021] [Indexed: 11/14/2022] Open
Abstract
Abstract
Enhancer-promoter interaction (EPI) is a key mechanism underlying gene regulation. EPI prediction has always been a challenging task because enhancers could regulate promoters of distant target genes. Although many machine learning models have been developed, they leverage only the features in enhancers and promoters, or simply add the average genomic signals in the regions between enhancers and promoters, without utilizing detailed features between or outside enhancers and promoters. Due to a lack of large-scale features, existing methods could achieve only moderate performance, especially for predicting EPIs in different cell types. Here, we present a Transformer-based model, TransEPI, for EPI prediction by capturing large genomic contexts. TransEPI was developed based on EPI datasets derived from Hi-C or ChIA-PET data in six cell lines. To avoid over-fitting, we evaluated the TransEPI model by testing it on independent test datasets where the cell line and chromosome are different from the training data. TransEPI not only achieved consistent performance across the cross-validation and test datasets from different cell types but also outperformed the state-of-the-art machine learning and deep learning models. In addition, we found that the improved performance of TransEPI was attributed to the integration of large genomic contexts. Lastly, TransEPI was extended to study the non-coding mutations associated with brain disorders or neural diseases, and we found that TransEPI was also useful for predicting the target genes of non-coding mutations.
Collapse
|
25
|
Li R, Li L, Xu Y, Yang J. Machine learning meets omics: applications and perspectives. Brief Bioinform 2021; 23:6425809. [PMID: 34791021 DOI: 10.1093/bib/bbab460] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 09/29/2021] [Accepted: 10/07/2021] [Indexed: 02/07/2023] Open
Abstract
The innovation of biotechnologies has allowed the accumulation of omics data at an alarming rate, thus introducing the era of 'big data'. Extracting inherent valuable knowledge from various omics data remains a daunting problem in bioinformatics. Better solutions often need some kind of more innovative methods for efficient handlings and effective results. Recent advancements in integrated analysis and computational modeling of multi-omics data helped address such needs in an increasingly harmonious manner. The development and application of machine learning have largely advanced our insights into biology and biomedicine and greatly promoted the development of therapeutic strategies, especially for precision medicine. Here, we propose a comprehensive survey and discussion on what happened, is happening and will happen when machine learning meets omics. Specifically, we describe how artificial intelligence can be applied to omics studies and review recent advancements at the interface between machine learning and the ever-widest range of omics including genomics, transcriptomics, proteomics, metabolomics, radiomics, as well as those at the single-cell resolution. We also discuss and provide a synthesis of ideas, new insights, current challenges and perspectives of machine learning in omics.
Collapse
Affiliation(s)
- Rufeng Li
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China
| | - Lixin Li
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China
| | - Yungang Xu
- School of Electronics and Information, Northwestern Polytechnical University, Xi'an, 710129, China
| | - Juan Yang
- Department of Cell Biology and Genetics, School of Basic Medical Sciences, Xi'an Jiaotong University Health Science Center, Xi'an 710061, P. R. China.,Key Laboratory of Environment and Genes Related to Diseases (Xi'an Jiaotong University), Ministry of Education of China, Xi'an 710061, P. R. China
| |
Collapse
|