1
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and Deep Learning Methods for Predicting 3D Genome Organization. Methods Mol Biol 2025; 2856:357-400. [PMID: 39283464 DOI: 10.1007/978-1-0716-4136-1_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]
Abstract
Three-dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, topologically associating domains (TADs), and A/B compartments, play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers and transcription factor binding site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, and TAD boundaries) and analyze their pros and cons. We also point out obstacles to the computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P G Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - J Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA, USA
| | - Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA.
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
2
|
Lee Y, Park SH, Lee H. Prediction of the 3D cancer genome from whole-genome sequencing using InfoHiC. Mol Syst Biol 2024; 20:1156-1172. [PMID: 39322849 PMCID: PMC11535030 DOI: 10.1038/s44320-024-00065-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2023] [Revised: 09/03/2024] [Accepted: 09/09/2024] [Indexed: 09/27/2024] Open
Abstract
The 3D genome prediction in cancer is crucial for uncovering the impact of structural variations (SVs) on tumorigenesis, especially when they are present in noncoding regions. We present InfoHiC, a systemic framework for predicting the 3D cancer genome directly from whole-genome sequencing (WGS). InfoHiC utilizes contig-specific copy number encoding on the SV contig assembly, and performs a contig-to-total Hi-C conversion for the cancer Hi-C prediction from multiple SV contigs. We showed that InfoHiC can predict 3D genome folding from all types of SVs using breast cancer cell line data. We applied it to WGS data of patients with breast cancer and pediatric patients with medulloblastoma, and identified neo topologically associating domains. For breast cancer, we discovered super-enhancer hijacking events associated with oncogenic overexpression and poor survival outcomes. For medulloblastoma, we found SVs in noncoding regions that caused super-enhancer hijacking events of medulloblastoma driver genes (GFI1, GFI1B, and PRDM6). In addition, we provide trained models for cancer Hi-C prediction from WGS at https://github.com/dmcb-gist/InfoHiC , uncovering the impacts of SVs in cancer patients and revealing novel therapeutic targets.
Collapse
Affiliation(s)
- Yeonghun Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju, 61005, Republic of Korea
| | - Sung-Hye Park
- Department of Pathology, Seoul National University Hospital, Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea
- Neuroscience Research Institute, Seoul National University College of Medicine, 103 Daehak-ro, Jongno-gu, Seoul, 03080, Republic of Korea
| | - Hyunju Lee
- School of Electrical Engineering and Computer Science, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju, 61005, Republic of Korea.
- AI Graduate School, Gwangju Institute of Science and Technology, 123 Cheomdangwagi-ro, Buk-gu, Gwangju, 61005, Republic of Korea.
| |
Collapse
|
3
|
Fan S, Dang D, Gao L, Zhang S. ImputeHiFI: An Imputation Method for Multiplexed DNA FISH Data by Utilizing Single-Cell Hi-C and RNA FISH Data. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2406364. [PMID: 39264290 PMCID: PMC11558076 DOI: 10.1002/advs.202406364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2024] [Revised: 08/03/2024] [Indexed: 09/13/2024]
Abstract
Although multiplexed DNA fluorescence in situ hybridization (FISH) enables tracking the spatial localization of thousands of genomic loci using probes within individual cells, the high rates of undetected probes impede the depiction of 3D chromosome structures. Current data imputation methods neither utilize single-cell Hi-C data, which elucidate 3D genome architectures using sequencing nor leverage multimodal RNA FISH data that reflect cell-type information, limiting the effectiveness of these methods in complex tissues such as the mouse brain. To this end, a novel multiplexed DNA FISH imputation method named ImputeHiFI is proposed, which fully utilizes the complementary structural information from single-cell Hi-C data and the cell type signature from RNA FISH data to obtain a high-fidelity and complete spatial location of chromatin loci. ImputeHiFI enhances cell clustering, compartment identification, and cell subtype detection at the single-cell level in the mouse brain. ImputeHiFI improves the recognition of cell-type-specific loops in three high-resolution datasets. In short, ImputeHiFI is a powerful tool capable of imputing multiplexed DNA FISH data from various resolutions and imaging protocols, facilitating studies of 3D genome structures and functions.
Collapse
Affiliation(s)
- Shichen Fan
- School of Computer Science and TechnologyXidian UniversityXi'an710071China
| | - Dachang Dang
- School of AutomationNorthwestern Polytechnical UniversityXi'an710072China
| | - Lin Gao
- School of Computer Science and TechnologyXidian UniversityXi'an710071China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDSAcademy of Mathematics and Systems ScienceChinese Academy of SciencesBeijing100190China
- School of Mathematical SciencesUniversity of Chinese Academy of SciencesBeijing100049China
- Key Laboratory of Systems BiologyHangzhou Institute for Advanced StudyUniversity of Chinese Academy of SciencesChinese Academy of SciencesHangzhou310024China
| |
Collapse
|
4
|
Hristov BH, Noble WS, Bertero A. Systematic identification of interchromosomal interaction networks supports the existence of specialized RNA factories. Genome Res 2024; 34:1610-1623. [PMID: 39322282 PMCID: PMC11529845 DOI: 10.1101/gr.278327.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 08/30/2024] [Indexed: 09/27/2024]
Abstract
Most studies of genome organization have focused on intrachromosomal (cis) contacts because they harbor key features such as DNA loops and topologically associating domains. Interchromosomal (trans) contacts have received much less attention, and tools for interrogating potential biologically relevant trans structures are lacking. Here, we develop a computational framework that uses Hi-C data to identify sets of loci that jointly interact in trans This method, trans-C, initiates probabilistic random walks with restarts from a set of seed loci to traverse an input Hi-C contact network, thereby identifying sets of trans-contacting loci. We validate trans-C in three increasingly complex models of established trans contacts: the Plasmodium falciparum var genes, the mouse olfactory receptor "Greek islands," and the human RBM20 cardiac splicing factory. We then apply trans-C to systematically test the hypothesis that genes coregulated by the same trans-acting element (i.e., a transcription or splicing factor) colocalize in three dimensions to form "RNA factories" that maximize the efficiency and accuracy of RNA biogenesis. We find that many loci with multiple binding sites of the same DNA-binding proteins interact with one another in trans, especially those bound by factors with intrinsically disordered domains. Similarly, clustered binding of a subset of RNA-binding proteins correlates with trans interaction of the encoding loci. We observe that these trans-interacting loci are close to nuclear speckles. These findings support the existence of trans- interacting chromatin domains (TIDs) driven by RNA biogenesis. Trans-C provides an efficient computational framework for studying these and other types of trans interactions, empowering studies of a poorly understood aspect of genome architecture.
Collapse
Affiliation(s)
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA
| | - Alessandro Bertero
- Molecular Biotechnology Center "Guido Tarone," Department of Molecular Biotechnology and Health Sciences, University of Turin, 10126 Torino, Italy
| |
Collapse
|
5
|
Dekker J, Oksuz BA, Zhang Y, Wang Y, Minsk MK, Kuang S, Yang L, Gibcus JH, Krietenstein N, Rando OJ, Xu J, Janssens DH, Henikoff S, Kukalev A, Willemin A, Winick-Ng W, Kempfer R, Pombo A, Yu M, Kumar P, Zhang L, Belmont AS, Sasaki T, van Schaik T, Brueckner L, Peric-Hupkes D, van Steensel B, Wang P, Chai H, Kim M, Ruan Y, Zhang R, Quinodoz SA, Bhat P, Guttman M, Zhao W, Chien S, Liu Y, Venev SV, Plewczynski D, Azcarate II, Szabó D, Thieme CJ, Szczepińska T, Chiliński M, Sengupta K, Conte M, Esposito A, Abraham A, Zhang R, Wang Y, Wen X, Wu Q, Yang Y, Liu J, Boninsegna L, Yildirim A, Zhan Y, Chiariello AM, Bianco S, Lee L, Hu M, Li Y, Barnett RJ, Cook AL, Emerson DJ, Marchal C, Zhao P, Park P, Alver BH, Schroeder A, Navelkar R, Bakker C, Ronchetti W, Ehmsen S, Veit A, Gehlenborg N, Wang T, Li D, Wang X, Nicodemi M, Ren B, Zhong S, Phillips-Cremins JE, Gilbert DM, Pollard KS, Alber F, Ma J, Noble WS, Yue F. An integrated view of the structure and function of the human 4D nucleome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.17.613111. [PMID: 39484446 PMCID: PMC11526861 DOI: 10.1101/2024.09.17.613111] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
The dynamic three-dimensional (3D) organization of the human genome (the "4D Nucleome") is closely linked to genome function. Here, we integrate a wide variety of genomic data generated by the 4D Nucleome Project to provide a detailed view of human 3D genome organization in widely used embryonic stem cells (H1-hESCs) and immortalized fibroblasts (HFFc6). We provide extensive benchmarking of 3D genome mapping assays and integrate these diverse datasets to annotate spatial genomic features across scales. The data reveal a rich complexity of chromatin domains and their sub-nuclear positions, and over one hundred thousand structural loops and promoter-enhancer interactions. We developed 3D models of population-based and individual cell-to-cell variation in genome structure, establishing connections between chromosome folding, nuclear organization, chromatin looping, gene transcription, and DNA replication. We demonstrate the use of computational methods to predict genome folding from DNA sequence, uncovering potential effects of genetic variants on genome structure and function. Together, this comprehensive analysis contributes insights into human genome organization and enhances our understanding of connections between the regulation of genome function and 3D genome organization in general.
Collapse
Affiliation(s)
| | - Job Dekker
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
| | - Betul Akgol Oksuz
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Yang Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - Ye Wang
- Department of Microbiology, Immunology, and Molecular Genetics; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Miriam K. Minsk
- Department of Genetics, Department of Bioengineering, Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Liyan Yang
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Johan H. Gibcus
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Nils Krietenstein
- Novo Nordisk Foundation Center for Protein Research, University of Copenhagen
| | - Oliver J. Rando
- Department of Biochemistry and Molecular Biotechnology, University of Massachusetts Chan Medical School, Worcester, Massachusetts 01605, USA
| | - Jie Xu
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, Illinois, USA
| | - Derek H. Janssens
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Department of Epigenetics, Van Andel Institute, Grand Rapids, MI, USA
| | - Steven Henikoff
- Howard Hughes Medical Institute, Chevy Chase, MD, USA
- Basic Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Alexander Kukalev
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Andréa Willemin
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Warren Winick-Ng
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Rieke Kempfer
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Ana Pombo
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Miao Yu
- University of California, San Diego School of Medicine, Department of Cellular and Molecular Medicine, La Jolla, CA, USA
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Fudan University, Shanghai, China
| | - Pradeep Kumar
- Department of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Liguo Zhang
- Department of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Andrew S Belmont
- Department of Cell and Developmental Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Takayo Sasaki
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | - Tom van Schaik
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
- Oncode Institute, the Netherlands
| | - Laura Brueckner
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Daan Peric-Hupkes
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
- Oncode Institute, the Netherlands
| | - Bas van Steensel
- Division of Gene Regulation, Netherlands Cancer Institute, Amsterdam, the Netherlands
- Oncode Institute, the Netherlands
| | - Ping Wang
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, Illinois, USA
| | - Haoxi Chai
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang Province, 310058, P.R. China
| | - Minji Kim
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yijun Ruan
- Life Sciences Institute, Zhejiang University, Hangzhou, Zhejiang Province, 310058, P.R. China
| | - Ran Zhang
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Sofia A. Quinodoz
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - Prashant Bhat
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
- David Geffen School of Medicine at UCLA, Los Angeles, USA
| | - Mitchell Guttman
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | - Wenxin Zhao
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Shu Chien
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Yuan Liu
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Sergey V. Venev
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology ul. Koszykowa 75, 00-662 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c Street, 02-097 Warsaw, Poland
| | - Ibai Irastorza Azcarate
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Dominik Szabó
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Christoph J. Thieme
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
| | - Teresa Szczepińska
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, 10115 Berlin, Germany
- Centre for Advanced Materials and Technologies CEZAMAT, Warsaw University of Technology, Poleczki 19, 02-822 Warsaw, Poland
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Banacha 2c Street, 02-097 Warsaw, Poland
| | - Mateusz Chiliński
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology ul. Koszykowa 75, 00-662 Warsaw, Poland
| | - Kaustav Sengupta
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology ul. Koszykowa 75, 00-662 Warsaw, Poland
| | - Mattia Conte
- Department of Physics, University of Naples “Federico II”, Naples, Italy; INFN, Naples, Italy
| | - Andrea Esposito
- Department of Physics, University of Naples “Federico II”, Naples, Italy; INFN, Naples, Italy
| | - Alex Abraham
- Department of Physics, University of Naples “Federico II”, Naples, Italy; INFN, Naples, Italy
| | - Ruochi Zhang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - Yuchuan Wang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - Xingzhao Wen
- Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, USA
| | - Qiuyang Wu
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Yang Yang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - Jie Liu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology, and Molecular Genetics; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Asli Yildirim
- Department of Microbiology, Immunology, and Molecular Genetics; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Yuxiang Zhan
- Department of Microbiology, Immunology, and Molecular Genetics; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Andrea Maria Chiariello
- Department of Physics, University of Naples “Federico II”, Naples, Italy; INFN, Naples, Italy
| | - Simona Bianco
- Department of Physics, University of Naples “Federico II”, Naples, Italy; INFN, Naples, Italy
| | - Lindsay Lee
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, USA
| | - Yun Li
- Department of Biostatistics, Department of Genetics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - R. Jordan Barnett
- Department of Genetics, Department of Bioengineering, Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA
| | - Ashley L. Cook
- Department of Genetics, Department of Bioengineering, Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel J. Emerson
- Department of Genetics, Department of Bioengineering, Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA
| | | | - Peiyao Zhao
- San Diego Biomedical Research Institute, San Diego, CA, USA
| | - Peter Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Burak H. Alver
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Andrew Schroeder
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Rahi Navelkar
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Clara Bakker
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - William Ronchetti
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Shannon Ehmsen
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Alexander Veit
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Nils Gehlenborg
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02115
| | - Ting Wang
- Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Daofeng Li
- Department of Genetics, Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Xiaotao Wang
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, Illinois, USA
- Obstetrics and Gynecology Hospital, Institute of Reproduction and Development, Fudan University, Shanghai, China
| | - Mario Nicodemi
- Department of Physics, University of Naples “Federico II”, Naples, Italy; INFN, Naples, Italy
| | - Bing Ren
- University of California, San Diego School of Medicine, Department of Cellular and Molecular Medicine, La Jolla, CA, USA
| | - Sheng Zhong
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
| | - Jennifer E. Phillips-Cremins
- Department of Genetics, Department of Bioengineering, Epigenetics Institute, University of Pennsylvania, Philadelphia, PA, USA
| | | | | | - Frank Alber
- Department of Microbiology, Immunology, and Molecular Genetics; Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University
| | - William S. Noble
- Department of Genome Sciences, University of Washington, Seattle, WA 98195
| | - Feng Yue
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, Illinois, USA
- Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, Illinois, USA
| |
Collapse
|
6
|
Chiliński M, Plewczynski D. HiCDiffusion - diffusion-enhanced, transformer-based prediction of chromatin interactions from DNA sequences. BMC Genomics 2024; 25:964. [PMID: 39407104 PMCID: PMC11481779 DOI: 10.1186/s12864-024-10885-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Accepted: 10/09/2024] [Indexed: 10/19/2024] Open
Abstract
Prediction of chromatin interactions from DNA sequence has been a significant research challenge in the last couple of years. Several solutions have been proposed, most of which are based on encoder-decoder architecture, where 1D sequence is convoluted, encoded into the latent representation, and then decoded using 2D convolutions into the Hi-C pairwise chromatin spatial proximity matrix. Those methods, while obtaining high correlation scores and improved metrics, produce Hi-C matrices that are artificial - they are blurred due to the deep learning model architecture. In our study, we propose the HiCDiffusion, sequence-only model that addresses this problem. We first train the encoder-decoder neural network and then use it as a component of the diffusion model - where we guide the diffusion using a latent representation of the sequence, as well as the final output from the encoder-decoder. That way, we obtain the high-resolution Hi-C matrices that not only better resemble the experimental results - improving the Fréchet inception distance by an average of 11 times, with the highest improvement of 56 times - but also obtain similar classic metrics to current state-of-the-art encoder-decoder architectures used for the task.
Collapse
Affiliation(s)
- Mateusz Chiliński
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, 00-662, Poland
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, 02-097, Poland
| | - Dariusz Plewczynski
- Laboratory of Bioinformatics and Computational Genomics, Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, 00-662, Poland.
- Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, 02-097, Poland.
| |
Collapse
|
7
|
Jha A, Hristov B, Wang X, Wang S, Greenleaf WJ, Kundaje A, Aiden EL, Bertero A, Noble WS. Prediction and functional interpretation of inter-chromosomal genome architecture from DNA sequence with TwinC. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.09.16.613355. [PMID: 39345598 PMCID: PMC11429679 DOI: 10.1101/2024.09.16.613355] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/01/2024]
Abstract
Three-dimensional nuclear DNA architecture comprises well-studied intra-chromosomal (cis) folding and less characterized inter-chromosomal (trans) interfaces. Current predictive models of 3D genome folding can effectively infer pairwise cis-chromatin interactions from the primary DNA sequence but generally ignore trans contacts. There is an unmet need for robust models of trans-genome organization that provide insights into their underlying principles and functional relevance. We present TwinC, an interpretable convolutional neural network model that reliably predicts trans contacts measurable through genome-wide chromatin conformation capture (Hi-C). TwinC uses a paired sequence design from replicate Hi-C experiments to learn single base pair relevance in trans interactions across two stretches of DNA. The method achieves high predictive accuracy (AUROC=0.80) on a cross-chromosomal test set from Hi-C experiments in heart tissue. Mechanistically, the neural network learns the importance of compartments, chromatin accessibility, clustered transcription factor binding and G-quadruplexes in forming trans contacts. In summary, TwinC models and interprets trans genome architecture, shedding light on this poorly understood aspect of gene regulation.
Collapse
Affiliation(s)
- Anupama Jha
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Borislav Hristov
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Xiao Wang
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen Center for Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - Sheng Wang
- Paul G. Allen Center for Computer Science & Engineering, University of Washington, Seattle, WA, USA
| | - William J Greenleaf
- Department of Genetics, Stanford University, Stanford, CA, USA
- Center for Personal Dynamic Regulomes, Stanford University, Stanford, CA, USA
- Department of Applied Physics, Stanford University, Stanford, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Anshul Kundaje
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Computer Science, Stanford University Stanford, CA, USA
| | - Erez Lieberman Aiden
- The Center for Genome Architecture, Baylor College of Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Computer Science, Rice University, Houston, TX, USA
- Department of Computational and Applied Mathematics, Rice University, Houston, TX, USA
| | - Alessandro Bertero
- Molecular Biotechnology Center "Guido Tarone," Department of Molecular Biotechnology and Health Sciences, University of Turin, Torino, Italy
| | - William Stafford Noble
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- Paul G. Allen Center for Computer Science & Engineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
8
|
Conte M, Abraham A, Esposito A, Yang L, Gibcus JH, Parsi KM, Vercellone F, Fontana A, Di Pierno F, Dekker J, Nicodemi M. Polymer Physics Models Reveal Structural Folding Features of Single-Molecule Gene Chromatin Conformations. Int J Mol Sci 2024; 25:10215. [PMID: 39337699 PMCID: PMC11432541 DOI: 10.3390/ijms251810215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 09/17/2024] [Accepted: 09/22/2024] [Indexed: 09/30/2024] Open
Abstract
Here, we employ polymer physics models of chromatin to investigate the 3D folding of a 2 Mb wide genomic region encompassing the human LTN1 gene, a crucial DNA locus involved in key cellular functions. Through extensive Molecular Dynamics simulations, we reconstruct in silico the ensemble of single-molecule LTN1 3D structures, which we benchmark against recent in situ Hi-C 2.0 data. The model-derived single molecules are then used to predict structural folding features at the single-cell level, providing testable predictions for super-resolution microscopy experiments.
Collapse
Affiliation(s)
- Mattia Conte
- Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy
| | - Alex Abraham
- Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy
| | - Andrea Esposito
- Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy
| | - Liyan Yang
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Johan H. Gibcus
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
| | - Krishna M. Parsi
- Diabetes Center of Excellence and Program in Molecular Medicine, University of Massachusetts Chan Medical School, Worcester, MA 01655, USA
| | - Francesca Vercellone
- DIETI, Università di Napoli Federico II, Via Claudio 21, 80125 Naples, Italy
- INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy
| | - Andrea Fontana
- Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy
| | - Florinda Di Pierno
- DIETI, Università di Napoli Federico II, Via Claudio 21, 80125 Naples, Italy
- INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy
| | - Job Dekker
- Department of Systems Biology, University of Massachusetts Chan Medical School, Worcester, MA 01605, USA
- Howard Hughes Medical Institute, Chevy Chase, MD 20815, USA
| | - Mario Nicodemi
- Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant’Angelo, 80126 Naples, Italy
| |
Collapse
|
9
|
Zhang Y, Chen K, Tang SC, Cai Y, Nambu A, See YX, Fu C, Raju A, Lebeau B, Ling Z, Chan JJ, Tay Y, Mutwil M, Lakshmanan M, Tucker-Kellogg G, Chng WJ, Tenen DG, Osato M, Tergaonkar V, Fullwood MJ. Super-silencer perturbation by EZH2 and REST inhibition leads to large loss of chromatin interactions and reduction in cancer growth. Nat Struct Mol Biol 2024:10.1038/s41594-024-01391-7. [PMID: 39304765 DOI: 10.1038/s41594-024-01391-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Accepted: 06/28/2024] [Indexed: 09/22/2024]
Abstract
Human silencers have been shown to regulate developmental gene expression. However, the functional importance of human silencers needs to be elucidated, such as whether they can form 'super-silencers' and whether they are linked to cancer progression. Here, we show two silencer components of the FGF18 gene can cooperate through compensatory chromatin interactions to form a super-silencer. Double knockout of two silencers exhibited synergistic upregulation of FGF18 expression and changes in cell identity. To perturb the super-silencers, we applied combinational treatment of an enhancer of zeste homolog 2 inhibitor GSK343, and a repressor element 1-silencing transcription factor inhibitor, X5050 ('GR'). Interestingly, GR led to severe loss of topologically associated domains and loops, which were associated with reduced CTCF and TOP2A mRNA levels. Moreover, GR synergistically upregulated super-silencer-controlled genes related to cell cycle, apoptosis and DNA damage, leading to anticancer effects in vivo. Overall, our data demonstrated a super-silencer example and showed that GR can disrupt super-silencers, potentially leading to cancer ablation.
Collapse
Affiliation(s)
- Ying Zhang
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
- Genome Institute of Singapore, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Kaijing Chen
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Seng Chuan Tang
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Yichao Cai
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Akiko Nambu
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Yi Xiang See
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Chaoyu Fu
- Mechanobiology Institute, National University of Singapore, Singapore, Singapore
| | - Anandhkumar Raju
- Laboratory of NF-κB Signalling, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Benjamin Lebeau
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Zixun Ling
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Jia Jia Chan
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Yvonne Tay
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore (NUS), Singapore, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Manikandan Lakshmanan
- Laboratory of NF-κB Signalling, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Greg Tucker-Kellogg
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
- Computational Biology Programme, Faculty of Science, National University of Singapore, Singapore, Singapore
| | - Wee Joo Chng
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
- Department of Medicine, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
- NUS Centre for Cancer Research (N2CR), Centre for Translational Medicine, Singapore, Singapore
- Department of Hematology-Oncology, National University Cancer Institute of Singapore (NCIS), National University Health System (NUHS), Singapore, Singapore
| | - Daniel G Tenen
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
- Harvard Stem Cells Institute, Harvard Medical School, Boston, MA, USA
| | - Motomi Osato
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore
| | - Vinay Tergaonkar
- Laboratory of NF-κB Signalling, Institute of Molecular and Cell Biology (IMCB), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore (NUS), Singapore, Singapore
| | - Melissa Jane Fullwood
- Cancer Science Institute of Singapore, National University of Singapore, Singapore, Singapore.
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
- Institute of Molecular and Cell Biology, Agency for Science, Technology and Research (A*STAR), Singapore, Singapore.
| |
Collapse
|
10
|
Bhattacharyya S, Ay F. Identifying genetic variants associated with chromatin looping and genome function. Nat Commun 2024; 15:8174. [PMID: 39289357 PMCID: PMC11408621 DOI: 10.1038/s41467-024-52296-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Accepted: 08/30/2024] [Indexed: 09/19/2024] Open
Abstract
Here we present a comprehensive HiChIP dataset on naïve CD4 T cells (nCD4) from 30 donors and identify QTLs that associate with genotype-dependent and/or allele-specific variation of HiChIP contacts defining loops between active regulatory regions (iQTLs). We observe a substantial overlap between iQTLs and previously defined eQTLs and histone QTLs, and an enrichment for fine-mapped QTLs and GWAS variants. Furthermore, we describe a distinct subset of nCD4 iQTLs, for which the significant variation of chromatin contacts in nCD4 are translated into significant eQTL trends in CD4 T cell memory subsets. Finally, we define connectivity-QTLs as iQTLs that are significantly associated with concordant genotype-dependent changes in chromatin contacts over a broad genomic region (e.g., GWAS SNP in the RNASET2 locus). Our results demonstrate the importance of chromatin contacts as a complementary modality for QTL mapping and their power in identifying previously uncharacterized QTLs linked to cell-specific gene expression and connectivity.
Collapse
Affiliation(s)
| | - Ferhat Ay
- La Jolla Institute for Immunology, La Jolla, CA, USA.
- Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
11
|
Perlman BS, Burget N, Zhou Y, Schwartz GW, Petrovic J, Modrusan Z, Faryabi RB. Enhancer-promoter hubs organize transcriptional networks promoting oncogenesis and drug resistance. Nat Commun 2024; 15:8070. [PMID: 39277592 PMCID: PMC11401928 DOI: 10.1038/s41467-024-52375-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 09/04/2024] [Indexed: 09/17/2024] Open
Abstract
Recent advances in high-resolution mapping of spatial interactions among regulatory elements support the existence of complex topological assemblies of enhancers and promoters known as enhancer-promoter hubs or cliques. Yet, organization principles of these multi-interacting enhancer-promoter hubs and their potential role in regulating gene expression in cancer remain unclear. Here, we systematically identify enhancer-promoter hubs in breast cancer, lymphoma, and leukemia. We find that highly interacting enhancer-promoter hubs form at key oncogenes and lineage-associated transcription factors potentially promoting oncogenesis of these diverse cancer types. Genomic and optical mapping of interactions among enhancer and promoter elements further show that topological alterations in hubs coincide with transcriptional changes underlying acquired resistance to targeted therapy in T cell leukemia and B cell lymphoma. Together, our findings suggest that enhancer-promoter hubs are dynamic and heterogeneous topological assemblies with the potential to control gene expression circuits promoting oncogenesis and drug resistance.
Collapse
Affiliation(s)
- Brent S Perlman
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, USA
- Penn Epigenetics Institute, University of Pennsylvania, Philadelphia, USA
- Abramson Family Cancer Research Institute, University of Pennsylvania, Philadelphia, USA
| | - Noah Burget
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, USA
- Penn Epigenetics Institute, University of Pennsylvania, Philadelphia, USA
- Abramson Family Cancer Research Institute, University of Pennsylvania, Philadelphia, USA
| | - Yeqiao Zhou
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, USA
- Penn Epigenetics Institute, University of Pennsylvania, Philadelphia, USA
- Abramson Family Cancer Research Institute, University of Pennsylvania, Philadelphia, USA
| | - Gregory W Schwartz
- Princess Margaret Cancer Centre, University Health Network, Toronto, Canada
- Department of Medical Biophysics, University of Toronto, Toronto, Canada
| | - Jelena Petrovic
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, USA
| | - Zora Modrusan
- Department of Proteomic and Genomic Technologies, Genentech, South San Francisco, USA
| | - Robert B Faryabi
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, USA.
- Penn Epigenetics Institute, University of Pennsylvania, Philadelphia, USA.
- Abramson Family Cancer Research Institute, University of Pennsylvania, Philadelphia, USA.
| |
Collapse
|
12
|
Lin J, Luo R, Pinello L. EPInformer: a scalable deep learning framework for gene expression prediction by integrating promoter-enhancer sequences with multimodal epigenomic data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.01.606099. [PMID: 39131276 PMCID: PMC11312614 DOI: 10.1101/2024.08.01.606099] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Transcriptional regulation, critical for cellular differentiation and adaptation to environmental changes, involves coordinated interactions among DNA sequences, regulatory proteins, and chromatin architecture. Despite extensive data from consortia like ENCODE, understanding the dynamics of cis-regulatory elements (CREs) in gene expression remains challenging. Deep learning is a powerful tool for learning gene expression and epigenomic signals from DNA sequences, exhibiting superior performance compared to conventional machine learning approaches. However, even the most advanced deep learning-based methods may fall short in capturing the regulatory effects of distal elements such as enhancers, limiting their predictive accuracy. In addition, these methods may require significant resources to train or to adapt to newly generated data. To address these challenges, we present EPInformer, a scalable deep-learning framework for predicting gene expression by integrating promoter-enhancer interactions with their sequences, epigenomic signals, and chromatin contacts. Our model outperforms existing gene expression prediction models in rigorous cross-chromosome validation, accurately recapitulates enhancer-gene interactions validated by CRISPR perturbation experiments, and identifies crucial transcription factor motifs within regulatory sequences. EPInformer is available as open-source software at https://github.com/pinellolab/EPInformer.
Collapse
Affiliation(s)
- Jiecong Lin
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Department of Pathology, Harvard Medical School, Boston, Massachusetts 02129, USA
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Ruibang Luo
- Department of Computer Science, The University of Hong Kong, Hong Kong, China
| | - Luca Pinello
- Molecular Pathology Unit, Center for Cancer Research, Massachusetts General Hospital, Department of Pathology, Harvard Medical School, Boston, Massachusetts 02129, USA
| |
Collapse
|
13
|
Nikjoo H, Rahmanian S, Taleei R. Modelling DNA damage-repair and beyond. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2024; 190:1-18. [PMID: 38754703 DOI: 10.1016/j.pbiomolbio.2024.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Revised: 03/27/2024] [Accepted: 05/10/2024] [Indexed: 05/18/2024]
Abstract
The paper presents a review of mechanistic modelling studies of DNA damage and DNA repair, and consequences to follow in mammalian cell nucleus. We hypothesize DNA deletions are consequences of repair of double strand breaks leading to the modifications of genome that play crucial role in long term development of genetic inheritance and diseases. The aim of the paper is to review formation mechanisms underlying naturally occurring DNA deletions in the human genome and their potential relevance for bridging the gap between induced DNA double strand breaks and deletions in damaged human genome from endogenous and exogenous events. The model of the cell nucleus presented enables simulation of DNA damage at molecular level identifying the spectrum of damage induced in all chromosomal territories and loops. Our mechanistic modelling of DNA repair for double stand breaks (DSB), single strand breaks (SSB) and base damage (BD), shows the complexity of DNA damage is responsible for the longer repair times and the reason for the biphasic feature of mammalian cells repair curves. In the absence of experimentally determined data, the mechanistic model of repair predicts the in vivo rate constants for the proteins involved in the repair of DSB, SSB, and of BD.
Collapse
Affiliation(s)
- Hooshang Nikjoo
- Department of Physiology, Anatomy and Genetics (DPAG), Oxford University, Oxford, OX1 3PT, UK.
| | | | - Reza Taleei
- Medical Physics Division, Department of Radiation Oncology Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, PA, 19107, USA.
| |
Collapse
|
14
|
Chen V, Yang M, Cui W, Kim JS, Talwalkar A, Ma J. Applying interpretable machine learning in computational biology-pitfalls, recommendations and opportunities for new developments. Nat Methods 2024; 21:1454-1461. [PMID: 39122941 PMCID: PMC11348280 DOI: 10.1038/s41592-024-02359-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 06/24/2024] [Indexed: 08/12/2024]
Abstract
Recent advances in machine learning have enabled the development of next-generation predictive models for complex computational biology problems, thereby spurring the use of interpretable machine learning (IML) to unveil biological insights. However, guidelines for using IML in computational biology are generally underdeveloped. We provide an overview of IML methods and evaluation techniques and discuss common pitfalls encountered when applying IML methods to computational biology problems. We also highlight open questions, especially in the era of large language models, and call for collaboration between IML and computational biology researchers.
Collapse
Affiliation(s)
- Valerie Chen
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Muyu Yang
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Wenbo Cui
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Joon Sik Kim
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ameet Talwalkar
- Machine Learning Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| | - Jian Ma
- Ray and Stephanie Lane Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
15
|
Sokolova K, Chen KM, Hao Y, Zhou J, Troyanskaya OG. Deep Learning Sequence Models for Transcriptional Regulation. Annu Rev Genomics Hum Genet 2024; 25:105-122. [PMID: 38594933 DOI: 10.1146/annurev-genom-021623-024727] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/11/2024]
Abstract
Deciphering the regulatory code of gene expression and interpreting the transcriptional effects of genome variation are critical challenges in human genetics. Modern experimental technologies have resulted in an abundance of data, enabling the development of sequence-based deep learning models that link patterns embedded in DNA to the biochemical and regulatory properties contributing to transcriptional regulation, including modeling epigenetic marks, 3D genome organization, and gene expression, with tissue and cell-type specificity. Such methods can predict the functional consequences of any noncoding variant in the human genome, even rare or never-before-observed variants, and systematically characterize their consequences beyond what is tractable from experiments or quantitative genetics studies alone. Recently, the development and application of interpretability approaches have led to the identification of key sequence patterns contributing to the predicted tasks, providing insights into the underlying biological mechanisms learned and revealing opportunities for improvement in future models.
Collapse
Affiliation(s)
- Ksenia Sokolova
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; , ,
| | - Kathleen M Chen
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; , ,
| | - Yun Hao
- Flatiron Institute, Simons Foundation, New York, NY, USA;
| | - Jian Zhou
- Lyda Hill Department of Bioinformatics, University of Texas Southwestern Medical Center, Dallas, Texas, USA;
| | - Olga G Troyanskaya
- Princeton Precision Health, Princeton University, Princeton, New Jersey, USA
- Flatiron Institute, Simons Foundation, New York, NY, USA;
- Department of Computer Science and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA; , ,
| |
Collapse
|
16
|
Perlman BS, Burget N, Zhou Y, Schwartz GW, Petrovic J, Modrusan Z, Faryabi RB. Enhancer-promoter hubs organize transcriptional networks promoting oncogenesis and drug resistance. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.02.601745. [PMID: 39005446 PMCID: PMC11244972 DOI: 10.1101/2024.07.02.601745] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/16/2024]
Abstract
Recent advances in high-resolution mapping of spatial interactions among regulatory elements support the existence of complex topological assemblies of enhancers and promoters known as enhancer-promoter hubs or cliques. Yet, organization principles of these multi-interacting enhancer-promoter hubs and their potential role in regulating gene expression in cancer remains unclear. Here, we systematically identified enhancer-promoter hubs in breast cancer, lymphoma, and leukemia. We found that highly interacting enhancer-promoter hubs form at key oncogenes and lineage-associated transcription factors potentially promoting oncogenesis of these diverse cancer types. Genomic and optical mapping of interactions among enhancer and promoter elements further showed that topological alterations in hubs coincide with transcriptional changes underlying acquired resistance to targeted therapy in T cell leukemia and B cell lymphoma. Together, our findings suggest that enhancer-promoter hubs are dynamic and heterogeneous topological assemblies with the potential to control gene expression circuits promoting oncogenesis and drug resistance.
Collapse
|
17
|
Murtaza G, Butaney B, Wagner J, Singh R. scGrapHiC: deep learning-based graph deconvolution for Hi-C using single cell gene expression. Bioinformatics 2024; 40:i490-i500. [PMID: 38940151 PMCID: PMC11256916 DOI: 10.1093/bioinformatics/btae223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
SUMMARY Single-cell Hi-C (scHi-C) protocol helps identify cell-type-specific chromatin interactions and sheds light on cell differentiation and disease progression. Despite providing crucial insights, scHi-C data is often underutilized due to the high cost and the complexity of the experimental protocol. We present a deep learning framework, scGrapHiC, that predicts pseudo-bulk scHi-C contact maps using pseudo-bulk scRNA-seq data. Specifically, scGrapHiC performs graph deconvolution to extract genome-wide single-cell interactions from a bulk Hi-C contact map using scRNA-seq as a guiding signal. Our evaluations show that scGrapHiC, trained on seven cell-type co-assay datasets, outperforms typical sequence encoder approaches. For example, scGrapHiC achieves a substantial improvement of 23.2% in recovering cell-type-specific Topologically Associating Domains over the baselines. It also generalizes to unseen embryo and brain tissue samples. scGrapHiC is a novel method to generate cell-type-specific scHi-C contact maps using widely available genomic signals that enables the study of cell-type-specific chromatin interactions. AVAILABILITY AND IMPLEMENTATION The GitHub link: https://github.com/rsinghlab/scGrapHiC contains the source code of scGrapHiC and associated scripts to preprocess publicly available datasets to produce the results and visualizations we have discuss in this manuscript.
Collapse
Affiliation(s)
- Ghulam Murtaza
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI, 02912, United States
| | - Byron Butaney
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI, 02912, United States
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, 20899, United States
| | - Ritambhara Singh
- Department of Computer Science, Brown University, 115 Waterman Street, Providence, RI, 02912, United States
- Center for Computational Molecular Biology, Brown University, 164 Angell Street, Providence, RI, 02912, United States
| |
Collapse
|
18
|
Gjoni K, Pollard KS. SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models. Bioinformatics 2024; 40:btae340. [PMID: 38796686 PMCID: PMC11153836 DOI: 10.1093/bioinformatics/btae340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 05/04/2024] [Accepted: 05/24/2024] [Indexed: 05/28/2024] Open
Abstract
SUMMARY The increasing development of sequence-based machine learning models has raised the demand for manipulating sequences for this application. However, existing approaches to edit and evaluate genome sequences using models have limitations, such as incompatibility with structural variants, challenges in identifying responsible sequence perturbations, and the need for vcf file inputs and phased data. To address these bottlenecks, we present Sequence Mutator for Predictive Models (SuPreMo), a scalable and comprehensive tool for performing and supporting in silico mutagenesis experiments. We then demonstrate how pairs of reference and perturbed sequences can be used with machine learning models to prioritize pathogenic variants or discover new functional sequences. AVAILABILITY AND IMPLEMENTATION SuPreMo was written in Python, and can be run using only one line of code to generate both sequences and 3D genome disruption scores. The codebase, instructions for installation and use, and tutorials are on the GitHub page: https://github.com/ketringjoni/SuPreMo.
Collapse
Affiliation(s)
- Ketrin Gjoni
- Institute of Data Science and Biotechnology, Gladstone Institutes, 1650 Owens Street, San Francisco, CA 94158, United States
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, United States
| | - Katherine S Pollard
- Institute of Data Science and Biotechnology, Gladstone Institutes, 1650 Owens Street, San Francisco, CA 94158, United States
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, United States
- Chan Zuckerberg Biohub, San Francisco, CA 94158, United States
| |
Collapse
|
19
|
Wang X, Yue F. Hijacked enhancer-promoter and silencer-promoter loops in cancer. Curr Opin Genet Dev 2024; 86:102199. [PMID: 38669773 DOI: 10.1016/j.gde.2024.102199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Revised: 03/19/2024] [Accepted: 04/07/2024] [Indexed: 04/28/2024]
Abstract
Recent work has shown that besides inducing fusion genes, structural variations (SVs) can also contribute to oncogenesis by disrupting the three-dimensional genome organization and dysregulating gene expression. At the chromatin-loop level, SVs can relocate enhancers or silencers from their original genomic loci to activate oncogenes or repress tumor suppressor genes. On a larger scale, different types of alterations in topologically associating domains (TADs) have been reported in cancer, such as TAD expansion, shuffling, and SV-induced neo-TADs. Furthermore, the transformation from normal cells to cancerous cells is usually coupled with active or repressive compartmental switches, and cancer-specific compartments have been proposed. This review discusses the sites, and the other latest advances in studying how SVs disrupt higher-order genome structure in cancer, which in turn leads to oncogene dysregulation. We also highlight the clinical implications of these changes and the challenges ahead in this field.
Collapse
Affiliation(s)
- Xiaotao Wang
- Obstetrics and Gynecology Hospital, Institute of Reproduction and Development, Fudan University, Shanghai, China; Shanghai Key Laboratory of Reproduction and Development, Shanghai, China.
| | - Feng Yue
- Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, Illinois, USA; Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, Illinois, USA.
| |
Collapse
|
20
|
Zhang L, Bartosovic M. Single-cell mapping of cell-type specific chromatin architecture in the central nervous system. Curr Opin Struct Biol 2024; 86:102824. [PMID: 38723561 DOI: 10.1016/j.sbi.2024.102824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 03/22/2024] [Accepted: 04/08/2024] [Indexed: 05/19/2024]
Abstract
Determining how chromatin is structured in the nucleus is critical to studying its role in gene regulation. Recent advances in the analysis of single-cell chromatin architecture have considerably improved our understanding of cell-type-specific chromosome conformation and nuclear architecture. In this review, we discuss the methods used for analysis of 3D chromatin conformation, including sequencing-based methods, imaging-based techniques, and computational approaches. We further review the application of these methods in the study of the role of chromatin topology in neural development and disorders.
Collapse
Affiliation(s)
- Letian Zhang
- Department of Biochemistry and Biophysics, Svante Arrhenius väg 16C, 162 53, Stockholm, Sweden. https://twitter.com/LetianZHANG_
| | - Marek Bartosovic
- Department of Biochemistry and Biophysics, Svante Arrhenius väg 16C, 162 53, Stockholm, Sweden.
| |
Collapse
|
21
|
Liu T, Zhu H, Wang Z. Learning Micro-C from Hi-C with diffusion models. PLoS Comput Biol 2024; 20:e1012136. [PMID: 38758956 PMCID: PMC11139321 DOI: 10.1371/journal.pcbi.1012136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 05/30/2024] [Accepted: 05/05/2024] [Indexed: 05/19/2024] Open
Abstract
In the last few years, Micro-C has shown itself as an improved alternative to Hi-C. It replaced the restriction enzymes in Hi-C assays with micrococcal nuclease (MNase), resulting in capturing nucleosome resolution chromatin interactions. The signal-to-noise improvement of Micro-C allows it to detect more chromatin loops than high-resolution Hi-C. However, compared with massive Hi-C datasets available in the literature, there are only a limited number of Micro-C datasets. To take full advantage of these Hi-C datasets, we present HiC2MicroC, a computational method learning and then predicting Micro-C from Hi-C based on the denoising diffusion probabilistic models (DDPM). We trained our DDPM and other regression models in human foreskin fibroblast (HFFc6) cell line and evaluated these methods in six different cell types at 5-kb and 1-kb resolution. Our evaluations demonstrate that both HiC2MicroC and regression methods can markedly improve Hi-C towards Micro-C, and our DDPM-based HiC2MicroC outperforms regression in various terms. First, HiC2MicroC successfully recovers most of the Micro-C loops even those not detected in Hi-C maps. Second, a majority of the HiC2MicroC-recovered loops anchor CTCF binding sites in a convergent orientation. Third, HiC2MicroC loops share genomic and epigenetic properties with Micro-C loops, including linking promoters and enhancers, and their anchors are enriched for structural proteins (CTCF and cohesin) and histone modifications. Lastly, we find our recovered loops are also consistent with the loops identified from promoter capture Micro-C (PCMicro-C) and Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET). Overall, HiC2MicroC is an effective tool for further studying Hi-C data with Micro-C as a template. HiC2MicroC is publicly available at https://github.com/zwang-bioinformatics/HiC2MicroC/.
Collapse
Affiliation(s)
- Tong Liu
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| | - Hao Zhu
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| | - Zheng Wang
- Department of Computer Science, University of Miami, Coral Gables, Florida, United States of America
| |
Collapse
|
22
|
Camerino M, Chang W, Cvekl A. Analysis of long-range chromatin contacts, compartments and looping between mouse embryonic stem cells, lens epithelium and lens fibers. Epigenetics Chromatin 2024; 17:10. [PMID: 38643244 PMCID: PMC11031936 DOI: 10.1186/s13072-024-00533-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2023] [Accepted: 03/08/2024] [Indexed: 04/22/2024] Open
Abstract
BACKGROUND Nuclear organization of interphase chromosomes involves individual chromosome territories, "open" and "closed" chromatin compartments, topologically associated domains (TADs) and chromatin loops. The DNA- and RNA-binding transcription factor CTCF together with the cohesin complex serve as major organizers of chromatin architecture. Cellular differentiation is driven by temporally and spatially coordinated gene expression that requires chromatin changes of individual loci of various complexities. Lens differentiation represents an advantageous system to probe transcriptional mechanisms underlying tissue-specific gene expression including high transcriptional outputs of individual crystallin genes until the mature lens fiber cells degrade their nuclei. RESULTS Chromatin organization between mouse embryonic stem (ES) cells, newborn (P0.5) lens epithelium and fiber cells were analyzed using Hi-C. Localization of CTCF in both lens chromatins was determined by ChIP-seq and compared with ES cells. Quantitative analyses show major differences between number and size of TADs and chromatin loop size between these three cell types. In depth analyses show similarities between lens samples exemplified by overlaps between compartments A and B. Lens epithelium-specific CTCF peaks are found in mostly methylated genomic regions while lens fiber-specific and shared peaks occur mostly within unmethylated DNA regions. Major differences in TADs and loops are illustrated at the ~ 500 kb Pax6 locus, encoding the critical lens regulatory transcription factor and within a larger ~ 15 Mb WAGR locus, containing Pax6 and other loci linked to human congenital diseases. Lens and ES cell Hi-C data (TADs and loops) together with ATAC-seq, CTCF, H3K27ac, H3K27me3 and ENCODE cis-regulatory sites are shown in detail for the Pax6, Sox1 and Hif1a loci, multiple crystallin genes and other important loci required for lens morphogenesis. The majority of crystallin loci are marked by unexpectedly high CTCF-binding across their transcribed regions. CONCLUSIONS Our study has generated the first data on 3-dimensional (3D) nuclear organization in lens epithelium and lens fibers and directly compared these data with ES cells. These findings generate novel insights into lens-specific transcriptional gene control, open new research avenues to study transcriptional condensates in lens fiber cells, and enable studies of non-coding genetic variants linked to cataract and other lens and ocular abnormalities.
Collapse
Affiliation(s)
- Michael Camerino
- The Departments Genetics, Albert Einstein College of Medicine, NY10461, Bronx, USA
| | - William Chang
- Ophthalmology and Visual Sciences, Albert Einstein College of Medicine, NY10461, Bronx, USA
| | - Ales Cvekl
- The Departments Genetics, Albert Einstein College of Medicine, NY10461, Bronx, USA.
- Ophthalmology and Visual Sciences, Albert Einstein College of Medicine, NY10461, Bronx, USA.
| |
Collapse
|
23
|
Bell CG. Epigenomic insights into common human disease pathology. Cell Mol Life Sci 2024; 81:178. [PMID: 38602535 PMCID: PMC11008083 DOI: 10.1007/s00018-024-05206-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/11/2024] [Accepted: 03/13/2024] [Indexed: 04/12/2024]
Abstract
The epigenome-the chemical modifications and chromatin-related packaging of the genome-enables the same genetic template to be activated or repressed in different cellular settings. This multi-layered mechanism facilitates cell-type specific function by setting the local sequence and 3D interactive activity level. Gene transcription is further modulated through the interplay with transcription factors and co-regulators. The human body requires this epigenomic apparatus to be precisely installed throughout development and then adequately maintained during the lifespan. The causal role of the epigenome in human pathology, beyond imprinting disorders and specific tumour suppressor genes, was further brought into the spotlight by large-scale sequencing projects identifying that mutations in epigenomic machinery genes could be critical drivers in both cancer and developmental disorders. Abrogation of this cellular mechanism is providing new molecular insights into pathogenesis. However, deciphering the full breadth and implications of these epigenomic changes remains challenging. Knowledge is accruing regarding disease mechanisms and clinical biomarkers, through pathogenically relevant and surrogate tissue analyses, respectively. Advances include consortia generated cell-type specific reference epigenomes, high-throughput DNA methylome association studies, as well as insights into ageing-related diseases from biological 'clocks' constructed by machine learning algorithms. Also, 3rd-generation sequencing is beginning to disentangle the complexity of genetic and DNA modification haplotypes. Cell-free DNA methylation as a cancer biomarker has clear clinical utility and further potential to assess organ damage across many disorders. Finally, molecular understanding of disease aetiology brings with it the opportunity for exact therapeutic alteration of the epigenome through CRISPR-activation or inhibition.
Collapse
Affiliation(s)
- Christopher G Bell
- William Harvey Research Institute, Barts & The London Faculty of Medicine, Queen Mary University of London, Charterhouse Square, London, EC1M 6BQ, UK.
| |
Collapse
|
24
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and deep learning methods for predicting 3D genome organization. ARXIV 2024:arXiv:2403.03231v1. [PMID: 38495565 PMCID: PMC10942493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers, Transcription Factor Binding Site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, TAD boundaries) and analyze their pros and cons. We also point out obstacles of computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P. G. Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - J. Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
25
|
Zhang Y, Boninsegna L, Yang M, Misteli T, Alber F, Ma J. Computational methods for analysing multiscale 3D genome organization. Nat Rev Genet 2024; 25:123-141. [PMID: 37673975 PMCID: PMC11127719 DOI: 10.1038/s41576-023-00638-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/12/2023] [Indexed: 09/08/2023]
Abstract
Recent progress in whole-genome mapping and imaging technologies has enabled the characterization of the spatial organization and folding of the genome in the nucleus. In parallel, advanced computational methods have been developed to leverage these mapping data to reveal multiscale three-dimensional (3D) genome features and to provide a more complete view of genome structure and its connections to genome functions such as transcription. Here, we discuss how recently developed computational tools, including machine-learning-based methods and integrative structure-modelling frameworks, have led to a systematic, multiscale delineation of the connections among different scales of 3D genome organization, genomic and epigenomic features, functional nuclear components and genome function. However, approaches that more comprehensively integrate a wide variety of genomic and imaging datasets are still needed to uncover the functional role of 3D genome structure in defining cellular phenotypes in health and disease.
Collapse
Affiliation(s)
- Yang Zhang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Lorenzo Boninsegna
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA
| | - Muyu Yang
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Tom Misteli
- Center for Cancer Research, National Cancer Institute, Bethesda, MD, USA.
| | - Frank Alber
- Department of Microbiology, Immunology and Molecular Genetics and Institute for Quantitative and Computational Biosciences, University of California Los Angeles, Los Angeles, CA, USA.
| | - Jian Ma
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
| |
Collapse
|
26
|
Willemin A, Szabó D, Pombo A. Epigenetic regulatory layers in the 3D nucleus. Mol Cell 2024; 84:415-428. [PMID: 38242127 PMCID: PMC10872226 DOI: 10.1016/j.molcel.2023.12.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 11/21/2023] [Accepted: 12/15/2023] [Indexed: 01/21/2024]
Abstract
Nearly 7 decades have elapsed since Francis Crick introduced the central dogma of molecular biology, as part of his ideas on protein synthesis, setting the fundamental rules of sequence information transfer from DNA to RNAs and proteins. We have since learned that gene expression is finely tuned in time and space, due to the activities of RNAs and proteins on regulatory DNA elements, and through cell-type-specific three-dimensional conformations of the genome. Here, we review major advances in genome biology and discuss a set of ideas on gene regulation and highlight how various biomolecular assemblies lead to the formation of structural and regulatory features within the nucleus, with roles in transcriptional control. We conclude by suggesting further developments that will help capture the complex, dynamic, and often spatially restricted events that govern gene expression in mammalian cells.
Collapse
Affiliation(s)
- Andréa Willemin
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, Berlin, Germany; Humboldt-Universität zu Berlin, Institute for Biology, Berlin, Germany.
| | - Dominik Szabó
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, Berlin, Germany; Humboldt-Universität zu Berlin, Institute for Biology, Berlin, Germany
| | - Ana Pombo
- Max-Delbrück-Center for Molecular Medicine in the Helmholtz Association (MDC), Berlin Institute for Medical Systems Biology (BIMSB), Epigenetic Regulation and Chromatin Architecture Group, Berlin, Germany; Humboldt-Universität zu Berlin, Institute for Biology, Berlin, Germany.
| |
Collapse
|
27
|
Abbas A, Chandratre K, Gao Y, Yuan J, Zhang MQ, Mani RS. ChIPr: accurate prediction of cohesin-mediated 3D genome organization from 2D chromatin features. Genome Biol 2024; 25:15. [PMID: 38217027 PMCID: PMC10785520 DOI: 10.1186/s13059-023-03158-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 12/22/2023] [Indexed: 01/14/2024] Open
Abstract
The three-dimensional genome organization influences diverse nuclear processes. Here we present Chromatin Interaction Predictor (ChIPr), a suite of regression models based on deep neural networks, random forest, and gradient boosting to predict cohesin-mediated chromatin interaction strength between any two loci in the genome. The predictions of ChIPr correlate well with ChIA-PET data in four cell lines. The standard ChIPr model requires three experimental inputs: ChIP-Seq signals for RAD21, H3K27ac, and H3K27me3 but works well with just RAD21 signal. Integrative analysis reveals novel insights into the role of CTCF motif, its orientation, and CTCF binding on cohesin-mediated chromatin interactions.
Collapse
Affiliation(s)
- Ahmed Abbas
- Department of Pathology, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Khyati Chandratre
- Department of Biological Sciences, Center for Systems Biology, The University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Yunpeng Gao
- Department of Pathology, UT Southwestern Medical Center, Dallas, TX, 75390, USA
| | - Jiapei Yuan
- State Key Laboratory of Experimental Hematology, National Clinical Research Center for Blood Diseases, Haihe Laboratory of Cell Ecosystem, Institute of Hematology and Blood Diseases Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin, 300020, China
| | - Michael Q Zhang
- Department of Biological Sciences, Center for Systems Biology, The University of Texas at Dallas, Richardson, TX, 75080, USA.
| | - Ram S Mani
- Department of Pathology, UT Southwestern Medical Center, Dallas, TX, 75390, USA.
- Department of Urology, UT Southwestern Medical Center, Dallas, TX, 75390, USA.
- Harold C. Simmons Comprehensive Cancer Center, UT Southwestern Medical Center, Dallas, TX, 75390, USA.
| |
Collapse
|
28
|
Gjoni K, Pollard KS. SuPreMo: a computational tool for streamlining in silico perturbation using sequence-based predictive models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.03.565556. [PMID: 37961123 PMCID: PMC10635135 DOI: 10.1101/2023.11.03.565556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Computationally editing genome sequences is a common bioinformatics task, but current approaches have limitations, such as incompatibility with structural variants, challenges in identifying responsible sequence perturbations, and the need for vcf file inputs and phased data. To address these bottlenecks, we present Sequence Mutator for Predictive Models (SuPreMo), a scalable and comprehensive tool for performing in silico mutagenesis. We then demonstrate how pairs of reference and perturbed sequences can be used with machine learning models to prioritize pathogenic variants or discover new functional sequences.
Collapse
Affiliation(s)
- Ketrin Gjoni
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, USA
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
| |
Collapse
|
29
|
Klie A, Laub D, Talwar JV, Stites H, Jores T, Solvason JJ, Farley EK, Carter H. Predictive analyses of regulatory sequences with EUGENe. NATURE COMPUTATIONAL SCIENCE 2023; 3:946-956. [PMID: 38177592 PMCID: PMC10768637 DOI: 10.1038/s43588-023-00544-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 09/27/2023] [Indexed: 01/06/2024]
Abstract
Deep learning has become a popular tool to study cis-regulatory function. Yet efforts to design software for deep-learning analyses in regulatory genomics that are findable, accessible, interoperable and reusable (FAIR) have fallen short of fully meeting these criteria. Here we present elucidating the utility of genomic elements with neural nets (EUGENe), a FAIR toolkit for the analysis of genomic sequences with deep learning. EUGENe consists of a set of modules and subpackages for executing the key functionality of a genomics deep learning workflow: (1) extracting, transforming and loading sequence data from many common file formats; (2) instantiating, initializing and training diverse model architectures; and (3) evaluating and interpreting model behavior. We designed EUGENe as a simple, flexible and extensible interface for streamlining and customizing end-to-end deep-learning sequence analyses, and illustrate these principles through application of the toolkit to three predictive modeling tasks. We hope that EUGENe represents a springboard towards a collaborative ecosystem for deep-learning applications in genomics research.
Collapse
Affiliation(s)
- Adam Klie
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - David Laub
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - James V Talwar
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | | | - Tobias Jores
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
| | - Joe J Solvason
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Department of Molecular Biology, University of California San Diego, La Jolla, CA, USA
| | - Emma K Farley
- Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Department of Molecular Biology, University of California San Diego, La Jolla, CA, USA
| | - Hannah Carter
- Department of Medicine, University of California San Diego, La Jolla, CA, USA.
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
30
|
Badia-I-Mompel P, Wessels L, Müller-Dott S, Trimbour R, Ramirez Flores RO, Argelaguet R, Saez-Rodriguez J. Gene regulatory network inference in the era of single-cell multi-omics. Nat Rev Genet 2023; 24:739-754. [PMID: 37365273 DOI: 10.1038/s41576-023-00618-5] [Citation(s) in RCA: 74] [Impact Index Per Article: 74.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2023] [Indexed: 06/28/2023]
Abstract
The interplay between chromatin, transcription factors and genes generates complex regulatory circuits that can be represented as gene regulatory networks (GRNs). The study of GRNs is useful to understand how cellular identity is established, maintained and disrupted in disease. GRNs can be inferred from experimental data - historically, bulk omics data - and/or from the literature. The advent of single-cell multi-omics technologies has led to the development of novel computational methods that leverage genomic, transcriptomic and chromatin accessibility information to infer GRNs at an unprecedented resolution. Here, we review the key principles of inferring GRNs that encompass transcription factor-gene interactions from transcriptomics and chromatin accessibility data. We focus on the comparison and classification of methods that use single-cell multimodal data. We highlight challenges in GRN inference, in particular with respect to benchmarking, and potential further developments using additional data modalities.
Collapse
Affiliation(s)
- Pau Badia-I-Mompel
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Lorna Wessels
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Department of Vascular Biology and Tumor Angiogenesis, European Center for Angioscience, Medical Faculty, MannHeim Heidelberg University, Mannheim, Germany
| | - Sophia Müller-Dott
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | - Rémi Trimbour
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
- Institut Pasteur, Université Paris Cité, CNRS UMR 3738, Machine Learning for Integrative Genomics Group, Paris, France
| | - Ricardo O Ramirez Flores
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany
| | | | - Julio Saez-Rodriguez
- Heidelberg University, Faculty of Medicine, Heidelberg University Hospital, Institute for Computational Biomedicine, Bioquant, Heidelberg, Germany.
| |
Collapse
|
31
|
Brand CM, Kuang S, Gilbertson EN, McArthur E, Pollard KS, Webster TH, Capra JA. Sequence-based machine learning reveals 3D genome differences between bonobos and chimpanzees. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.26.564272. [PMID: 37961120 PMCID: PMC10634871 DOI: 10.1101/2023.10.26.564272] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Phenotypic divergence between closely related species, including bonobos and chimpanzees (genus Pan), is largely driven by variation in gene regulation. The 3D structure of the genome mediates gene expression; however, genome folding differences in Pan are not well understood. Here, we apply machine learning to predict genome-wide 3D genome contact maps from DNA sequence for 56 bonobos and chimpanzees, encompassing all five extant lineages. We use a pairwise approach to estimate 3D divergence between individuals from the resulting contact maps in 4,420 1 Mb genomic windows. While most pairs were similar, ∼17% were predicted to be substantially divergent in genome folding. The most dissimilar maps were largely driven by single individuals with rare variants that produce unique 3D genome folding in a region. We also identified 89 genomic windows where bonobo and chimpanzee contact maps substantially diverged, including several windows harboring genes associated with traits implicated in Pan phenotypic divergence. We used in silico mutagenesis to identify 51 3D-modifying variants in these bonobo-chimpanzee divergent windows, finding that 34 or 66.67% induce genome folding changes via CTCF binding motif disruption. Our results reveal 3D genome variation at the population-level and identify genomic regions where changes in 3D folding may contribute to phenotypic differences in our closest living relatives.
Collapse
Affiliation(s)
- Colin M. Brand
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA
| | - Shuzhen Kuang
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
| | - Erin N. Gilbertson
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
- Biomedical Informatics Graduate Program, University of California San Francisco, San Francisco, CA
| | - Evonne McArthur
- Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN
| | - Katherine S. Pollard
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Biomedical Informatics Graduate Program, University of California San Francisco, San Francisco, CA
- Chan Zuckerberg Biohub, San Francisco, CA
| | | | - John A. Capra
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, CA
- Biomedical Informatics Graduate Program, University of California San Francisco, San Francisco, CA
| |
Collapse
|
32
|
Gunsalus LM, Keiser MJ, Pollard KS. In silico discovery of repetitive elements as key sequence determinants of 3D genome folding. CELL GENOMICS 2023; 3:100410. [PMID: 37868032 PMCID: PMC10589630 DOI: 10.1016/j.xgen.2023.100410] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Revised: 11/08/2022] [Accepted: 08/31/2023] [Indexed: 10/24/2023]
Abstract
Natural and experimental genetic variants can modify DNA loops and insulating boundaries to tune transcription, but it is unknown how sequence perturbations affect chromatin organization genome wide. We developed a deep-learning strategy to quantify the effect of any insertion, deletion, or substitution on chromatin contacts and systematically scored millions of synthetic variants. While most genetic manipulations have little impact, regions with CTCF motifs and active transcription are highly sensitive, as expected. Our unbiased screen and subsequent targeted experiments also point to noncoding RNA genes and several families of repetitive elements as CTCF-motif-free DNA sequences with particularly large effects on nearby chromatin interactions, sometimes exceeding the effects of CTCF sites and explaining interactions that lack CTCF. We anticipate that our disruption tracks may be of broad interest and utility as a measure of 3D genome sensitivity, and our computational strategies may serve as a template for biological inquiry with deep learning.
Collapse
Affiliation(s)
- Laura M. Gunsalus
- Gladstone Institutes, San Francisco, CA, USA
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
| | - Michael J. Keiser
- Institute for Neurodegenerative Diseases, University of California, San Francisco, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, USA
- Kavli Institute for Fundamental Neuroscience, University of California, San Francisco, San Francisco, CA, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, CA, USA
| | - Katherine S. Pollard
- Gladstone Institutes, San Francisco, CA, USA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, CA, USA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA, USA
| |
Collapse
|
33
|
Baur B, Roy S. Predicting patient-specific enhancer-promoter interactions. CELL REPORTS METHODS 2023; 3:100594. [PMID: 37751694 PMCID: PMC10545932 DOI: 10.1016/j.crmeth.2023.100594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 08/30/2023] [Accepted: 08/30/2023] [Indexed: 09/28/2023]
Abstract
Computational methods that can predict hard-to-measure modalities from those that are easier to measure, in a patient-specific manner, play a critical role in personalized medicine. In this issue of Cell Reports Methods, Khurana et al. present differential gene targets of accessible chromatin (DGTAC), an approach which predicts patient-specific enhancer-promoter interactions.
Collapse
Affiliation(s)
- Brittany Baur
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, WI 53715, USA; The Max Harry Weil Institute of Critical Care Research & Innovation, University of Michigan, Ann Arbor, MI, USA; Department of Emergency Medicine, University of Michigan Medical School, Ann Arbor, MI, USA
| | - Sushmita Roy
- Wisconsin Institute for Discovery, 330 N. Orchard Street, Madison, WI 53715, USA; Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53715, USA.
| |
Collapse
|
34
|
Tan W, Shen Y. Multimodal learning of noncoding variant effects using genome sequence and chromatin structure. Bioinformatics 2023; 39:btad541. [PMID: 37669132 PMCID: PMC10502240 DOI: 10.1093/bioinformatics/btad541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 08/28/2023] [Accepted: 09/04/2023] [Indexed: 09/07/2023] Open
Abstract
MOTIVATION A growing amount of noncoding genetic variants, including single-nucleotide polymorphisms, are found to be associated with complex human traits and diseases. Their mechanistic interpretation is relatively limited and can use the help from computational prediction of their effects on epigenetic profiles. However, current models often focus on local, 1D genome sequence determinants and disregard global, 3D chromatin structure that critically affects epigenetic events. RESULTS We find that noncoding variants of unexpected high similarity in epigenetic profiles, with regards to their relatively low similarity in local sequences, can be largely attributed to their proximity in chromatin structure. Accordingly, we have developed a multimodal deep learning scheme that incorporates both data of 1D genome sequence and 3D chromatin structure for predicting noncoding variant effects. Specifically, we have integrated convolutional and recurrent neural networks for sequence embedding and graph neural networks for structure embedding despite the resolution gap between the two types of data, while utilizing recent DNA language models. Numerical results show that our models outperform competing sequence-only models in predicting epigenetic profiles and their use of long-range interactions complement sequence-only models in extracting regulatory motifs. They prove to be excellent predictors for noncoding variant effects in gene expression and pathogenicity, whether in unsupervised "zero-shot" learning or supervised "few-shot" learning. AVAILABILITY AND IMPLEMENTATION Codes and data can be accessed at https://github.com/Shen-Lab/ncVarPred-1D3D and https://zenodo.org/record/7975777.
Collapse
Affiliation(s)
- Wuwei Tan
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
| | - Yang Shen
- Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, United States
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX 77843, United States
- Institute of Biosciences and Technology and Department of Translational Medical Sciences, College of Medicine, Texas A&M University, Houston, TX 77030, United States
| |
Collapse
|
35
|
Gao VR, Yang R, Das A, Luo R, Luo H, McNally DR, Karagiannidis I, Rivas MA, Wang ZM, Barisic D, Karbalayghareh A, Wong W, Zhan YA, Chin CR, Noble W, Bilmes JA, Apostolou E, Kharas MG, Béguelin W, Viny AD, Huangfu D, Rudensky AY, Melnick AM, Leslie CS. ChromaFold predicts the 3D contact map from single-cell chromatin accessibility. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.27.550836. [PMID: 37546906 PMCID: PMC10402156 DOI: 10.1101/2023.07.27.550836] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
The identification of cell-type-specific 3D chromatin interactions between regulatory elements can help to decipher gene regulation and to interpret the function of disease-associated non-coding variants. However, current chromosome conformation capture (3C) technologies are unable to resolve interactions at this resolution when only small numbers of cells are available as input. We therefore present ChromaFold, a deep learning model that predicts 3D contact maps and regulatory interactions from single-cell ATAC sequencing (scATAC-seq) data alone. ChromaFold uses pseudobulk chromatin accessibility, co-accessibility profiles across metacells, and predicted CTCF motif tracks as input features and employs a lightweight architecture to enable training on standard GPUs. Once trained on paired scATAC-seq and Hi-C data in human cell lines and tissues, ChromaFold can accurately predict both the 3D contact map and peak-level interactions across diverse human and mouse test cell types. In benchmarking against a recent deep learning method that uses bulk ATAC-seq, DNA sequence, and CTCF ChIP-seq to make cell-type-specific predictions, ChromaFold yields superior prediction performance when including CTCF ChIP-seq data as an input and comparable performance without. Finally, fine-tuning ChromaFold on paired scATAC-seq and Hi-C in a complex tissue enables deconvolution of chromatin interactions across cell subpopulations. ChromaFold thus achieves state-of-the-art prediction of 3D contact maps and regulatory interactions using scATAC-seq alone as input data, enabling accurate inference of cell-type-specific interactions in settings where 3C-based assays are infeasible.
Collapse
Affiliation(s)
- Vianne R. Gao
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Rui Yang
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Arnav Das
- University of Washington, Seattle, WA, USA
| | - Renhe Luo
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Hanzhi Luo
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Dylan R. McNally
- Caryl and Israel Englander Institute for Precision Medicine, Institute for Computational Biomedicine, Weill Cornell Medicine, Cornell University, New York, NY, USA
| | - Ioannis Karagiannidis
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Martin A. Rivas
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Zhong-Min Wang
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Darko Barisic
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Alireza Karbalayghareh
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wilfred Wong
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA
| | - Yingqian A. Zhan
- Center for Epigenetics Research, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Christopher R. Chin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | | | | | - Effie Apostolou
- Sanford I Weill department of Medicine, Sandra and Edward Meyer Cancer center, Weill Cornell Medicine, New York, NY, USA
| | - Michael G. Kharas
- Molecular Pharmacology Program, Experimental Therapeutics Center and Center for Stem Cell Biology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Wendy Béguelin
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Aaron D. Viny
- Departments of Medicine, Division of Hematology & Oncology, and of Genetics & Development, Columbia Stem Cell Initiative, Herbert Irving Comprehensive Cancer Center, Columbia University Irving Medical Center, New York, NY, USA
| | - Danwei Huangfu
- Developmental Biology Program, Sloan Kettering Institute, New York, NY, USA
| | - Alexander Y. Rudensky
- Howard Hughes Medical Institute and Immunology Program, Sloan Kettering Institute and Ludwig Center at Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Ari M. Melnick
- Division of Hematology and Medical Oncology, Department of Medicine, Weill Cornell Medical College, New York, NY, USA
| | - Christina S. Leslie
- Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
36
|
Murphy D, Salataj E, Di Giammartino DC, Rodriguez-Hernaez J, Kloetgen A, Garg V, Char E, Uyehara CM, Ee LS, Lee U, Stadtfeld M, Hadjantonakis AK, Tsirigos A, Polyzos A, Apostolou E. Systematic mapping and modeling of 3D enhancer-promoter interactions in early mouse embryonic lineages reveal regulatory principles that determine the levels and cell-type specificity of gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.19.549714. [PMID: 37577543 PMCID: PMC10422694 DOI: 10.1101/2023.07.19.549714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Mammalian embryogenesis commences with two pivotal and binary cell fate decisions that give rise to three essential lineages, the trophectoderm (TE), the epiblast (EPI) and the primitive endoderm (PrE). Although key signaling pathways and transcription factors that control these early embryonic decisions have been identified, the non-coding regulatory elements via which transcriptional regulators enact these fates remain understudied. To address this gap, we have characterized, at a genome-wide scale, enhancer activity and 3D connectivity in embryo-derived stem cell lines that represent each of the early developmental fates. We observed extensive enhancer remodeling and fine-scale 3D chromatin rewiring among the three lineages, which strongly associate with transcriptional changes, although there are distinct groups of genes that are irresponsive to topological changes. In each lineage, a high degree of connectivity or "hubness" positively correlates with levels of gene expression and enriches for cell-type specific and essential genes. Genes within 3D hubs also show a significantly stronger probability of coregulation across lineages, compared to genes in linear proximity or within the same contact domains. By incorporating 3D chromatin features, we build a novel predictive model for transcriptional regulation (3D-HiChAT), which outperformed models that use only 1D promoter or proximal variables in predicting levels and cell-type specificity of gene expression. Using 3D-HiChAT, we performed genome-wide in silico perturbations to nominate candidate functional enhancers and hubs in each cell lineage, and with CRISPRi experiments we validated several novel enhancers that control expression of one or more genes in their respective lineages. Our study comprehensively identifies 3D regulatory hubs associated with the earliest mammalian lineages and describes their relationship to gene expression and cell identity, providing a framework to understand lineage-specific transcriptional behaviors.
Collapse
Affiliation(s)
- Dylan Murphy
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Eralda Salataj
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Dafne Campigli Di Giammartino
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
- 3D Chromatin Conformation and RNA genomics laboratory, Instituto Italiano di Tecnologia (IIT), Center for Human Technologies (CHT), Genova, Italy (current affiliation)
| | - Javier Rodriguez-Hernaez
- Department of Pathology, New York University Langone Health, New York, NY 10016, USA
- Applied Bioinformatics Laboratory, New York University Langone Health, New York, NY 10016, USA
| | - Andreas Kloetgen
- Department of Pathology, New York University Langone Health, New York, NY 10016, USA
- Applied Bioinformatics Laboratory, New York University Langone Health, New York, NY 10016, USA
| | - Vidur Garg
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Biochemistry Cell and Molecular Biology Program, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY 10065, USA
| | - Erin Char
- Tri-Institutional Training Program in Computational Biology and Medicine, Weill Cornell Medical College, New York, 10065, New York, USA
| | - Christopher M. Uyehara
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Ly-sha Ee
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - UkJin Lee
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Matthias Stadtfeld
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Anna-Katerina Hadjantonakis
- Developmental Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Biochemistry Cell and Molecular Biology Program, Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY 10065, USA
| | - Aristotelis Tsirigos
- Department of Pathology, New York University Langone Health, New York, NY 10016, USA
- Applied Bioinformatics Laboratory, New York University Langone Health, New York, NY 10016, USA
| | - Alexander Polyzos
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| | - Effie Apostolou
- Sanford I. Weill Department of Medicine, Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York, NY, United States
| |
Collapse
|
37
|
Joo J, Cho S, Hong S, Min S, Kim K, Kumar R, Choi JM, Shin Y, Jung I. Probabilistic establishment of speckle-associated inter-chromosomal interactions. Nucleic Acids Res 2023; 51:5377-5395. [PMID: 37013988 PMCID: PMC10287923 DOI: 10.1093/nar/gkad211] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2022] [Revised: 03/08/2023] [Accepted: 03/25/2023] [Indexed: 04/05/2023] Open
Abstract
Inter-chromosomal interactions play a crucial role in genome organization, yet the organizational principles remain elusive. Here, we introduce a novel computational method to systematically characterize inter-chromosomal interactions using in situ Hi-C results from various cell types. Our method successfully identifies two apparently hub-like inter-chromosomal contacts associated with nuclear speckles and nucleoli, respectively. Interestingly, we discover that nuclear speckle-associated inter-chromosomal interactions are highly cell-type invariant with a marked enrichment of cell-type common super-enhancers (CSEs). Validation using DNA Oligopaint fluorescence in situ hybridization (FISH) shows a strong but probabilistic interaction behavior between nuclear speckles and CSE-harboring genomic regions. Strikingly, we find that the likelihood of speckle-CSE associations can accurately predict two experimentally measured inter-chromosomal contacts from Hi-C and Oligopaint DNA FISH. Our probabilistic establishment model well describes the hub-like structure observed at the population level as a cumulative effect of summing individual stochastic chromatin-speckle interactions. Lastly, we observe that CSEs are highly co-occupied by MAZ binding and MAZ depletion leads to significant disorganization of speckle-associated inter-chromosomal contacts. Taken together, our results propose a simple organizational principle of inter-chromosomal interactions mediated by MAZ-occupied CSEs.
Collapse
Affiliation(s)
- Jaegeon Joo
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea
| | - Sunghyun Cho
- Department of Mechanical Engineering, Seoul National University, Seoul 08826, Republic of Korea
| | - Sukbum Hong
- Department of Mechanical Engineering, Seoul National University, Seoul 08826, Republic of Korea
| | - Sunwoo Min
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea
| | - Kyukwang Kim
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea
| | - Rajeev Kumar
- Department of Chemistry and Chemistry Institute for Functional Materials, Pusan National University, Busan 46241, Republic of Korea
| | - Jeong-Mo Choi
- Department of Chemistry and Chemistry Institute for Functional Materials, Pusan National University, Busan 46241, Republic of Korea
| | - Yongdae Shin
- Department of Mechanical Engineering, Seoul National University, Seoul 08826, Republic of Korea
- Interdisciplinary Program in Bioengineering, Seoul National University, Seoul 08826, Republic of Korea
| | - Inkyung Jung
- Department of Biological Sciences, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea
| |
Collapse
|
38
|
Kalluchi A, Harris HL, Reznicek TE, Rowley MJ. Considerations and caveats for analyzing chromatin compartments. Front Mol Biosci 2023; 10:1168562. [PMID: 37091873 PMCID: PMC10113542 DOI: 10.3389/fmolb.2023.1168562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 03/27/2023] [Indexed: 04/08/2023] Open
Abstract
Genomes are organized into nuclear compartments, separating active from inactive chromatin. Chromatin compartments are readily visible in a large number of species by experiments that map chromatin conformation genome-wide. When analyzing these maps, a common step is the identification of genomic intervals that interact within A (active) and B (inactive) compartments. It has also become increasingly common to identify and analyze subcompartments. We review different strategies to identify A/B and subcompartment intervals, including a discussion of various machine-learning approaches to predict these features. We then discuss the strengths and limitations of current strategies and examine how these aspects of analysis may have impacted our understanding of chromatin compartments.
Collapse
Affiliation(s)
| | | | | | - M. Jordan Rowley
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, United States
| |
Collapse
|
39
|
Gunsalus LM, McArthur E, Gjoni K, Kuang S, Pittman M, Capra JA, Pollard KS. Comparing chromatin contact maps at scale: methods and insights. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.04.535480. [PMID: 37066196 PMCID: PMC10104037 DOI: 10.1101/2023.04.04.535480] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/18/2023]
Abstract
Comparing chromatin contact maps is an essential step in quantifying how three-dimensional (3D) genome organization shapes development, evolution, and disease. However, no gold standard exists for comparing contact maps, and even simple methods often disagree. In this study, we propose novel comparison methods and evaluate them alongside existing approaches using genome-wide Hi-C data and 22,500 in silico predicted contact maps. We also quantify the robustness of methods to common sources of biological and technical variation, such as boundary size and noise. We find that simple difference-based methods such as mean squared error are suitable for initial screening, but biologically informed methods are necessary to identify why maps diverge and propose specific functional hypotheses. We provide a reference guide, codebase, and benchmark for rapidly comparing chromatin contact maps at scale to enable biological insights into the 3D organization of the genome.
Collapse
Affiliation(s)
- Laura M. Gunsalus
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
| | - Evonne McArthur
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN
| | - Ketrin Gjoni
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
| | - Shuzhen Kuang
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
| | - Maureen Pittman
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
| | - John A. Capra
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
| | - Katherine S. Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA
- Department of Epidemiology & Biostatistics, University of California, San Francisco, CA
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA
- Chan Zuckerberg Biohub, San Francisco, CA, USA
| |
Collapse
|