1
|
Zheng S, Thakkar N, Harris HL, Liu S, Zhang M, Gerstein M, Aiden EL, Rowley MJ, Noble WS, Gürsoy G, Singh R. Predicting A/B compartments from histone modifications using deep learning. iScience 2024; 27:109570. [PMID: 38646172 PMCID: PMC11031843 DOI: 10.1016/j.isci.2024.109570] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 02/28/2024] [Accepted: 03/22/2024] [Indexed: 04/23/2024] Open
Abstract
The three-dimensional organization of genomes plays a crucial role in essential biological processes. The segregation of chromatin into A and B compartments highlights regions of activity and inactivity, providing a window into the genomic activities specific to each cell type. Yet, the steep costs associated with acquiring Hi-C data, necessary for studying this compartmentalization across various cell types, pose a significant barrier in studying cell type specific genome organization. To address this, we present a prediction tool called compartment prediction using recurrent neural networks (CoRNN), which predicts compartmentalization of 3D genome using histone modification enrichment. CoRNN demonstrates robust cross-cell-type prediction of A/B compartments with an average AuROC of 90.9%. Cell-type-specific predictions align well with known functional elements, with H3K27ac and H3K36me3 identified as highly predictive histone marks. We further investigate our mispredictions and found that they are located in regions with ambiguous compartmental status. Furthermore, our model's generalizability is validated by predicting compartments in independent tissue samples, which underscores its broad applicability.
Collapse
Affiliation(s)
- Suchen Zheng
- Department of Computer Science, Brown University, Providence, RI, USA
| | - Nitya Thakkar
- Department of Computer Science, Brown University, Providence, RI, USA
| | - Hannah L. Harris
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, USA
| | - Susanna Liu
- Data Science and Statistics, Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT, USA
| | - Megan Zhang
- Data Science and Statistics, Molecular, Cellular, and Developmental Biology, Yale University, New Haven, CT, USA
| | - Mark Gerstein
- Computational Biology and Bioinformatics, Molecular Biophysics & Biochemistry, Data Science and Statistics, Computer Science, Yale University, New Haven, CT, USA
| | - Erez Lieberman Aiden
- Department of Genetics, Baylor College of Medicine, Department of Computer Science, Computational and Applied Mathematics, Rice University, Houston, TX, USA
| | - M. Jordan Rowley
- Department of Genetics, Cell Biology and Anatomy, University of Nebraska Medical Center, Omaha, NE, USA
| | - William Stafford Noble
- Department of Genome Sciences, Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Gamze Gürsoy
- Department of Biomedical Informatics, Columbia University, New York Genome Center, New York, NY, USA
| | - Ritambhara Singh
- Department of Computer Science, Center for Computational Molecular Biology, Brown University, Providence, RI, USA
| |
Collapse
|
2
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and deep learning methods for predicting 3D genome organization. ARXIV 2024:arXiv:2403.03231v1. [PMID: 38495565 PMCID: PMC10942493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers, Transcription Factor Binding Site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, TAD boundaries) and analyze their pros and cons. We also point out obstacles of computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P. G. Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - J. Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
3
|
Li Y, Ju F, Chen Z, Qu Y, Xia H, He L, Wu L, Zhu J, Shao B, Deng P. CREaTor: zero-shot cis-regulatory pattern modeling with attention mechanisms. Genome Biol 2023; 24:266. [PMID: 37996959 PMCID: PMC10666311 DOI: 10.1186/s13059-023-03103-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 11/03/2023] [Indexed: 11/25/2023] Open
Abstract
Linking cis-regulatory sequences to target genes has been a long-standing challenge. In this study, we introduce CREaTor, an attention-based deep neural network designed to model cis-regulatory patterns for genomic elements up to 2 Mb from target genes. Coupled with a training strategy that predicts gene expression from flanking candidate cis-regulatory elements (cCREs), CREaTor can model cell type-specific cis-regulatory patterns in new cell types without prior knowledge of cCRE-gene interactions or additional training. The zero-shot modeling capability, combined with the use of only RNA-seq and ChIP-seq data, allows for the ready generalization of CREaTor to a broad range of cell types.
Collapse
Affiliation(s)
- Yongge Li
- Microsoft Research AI4Science, Beijing, China
- School of Medicine, Tsinghua University, Beijing, China
| | - Fusong Ju
- Microsoft Research AI4Science, Beijing, China
| | - Zhiyuan Chen
- Microsoft Research AI4Science, Beijing, China
- School of Computing, Australian National University, Canberra, Australia
| | - Yiming Qu
- Microsoft Research AI4Science, Beijing, China
- School of Life Sciences, Tsinghua University, Beijing, China
| | | | - Liang He
- Microsoft Research AI4Science, Beijing, China
| | - Lijun Wu
- Microsoft Research AI4Science, Beijing, China
| | - Jianwei Zhu
- Microsoft Research AI4Science, Beijing, China
| | - Bin Shao
- Microsoft Research AI4Science, Beijing, China
| | - Pan Deng
- Microsoft Research AI4Science, Beijing, China.
| |
Collapse
|
4
|
Xu J, Zhang P, Sun W, Zhang J, Zhang W, Hou C, Li L. EpiMCI: Predicting Multi-Way Chromatin Interactions from Epigenomic Signals. BIOLOGY 2023; 12:1203. [PMID: 37759602 PMCID: PMC10525350 DOI: 10.3390/biology12091203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 08/31/2023] [Accepted: 08/31/2023] [Indexed: 09/29/2023]
Abstract
The recently emerging high-throughput Pore-C (HiPore-C) can identify whole-genome high-order chromatin multi-way interactions with an ultra-high output, contributing to deciphering three-dimensional (3D) genome organization. However, it also brings new challenges to relevant data analysis. To alleviate this problem, we proposed the EpiMCI, a model for multi-way chromatin interaction prediction based on a hypergraph neural network with epigenomic signals as the input. The EpiMCI integrated separate hyperedge representations with coupling hyperedge information and obtained AUCs of 0.981 and 0.984 in the GM12878 and K562 datasets, respectively, which outperformed the current available method. Moreover, the EpiMCI can be applied to denoise the HiPore-C data and improve the data quality efficiently. Furthermore, the vertex embeddings extracted from the EpiMCI reflected the global chromatin architecture accurately. The principal component analysis suggested that it was well aligned with the activities of genomic regions at the chromatin compartment level. Taken together, the EpiMCI can accurately predict multi-way chromatin interactions and can be applied to studies relying on chromatin architecture.
Collapse
Affiliation(s)
- Jinsheng Xu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Ping Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Weicheng Sun
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Junying Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wenxue Zhang
- Food Science Program, Division of Food, Nutrition and Exercise Sciences, University of Missouri, 1406 E Rollins Street, Columbia, MO 65211, USA
| | - Chunhui Hou
- China State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Li Li
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
- Hubei Hongshan Laboratory, Huazhong Agricultural University, Wuhan 430074, China
| |
Collapse
|
5
|
Jiang M, Zhang R, Xia Y, Jia G, Yin Y, Wang P, Wu J, Ge R. i2APP: A Two-Step Machine Learning Framework For Antiparasitic Peptides Identification. Front Genet 2022; 13:884589. [PMID: 35571057 PMCID: PMC9091563 DOI: 10.3389/fgene.2022.884589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2022] [Accepted: 04/11/2022] [Indexed: 11/18/2022] Open
Abstract
Parasites can cause enormous damage to their hosts. Studies have shown that antiparasitic peptides can inhibit the growth and development of parasites and even kill them. Because traditional biological methods to determine the activity of antiparasitic peptides are time-consuming and costly, a method for large-scale prediction of antiparasitic peptides is urgently needed. We propose a computational approach called i2APP that can efficiently identify APPs using a two-step machine learning (ML) framework. First, in order to solve the imbalance of positive and negative samples in the training set, a random under sampling method is used to generate a balanced training data set. Then, the physical and chemical features and terminus-based features are extracted, and the first classification is performed by Light Gradient Boosting Machine (LGBM) and Support Vector Machine (SVM) to obtain 264-dimensional higher level features. These features are selected by Maximal Information Coefficient (MIC) and the features with the big MIC values are retained. Finally, the SVM algorithm is used for the second classification in the optimized feature space. Thus the prediction model i2APP is fully constructed. On independent datasets, the accuracy and AUC of i2APP are 0.913 and 0.935, respectively, which are better than the state-of-arts methods. The key idea of the proposed method is that multi-level features are extracted from peptide sequences and the higher-level features can distinguish well the APPs and non-APPs.
Collapse
Affiliation(s)
- Minchao Jiang
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Renfeng Zhang
- Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, China
| | - Yixiao Xia
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Gangyong Jia
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Yuyu Yin
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
| | - Pu Wang
- Computer School, Hubei University of Arts and Science, Xiangyang, China
- *Correspondence: Pu Wang, ; Jian Wu, ; Ruiquan Ge,
| | - Jian Wu
- MyGenostics Inc., Beijing, China
- *Correspondence: Pu Wang, ; Jian Wu, ; Ruiquan Ge,
| | - Ruiquan Ge
- School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou, China
- *Correspondence: Pu Wang, ; Jian Wu, ; Ruiquan Ge,
| |
Collapse
|