1
|
Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and deep learning methods for predicting 3D genome organization. ARXIV 2024:arXiv:2403.03231v1. [PMID: 38495565 PMCID: PMC10942493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]
Abstract
Three-Dimensional (3D) chromatin interactions, such as enhancer-promoter interactions (EPIs), loops, Topologically Associating Domains (TADs), and A/B compartments play critical roles in a wide range of cellular processes by regulating gene expression. Recent development of chromatin conformation capture technologies has enabled genome-wide profiling of various 3D structures, even with single cells. However, current catalogs of 3D structures remain incomplete and unreliable due to differences in technology, tools, and low data resolution. Machine learning methods have emerged as an alternative to obtain missing 3D interactions and/or improve resolution. Such methods frequently use genome annotation data (ChIP-seq, DNAse-seq, etc.), DNA sequencing information (k-mers, Transcription Factor Binding Site (TFBS) motifs), and other genomic properties to learn the associations between genomic features and chromatin interactions. In this review, we discuss computational tools for predicting three types of 3D interactions (EPIs, chromatin interactions, TAD boundaries) and analyze their pros and cons. We also point out obstacles of computational prediction of 3D interactions and suggest future research directions.
Collapse
Affiliation(s)
- Brydon P. G. Wall
- Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - My Nguyen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
| | - J. Chuck Harrell
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
- Massey Comprehensive Cancer Center, Virginia Commonwealth University, Richmond, VA 23298, USA
- Center for Pharmaceutical Engineering, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Mikhail G. Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, 23298, USA
- Department of Pathology, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
2
|
Belokopytova P, Fishman V. Predicting Genome Architecture: Challenges and Solutions. Front Genet 2021; 11:617202. [PMID: 33552135 PMCID: PMC7862721 DOI: 10.3389/fgene.2020.617202] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 12/15/2020] [Indexed: 12/22/2022] Open
Abstract
Genome architecture plays a pivotal role in gene regulation. The use of high-throughput methods for chromatin profiling and 3-D interaction mapping provide rich experimental data sets describing genome organization and dynamics. These data challenge development of new models and algorithms connecting genome architecture with epigenetic marks. In this review, we describe how chromatin architecture could be reconstructed from epigenetic data using biophysical or statistical approaches. We discuss the applicability and limitations of these methods for understanding the mechanisms of chromatin organization. We also highlight the emergence of new predictive approaches for scoring effects of structural variations in human cells.
Collapse
Affiliation(s)
- Polina Belokopytova
- Natural Sciences Department, Novosibirsk State University, Novosibirsk, Russia
- Institute of Cytology and Genetics Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk, Russia
| | - Veniamin Fishman
- Natural Sciences Department, Novosibirsk State University, Novosibirsk, Russia
- Institute of Cytology and Genetics Siberian Branch of Russian Academy of Sciences (SB RAS), Novosibirsk, Russia
| |
Collapse
|
3
|
Ma A, McDermaid A, Xu J, Chang Y, Ma Q. Integrative Methods and Practical Challenges for Single-Cell Multi-omics. Trends Biotechnol 2020; 38:1007-1022. [PMID: 32818441 PMCID: PMC7442857 DOI: 10.1016/j.tibtech.2020.02.013] [Citation(s) in RCA: 113] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 02/27/2020] [Accepted: 02/28/2020] [Indexed: 12/19/2022]
Abstract
Fast-developing single-cell multimodal omics (scMulti-omics) technologies enable the measurement of multiple modalities, such as DNA methylation, chromatin accessibility, RNA expression, protein abundance, gene perturbation, and spatial information, from the same cell. scMulti-omics can comprehensively explore and identify cell characteristics, while also presenting challenges to the development of computational methods and tools for integrative analyses. Here, we review these integrative methods and summarize the existing tools for studying a variety of scMulti-omics data. The various functionalities and practical challenges in using the available tools in the public domain are explored through several case studies. Finally, we identify remaining challenges and future trends in scMulti-omics modeling and analyses.
Collapse
Affiliation(s)
- Anjun Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA
| | - Adam McDermaid
- Imagenetics, Sanford Health, Sioux Falls, SD 57104, USA; Department of Internal Medicine, University of South Dakota, Virmillion, SD 57069, USA
| | - Jennifer Xu
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Yuzhou Chang
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA
| | - Qin Ma
- Department of Biomedical Informatics, The Ohio State University, Columbus, OH 43235, USA.
| |
Collapse
|
4
|
Xue H, Wei Z, Chen K, Tang Y, Wu X, Su J, Meng J. Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods. Evol Bioinform Online 2020; 16:1176934320915707. [PMID: 32733123 PMCID: PMC7372605 DOI: 10.1177/1176934320915707] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2019] [Accepted: 02/28/2020] [Indexed: 12/18/2022] Open
Abstract
RNA N6-methyladenosine (m6A) has emerged
as an important epigenetic modification for its role in regulating the
stability, structure, processing, and translation of RNA. Instability of
m6A homeostasis may result in flaws in stem cell regulation,
decrease in fertility, and risk of cancer. To this day, experimental detection
and quantification of RNA m6A modification are still time-consuming
and labor-intensive. There is only a limited number of epitranscriptome samples
in existing databases, and a matched RNA methylation profile is not often
available for a biological problem of interests. As gene expression data are
usually readily available for most biological problems, it could be appealing if
we can estimate the RNA methylation status from gene expression data using
in silico methods. In this study, we explored the
possibility of computational prediction of RNA methylation status from gene
expression data using classification and regression methods based on mouse RNA
methylation data collected from 73 experimental conditions. Elastic
Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and
Random Forests (RF) were constructed for classification. Both SVM and RF
achieved the best performance with the mean area under the curve (AUC) = 0.84
across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was
conducted on those sites selected by ENLR as predictors to access the biological
significance of the model. Three functional annotation terms were found
statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and
endoplasmic reticulum. All 3 terms were found to be closely related to
m6A pathway. For regression analysis, Elastic Net was
implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a
mean Spearman correlation coefficient = 0.64. Our exploratory study suggested
that gene expression data could be used to construct predictors for
m6A methylation status with adequate accuracy. Our work showed
for the first time that RNA methylation status may be predicted from the matched
gene expression data. This finding may facilitate RNA modification research in
various biological contexts when a matched RNA methylation profile is not
available, especially in the very early stage of the study.
Collapse
Affiliation(s)
- Hao Xue
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China.,Harvard T.H. Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Zhen Wei
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China.,Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Kunqi Chen
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China.,Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Yujiao Tang
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China.,Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Xiangyu Wu
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China.,Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| | - Jionglong Su
- Department of Mathematical Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Jia Meng
- Department of Biological Sciences, Xi'an Jiaotong-Liverpool University, Suzhou, China.,Institute of Integrative Biology, University of Liverpool, Liverpool, UK
| |
Collapse
|
5
|
Peng LH, Zhou LQ, Chen X, Piao X. A Computational Study of Potential miRNA-Disease Association Inference Based on Ensemble Learning and Kernel Ridge Regression. Front Bioeng Biotechnol 2020; 8:40. [PMID: 32117922 PMCID: PMC7015868 DOI: 10.3389/fbioe.2020.00040] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Accepted: 01/17/2020] [Indexed: 12/11/2022] Open
Abstract
As increasing experimental studies have shown that microRNAs (miRNAs) are closely related to multiple biological processes and the prevention, diagnosis and treatment of human diseases, a growing number of researchers are focusing on the identification of associations between miRNAs and diseases. Identifying such associations purely via experiments is costly and demanding, which prompts researchers to develop computational methods to complement the experiments. In this paper, a novel prediction model named Ensemble of Kernel Ridge Regression based MiRNA-Disease Association prediction (EKRRMDA) was developed. EKRRMDA obtained features of miRNAs and diseases by integrating the disease semantic similarity, the miRNA functional similarity and the Gaussian interaction profile kernel similarity for diseases and miRNAs. Under the computational framework that utilized ensemble learning and feature dimensionality reduction, multiple base classifiers that combined two Kernel Ridge Regression classifiers from the miRNA side and disease side, respectively, were obtained based on random selection of features. Then average strategy for these base classifiers was adopted to obtain final association scores of miRNA-disease pairs. In the global and local leave-one-out cross validation, EKRRMDA attained the AUCs of 0.9314 and 0.8618, respectively. Moreover, the model’s average AUC with standard deviation in 5-fold cross validation was 0.9275 ± 0.0008. In addition, we implemented three different types of case studies on predicting miRNAs associated with five important diseases. As a result, there were 90% (Esophageal Neoplasms), 86% (Kidney Neoplasms), 86% (Lymphoma), 98% (Lung Neoplasms), and 96% (Breast Neoplasms) of the top 50 predicted miRNAs verified to have associations with these diseases.
Collapse
Affiliation(s)
- Li-Hong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Li-Qian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Xue Piao
- School of Medical Informatics, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
6
|
Belokopytova PS, Nuriddinov MA, Mozheiko EA, Fishman D, Fishman V. Quantitative prediction of enhancer-promoter interactions. Genome Res 2019; 30:72-84. [PMID: 31804952 PMCID: PMC6961579 DOI: 10.1101/gr.249367.119] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Accepted: 11/25/2019] [Indexed: 11/24/2022]
Abstract
Recent experimental and computational efforts have provided large data sets describing three-dimensional organization of mouse and human genomes and showed the interconnection between the expression profile, epigenetic state, and spatial interactions of loci. These interconnections were utilized to infer the spatial organization of chromatin, including enhancer–promoter contacts, from one-dimensional epigenetic marks. Here, we show that the predictive power of some of these algorithms is overestimated due to peculiar properties of the biological data. We propose an alternative approach, which provides high-quality predictions of chromatin interactions using information on gene expression and CTCF-binding alone. Using multiple metrics, we confirmed that our algorithm could efficiently predict the three-dimensional architecture of both normal and rearranged genomes.
Collapse
Affiliation(s)
- Polina S Belokopytova
- Institute of Cytology and Genetics SB RAS 630090, Novosibirsk, Russia.,Novosibirsk State University, Novosibirsk, Russia 630090
| | | | | | - Daniil Fishman
- Novosibirsk State University, Novosibirsk, Russia 630090
| | - Veniamin Fishman
- Institute of Cytology and Genetics SB RAS 630090, Novosibirsk, Russia.,Novosibirsk State University, Novosibirsk, Russia 630090
| |
Collapse
|
7
|
Song Y, Xu Q, Wei Z, Zhen D, Su J, Chen K, Meng J. Predict Epitranscriptome Targets and Regulatory Functions of N 6-Methyladenosine (m 6A) Writers and Erasers. Evol Bioinform Online 2019; 15:1176934319871290. [PMID: 31523126 PMCID: PMC6728658 DOI: 10.1177/1176934319871290] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 07/31/2019] [Indexed: 12/13/2022] Open
Abstract
Currently, although many successful bioinformatics efforts have been reported in the epitranscriptomics field for N 6-methyladenosine (m6A) site identification, none is focused on the substrate specificity of different m6A-related enzymes, ie, the methyltransferases (writers) and demethylases (erasers). In this work, to untangle the target specificity and the regulatory functions of different RNA m6A writers (METTL3-METT14 and METTL16) and erasers (ALKBH5 and FTO), we extracted 49 genomic features along with the conventional sequence features and used the machine learning approach of random forest to predict their epitranscriptome substrates. Our method achieved reasonable performance on both the writer target prediction (as high as 0.918) and the eraser target prediction (as high as 0.888) in a 5-fold cross-validation, and results of the gene ontology analysis of their preferential targets further revealed the functional relevance of different RNA methylation writers and erasers.
Collapse
Affiliation(s)
- Yiyou Song
- Department of Biological Sciences, Xi’an
Jiaotong-Liverpool University, Suzhou, China
| | - Qingru Xu
- Department of Biological Sciences, Xi’an
Jiaotong-Liverpool University, Suzhou, China
| | - Zhen Wei
- Department of Biological Sciences, Xi’an
Jiaotong-Liverpool University, Suzhou, China
- Department of Mathematical Sciences,
Xi’an Jiaotong-Liverpool University, Suzhou, China
| | - Di Zhen
- Department of Biological Sciences, Xi’an
Jiaotong-Liverpool University, Suzhou, China
| | - Jionglong Su
- Department of Mathematical Sciences,
Xi’an Jiaotong-Liverpool University, Suzhou, China
- Research Center for Precision Medicine,
Xi’an Jiaotong-Liverpool University, Suzhou, China
| | - Kunqi Chen
- Department of Biological Sciences, Xi’an
Jiaotong-Liverpool University, Suzhou, China
- Institute of Ageing and Chronic Disease,
University of Liverpool, Liverpool, UK
| | - Jia Meng
- Research Center for Precision Medicine,
Xi’an Jiaotong-Liverpool University, Suzhou, China
- Institute of Integrative Biology,
University of Liverpool, Liverpool, UK
| |
Collapse
|
8
|
Ensemble of decision tree reveals potential miRNA-disease associations. PLoS Comput Biol 2019; 15:e1007209. [PMID: 31329575 PMCID: PMC6675125 DOI: 10.1371/journal.pcbi.1007209] [Citation(s) in RCA: 140] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Revised: 08/01/2019] [Accepted: 06/24/2019] [Indexed: 12/14/2022] Open
Abstract
In recent years, increasing associations between microRNAs (miRNAs) and human diseases have been identified. Based on accumulating biological data, many computational models for potential miRNA-disease associations inference have been developed, which saves time and expenditure on experimental studies, making great contributions to researching molecular mechanism of human diseases and developing new drugs for disease treatment. In this paper, we proposed a novel computational method named Ensemble of Decision Tree based MiRNA-Disease Association prediction (EDTMDA), which innovatively built a computational framework integrating ensemble learning and dimensionality reduction. For each miRNA-disease pair, the feature vector was extracted by calculating the statistical measures, graph theoretical measures, and matrix factorization results for the miRNA and disease, respectively. Then multiple base learnings were built to yield many decision trees (DTs) based on random selection of negative samples and miRNA/disease features. Particularly, Principal Components Analysis was applied to each base learning to reduce feature dimensionality and hence remove the noise or redundancy. Average strategy was adopted for these DTs to get final association scores between miRNAs and diseases. In model performance evaluation, EDTMDA showed AUC of 0.9309 in global leave-one-out cross validation (LOOCV) and AUC of 0.8524 in local LOOCV. Additionally, AUC of 0.9192+/-0.0009 in 5-fold cross validation proved the model's reliability and stability. Furthermore, three types of case studies for four human diseases were implemented. As a result, 94% (Esophageal Neoplasms), 86% (Kidney Neoplasms), 96% (Breast Neoplasms) and 88% (Carcinoma Hepatocellular) of top 50 predicted miRNAs were confirmed by experimental evidences in literature.
Collapse
|
9
|
Zhao Y, Chen X, Yin J. Adaptive boosting-based computational model for predicting potential miRNA-disease associations. Bioinformatics 2019; 35:4730-4738. [DOI: 10.1093/bioinformatics/btz297] [Citation(s) in RCA: 87] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Revised: 03/19/2019] [Accepted: 04/18/2019] [Indexed: 12/24/2022] Open
Abstract
AbstractMotivationRecent studies have shown that microRNAs (miRNAs) play a critical part in several biological processes and dysregulation of miRNAs is related with numerous complex human diseases. Thus, in-depth research of miRNAs and their association with human diseases can help us to solve many problems.ResultsDue to the high cost of traditional experimental methods, revealing disease-related miRNAs through computational models is a more economical and efficient way. Considering the disadvantages of previous models, in this paper, we developed adaptive boosting for miRNA-disease association prediction (ABMDA) to predict potential associations between diseases and miRNAs. We balanced the positive and negative samples by performing random sampling based on k-means clustering on negative samples, whose process was quick and easy, and our model had higher efficiency and scalability for large datasets than previous methods. As a boosting technology, ABMDA was able to improve the accuracy of given learning algorithm by integrating weak classifiers that could score samples to form a strong classifier based on corresponding weights. Here, we used decision tree as our weak classifier. As a result, the area under the curve (AUC) of global and local leave-one-out cross validation reached 0.9170 and 0.8220, respectively. What is more, the mean and the standard deviation of AUCs achieved 0.9023 and 0.0016, respectively in 5-fold cross validation. Besides, in the case studies of three important human cancers, 49, 50 and 50 out of the top 50 predicted miRNAs for colon neoplasms, hepatocellular carcinoma and breast neoplasms were confirmed by the databases and experimental literatures.Availability and implementationThe code and dataset of ABMDA are freely available at https://github.com/githubcode007/ABMDA.Supplementary informationSupplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yan Zhao
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jun Yin
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| |
Collapse
|