Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rajagopal N, Xie W, Li Y, Wagner U, Wang W, Stamatoyannopoulos J, Ernst J, Kellis M, Ren B. RFECS: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Comput Biol 2013;9:e1002968. [PMID: 23526891 PMCID: PMC3597546 DOI: 10.1371/journal.pcbi.1002968] [Citation(s) in RCA: 157] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 01/20/2013] [Indexed: 01/08/2023] Open

For:	Rajagopal N, Xie W, Li Y, Wagner U, Wang W, Stamatoyannopoulos J, Ernst J, Kellis M, Ren B. RFECS: a random-forest based algorithm for enhancer identification from chromatin state. PLoS Comput Biol 2013;9:e1002968. [PMID: 23526891 PMCID: PMC3597546 DOI: 10.1371/journal.pcbi.1002968] [Citation(s) in RCA: 157] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2012] [Accepted: 01/20/2013] [Indexed: 01/08/2023] Open

Number

Cited by Other Article(s)

Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and Deep Learning Methods for Predicting 3D Genome Organization. Methods Mol Biol 2025;2856:357-400. [PMID: 39283464 DOI: 10.1007/978-1-0716-4136-1_22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/25/2024]

Xie W, Yao Z, Yuan Y, Too J, Li F, Wang H, Zhan Y, Wu X, Wang Z, Zhang G. W2V-repeated index: Prediction of enhancers and their strength based on repeated fragments. Genomics 2024;116:110906. [PMID: 39084477 DOI: 10.1016/j.ygeno.2024.110906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 07/10/2024] [Accepted: 07/24/2024] [Indexed: 08/02/2024]

Affiliation(s)

Weiming Xie Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
Zhaomin Yao Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China.
Yizhe Yuan China Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200240, China
Jingwei Too Faculty of Electrical Engineering, Universiti Teknikal Malaysia Melaka, Hang Tuah Jaya, Durian Tunggal, 76100 Melaka, Malaysia
Fei Li College of Computer Science and Technology, Jilin University, Changchun, Jilin 130012, China
Hongyu Wang Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
Ying Zhan Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China
Xiaodan Wu Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China
Zhiguo Wang Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China.
Guoxu Zhang Department of Nuclear Medicine, General Hospital of Northern Theater Command, Shenyang, Liaoning 110016, China; College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning 110167, China.

Collapse

Etoh K, Araki H, Koga T, Hino Y, Kuribayashi K, Hino S, Nakao M. Citrate metabolism controls the senescent microenvironment via the remodeling of pro-inflammatory enhancers. Cell Rep 2024;43:114496. [PMID: 39043191 DOI: 10.1016/j.celrep.2024.114496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/22/2024] [Accepted: 06/27/2024] [Indexed: 07/25/2024] Open

Yang Y, Zhang J. Ascites-derived hsa-miR-181a-5p serves as a prognostic marker for gastric cancer-associated malignant ascites. BMC Genomics 2024;25:628. [PMID: 38914980 PMCID: PMC11194912 DOI: 10.1186/s12864-024-10359-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Accepted: 04/29/2024] [Indexed: 06/26/2024] Open

Abstract

BACKGROUND

Peritoneal carcinomatosis was the main reason leading to gastric cancer (GC)-related death. We aimed to explore the roles of dysregulated microRNAs (miRNAs) and related immune regulation activities in GC-associated malignant ascites.

METHODS

GSE126399 were downloaded from GEO database. Differentially expressed miRNAs in GC ascites samples was firstly screened, and critical miRNAs were further investigated by LASSO (least absolute shrinkage and selection operator) logistic regression and random forest (RF) algorithm. Receiver operating characteristic of critical miRNAs was also constructed. Moreover, functional analysis, immune cell infiltration associated with differentially expressed mRNAs were further analyzed. After selecting key modules by weighted gene co-expression network analysis, mRNAs related with survival performance and transcription factor (TF)-miRNA-mRNA network were constructed.

RESULTS

Hsa-miR-181b-5p was confirmed as critical differentially expressed miRNAs in GC ascites. Then, the tumor samples were divided into high- and low- expression groups divided by mean expression levels of hsa-miR-181b-5p, and subjects with high hsa-miR-181b-5p levels had better survival outcomes. In total, 197 differentially expressed mRNAs associated with hsa-miR-181b-5p levels were obtained, and these mRNAs were mainly enriched in muscle activity and vascular smooth muscle contraction. Hsa-miR-181b-5 was positively related with activated CD4 T cells and negatively related with eosinophil. 17 mRNAs were selected as mRNAs significantly related with prognosis of GC, such as PDK4 and RAMP1. Finally, 75 TF-miRNA-mRNA relationships were obtained, including 15 TFs, hsa-miR-181b-5p, and five mRNAs.

CONCLUSION

Our data suggest that the differentially expressed hsa-miR-181b-5p in ascites samples of GC patients may be a valuable prognostic marker and a potential target for therapeutic intervention, which should be validated in the near future.

Collapse

Hu W, Li Y, Wu Y, Guan L, Li M. A deep learning model for DNA enhancer prediction based on nucleotide position aware feature encoding. iScience 2024;27:110030. [PMID: 38868182 PMCID: PMC11167433 DOI: 10.1016/j.isci.2024.110030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 04/23/2024] [Accepted: 05/16/2024] [Indexed: 06/14/2024] Open

Wall BPG, Nguyen M, Harrell JC, Dozmorov MG. Machine and deep learning methods for predicting 3D genome organization. ARXIV 2024:arXiv:2403.03231v1. [PMID: 38495565 PMCID: PMC10942493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 03/19/2024]

Mehmood F, Arshad S, Shoaib M. ADH-Enhancer: an attention-based deep hybrid framework for enhancer identification and strength prediction. Brief Bioinform 2024;25:bbae030. [PMID: 38385876 PMCID: PMC10885011 DOI: 10.1093/bib/bbae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/30/2023] [Accepted: 01/11/2024] [Indexed: 02/23/2024] Open

Ramakrishnan A, Wangensteen G, Kim S, Nestler EJ, Shen L. DeepRegFinder: deep learning-based regulatory elements finder. BIOINFORMATICS ADVANCES 2024;4:vbae007. [PMID: 38343388 PMCID: PMC10858349 DOI: 10.1093/bioadv/vbae007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 12/06/2023] [Accepted: 01/12/2024] [Indexed: 06/15/2024]

Garza AB, Garcia R, Solis LM, Halfon MS, Girgis HZ. EnhancerTracker: Comparing cell-type-specific enhancer activity of DNA sequence triplets via an ensemble of deep convolutional neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.23.573198. [PMID: 38187673 PMCID: PMC10769370 DOI: 10.1101/2023.12.23.573198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]

Luo X, Li Q, Tang Y, Liu Y, Zou Q, Zheng J, Zhang Y, Xu L. Predicting active enhancers with DNA methylation and histone modification. BMC Bioinformatics 2023;24:414. [PMID: 37919681 PMCID: PMC10621108 DOI: 10.1186/s12859-023-05547-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2023] [Accepted: 10/27/2023] [Indexed: 11/04/2023] Open

Liu Y, Wang Z, Yuan H, Zhu G, Zhang Y. HEAP: a task adaptive-based explainable deep learning framework for enhancer activity prediction. Brief Bioinform 2023;24:bbad286. [PMID: 37539835 DOI: 10.1093/bib/bbad286] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 07/05/2023] [Accepted: 07/21/2023] [Indexed: 08/05/2023] Open

Wang J, Zhang H, Chen N, Zeng T, Ai X, Wu K. PorcineAI-Enhancer: Prediction of Pig Enhancer Sequences Using Convolutional Neural Networks. Animals (Basel) 2023;13:2935. [PMID: 37760334 PMCID: PMC10526013 DOI: 10.3390/ani13182935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/21/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open

Wang W, Wu Q, Li C. iEnhancer-DCSA: identifying enhancers via dual-scale convolution and spatial attention. BMC Genomics 2023;24:393. [PMID: 37442977 DOI: 10.1186/s12864-023-09468-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 06/20/2023] [Indexed: 07/15/2023] Open

Phan LT, Oh C, He T, Manavalan B. A comprehensive revisit of the machine-learning tools developed for the identification of enhancers in the human genome. Proteomics 2023;23:e2200409. [PMID: 37021401 DOI: 10.1002/pmic.202200409] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/18/2023] [Accepted: 03/27/2023] [Indexed: 04/07/2023]

Alakuş TB. A Novel Repetition Frequency-Based DNA Encoding Scheme to Predict Human and Mouse DNA Enhancers with Deep Learning. Biomimetics (Basel) 2023;8:218. [PMID: 37366813 DOI: 10.3390/biomimetics8020218] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 05/18/2023] [Accepted: 05/22/2023] [Indexed: 06/28/2023] Open

Abstract

Recent studies have shown that DNA enhancers have an important role in the regulation of gene expression. They are responsible for different important biological elements and processes such as development, homeostasis, and embryogenesis. However, experimental prediction of these DNA enhancers is time-consuming and costly as it requires laboratory work. Therefore, researchers started to look for alternative ways and started to apply computation-based deep learning algorithms to this field. Yet, the inconsistency and unsuccessful prediction performance of computational-based approaches among various cell lines led to the investigation of these approaches as well. Therefore, in this study, a novel DNA encoding scheme was proposed, and solutions were sought to the problems mentioned and DNA enhancers were predicted with BiLSTM. The study consisted of four different stages for two scenarios. In the first stage, DNA enhancer data were obtained. In the second stage, DNA sequences were converted to numerical representations by both the proposed encoding scheme and various DNA encoding schemes including EIIP, integer number, and atomic number. In the third stage, the BiLSTM model was designed, and the data were classified. In the final stage, the performance of DNA encoding schemes was determined by accuracy, precision, recall, F1-score, CSI, MCC, G-mean, Kappa coefficient, and AUC scores. In the first scenario, it was determined whether the DNA enhancers belonged to humans or mice. As a result of the prediction process, the highest performance was achieved with the proposed DNA encoding scheme, and an accuracy of 92.16% and an AUC score of 0.85 were calculated, respectively. The closest accuracy score to the proposed scheme was obtained with the EIIP DNA encoding scheme and the result was observed as 89.14%. The AUC score of this scheme was measured as 0.87. Among the remaining DNA encoding schemes, the atomic number showed an accuracy score of 86.61%, while this rate decreased to 76.96% with the integer scheme. The AUC values of these schemes were 0.84 and 0.82, respectively. In the second scenario, it was determined whether there was a DNA enhancer and, if so, it was decided to which species this enhancer belonged. In this scenario, the highest accuracy score was obtained with the proposed DNA encoding scheme and the result was 84.59%. Moreover, the AUC score of the proposed scheme was determined as 0.92. EIIP and integer DNA encoding schemes showed accuracy scores of 77.80% and 73.68%, respectively, while their AUC scores were close to 0.90. The most ineffective prediction was performed with the atomic number and the accuracy score of this scheme was calculated as 68.27%. Finally, the AUC score of this scheme was 0.81. At the end of the study, it was observed that the proposed DNA encoding scheme was successful and effective in predicting DNA enhancers.

Collapse

Smith GD, Ching WH, Cornejo-Páramo P, Wong ES. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol 2023;24:116. [PMID: 37173718 PMCID: PMC10176946 DOI: 10.1186/s13059-023-02955-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 04/28/2023] [Indexed: 05/15/2023] Open

Hong W, Zhao Y, Weng YL, Cheng C. Random Forest model reveals the interaction between N6-methyladenosine modifications and RNA-binding proteins. iScience 2023;26:106250. [PMID: 36922995 PMCID: PMC10009289 DOI: 10.1016/j.isci.2023.106250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 12/16/2022] [Accepted: 02/15/2023] [Indexed: 02/22/2023] Open

Wang C, Zou Q, Ju Y, Shi H. Enhancer-FRL: Improved and Robust Identification of Enhancers and Their Activities Using Feature Representation Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:967-975. [PMID: 36063523 DOI: 10.1109/tcbb.2022.3204365] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Li Y, Kong F, Cui H, Wang F, Li C, Ma J. SENIES: DNA Shape Enhanced Two-Layer Deep Learning Predictor for the Identification of Enhancers and Their Strength. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:637-645. [PMID: 35015646 DOI: 10.1109/tcbb.2022.3142019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Jia J, Lei R, Qin L, Wu G, Wei X. iEnhancer-DCSV: Predicting enhancers and their strength based on DenseNet and improved convolutional block attention module. Front Genet 2023;14:1132018. [PMID: 36936423 PMCID: PMC10014624 DOI: 10.3389/fgene.2023.1132018] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 02/13/2023] [Indexed: 03/06/2023] Open

Aladhadh S, Almatroodi SA, Habib S, Alabdulatif A, Khattak SU, Islam M. An Efficient Lightweight Hybrid Model with Attention Mechanism for Enhancer Sequence Recognition. Biomolecules 2022;13:biom13010070. [PMID: 36671456 PMCID: PMC9855522 DOI: 10.3390/biom13010070] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 12/22/2022] [Accepted: 12/26/2022] [Indexed: 12/31/2022] Open

Abstract

Enhancers are sequences with short motifs that exhibit high positional variability and free scattering properties. Identification of these noncoding DNA fragments and their strength are extremely important because they play a key role in controlling gene regulation on a cellular basis. The identification of enhancers is more complex than that of other factors in the genome because they are freely scattered, and their location varies widely. In recent years, bioinformatics tools have enabled significant improvement in identifying this biological difficulty. Cell line-specific screening is not possible using these existing computational methods based solely on DNA sequences. DNA segment chromatin accessibility may provide useful information about its potential function in regulation, thereby identifying regulatory elements based on its chromatin accessibility. In chromatin, the entanglement structure allows positions far apart in the sequence to encounter each other, regardless of their proximity to the gene to be acted upon. Thus, identifying enhancers and assessing their strength is difficult and time-consuming. The goal of our work was to overcome these limitations by presenting a convolutional neural network (CNN) with attention-gated recurrent units (AttGRU) based on Deep Learning. It used a CNN and one-hot coding to build models, primarily to identify enhancers and secondarily to classify their strength. To test the performance of the proposed model, parallels were drawn between enhancer-CNNAttGRU and existing state-of-the-art methods to enable comparisons. The proposed model performed the best for predicting stage one and stage two enhancer sequences, as well as their strengths, in a cross-species analysis, achieving best accuracy values of 87.39% and 84.46%, respectively. Overall, the results showed that the proposed model provided comparable results to state-of-the-art models, highlighting its usefulness.

Collapse

Genome-wide identification and characterization of DNA enhancers with a stacked multivariate fusion framework. PLoS Comput Biol 2022;18:e1010779. [PMID: 36520922 PMCID: PMC9836277 DOI: 10.1371/journal.pcbi.1010779] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 01/12/2023] [Accepted: 11/29/2022] [Indexed: 12/23/2022] Open

Abstract

Enhancers are short non-coding DNA sequences outside of the target promoter regions that can be bound by specific proteins to increase a gene's transcriptional activity, which has a crucial role in the spatiotemporal and quantitative regulation of gene expression. However, enhancers do not have a specific sequence motifs or structures, and their scattered distribution in the genome makes the identification of enhancers from human cell lines particularly challenging. Here we present a novel, stacked multivariate fusion framework called SMFM, which enables a comprehensive identification and analysis of enhancers from regulatory DNA sequences as well as their interpretation. Specifically, to characterize the hierarchical relationships of enhancer sequences, multi-source biological information and dynamic semantic information are fused to represent regulatory DNA enhancer sequences. Then, we implement a deep learning-based sequence network to learn the feature representation of the enhancer sequences comprehensively and to extract the implicit relationships in the dynamic semantic information. Ultimately, an ensemble machine learning classifier is trained based on the refined multi-source features and dynamic implicit relations obtained from the deep learning-based sequence network. Benchmarking experiments demonstrated that SMFM significantly outperforms other existing methods using several evaluation metrics. In addition, an independent test set was used to validate the generalization performance of SMFM by comparing it to other state-of-the-art enhancer identification methods. Moreover, we performed motif analysis based on the contribution scores of different bases of enhancer sequences to the final identification results. Besides, we conducted interpretability analysis of the identified enhancer sequences based on attention weights of EnhancerBERT, a fine-tuned BERT model that provides new insights into exploring the gene semantic information likely to underlie the discovered enhancers in an interpretable manner. Finally, in a human placenta study with 4,562 active distal gene regulatory enhancers, SMFM successfully exposed tissue-related placental development and the differential mechanism, demonstrating the generalizability and stability of our proposed framework.

Collapse

iEnhancer-MRBF: Identifying enhancers and their strength with a multiple Laplacian-regularized radial basis function network. Methods 2022;208:1-8. [DOI: 10.1016/j.ymeth.2022.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 09/26/2022] [Accepted: 10/03/2022] [Indexed: 11/07/2022] Open

Ni P, Wilson D, Su Z. A map of cis-regulatory modules and constituent transcription factor binding sites in 80% of the mouse genome. BMC Genomics 2022;23:714. [PMID: 36261804 PMCID: PMC9583556 DOI: 10.1186/s12864-022-08933-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2022] [Accepted: 10/11/2022] [Indexed: 11/10/2022] Open

Ni P, Moe J, Su Z. Accurate prediction of functional states of cis-regulatory modules reveals common epigenetic rules in humans and mice. BMC Biol 2022;20:221. [PMID: 36199141 PMCID: PMC9535988 DOI: 10.1186/s12915-022-01426-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 09/29/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Predicting cis-regulatory modules (CRMs) in a genome and their functional states in various cell/tissue types of the organism are two related challenging computational tasks. Most current methods attempt to simultaneously achieve both using data of multiple epigenetic marks in a cell/tissue type. Though conceptually attractive, they suffer high false discovery rates and limited applications. To fill the gaps, we proposed a two-step strategy to first predict a map of CRMs in the genome, and then predict functional states of all the CRMs in various cell/tissue types of the organism. We have recently developed an algorithm for the first step that was able to more accurately and completely predict CRMs in a genome than existing methods by integrating numerous transcription factor ChIP-seq datasets in the organism. Here, we presented machine-learning methods for the second step.

RESULTS

We showed that functional states in a cell/tissue type of all the CRMs in the genome could be accurately predicted using data of only 1~4 epigenetic marks by a variety of machine-learning classifiers. Our predictions are substantially more accurate than the best achieved so far. Interestingly, a model trained on a cell/tissue type in humans can accurately predict functional states of CRMs in different cell/tissue types of humans as well as of mice, and vice versa. Therefore, epigenetic code that defines functional states of CRMs in various cell/tissue types is universal at least in humans and mice. Moreover, we found that from tens to hundreds of thousands of CRMs were active in a human and mouse cell/tissue type, and up to 99.98% of them were reutilized in different cell/tissue types, while as small as 0.02% of them were unique to a cell/tissue type that might define the cell/tissue type.

CONCLUSIONS

Our two-step approach can accurately predict functional states in any cell/tissue type of all the CRMs in the genome using data of only 1~4 epigenetic marks. Our approach is also more cost-effective than existing methods that typically use data of more epigenetic marks. Our results suggest common epigenetic rules for defining functional states of CRMs in various cell/tissue types in humans and mice.

Collapse

Butt AH, Alkhalifah T, Alturise F, Khan YD. A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns. Sci Rep 2022;12:15183. [PMID: 36071071 PMCID: PMC9452539 DOI: 10.1038/s41598-022-19099-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 08/24/2022] [Indexed: 11/26/2022] Open

Huang G, Luo W, Zhang G, Zheng P, Yao Y, Lyu J, Liu Y, Wei DQ. Enhancer-LSTMAtt: A Bi-LSTM and Attention-Based Deep Learning Method for Enhancer Recognition. Biomolecules 2022;12:biom12070995. [PMID: 35883552 PMCID: PMC9313278 DOI: 10.3390/biom12070995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/03/2022] [Accepted: 07/07/2022] [Indexed: 01/27/2023] Open

Gao Y, Chen Y, Feng H, Zhang Y, Yue Z. RicENN: Prediction of Rice Enhancers with Neural Network Based on DNA Sequences. Interdiscip Sci 2022;14:555-565. [PMID: 35190950 DOI: 10.1007/s12539-022-00503-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 01/07/2022] [Accepted: 01/18/2022] [Indexed: 01/22/2023]

Abstract

Enhancers are the primary cis-elements of transcriptional regulation and play a vital role in gene expression at different stages of plant growth and development. Having high locational variation and free scattering in non-encoding genomes, identification of enhancers is a crucial, but challenging work in understanding the biological mechanism of model plants. Recently, applications of neural network models are gaining increasing popularity in predicting the function of genomic elements. Although several computational models have shown great advantages to tackle this challenge, a further study of the identification of rice enhancers from DNA sequences is still lacking. We present RicENN, a novel deep learning framework capable of accurately identifying enhancers of rice, integrating convolution neural networks (CNNs), bi-directional recurrent neural networks (RNNs), and attention mechanisms. A combined-feature representation method was designed to extract the sequence features from original DNA sequences using six types of autocorrelation encodings. Moreover, we verified that the integrated model achieves the best performance by an ablation study. Finally, our deep learning framework realized a reliable prediction of the rice enhancers. The results show RicENN outperforms available alternative approaches in rice species, achieving the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) of 0.960 and 0.960 on cross-validation, and 0.879 and 0.877 during independent tests, respectively. This study develops a hybrid model to combine the merits of different neural network architectures, which shows the potential ability to apply deep learning in bioinformatic sequences and contributes to the acceleration of functional genomic studies of rice. RicENN and its code are freely accessible at http://bioinfor.aielab.cc/RicENN/ .

Collapse

Amilpur S, Bhukya R. A sequence-based two-layer predictor for identifying enhancers and their strength through enhanced feature extraction. J Bioinform Comput Biol 2022;20:2250005. [PMID: 35264081 DOI: 10.1142/s0219720022500056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

iEnhancer-Deep: A Computational Predictor for Enhancer Sites and Their Strength Using Deep Learning. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12042120] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Tsuda S, Pipkin ME. Transcriptional Control of Cell Fate Determination in Antigen-Experienced CD8 T Cells. Cold Spring Harb Perspect Biol 2022;14:a037945. [PMID: 34127445 PMCID: PMC8805646 DOI: 10.1101/cshperspect.a037945] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Glaser LV, Steiger M, Fuchs A, van Bömmel A, Einfeldt E, Chung HR, Vingron M, Meijsing SH. Assessing genome-wide dynamic changes in enhancer activity during early mESC differentiation by FAIRE-STARR-seq. Nucleic Acids Res 2021;49:12178-12195. [PMID: 34850108 PMCID: PMC8643627 DOI: 10.1093/nar/gkab1100] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 10/14/2021] [Accepted: 10/22/2021] [Indexed: 11/18/2022] Open

Jain M, Garg R. Enhancers as potential targets for engineering salinity stress tolerance in crop plants. PHYSIOLOGIA PLANTARUM 2021;173:1382-1391. [PMID: 33837536 DOI: 10.1111/ppl.13421] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 03/19/2021] [Accepted: 04/06/2021] [Indexed: 06/12/2023]

Liang Y, Zhang S, Qiao H, Cheng Y. iEnhancer-MFGBDT: Identifying enhancers and their strength by fusing multiple features and gradient boosting decision tree. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2021;18:8797-8814. [PMID: 34814323 DOI: 10.3934/mbe.2021434] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

iEnhancer-RD: Identification of enhancers and their strength using RKPK features and deep neural networks. Anal Biochem 2021;630:114318. [PMID: 34364858 DOI: 10.1016/j.ab.2021.114318] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2021] [Revised: 07/02/2021] [Accepted: 07/27/2021] [Indexed: 11/20/2022]

Basith S, Hasan MM, Lee G, Wei L, Manavalan B. Integrative machine learning framework for the identification of cell-specific enhancers from the human genome. Brief Bioinform 2021;22:6315815. [PMID: 34226917 DOI: 10.1093/bib/bbab252] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Revised: 06/08/2021] [Accepted: 06/14/2021] [Indexed: 02/06/2023] Open

Abstract

Enhancers are deoxyribonucleic acid (DNA) fragments which when bound by transcription factors enhance the transcription of related genes. Due to its sporadic distribution and similar fractions, identification of enhancers from the human genome seems a daunting task. Compared to the traditional experimental approaches, computational methods with easy-to-use platforms could be efficiently applied to annotate enhancers' functions and physiological roles. In this aspect, several bioinformatics tools have been developed to identify enhancers. Despite their spectacular performances, existing methods have certain drawbacks and limitations, including fixed length of sequences being utilized for model development and cell-specificity negligence. A novel predictor would be beneficial in the context of genome-wide enhancer prediction by addressing the above-mentioned issues. In this study, we constructed new datasets for eight different cell types. Utilizing these data, we proposed an integrative machine learning (ML)-based framework called Enhancer-IF for identifying cell-specific enhancers. Enhancer-IF comprehensively explores a wide range of heterogeneous features with five commonly used ML methods (random forest, extremely randomized tree, multilayer perceptron, support vector machine and extreme gradient boosting). Specifically, these five classifiers were trained with seven encodings and obtained 35 baseline models. The output of these baseline models was integrated and again inputted to five classifiers for the construction of five meta-models. Finally, the integration of five meta-models through ensemble learning improved the model robustness. Our proposed approach showed an excellent prediction performance compared to the baseline models on both training and independent datasets in different cell types, thus highlighting the superiority of our approach in the identification of the enhancers. We assume that Enhancer-IF will be a valuable tool for screening and identifying potential enhancers from the human DNA sequences.

Collapse

Asma H, Halfon MS. Annotating the Insect Regulatory Genome. INSECTS 2021;12:591. [PMID: 34209769 PMCID: PMC8305585 DOI: 10.3390/insects12070591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 06/23/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]

Ni P, Su Z. Accurate prediction of cis-regulatory modules reveals a prevalent regulatory genome of humans. NAR Genom Bioinform 2021;3:lqab052. [PMID: 34159315 PMCID: PMC8210889 DOI: 10.1093/nargab/lqab052] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/01/2021] [Accepted: 06/14/2021] [Indexed: 02/07/2023] Open

Cai L, Ren X, Fu X, Peng L, Gao M, Zeng X. iEnhancer-XG: interpretable sequence-based enhancers and their strength predictor. Bioinformatics 2021;37:1060-1067. [PMID: 33119044 DOI: 10.1093/bioinformatics/btaa914] [Citation(s) in RCA: 52] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 09/30/2020] [Accepted: 10/15/2020] [Indexed: 01/10/2023] Open

Parisi C, Vashisht S, Winata CL. Fish-Ing for Enhancers in the Heart. Int J Mol Sci 2021;22:3914. [PMID: 33920121 PMCID: PMC8069060 DOI: 10.3390/ijms22083914] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/07/2021] [Accepted: 04/08/2021] [Indexed: 12/19/2022] Open

Mu X, Wang Y, Duan M, Liu S, Li F, Wang X, Zhang K, Huang L, Zhou F. A Novel Position-Specific Encoding Algorithm (SeqPose) of Nucleotide Sequences and Its Application for Detecting Enhancers. Int J Mol Sci 2021;22:ijms22063079. [PMID: 33802922 PMCID: PMC8002641 DOI: 10.3390/ijms22063079] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Revised: 03/04/2021] [Accepted: 03/11/2021] [Indexed: 11/16/2022] Open

Abstract

Enhancers are short genomic regions exerting tissue-specific regulatory roles, usually for remote coding regions. Enhancers are observed in both prokaryotic and eukaryotic genomes, and their detections facilitate a better understanding of the transcriptional regulation mechanism. The accurate detection and transcriptional regulation strength evaluation of the enhancers remain a major bioinformatics challenge. Most of the current studies utilized the statistical features of short fixed-length nucleotide sequences. This study introduces the location information of each k-mer (SeqPose) into the encoding strategy of a DNA sequence and employs the attention mechanism in the two-layer bi-directional long-short term memory (BD-LSTM) model (spEnhancer) for the enhancer detection problem. The first layer of the delivered classifier discriminates between enhancers and non-enhancers, and the second layer evaluates the transcriptional regulation strength of the detected enhancer. The SeqPose-encoded features are selected by the Chi-squared test, and 45 positions are removed from further analysis. The existing studies may focus on selecting the statistical DNA sequence descriptors with large contributions to the prediction models. This study does not utilize these statistical DNA sequence descriptors. Then the word vector of the SeqPose-encoded features is obtained by using the word embedding layer. This study hypothesizes that different word vector features may contribute differently to the enhancer detection model, and assigns different weights to these word vectors through the attention mechanism in the BD-LSTM model. The previous study generously provided the training and independent test datasets, and the proposed spEnhancer is compared with the three existing state-of-the-art studies using the same experimental procedure. The leave-one-out validation data on the training dataset shows that the proposed spEnhancer achieves similar detection performances as the three existing studies. While spEnhancer achieves the best overall performance metric MCC for both of the two binary classification problems on the independent test dataset. The experimental data shows that the strategy of removing redundant positions (SeqPose) may help improve the DNA sequence-based prediction models. spEnhancer may serve well as a complementary model to the existing studies, especially for the novel query enhancers that are not included in the training dataset.

Collapse

Affiliation(s)

Xuechen Mu Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.) School of Mathematics, Jilin University, Changchun 130012, China; (X.W.); (K.Z.)
Yueying Wang Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.) Department of Epidemiology and Biostatistics, School of Public Health, Jilin University, Changchun 130021, China
Meiyu Duan Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.)
Shuai Liu Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.)
Fei Li Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.)
Xiuli Wang School of Mathematics, Jilin University, Changchun 130012, China; (X.W.); (K.Z.)
Kai Zhang School of Mathematics, Jilin University, Changchun 130012, China; (X.W.); (K.Z.)
Lan Huang Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.)
Fengfeng Zhou Health Informatics Lab, College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of the Ministry of Education, Jilin University, Changchun 130012, China; (X.M.); (Y.W.); (M.D.); (S.L.); (F.L.); (L.H.) Correspondence: or

Collapse

Pipkin ME. Runx proteins and transcriptional mechanisms that govern memory CD8 T cell development. Immunol Rev 2021;300:100-124. [PMID: 33682165 DOI: 10.1111/imr.12954] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Revised: 12/23/2020] [Accepted: 12/28/2020] [Indexed: 12/14/2022]

Chen S, Gan M, Lv H, Jiang R. DeepCAPE: A Deep Convolutional Neural Network for the Accurate Prediction of Enhancers. GENOMICS PROTEOMICS & BIOINFORMATICS 2021;19:565-577. [PMID: 33581335 PMCID: PMC9040020 DOI: 10.1016/j.gpb.2019.04.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Revised: 03/15/2019] [Accepted: 04/29/2019] [Indexed: 12/12/2022]

Abstract

The establishment of a landscape of enhancers across human cells is crucial to deciphering the mechanism of gene regulation, cell differentiation, and disease development. High-throughput experimental approaches, which contain successfully reported enhancers in typical cell lines, are still too costly and time-consuming to perform systematic identification of enhancers specific to different cell lines. Existing computational methods, capable of predicting regulatory elements purely relying on DNA sequences, lack the power of cell line-specific screening. Recent studies have suggested that chromatin accessibility of a DNA segment is closely related to its potential function in regulation, and thus may provide useful information in identifying regulatory elements. Motivated by the aforementioned understanding, we integrate DNA sequences and chromatin accessibility data to accurately predict enhancers in a cell line-specific manner. We proposed DeepCAPE, a deep convolutional neural network to predict enhancers via the integration of DNA sequences and DNase-seq data. Benefitting from the well-designed feature extraction mechanism and skip connection strategy, our model not only consistently outperforms existing methods in the imbalanced classification of cell line-specific enhancers against background sequences, but also has the ability to self-adapt to different sizes of datasets. Besides, with the adoption of auto-encoder, our model is capable of making cross-cell line predictions. We further visualize kernels of the first convolutional layer and show the match of identified sequence signatures and known motifs. We finally demonstrate the potential ability of our model to explain functional implications of putative disease-associated genetic variants and discriminate disease-related enhancers. The source code and detailed tutorial of DeepCAPE are freely available at https://github.com/ShengquanChen/DeepCAPE.

Collapse

Zhang TH, Flores M, Huang Y. ES-ARCNN: Predicting enhancer strength by using data augmentation and residual convolutional neural network. Anal Biochem 2021;618:114120. [PMID: 33535061 DOI: 10.1016/j.ab.2021.114120] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 12/13/2020] [Accepted: 01/21/2021] [Indexed: 02/06/2023]

Xie Y, Xiao L, Chen L, Zheng Y, Zhang C, Wang G. Integrated Analysis of Methylomic and Transcriptomic Data to Identify Potential Diagnostic Biomarkers for Major Depressive Disorder. Genes (Basel) 2021;12:genes12020178. [PMID: 33513891 PMCID: PMC7912210 DOI: 10.3390/genes12020178] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2020] [Revised: 01/15/2021] [Accepted: 01/26/2021] [Indexed: 12/12/2022] Open

Abstract

Major depressive disorder (MDD) is a mental illness with high incidence and complex etiology, that poses a serious threat to human health and increases the socioeconomic burden. Currently, high-accuracy biomarkers for MDD diagnosis are urgently needed. This paper aims to identify novel blood-based diagnostic biomarkers for MDD. Whole blood DNA methylation data and gene expression data from the Gene Expression Omnibus database are downloaded. Then, differentially expressed/methylated genes (DEGs/DMGs) are identified. In addition, we made a systematic analysis of the DNA methylation on 5′-C-phosphate-G-3′ (CpGs) in all of the gene regions, as well as different gene regions, and then we defined a “dominant” region. Subsequently, integrated analysis is employed to identify the robust MDD-related blood biomarkers. Finally, a gene expression classifier and a methylation classifier are constructed using the random forest algorithm and the leave-one-out cross-validation method. Our results demonstrate that DEGs are mainly involved in the inflammatory response-associated pathways, while DMGs are primarily concentrated in the neurodevelopment- and neuroplasticity-associated pathways. Our integrated analysis identified 46 hypo-methylated and up-regulated (hypo-up) genes and 71 hyper-methylated and down-regulated (hyper-down) genes. One gene expression classifier and two DNA methylation classifiers, based on the CpGs in all of the regions or in the dominant regions are constructed. The gene expression classifier possessed the best predictive ability, followed by the DNA methylation classifiers, based on the CpGs in both the dominant regions and all of the regions. In summary, the integrated analysis of DNA methylation and gene expression has identified 46 hypo-up genes and 71 hyper-down genes, which could be used as diagnostic biomarkers for MDD.

Collapse

Kong N, Jung I. Long-range chromatin interactions in pathogenic gene expression control. Transcription 2020;11:211-216. [PMID: 33151112 DOI: 10.1080/21541264.2020.1843958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open

Tobias IC, Abatti LE, Moorthy SD, Mullany S, Taylor T, Khader N, Filice MA, Mitchell JA. Transcriptional enhancers: from prediction to functional assessment on a genome-wide scale. Genome 2020;64:426-448. [PMID: 32961076 DOI: 10.1139/gen-2020-0104] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Supervised enhancer prediction with epigenetic pattern recognition and targeted validation. Nat Methods 2020;17:807-814. [PMID: 32737473 PMCID: PMC8073243 DOI: 10.1038/s41592-020-0907-8] [Citation(s) in RCA: 52] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Accepted: 06/18/2020] [Indexed: 12/20/2022]

Osmala M, Lähdesmäki H. Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns. BMC Bioinformatics 2020;21:317. [PMID: 32689977 PMCID: PMC7370432 DOI: 10.1186/s12859-020-03621-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 06/19/2020] [Indexed: 12/11/2022] Open

Abstract

Background

The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently.

Results

In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods.

Conclusion

PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies.

Collapse

Sun C, Zhang N, Yu P, Wu X, Li Q, Li T, Li H, Xiao X, Shalmani A, Li L, Che D, Wang X, Zhang P, Chen Z, Liu T, Zhao J, Hua J, Liao M. Enhancer recognition and prediction during spermatogenesis based on deep convolutional neural networks. Mol Omics 2020;16:455-464. [PMID: 32568326 DOI: 10.1039/d0mo00031k] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]

Abstract

MOTIVATION

enhancers play an important role in the regulation of gene expression during spermatogenesis. The development of ChIP-Chip and ChIP-Seq sequencing technology has enabled researchers to focus on the relationship between enhancers and DNA sequences and histone protein modifications. However, the prediction of enhancers based on the locally conserved DNA sequence and similar histone modification features is still unknown. Here, the present study proposed a convolutional neural network (CNN) model to predict enhancers that can regulate gene expression during spermatogenesis.

RESULTS

we have obtained a positive set of enhancers using the P300 locus, verified by experiments, while a negative set was constructed using the promoter as a non-enhancer locus. The model was trained on all types of specific cells during spermatogenesis independently, and the transfer learning strategy was used to fine-tune the model based on which the model can be trained and adapted to other cells quickly. We visualized the convolution layer of the trained model and aligned the predicted enhancer with the JASPAR database. The results showed that the model was highly matched with some important transcription factors during spermatogenesis, signifying the reliability of the model. Finally, we compared the CNN algorithm with the gkmSVM algorithm (Support Vector Machine). It is well known that CNN has better performance than the gkmSVM algorithm, especially in the generalization ability. Our work demonstrated their strong learning ability and the low CPU requirements for the experiment, with a small number of convolution layers and simple network structure, while avoiding overfitting the training data. At the end of the experiment, we used the trained model to build an enhancer recognition website for further research and communication.

Collapse