Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kleftogiannis D, Kalnis P, Bajic VB. Progress and challenges in bioinformatics approaches for enhancer identification. Brief Bioinform 2015;17:967-979. [PMID: 26634919 PMCID: PMC5142011 DOI: 10.1093/bib/bbv101] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 10/22/2015] [Indexed: 12/20/2022] Open

For:	Kleftogiannis D, Kalnis P, Bajic VB. Progress and challenges in bioinformatics approaches for enhancer identification. Brief Bioinform 2015;17:967-979. [PMID: 26634919 PMCID: PMC5142011 DOI: 10.1093/bib/bbv101] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 10/22/2015] [Indexed: 12/20/2022] Open

Number

Cited by Other Article(s)

Shireen H, Batool F, Khatoon H, Parveen N, Sehar NU, Hussain I, Ali S, Abbasi AA. Predicting genome-wide tissue-specific enhancers via combinatorial transcription factor genomic occupancy analysis. FEBS Lett 2024. [PMID: 39367524 DOI: 10.1002/1873-3468.15030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 08/27/2024] [Accepted: 09/13/2024] [Indexed: 10/06/2024]

Safaei M, Goodarzi A, Abpeikar Z, Farmani AR, Kouhpayeh SA, Najafipour S, Jafari Najaf Abadi MH. Determination of key hub genes in Leishmaniasis as potential factors in diagnosis and treatment based on a bioinformatics study. Sci Rep 2024;14:22537. [PMID: 39342024 PMCID: PMC11438978 DOI: 10.1038/s41598-024-73779-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Accepted: 09/20/2024] [Indexed: 10/01/2024] Open

Abstract

Leishmaniasis is an infectious disease caused by protozoan parasites from different species of leishmania. The disease is transmitted by female sandflies that carry these parasites. In this study, datasets on leishmaniasis published in the GEO database were analyzed and summarized. The analysis in all three datasets (GSE43880, GSE55664, and GSE63931) used in this study has been performed on the skin wounds of patients infected with a clinical form of leishmania (Leishmania braziliensis), and biopsies have been taken from them. To identify differentially expressed genes (DEGs) between leishmaniasis patients and controls, the robust rank aggregation (RRA) procedure was applied. We performed gene functional annotation and protein-protein interaction (PPI) network analysis to demonstrate the putative functionalities of the DEGs. The study utilized Molecular Complex Detection (MCODE), Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) to detect molecular complexes within the protein-protein interaction (PPI) network and conduct analyses on the identified functional modules. The CytoHubba plugin's results were paired with RRA analysis to determine the hub genes. Finally, the interaction between miRNAs and hub genes was predicted. Based on the RRA integrated analysis, 407 DEGs were identified (263 up-regulated genes and 144 down-regulated genes). The top three modules were listed after creating the PPI network via the MCODE plug. Seven hub genes were found using the CytoHubba app and RRA: CXCL10, GBP1, GNLY, GZMA, GZMB, NKG7, and UBD. According to our enrichment analysis, these functional modules were primarily associated with immune pathways, cytokine activity/signaling pathways, and inflammation pathways. However, a UBD hub gene is interestingly involved in the ubiquitination pathways of pathogenesis. The mirNet database predicted the hub gene's interaction with miRNAs, and results revealed that several miRNAs, including mir-146a-5p, crucial in fighting pathogenesis. The key hub genes discovered in this work may be considered as potential biomarkers in diagnosis, development of agonists/antagonist, novel vaccine design, and will greatly contribute to clinical studies in the future.

Collapse

Mulero-Hernández J, Mironov V, Miñarro-Giménez JA, Kuiper M, Fernández-Breis J. Integration of chromosome locations and functional aspects of enhancers and topologically associating domains in knowledge graphs enables versatile queries about gene regulation. Nucleic Acids Res 2024;52:e69. [PMID: 38967009 PMCID: PMC11347148 DOI: 10.1093/nar/gkae566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Revised: 06/12/2024] [Accepted: 06/19/2024] [Indexed: 07/06/2024] Open

Das S, Rai SN. Predicting the Effect of miRNA on Gene Regulation to Foster Translational Multi-Omics Research-A Review on the Role of Super-Enhancers. Noncoding RNA 2024;10:45. [PMID: 39195574 DOI: 10.3390/ncrna10040045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 08/12/2024] [Accepted: 08/13/2024] [Indexed: 08/29/2024] Open

Ni P, Wu S, Su Z. Validated Negative Regions (VNRs) in the VISTA Database might be Truncated Forms of Bona Fide Enhancers. ADVANCED GENETICS (HOBOKEN, N.J.) 2024;5:2300209. [PMID: 38884049 PMCID: PMC11170074 DOI: 10.1002/ggn2.202300209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 03/16/2024] [Indexed: 06/18/2024]

Abbasi AF, Asim MN, Ahmed S, Dengel A. Long extrachromosomal circular DNA identification by fusing sequence-derived features of physicochemical properties and nucleotide distribution patterns. Sci Rep 2024;14:9466. [PMID: 38658614 PMCID: PMC11043385 DOI: 10.1038/s41598-024-57457-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 03/18/2024] [Indexed: 04/26/2024] Open

Abstract

Long extrachromosomal circular DNA (leccDNA) regulates several biological processes such as genomic instability, gene amplification, and oncogenesis. The identification of leccDNA holds significant importance to investigate its potential associations with cancer, autoimmune, cardiovascular, and neurological diseases. In addition, understanding these associations can provide valuable insights about disease mechanisms and potential therapeutic approaches. Conventionally, wet lab-based methods are utilized to identify leccDNA, which are hindered by the need for prior knowledge, and resource-intensive processes, potentially limiting their broader applicability. To empower the process of leccDNA identification across multiple species, the paper in hand presents the very first computational predictor. The proposed iLEC-DNA predictor makes use of SVM classifier along with sequence-derived nucleotide distribution patterns and physicochemical properties-based features. In addition, the study introduces a set of 12 benchmark leccDNA datasets related to three species, namely Homo sapiens (HM), Arabidopsis Thaliana (AT), and Saccharomyces cerevisiae (SC/YS). It performs large-scale experimentation across 12 benchmark datasets under different experimental settings using the proposed predictor, more than 140 baseline predictors, and 858 encoder ensembles. The proposed predictor outperforms baseline predictors and encoder ensembles across diverse leccDNA datasets by producing average performance values of 81.09%, 62.2% and 81.08% in terms of ACC, MCC and AUC-ROC across all the datasets. The source code of the proposed and baseline predictors is available at https://github.com/FAhtisham/Extrachrosmosomal-DNA-Prediction . To facilitate the scientific community, a web application for leccDNA identification is available at https://sds_genetic_analysis.opendfki.de/iLEC_DNA/.

Collapse

Yao X, Ouyang S, Lian Y, Peng Q, Zhou X, Huang F, Hu X, Shi F, Xia J. PheSeq, a Bayesian deep learning model to enhance and interpret the gene-disease association studies. Genome Med 2024;16:56. [PMID: 38627848 PMCID: PMC11020195 DOI: 10.1186/s13073-024-01330-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 04/02/2024] [Indexed: 04/19/2024] Open

Wang Q, Zhang J, Liu Z, Duan Y, Li C. Integrative approaches based on genomic techniques in the functional studies on enhancers. Brief Bioinform 2023;25:bbad442. [PMID: 38048082 PMCID: PMC10694556 DOI: 10.1093/bib/bbad442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/22/2023] [Accepted: 11/08/2023] [Indexed: 12/05/2023] Open

Gonçalves TM, Stewart CL, Baxley SD, Xu J, Li D, Gabel HW, Wang T, Avraham O, Zhao G. Towards a comprehensive regulatory map of Mammalian Genomes. RESEARCH SQUARE 2023:rs.3.rs-3294408. [PMID: 37841836 PMCID: PMC10571623 DOI: 10.21203/rs.3.rs-3294408/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/17/2023]

Liu Y, Wang Z, Yuan H, Zhu G, Zhang Y. HEAP: a task adaptive-based explainable deep learning framework for enhancer activity prediction. Brief Bioinform 2023;24:bbad286. [PMID: 37539835 DOI: 10.1093/bib/bbad286] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 07/05/2023] [Accepted: 07/21/2023] [Indexed: 08/05/2023] Open

Phan LT, Oh C, He T, Manavalan B. A comprehensive revisit of the machine-learning tools developed for the identification of enhancers in the human genome. Proteomics 2023;23:e2200409. [PMID: 37021401 DOI: 10.1002/pmic.202200409] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/18/2023] [Accepted: 03/27/2023] [Indexed: 04/07/2023]

Maytum A, Edginton-White B, Bonifer C. Identification and characterization of enhancer elements controlling cell type-specific and signalling dependent chromatin programming during hematopoietic development. Stem Cell Investig 2023;10:14. [PMID: 37404470 PMCID: PMC10316067 DOI: 10.21037/sci-2023-011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 05/24/2023] [Indexed: 07/06/2023]

Abstract

The development of multi-cellular organisms from a single fertilized egg requires to differentially execute the information encoded in our DNA. This complex process is regulated by the interplay of transcription factors with a chromatin environment, both of which provide the epigenetic information maintaining cell-type specific gene expression patterns. Moreover, transcription factors and their target genes form vast interacting gene regulatory networks which can be exquisitely stable. However, all developmental processes originate from pluripotent precursor cell types. The production of terminally differentiated cells from such cells, therefore, requires successive changes of cell fates, meaning that genes relevant for the next stage of differentiation must be switched on and genes not relevant anymore must be switched off. The stimulus for the change of cell fate originates from extrinsic signals which set a cascade of intracellular processes in motion that eventually terminate at the genome leading to changes in gene expression and the development of alternate gene regulatory networks. How developmental trajectories are encoded in the genome and how the interplay between intrinsic and extrinsic processes regulates development is one of the major questions in developmental biology. The development of the hematopoietic system has long served as model to understand how changes in gene regulatory networks drive the differentiation of the various blood cell types. In this review, we highlight the main signals and transcription factors and how they are integrated at the level of chromatin programming and gene expression control. We also highlight recent studies identifying the cis-regulatory elements such as enhancers at the global level and explain how their developmental activity is regulated by the cooperation of cell-type specific and ubiquitous transcription factors with extrinsic signals.

Collapse

Genome-wide identification and characterization of DNA enhancers with a stacked multivariate fusion framework. PLoS Comput Biol 2022;18:e1010779. [PMID: 36520922 PMCID: PMC9836277 DOI: 10.1371/journal.pcbi.1010779] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 01/12/2023] [Accepted: 11/29/2022] [Indexed: 12/23/2022] Open

Abstract

Enhancers are short non-coding DNA sequences outside of the target promoter regions that can be bound by specific proteins to increase a gene's transcriptional activity, which has a crucial role in the spatiotemporal and quantitative regulation of gene expression. However, enhancers do not have a specific sequence motifs or structures, and their scattered distribution in the genome makes the identification of enhancers from human cell lines particularly challenging. Here we present a novel, stacked multivariate fusion framework called SMFM, which enables a comprehensive identification and analysis of enhancers from regulatory DNA sequences as well as their interpretation. Specifically, to characterize the hierarchical relationships of enhancer sequences, multi-source biological information and dynamic semantic information are fused to represent regulatory DNA enhancer sequences. Then, we implement a deep learning-based sequence network to learn the feature representation of the enhancer sequences comprehensively and to extract the implicit relationships in the dynamic semantic information. Ultimately, an ensemble machine learning classifier is trained based on the refined multi-source features and dynamic implicit relations obtained from the deep learning-based sequence network. Benchmarking experiments demonstrated that SMFM significantly outperforms other existing methods using several evaluation metrics. In addition, an independent test set was used to validate the generalization performance of SMFM by comparing it to other state-of-the-art enhancer identification methods. Moreover, we performed motif analysis based on the contribution scores of different bases of enhancer sequences to the final identification results. Besides, we conducted interpretability analysis of the identified enhancer sequences based on attention weights of EnhancerBERT, a fine-tuned BERT model that provides new insights into exploring the gene semantic information likely to underlie the discovered enhancers in an interpretable manner. Finally, in a human placenta study with 4,562 active distal gene regulatory enhancers, SMFM successfully exposed tissue-related placental development and the differential mechanism, demonstrating the generalizability and stability of our proposed framework.

Collapse

Ni P, Moe J, Su Z. Accurate prediction of functional states of cis-regulatory modules reveals common epigenetic rules in humans and mice. BMC Biol 2022;20:221. [PMID: 36199141 PMCID: PMC9535988 DOI: 10.1186/s12915-022-01426-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 09/29/2022] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Predicting cis-regulatory modules (CRMs) in a genome and their functional states in various cell/tissue types of the organism are two related challenging computational tasks. Most current methods attempt to simultaneously achieve both using data of multiple epigenetic marks in a cell/tissue type. Though conceptually attractive, they suffer high false discovery rates and limited applications. To fill the gaps, we proposed a two-step strategy to first predict a map of CRMs in the genome, and then predict functional states of all the CRMs in various cell/tissue types of the organism. We have recently developed an algorithm for the first step that was able to more accurately and completely predict CRMs in a genome than existing methods by integrating numerous transcription factor ChIP-seq datasets in the organism. Here, we presented machine-learning methods for the second step.

RESULTS

We showed that functional states in a cell/tissue type of all the CRMs in the genome could be accurately predicted using data of only 1~4 epigenetic marks by a variety of machine-learning classifiers. Our predictions are substantially more accurate than the best achieved so far. Interestingly, a model trained on a cell/tissue type in humans can accurately predict functional states of CRMs in different cell/tissue types of humans as well as of mice, and vice versa. Therefore, epigenetic code that defines functional states of CRMs in various cell/tissue types is universal at least in humans and mice. Moreover, we found that from tens to hundreds of thousands of CRMs were active in a human and mouse cell/tissue type, and up to 99.98% of them were reutilized in different cell/tissue types, while as small as 0.02% of them were unique to a cell/tissue type that might define the cell/tissue type.

CONCLUSIONS

Our two-step approach can accurately predict functional states in any cell/tissue type of all the CRMs in the genome using data of only 1~4 epigenetic marks. Our approach is also more cost-effective than existing methods that typically use data of more epigenetic marks. Our results suggest common epigenetic rules for defining functional states of CRMs in various cell/tissue types in humans and mice.

Collapse

Huang S, Chen S, Zhang D, Gao J, Liu L. Enhancer-associated regulatory network and gene signature based on transcriptome and methylation data to predict the survival of patients with lung adenocarcinoma. Front Genet 2022;13:fgene-2022-1008602. [PMID: 36212131 PMCID: PMC9538943 DOI: 10.3389/fgene.2022.1008602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/08/2022] [Indexed: 11/17/2022] Open

Abstract

Accumulating evidence has proved that aberrant methylation of enhancers plays regulatory roles in gene expression for various cancers including lung adenocarcinoma (LUAD). In this study, the transcriptome and methylation data of The Cancer Genome Atlas (TCGA)-LUAD cohort were comprehensively analyzed with a five-step Enhancer Linking by Methylation/Expression Relationships (ELMER) process. Step 1: 131,371 distal (2 kb upstream from the transcription start site) probes were obtained. Step 2: 10,665 distal hypomethylated probes were identified in an unsupervised mode with the get.diff.meth function. Step 3: 699 probe-gene pairs with negative correlations were screened using the get.pair function in an unsupervised mode. Step 4: After mapping with probes, 768 motifs were obtained and 24 of them were enriched. Step 5: 127 transcription factors (TFs) with differential expressions and negative correlations with methylation levels were screened, which were corresponding to 21 motifs. After the ELMER process, a prognostic “TFs-motifs-genes” regulatory network was constructed. The Least absolute shrinkage and selection operator (LASSO) and Stepwise regression analyses were further applied to identify variables in the TCGA-LUAD cohort and an eight-gene signature was constructed for calculating the risk score. The risk score was verified in two independent validation cohorts. The area under curve values of receiver operating characteristic curves predicting 1-, 3-, and 5-years survival ranged from 0.633 to 0.764. With the increase of the risk scores, both the survival statuses and clinical traits showed a worse tendency. There were significant differences in the degrees of immune cell infiltration, TMB values, and TIDE scores between the high-risk and low-risk groups. Finally, a better-performing prognostic nomogram was integrated with the risk score and other clinical traits. In short, this multi-omics analysis demonstrated the application of ELMER in analyzing enhancer-associated regulatory network in LUAD, which provided promising strategies for epigenetic therapy and prognostic biomarkers.

Collapse

Sharov AA, Nakatake Y, Wang W. Atlas of regulated target genes of transcription factors (ART-TF) in human ES cells. BMC Bioinformatics 2022;23:377. [PMID: 36114445 PMCID: PMC9479252 DOI: 10.1186/s12859-022-04924-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2022] [Accepted: 09/12/2022] [Indexed: 12/26/2022] Open

Abstract

Background

Transcription factors (TFs) play central roles in maintaining “stemness” of embryonic stem (ES) cells and their differentiation into several hundreds of adult cell types. The regulatory competence of TFs is routinely assessed by detecting target genes to which they bind. However, these data do not indicate which target genes are activated, repressed, or not affected by the change of TF abundance. There is a lack of large-scale studies that compare the genome binding of TFs with the expression change of target genes after manipulation of each TF.

Results

In this paper we associated human TFs with their target genes by two criteria: binding to genes, evaluated from published ChIP-seq data (n = 1868); and change of target gene expression shortly after induction of each TF in human ES cells. Lists of direction- and strength-specific regulated target genes are generated for 311 TFs (out of 351 TFs tested) with expected proportion of false positives less than or equal to 0.30, including 63 new TFs not present in four existing databases of target genes. Our lists of direction-specific targets for 152 TFs (80.0%) are larger that in the TRRUST database. In average, 30.9% of genes that respond greater than or equal to twofold to the induction of TFs are regulated targets. Regulated target genes indicate that the majority of TFs are either strong activators or strong repressors, whereas sets of genes that responded greater than or equal to twofold to the induction of TFs did not show strong asymmetry in the direction of expression change. The majority of human TFs (82.1%) regulated their target genes primarily via binding to enhancers. Repression of target genes is more often mediated by promoter-binding than activation of target genes. Enhancer-promoter loops are more abundant among strong activator and repressor TFs.

Conclusions

We developed an atlas of regulated targets of TFs (ART-TF) in human ES cells by combining data on TF binding with data on gene expression change after manipulation of individual TFs. Sets of regulated gene targets were identified with a controlled rate of false positives. This approach contributes to the understanding of biological functions of TFs and organization of gene regulatory networks. This atlas should be a valuable resource for ES cell-based regenerative medicine studies.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-022-04924-3.

Collapse

Hu J, Wang J, Li J, Hu H, Wu B, Ren H, Wang J. AHLS-pred: a novel sequence-based predictor of acyl-homoserine-lactone synthases using machine learning algorithms. ENVIRONMENTAL MICROBIOLOGY REPORTS 2022;14:616-631. [PMID: 35403334 DOI: 10.1111/1758-2229.13068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 03/28/2022] [Accepted: 03/30/2022] [Indexed: 06/14/2023]

Huang G, Luo W, Zhang G, Zheng P, Yao Y, Lyu J, Liu Y, Wei DQ. Enhancer-LSTMAtt: A Bi-LSTM and Attention-Based Deep Learning Method for Enhancer Recognition. Biomolecules 2022;12:biom12070995. [PMID: 35883552 PMCID: PMC9313278 DOI: 10.3390/biom12070995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/03/2022] [Accepted: 07/07/2022] [Indexed: 01/27/2023] Open

Mulero Hernández J, Fernández-Breis JT. Analysis of the landscape of human enhancer sequences in biological databases. Comput Struct Biotechnol J 2022;20:2728-2744. [PMID: 35685360 PMCID: PMC9168495 DOI: 10.1016/j.csbj.2022.05.045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 05/20/2022] [Accepted: 05/21/2022] [Indexed: 12/01/2022] Open

DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nat Genet 2022;54:613-624. [PMID: 35551305 DOI: 10.1038/s41588-022-01048-5] [Citation(s) in RCA: 69] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Accepted: 03/08/2022] [Indexed: 02/06/2023]

Enhancer RNAs (eRNAs) in Cancer: The Jacks of All Trades. Cancers (Basel) 2022;14:cancers14081978. [PMID: 35454885 PMCID: PMC9030334 DOI: 10.3390/cancers14081978] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Revised: 04/09/2022] [Accepted: 04/12/2022] [Indexed: 02/04/2023] Open

Jankovic B, Gojobori T. From shallow to deep: some lessons learned from application of machine learning for recognition of functional genomic elements in human genome. Hum Genomics 2022;16:7. [PMID: 35180894 PMCID: PMC8855580 DOI: 10.1186/s40246-022-00376-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2021] [Accepted: 01/02/2022] [Indexed: 11/25/2022] Open

Abstract

Identification of genomic signals as indicators for functional genomic elements is one of the areas that received early and widespread application of machine learning methods. With time, the methods applied grew in variety and generally exhibited a tendency to improve their ability to identify some major genomic and transcriptomics signals. The evolution of machine learning in genomics followed a similar path to applications of machine learning in other fields. These were impacted in a major way by three dominant developments, namely an enormous increase in availability and quality of data, a significant increase in computational power available to machine learning applications, and finally, new machine learning paradigms, of which deep learning is the most well-known example. It is not easy in general to distinguish factors leading to improvements in results of applications of machine learning. This is even more so in the field of genomics, where the advent of next-generation sequencing and the increased ability to perform functional analysis of raw data have had a major effect on the applicability of machine learning in OMICS fields. In this paper, we survey the results from a subset of published work in application of machine learning in the recognition of genomic signals and regions in human genome and summarize some lessons learnt from this endeavor. There is no doubt that a significant progress has been made both in terms of accuracy and reliability of models. Questions remain however whether the progress has been sufficient and what these developments bring to the field of genomics in general and human genomics in particular. Improving usability, interpretability and accuracy of models remains an important open challenge for current and future research in application of machine learning and more generally of artificial intelligence methods in genomics.

Collapse

Holm I, Nardini L, Pain A, Bischoff E, Anderson CE, Zongo S, Guelbeogo WM, Sagnon N, Gohl DM, Nowling RJ, Vernick KD, Riehle MM. Comprehensive Genomic Discovery of Non-Coding Transcriptional Enhancers in the African Malaria Vector Anopheles coluzzii. Front Genet 2022;12:785934. [PMID: 35082832 PMCID: PMC8784733 DOI: 10.3389/fgene.2021.785934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 12/10/2021] [Indexed: 11/24/2022] Open

Abstract

Almost all regulation of gene expression in eukaryotic genomes is mediated by the action of distant non-coding transcriptional enhancers upon proximal gene promoters. Enhancer locations cannot be accurately predicted bioinformatically because of the absence of a defined sequence code, and thus functional assays are required for their direct detection. Here we used a massively parallel reporter assay, Self-Transcribing Active Regulatory Region sequencing (STARR-seq), to generate the first comprehensive genome-wide map of enhancers in Anopheles coluzzii, a major African malaria vector in the Gambiae species complex. The screen was carried out by transfecting reporter libraries created from the genomic DNA of 60 wild A. coluzzii from Burkina Faso into A. coluzzii 4a3A cells, in order to functionally query enhancer activity of the natural population within the homologous cellular context. We report a catalog of 3,288 active genomic enhancers that were significant across three biological replicates, 74% of them located in intergenic and intronic regions. The STARR-seq enhancer screen is chromatin-free and thus detects inherent activity of a comprehensive catalog of enhancers that may be restricted in vivo to specific cell types or developmental stages. Testing of a validation panel of enhancer candidates using manual luciferase assays confirmed enhancer function in 26 of 28 (93%) of the candidates over a wide dynamic range of activity from two to at least 16-fold activity above baseline. The enhancers occupy only 0.7% of the genome, and display distinct composition features. The enhancer compartment is significantly enriched for 15 transcription factor binding site signatures, and displays divergence for specific dinucleotide repeats, as compared to matched non-enhancer genomic controls. The genome-wide catalog of A. coluzzii enhancers is publicly available in a simple searchable graphic format. This enhancer catalogue will be valuable in linking genetic and phenotypic variation, in identifying regulatory elements that could be employed in vector manipulation, and in better targeting of chromosome editing to minimize extraneous regulation influences on the introduced sequences.

Importance: Understanding the role of the non-coding regulatory genome in complex disease phenotypes is essential, but even in well-characterized model organisms, identification of regulatory regions within the vast non-coding genome remains a challenge. We used a large-scale assay to generate a genome wide map of transcriptional enhancers. Such a catalogue for the important malaria vector, Anopheles coluzzii, will be an important research tool as the role of non-coding regulatory variation in differential susceptibility to malaria infection is explored and as a public resource for research on this important insect vector of disease.

Collapse

Affiliation(s)

Inge Holm Institut Pasteur, Université de Paris, CNRS UMR 2000, Unit of Insect Vector Genetics and Genomics, Department of Parasites and Insect Vectors, Paris, France
Luisa Nardini Institut Pasteur, Université de Paris, CNRS UMR 2000, Unit of Insect Vector Genetics and Genomics, Department of Parasites and Insect Vectors, Paris, France
Adrien Pain Institut Pasteur, Université de Paris, CNRS UMR 2000, Unit of Insect Vector Genetics and Genomics, Department of Parasites and Insect Vectors, Paris, France.,Institut Pasteur, Université de Paris, Hub de Bioinformatique et Biostatistique, Paris, France
Emmanuel Bischoff Institut Pasteur, Université de Paris, CNRS UMR 2000, Unit of Insect Vector Genetics and Genomics, Department of Parasites and Insect Vectors, Paris, France
Cameron E Anderson Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States
Soumanaba Zongo Centre National de Recherche et de Formation sur le Paludisme (CNRFP), Ministry of Health, Ouagadougou, Burkina Faso
Wamdaogo M Guelbeogo Centre National de Recherche et de Formation sur le Paludisme (CNRFP), Ministry of Health, Ouagadougou, Burkina Faso
N'Fale Sagnon Centre National de Recherche et de Formation sur le Paludisme (CNRFP), Ministry of Health, Ouagadougou, Burkina Faso
Daryl M Gohl University of Minnesota Genomics Center, Minneapolis, MN, United States.,Department of Genetics, Cell Biology and Development, University of Minnesota, Minneapolis, MN, United States
Ronald J Nowling Department of Electrical Engineering and Computer Science, Milwaukee School of Engineering (MSOE), Milwaukee, WI, United States
Kenneth D Vernick Institut Pasteur, Université de Paris, CNRS UMR 2000, Unit of Insect Vector Genetics and Genomics, Department of Parasites and Insect Vectors, Paris, France
Michelle M Riehle Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States

Collapse

Jain M, Garg R. Enhancers as potential targets for engineering salinity stress tolerance in crop plants. PHYSIOLOGIA PLANTARUM 2021;173:1382-1391. [PMID: 33837536 DOI: 10.1111/ppl.13421] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/08/2021] [Revised: 03/19/2021] [Accepted: 04/06/2021] [Indexed: 06/12/2023]

Yousefi S, Deng R, Lanko K, Salsench EM, Nikoncuk A, van der Linde HC, Perenthaler E, van Ham TJ, Mulugeta E, Barakat TS. Comprehensive multi-omics integration identifies differentially active enhancers during human brain development with clinical relevance. Genome Med 2021;13:162. [PMID: 34663447 PMCID: PMC8524963 DOI: 10.1186/s13073-021-00980-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Accepted: 09/29/2021] [Indexed: 12/13/2022] Open

Ferré Q, Chèneby J, Puthier D, Capponi C, Ballester B. Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders. BMC Bioinformatics 2021;22:460. [PMID: 34563116 PMCID: PMC8467021 DOI: 10.1186/s12859-021-04359-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 06/04/2021] [Accepted: 08/09/2021] [Indexed: 11/13/2022] Open

Ni P, Su Z. Accurate prediction of cis-regulatory modules reveals a prevalent regulatory genome of humans. NAR Genom Bioinform 2021;3:lqab052. [PMID: 34159315 PMCID: PMC8210889 DOI: 10.1093/nargab/lqab052] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/01/2021] [Accepted: 06/14/2021] [Indexed: 02/07/2023] Open

Niu X, Deng K, Liu L, Yang K, Hu X. A statistical framework for predicting critical regions of p53-dependent enhancers. Brief Bioinform 2021;22:bbaa053. [PMID: 32392580 PMCID: PMC8138796 DOI: 10.1093/bib/bbaa053] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 02/26/2020] [Indexed: 12/13/2022] Open

Hong J, Gao R, Yang Y. CrepHAN: Cross-species prediction of enhancers by using hierarchical attention networks. Bioinformatics 2021;37:3436-3443. [PMID: 33978703 DOI: 10.1093/bioinformatics/btab349] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 04/21/2021] [Accepted: 05/06/2021] [Indexed: 01/17/2023] Open

Aboelnour E, Bonev B. Decoding the organization, dynamics, and function of the 4D genome. Dev Cell 2021;56:1562-1573. [PMID: 33984271 DOI: 10.1016/j.devcel.2021.04.023] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 02/15/2021] [Accepted: 04/21/2021] [Indexed: 11/15/2022]

Lee JTH, Patikas N, Kiselev VY, Hemberg M. Fast searches of large collections of single-cell data using scfind. Nat Methods 2021;18:262-271. [PMID: 33649586 PMCID: PMC7116898 DOI: 10.1038/s41592-021-01076-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2019] [Accepted: 01/20/2021] [Indexed: 01/30/2023]

Tobias IC, Abatti LE, Moorthy SD, Mullany S, Taylor T, Khader N, Filice MA, Mitchell JA. Transcriptional enhancers: from prediction to functional assessment on a genome-wide scale. Genome 2020;64:426-448. [PMID: 32961076 DOI: 10.1139/gen-2020-0104] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Osmala M, Lähdesmäki H. Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns. BMC Bioinformatics 2020;21:317. [PMID: 32689977 PMCID: PMC7370432 DOI: 10.1186/s12859-020-03621-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 06/19/2020] [Indexed: 12/11/2022] Open

Abstract

Background

The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently.

Results

In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods.

Conclusion

PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies.

Collapse

Malladi VS, Nagari A, Franco HL, Kraus WL. Total Functional Score of Enhancer Elements Identifies Lineage-Specific Enhancers That Drive Differentiation of Pancreatic Cells. Bioinform Biol Insights 2020;14:1177932220938063. [PMID: 32655276 PMCID: PMC7331761 DOI: 10.1177/1177932220938063] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Accepted: 06/02/2020] [Indexed: 01/10/2023] Open

Neumayr C, Pagani M, Stark A, Arnold CD. STARR-seq and UMI-STARR-seq: Assessing Enhancer Activities for Genome-Wide-, High-, and Low-Complexity Candidate Libraries. ACTA ACUST UNITED AC 2020;128:e105. [PMID: 31503413 PMCID: PMC9286403 DOI: 10.1002/cpmb.105] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Tomoyasu Y, Halfon MS. How to study enhancers in non-traditional insect models. ACTA ACUST UNITED AC 2020;223:223/Suppl_1/jeb212241. [PMID: 32034049 DOI: 10.1242/jeb.212241] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Pataskar A, Vanderlinden W, Emmerig J, Singh A, Lipfert J, Tiwari VK. Deciphering the Gene Regulatory Landscape Encoded in DNA Biophysical Features. iScience 2019;21:638-649. [PMID: 31731201 PMCID: PMC6889597 DOI: 10.1016/j.isci.2019.10.055] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2019] [Revised: 10/20/2019] [Accepted: 10/24/2019] [Indexed: 01/24/2023] Open

Enhancer prediction with histone modification marks using a hybrid neural network model. Methods 2019;166:48-56. [DOI: 10.1016/j.ymeth.2019.03.014] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2018] [Revised: 02/28/2019] [Accepted: 03/16/2019] [Indexed: 01/19/2023] Open

Perenthaler E, Yousefi S, Niggl E, Barakat TS. Beyond the Exome: The Non-coding Genome and Enhancers in Neurodevelopmental Disorders and Malformations of Cortical Development. Front Cell Neurosci 2019;13:352. [PMID: 31417368 PMCID: PMC6685065 DOI: 10.3389/fncel.2019.00352] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 07/16/2019] [Indexed: 12/22/2022] Open

Abstract

The development of the human cerebral cortex is a complex and dynamic process, in which neural stem cell proliferation, neuronal migration, and post-migratory neuronal organization need to occur in a well-organized fashion. Alterations at any of these crucial stages can result in malformations of cortical development (MCDs), a group of genetically heterogeneous neurodevelopmental disorders that present with developmental delay, intellectual disability and epilepsy. Recent progress in genetic technologies, such as next generation sequencing, most often focusing on all protein-coding exons (e.g., whole exome sequencing), allowed the discovery of more than a 100 genes associated with various types of MCDs. Although this has considerably increased the diagnostic yield, most MCD cases remain unexplained. As Whole Exome Sequencing investigates only a minor part of the human genome (1-2%), it is likely that patients, in which no disease-causing mutation has been identified, could harbor mutations in genomic regions beyond the exome. Even though functional annotation of non-coding regions is still lagging behind that of protein-coding genes, tremendous progress has been made in the field of gene regulation. One group of non-coding regulatory regions are enhancers, which can be distantly located upstream or downstream of genes and which can mediate temporal and tissue-specific transcriptional control via long-distance interactions with promoter regions. Although some examples exist in literature that link alterations of enhancers to genetic disorders, a widespread appreciation of the putative roles of these sequences in MCDs is still lacking. Here, we summarize the current state of knowledge on cis-regulatory regions and discuss novel technologies such as massively-parallel reporter assay systems, CRISPR-Cas9-based screens and computational approaches that help to further elucidate the emerging role of the non-coding genome in disease. Moreover, we discuss existing literature on mutations or copy number alterations of regulatory regions involved in brain development. We foresee that the future implementation of the knowledge obtained through ongoing gene regulation studies will benefit patients and will provide an explanation to part of the missing heritability of MCDs and other genetic disorders.

Collapse

Benton ML, Talipineni SC, Kostka D, Capra JA. Genome-wide enhancer annotations differ significantly in genomic distribution, evolution, and function. BMC Genomics 2019;20:511. [PMID: 31221079 PMCID: PMC6585034 DOI: 10.1186/s12864-019-5779-x] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Accepted: 05/07/2019] [Indexed: 12/28/2022] Open

Abstract

Background

Non-coding gene regulatory enhancers are essential to transcription in mammalian cells. As a result, a large variety of experimental and computational strategies have been developed to identify cis-regulatory enhancer sequences. Given the differences in the biological signals assayed, some variation in the enhancers identified by different methods is expected; however, the concordance of enhancers identified by different methods has not been comprehensively evaluated. This is critically needed, since in practice, most studies consider enhancers identified by only a single method. Here, we compare enhancer sets from eleven representative strategies in four biological contexts.

Results

All sets we evaluated overlap significantly more than expected by chance; however, there is significant dissimilarity in their genomic, evolutionary, and functional characteristics, both at the element and base-pair level, within each context. The disagreement is sufficient to influence interpretation of candidate SNPs from GWAS studies, and to lead to disparate conclusions about enhancer and disease mechanisms. Most regions identified as enhancers are supported by only one method, and we find limited evidence that regions identified by multiple methods are better candidates than those identified by a single method. As a result, we cannot recommend the use of any single enhancer identification strategy in all settings.

Conclusions

Our results highlight the inherent complexity of enhancer biology and identify an important challenge to mapping the genetic architecture of complex disease. Greater appreciation of how the diverse enhancer identification strategies in use today relate to the dynamic activity of gene regulatory regions is needed to enable robust and reproducible results.

Electronic supplementary material

The online version of this article (10.1186/s12864-019-5779-x) contains supplementary material, which is available to authorized users.

Collapse

Hariprakash JM, Ferrari F. Computational Biology Solutions to Identify Enhancers-target Gene Pairs. Comput Struct Biotechnol J 2019;17:821-831. [PMID: 31316726 PMCID: PMC6611831 DOI: 10.1016/j.csbj.2019.06.012] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Revised: 06/04/2019] [Accepted: 06/11/2019] [Indexed: 12/12/2022] Open

Albalawi F, Chahid A, Guo X, Albaradei S, Magana-Mora A, Jankovic BR, Uludag M, Van Neste C, Essack M, Laleg-Kirati TM, Bajic VB. Hybrid model for efficient prediction of poly(A) signals in human genomic DNA. Methods 2019;166:31-39. [PMID: 30991099 DOI: 10.1016/j.ymeth.2019.04.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 03/12/2019] [Accepted: 04/01/2019] [Indexed: 12/15/2022] Open

Wu C, Chen J, Liu Y, Hu X. Improved Prediction of Regulatory Element Using Hybrid Abelian Complexity Features with DNA Sequences. Int J Mol Sci 2019;20:ijms20071704. [PMID: 30959806 PMCID: PMC6480087 DOI: 10.3390/ijms20071704] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Revised: 04/01/2019] [Accepted: 04/02/2019] [Indexed: 12/14/2022] Open

Asma H, Halfon MS. Computational enhancer prediction: evaluation and improvements. BMC Bioinformatics 2019;20:174. [PMID: 30953451 PMCID: PMC6451241 DOI: 10.1186/s12859-019-2781-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

Identifying transcriptional enhancers and other cis-regulatory modules (CRMs) is an important goal of post-sequencing genome annotation. Computational approaches provide a useful complement to empirical methods for CRM discovery, but it is critical that we develop effective means to evaluate their performance in terms of estimating their sensitivity and specificity.

RESULTS

We introduce here pCRMeval, a pipeline for in silico evaluation of any enhancer prediction tools that are flexible enough to be applied to the Drosophila melanogaster genome. pCRMeval compares the result of predictions with the extensive existing knowledge of experimentally-validated Drosophila CRMs in order to estimate the precision and relative sensitivity of the prediction method. In the case of supervised prediction methods-when training data composed of validated CRMs are used-pCRMeval can also assess the sensitivity of specific training sets. We demonstrate the utility of pCRMeval through evaluation of our SCRMshaw CRM prediction method and training data. By measuring the impact of different parameters on SCRMshaw performance, as assessed by pCRMeval, we develop a more robust version of SCRMshaw, SCRMshaw_HD, that improves the number of predictions while maintaining sensitivity and specificity. Our analysis also demonstrates that SCRMshaw_HD, when applied to increasingly less well-assembled genomes, maintains its strong predictive power with only a minor drop-off in performance.

CONCLUSION

Our pCRMeval pipeline provides a general framework for evaluation that can be applied to any CRM prediction method, particularly a supervised method. While we make use of it here primarily to test and improve a particular method for CRM prediction, SCRMshaw, pCRMeval should provide a valuable platform to the research community not only for evaluating individual methods, but also for comparing between competing methods.

Collapse

Colbran LL, Chen L, Capra JA. Sequence Characteristics Distinguish Transcribed Enhancers from Promoters and Predict Their Breadth of Activity. Genetics 2019;211:1205-1217. [PMID: 30696717 PMCID: PMC6456323 DOI: 10.1534/genetics.118.301895] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 01/27/2019] [Indexed: 01/08/2023] Open

Zehnder T, Benner P, Vingron M. Predicting enhancers in mammalian genomes using supervised hidden Markov models. BMC Bioinformatics 2019;20:157. [PMID: 30917778 PMCID: PMC6437899 DOI: 10.1186/s12859-019-2708-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2018] [Accepted: 02/27/2019] [Indexed: 12/24/2022] Open

Abstract

BACKGROUND

Eukaryotic gene regulation is a complex process comprising the dynamic interaction of enhancers and promoters in order to activate gene expression. In recent years, research in regulatory genomics has contributed to a better understanding of the characteristics of promoter elements and for most sequenced model organism genomes there exist comprehensive and reliable promoter annotations. For enhancers, however, a reliable description of their characteristics and location has so far proven to be elusive. With the development of high-throughput methods such as ChIP-seq, large amounts of data about epigenetic conditions have become available, and many existing methods use the information on chromatin accessibility or histone modifications to train classifiers in order to segment the genome into functional groups such as enhancers and promoters. However, these methods often do not consider prior biological knowledge about enhancers such as their diverse lengths or molecular structure.

RESULTS

We developed enhancer HMM (eHMM), a supervised hidden Markov model designed to learn the molecular structure of promoters and enhancers. Both consist of a central stretch of accessible DNA flanked by nucleosomes with distinct histone modification patterns. We evaluated the performance of eHMM within and across cell types and developmental stages and found that eHMM successfully predicts enhancers with high precision and recall comparable to state-of-the-art methods, and consistently outperforms those in terms of accuracy and resolution.

CONCLUSIONS

eHMM predicts active enhancers based on data from chromatin accessibility assays and a minimal set of histone modification ChIP-seq experiments. In comparison to other 'black box' methods its parameters are easy to interpret. eHMM can be used as a stand-alone tool for enhancer prediction without the need for additional training or a tuning of parameters. The high spatial precision of enhancer predictions gives valuable targets for potential knockout experiments or downstream analyses such as motif search.

Collapse

Ho EYK, Cao Q, Gu M, Chan RWL, Wu Q, Gerstein M, Yip KY. Shaping the nebulous enhancer in the era of high-throughput assays and genome editing. Brief Bioinform 2019;21:836-850. [PMID: 30895290 DOI: 10.1093/bib/bbz030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 02/15/2019] [Accepted: 02/26/2019] [Indexed: 01/22/2023] Open

Mejía-Guerra MK, Buckler ES. A k-mer grammar analysis to uncover maize regulatory architecture. BMC PLANT BIOLOGY 2019;19:103. [PMID: 30876396 PMCID: PMC6419808 DOI: 10.1186/s12870-019-1693-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 02/21/2019] [Indexed: 05/06/2023]

Suzuki N, Hirano K, Ogino H, Ochi H. Arid3a regulates nephric tubule regeneration via evolutionarily conserved regeneration signal-response enhancers. eLife 2019;8:43186. [PMID: 30616715 PMCID: PMC6324879 DOI: 10.7554/elife.43186] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Accepted: 12/18/2018] [Indexed: 12/15/2022] Open

TELS: A Novel Computational Framework for Identifying Motif Signatures of Transcribed Enhancers. GENOMICS PROTEOMICS & BIOINFORMATICS 2018;16:332-341. [PMID: 30578915 PMCID: PMC6364045 DOI: 10.1016/j.gpb.2018.05.003] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2017] [Revised: 04/23/2018] [Accepted: 05/15/2018] [Indexed: 12/31/2022]