Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Liu B, Li K, Huang DS, Chou KC. iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics 2019;34:3835-3842. [PMID: 29878118 DOI: 10.1093/bioinformatics/bty458] [Citation(s) in RCA: 130] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 06/06/2018] [Indexed: 11/14/2022] Open

For:	Liu B, Li K, Huang DS, Chou KC. iEnhancer-EL: identifying enhancers and their strength with ensemble learning approach. Bioinformatics 2019;34:3835-3842. [PMID: 29878118 DOI: 10.1093/bioinformatics/bty458] [Citation(s) in RCA: 130] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2018] [Accepted: 06/06/2018] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Liang Y, Cao M, Zhang S. NeuroPred-ResSE: Predicting neuropeptides by integrating residual block and squeeze-excitation attention mechanism. Anal Biochem 2024;695:115648. [PMID: 39154878 DOI: 10.1016/j.ab.2024.115648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/31/2024] [Accepted: 08/15/2024] [Indexed: 08/20/2024]

Nerella S, Bandyopadhyay S, Zhang J, Contreras M, Siegel S, Bumin A, Silva B, Sena J, Shickel B, Bihorac A, Khezeli K, Rashidi P. Transformers and large language models in healthcare: A review. Artif Intell Med 2024;154:102900. [PMID: 38878555 DOI: 10.1016/j.artmed.2024.102900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 05/28/2024] [Accepted: 05/30/2024] [Indexed: 08/09/2024]

Yao L, Xie P, Guan J, Chung CR, Huang Y, Pang Y, Wu H, Chiang YC, Lee TY. CapsEnhancer: An Effective Computational Framework for Identifying Enhancers Based on Chaos Game Representation and Capsule Network. J Chem Inf Model 2024;64:5725-5736. [PMID: 38946113 PMCID: PMC11267569 DOI: 10.1021/acs.jcim.4c00546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 06/21/2024] [Accepted: 06/21/2024] [Indexed: 07/02/2024]

Hu W, Li Y, Wu Y, Guan L, Li M. A deep learning model for DNA enhancer prediction based on nucleotide position aware feature encoding. iScience 2024;27:110030. [PMID: 38868182 PMCID: PMC11167433 DOI: 10.1016/j.isci.2024.110030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 04/23/2024] [Accepted: 05/16/2024] [Indexed: 06/14/2024] Open

Tenekeci S, Tekir S. Identifying promoter and enhancer sequences by graph convolutional networks. Comput Biol Chem 2024;110:108040. [PMID: 38430611 DOI: 10.1016/j.compbiolchem.2024.108040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 01/09/2024] [Accepted: 02/27/2024] [Indexed: 03/05/2024]

Zou H. iDPPIV-SI: identifying dipeptidyl peptidase IV inhibitory peptides by using multiple sequence information. J Biomol Struct Dyn 2024;42:2144-2152. [PMID: 37125813 DOI: 10.1080/07391102.2023.2203257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 04/10/2023] [Indexed: 05/02/2023]

Zhang Y, Zhang P, Wu H. Enhancer-MDLF: a novel deep learning framework for identifying cell-specific enhancers. Brief Bioinform 2024;25:bbae083. [PMID: 38485768 PMCID: PMC10938904 DOI: 10.1093/bib/bbae083] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2023] [Revised: 01/27/2024] [Accepted: 02/07/2024] [Indexed: 03/18/2024] Open

Mehmood F, Arshad S, Shoaib M. ADH-Enhancer: an attention-based deep hybrid framework for enhancer identification and strength prediction. Brief Bioinform 2024;25:bbae030. [PMID: 38385876 PMCID: PMC10885011 DOI: 10.1093/bib/bbae030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/30/2023] [Accepted: 01/11/2024] [Indexed: 02/23/2024] Open

Liu R, Wang Q, Zhang X. Identification of prognostic coagulation-related signatures in clear cell renal cell carcinoma through integrated multi-omics analysis and machine learning. Comput Biol Med 2024;168:107779. [PMID: 38061153 DOI: 10.1016/j.compbiomed.2023.107779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 10/30/2023] [Accepted: 11/28/2023] [Indexed: 01/10/2024]

Li Z, Jin B, Fang J. MetaAc4C: A multi-module deep learning framework for accurate prediction of N4-acetylcytidine sites based on pre-trained bidirectional encoder representation and generative adversarial networks. Genomics 2024;116:110749. [PMID: 38008265 DOI: 10.1016/j.ygeno.2023.110749] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 11/05/2023] [Accepted: 11/21/2023] [Indexed: 11/28/2023]

Feng T, Hu T, Liu W, Zhang Y. Enhancer Recognition: A Transformer Encoder-Based Method with WGAN-GP for Data Augmentation. Int J Mol Sci 2023;24:17548. [PMID: 38139375 PMCID: PMC10743946 DOI: 10.3390/ijms242417548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 11/29/2023] [Accepted: 12/12/2023] [Indexed: 12/24/2023] Open

Abstract

Enhancers are located upstream or downstream of key deoxyribonucleic acid (DNA) sequences in genes and can adjust the transcription activity of neighboring genes. Identifying enhancers and determining their functions are important for understanding gene regulatory networks and expression regulatory mechanisms. However, traditional enhancer recognition relies on manual feature engineering, which is time-consuming and labor-intensive, making it difficult to perform large-scale recognition analysis. In addition, if the original dataset is too small, there is a risk of overfitting. In recent years, emerging methods, such as deep learning, have provided new insights for enhancing identification. However, these methods also present certain challenges. Deep learning models typically require a large amount of high-quality data, and data acquisition demands considerable time and resources. To address these challenges, in this paper, we propose a data-augmentation method based on generative adversarial networks to solve the problem of small datasets. Moreover, we used regularization methods such as weight decay to improve the generalizability of the model and alleviate overfitting. The Transformer encoder was used as the main component to capture the complex relationships and dependencies in enhancer sequences. The encoding layer was designed based on the principle of k-mers to preserve more information from the original DNA sequence. Compared with existing methods, the proposed approach made significant progress in enhancing the accuracy and strength of enhancer identification and prediction, demonstrating the effectiveness of the proposed method. This paper provides valuable insights for enhancer analysis and is of great significance for understanding gene regulatory mechanisms and studying disease correlations.

Collapse

Mir BA, Rehman MU, Tayara H, Chong KT. Improving Enhancer Identification with a Multi-Classifier Stacked Ensemble Model. J Mol Biol 2023;435:168314. [PMID: 37852600 DOI: 10.1016/j.jmb.2023.168314] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/06/2023] [Accepted: 10/11/2023] [Indexed: 10/20/2023]

Wang J, Zhang H, Chen N, Zeng T, Ai X, Wu K. PorcineAI-Enhancer: Prediction of Pig Enhancer Sequences Using Convolutional Neural Networks. Animals (Basel) 2023;13:2935. [PMID: 37760334 PMCID: PMC10526013 DOI: 10.3390/ani13182935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 08/21/2023] [Accepted: 09/05/2023] [Indexed: 09/29/2023] Open

Tang YJ, Yan K, Zhang X, Tian Y, Liu B. Protein intrinsically disordered region prediction by combining neural architecture search and multi-objective genetic algorithm. BMC Biol 2023;21:188. [PMID: 37674132 PMCID: PMC10483879 DOI: 10.1186/s12915-023-01672-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2023] [Accepted: 07/31/2023] [Indexed: 09/08/2023] Open

Wang W, Wu Q, Li C. iEnhancer-DCSA: identifying enhancers via dual-scale convolution and spatial attention. BMC Genomics 2023;24:393. [PMID: 37442977 DOI: 10.1186/s12864-023-09468-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 06/20/2023] [Indexed: 07/15/2023] Open

Phan LT, Oh C, He T, Manavalan B. A comprehensive revisit of the machine-learning tools developed for the identification of enhancers in the human genome. Proteomics 2023;23:e2200409. [PMID: 37021401 DOI: 10.1002/pmic.202200409] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 03/18/2023] [Accepted: 03/27/2023] [Indexed: 04/07/2023]

Grešová K, Martinek V, Čechák D, Šimeček P, Alexiou P. Genomic benchmarks: a collection of datasets for genomic sequence classification. BMC Genom Data 2023;24:25. [PMID: 37127596 PMCID: PMC10150520 DOI: 10.1186/s12863-023-01123-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Accepted: 03/31/2023] [Indexed: 05/03/2023] Open

Abstract

BACKGROUND

Recently, deep neural networks have been successfully applied in many biological fields. In 2020, a deep learning model AlphaFold won the protein folding competition with predicted structures within the error tolerance of experimental methods. However, this solution to the most prominent bioinformatic challenge of the past 50 years has been possible only thanks to a carefully curated benchmark of experimentally predicted protein structures. In Genomics, we have similar challenges (annotation of genomes and identification of functional elements) but currently, we lack benchmarks similar to protein folding competition.

RESULTS

Here we present a collection of curated and easily accessible sequence classification datasets in the field of genomics. The proposed collection is based on a combination of novel datasets constructed from the mining of publicly available databases and existing datasets obtained from published articles. The collection currently contains nine datasets that focus on regulatory elements (promoters, enhancers, open chromatin region) from three model organisms: human, mouse, and roundworm. A simple convolution neural network is also included in a repository and can be used as a baseline model. Benchmarks and the baseline model are distributed as the Python package 'genomic-benchmarks', and the code is available at https://github.com/ML-Bioinfo-CEITEC/genomic_benchmarks .

CONCLUSIONS

Deep learning techniques revolutionized many biological fields but mainly thanks to the carefully curated benchmarks. For the field of Genomics, we propose a collection of benchmark datasets for the classification of genomic sequences with an interface for the most commonly used deep learning libraries, implementation of the simple neural network and a training framework that can be used as a starting point for future research. The main aim of this effort is to create a repository for shared datasets that will make machine learning for genomics more comparable and reproducible while reducing the overhead of researchers who want to enter the field, leading to healthy competition and new discoveries.

Collapse

Ali F, Kumar H, Alghamdi W, Kateb FA, Alarfaj FK. Recent Advances in Machine Learning-Based Models for Prediction of Antiviral Peptides. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2023;30:1-12. [PMID: 37359746 PMCID: PMC10148704 DOI: 10.1007/s11831-023-09933-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/19/2023] [Indexed: 06/28/2023]

Wang C, Zou Q, Ju Y, Shi H. Enhancer-FRL: Improved and Robust Identification of Enhancers and Their Activities Using Feature Representation Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:967-975. [PMID: 36063523 DOI: 10.1109/tcbb.2022.3204365] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Zhu D, Yang W, Xu D, Li H, Zhao Y, Li D. A deep learning based two-layer predictor to identify enhancers and their strength. Methods 2023;211:23-30. [PMID: 36740001 DOI: 10.1016/j.ymeth.2023.01.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2022] [Revised: 01/03/2023] [Accepted: 01/30/2023] [Indexed: 02/05/2023] Open

Wu H, Liu M, Zhang P, Zhang H. iEnhancer-SKNN: a stacking ensemble learning-based method for enhancer identification and classification using sequence information. Brief Funct Genomics 2023;22:302-311. [PMID: 36715222 DOI: 10.1093/bfgp/elac057] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 12/01/2022] [Accepted: 12/13/2022] [Indexed: 01/31/2023] Open

Abstract

Enhancers, a class of distal cis-regulatory elements located in the non-coding region of DNA, play a key role in gene regulation. It is difficult to identify enhancers from DNA sequence data because enhancers are freely distributed in the non-coding region, with no specific sequence features, and having a long distance with the targeted promoters. Therefore, this study presents a stacking ensemble learning method to accurately identify enhancers and classify enhancers into strong and weak enhancers. Firstly, we obtain the fusion feature matrix by fusing the four features of Kmer, PseDNC, PCPseDNC and Z-Curve9. Secondly, five K-Nearest Neighbor (KNN) models with different parameters are trained as the base model, and the Logistic Regression algorithm is utilized as the meta-model. Thirdly, the stacking ensemble learning strategy is utilized to construct a two-layer model based on the base model and meta-model to train the preprocessed feature sets. The proposed method, named iEnhancer-SKNN, is a two-layer prediction model, in which the function of the first layer is to predict whether the given DNA sequences are enhancers or non-enhancers, and the function of the second layer is to distinguish whether the predicted enhancers are strong enhancers or weak enhancers. The performance of iEnhancer-SKNN is evaluated on the independent testing dataset and the results show that the proposed method has better performance in predicting enhancers and their strength. In enhancer identification, iEnhancer-SKNN achieves an accuracy of 81.75%, an improvement of 1.35% to 8.75% compared with other predictors, and in enhancer classification, iEnhancer-SKNN achieves an accuracy of 80.50%, an improvement of 5.5% to 25.5% compared with other predictors. Moreover, we identify key transcription factor binding site motifs in the enhancer regions and further explore the biological functions of the enhancers and these key motifs. Source code and data can be downloaded from https://github.com/HaoWuLab-Bioinformatics/iEnhancer-SKNN.

Collapse

Wang C, Zou Q. Prediction of protein solubility based on sequence physicochemical patterns and distributed representation information with DeepSoluE. BMC Biol 2023;21:12. [PMID: 36694239 PMCID: PMC9875434 DOI: 10.1186/s12915-023-01510-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 01/05/2023] [Indexed: 01/25/2023] Open

Li J, Wu Z, Lin W, Luo J, Zhang J, Chen Q, Chen J. iEnhancer-ELM: improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models. BIOINFORMATICS ADVANCES 2023;3:vbad043. [PMID: 37113248 PMCID: PMC10125906 DOI: 10.1093/bioadv/vbad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 02/04/2023] [Accepted: 03/24/2023] [Indexed: 04/29/2023]

Li Y, Kong F, Cui H, Wang F, Li C, Ma J. SENIES: DNA Shape Enhanced Two-Layer Deep Learning Predictor for the Identification of Enhancers and Their Strength. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:637-645. [PMID: 35015646 DOI: 10.1109/tcbb.2022.3142019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Jia J, Lei R, Qin L, Wu G, Wei X. iEnhancer-DCSV: Predicting enhancers and their strength based on DenseNet and improved convolutional block attention module. Front Genet 2023;14:1132018. [PMID: 36936423 PMCID: PMC10014624 DOI: 10.3389/fgene.2023.1132018] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 02/13/2023] [Indexed: 03/06/2023] Open

Liang Y, Ma X. iACP-GE: accurate identification of anticancer peptides by using gradient boosting decision tree and extra tree. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2023;34:1-19. [PMID: 36562289 DOI: 10.1080/1062936x.2022.2160011] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Accepted: 12/12/2022] [Indexed: 06/17/2023]

Yang TH, Yu YH, Wu SH, Zhang FY. CFA: An explainable deep learning model for annotating the transcriptional roles of cis-regulatory modules based on epigenetic codes. Comput Biol Med 2023;152:106375. [PMID: 36502693 DOI: 10.1016/j.compbiomed.2022.106375] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 11/07/2022] [Accepted: 11/27/2022] [Indexed: 11/30/2022]

Aladhadh S, Almatroodi SA, Habib S, Alabdulatif A, Khattak SU, Islam M. An Efficient Lightweight Hybrid Model with Attention Mechanism for Enhancer Sequence Recognition. Biomolecules 2022;13:biom13010070. [PMID: 36671456 PMCID: PMC9855522 DOI: 10.3390/biom13010070] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 12/22/2022] [Accepted: 12/26/2022] [Indexed: 12/31/2022] Open

Abstract

Enhancers are sequences with short motifs that exhibit high positional variability and free scattering properties. Identification of these noncoding DNA fragments and their strength are extremely important because they play a key role in controlling gene regulation on a cellular basis. The identification of enhancers is more complex than that of other factors in the genome because they are freely scattered, and their location varies widely. In recent years, bioinformatics tools have enabled significant improvement in identifying this biological difficulty. Cell line-specific screening is not possible using these existing computational methods based solely on DNA sequences. DNA segment chromatin accessibility may provide useful information about its potential function in regulation, thereby identifying regulatory elements based on its chromatin accessibility. In chromatin, the entanglement structure allows positions far apart in the sequence to encounter each other, regardless of their proximity to the gene to be acted upon. Thus, identifying enhancers and assessing their strength is difficult and time-consuming. The goal of our work was to overcome these limitations by presenting a convolutional neural network (CNN) with attention-gated recurrent units (AttGRU) based on Deep Learning. It used a CNN and one-hot coding to build models, primarily to identify enhancers and secondarily to classify their strength. To test the performance of the proposed model, parallels were drawn between enhancer-CNNAttGRU and existing state-of-the-art methods to enable comparisons. The proposed model performed the best for predicting stage one and stage two enhancer sequences, as well as their strengths, in a cross-species analysis, achieving best accuracy values of 87.39% and 84.46%, respectively. Overall, the results showed that the proposed model provided comparable results to state-of-the-art models, highlighting its usefulness.

Collapse

Genome-wide identification and characterization of DNA enhancers with a stacked multivariate fusion framework. PLoS Comput Biol 2022;18:e1010779. [PMID: 36520922 PMCID: PMC9836277 DOI: 10.1371/journal.pcbi.1010779] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 01/12/2023] [Accepted: 11/29/2022] [Indexed: 12/23/2022] Open

Abstract

Enhancers are short non-coding DNA sequences outside of the target promoter regions that can be bound by specific proteins to increase a gene's transcriptional activity, which has a crucial role in the spatiotemporal and quantitative regulation of gene expression. However, enhancers do not have a specific sequence motifs or structures, and their scattered distribution in the genome makes the identification of enhancers from human cell lines particularly challenging. Here we present a novel, stacked multivariate fusion framework called SMFM, which enables a comprehensive identification and analysis of enhancers from regulatory DNA sequences as well as their interpretation. Specifically, to characterize the hierarchical relationships of enhancer sequences, multi-source biological information and dynamic semantic information are fused to represent regulatory DNA enhancer sequences. Then, we implement a deep learning-based sequence network to learn the feature representation of the enhancer sequences comprehensively and to extract the implicit relationships in the dynamic semantic information. Ultimately, an ensemble machine learning classifier is trained based on the refined multi-source features and dynamic implicit relations obtained from the deep learning-based sequence network. Benchmarking experiments demonstrated that SMFM significantly outperforms other existing methods using several evaluation metrics. In addition, an independent test set was used to validate the generalization performance of SMFM by comparing it to other state-of-the-art enhancer identification methods. Moreover, we performed motif analysis based on the contribution scores of different bases of enhancer sequences to the final identification results. Besides, we conducted interpretability analysis of the identified enhancer sequences based on attention weights of EnhancerBERT, a fine-tuned BERT model that provides new insights into exploring the gene semantic information likely to underlie the discovered enhancers in an interpretable manner. Finally, in a human placenta study with 4,562 active distal gene regulatory enhancers, SMFM successfully exposed tissue-related placental development and the differential mechanism, demonstrating the generalizability and stability of our proposed framework.

Collapse

iEnhancer-MRBF: Identifying enhancers and their strength with a multiple Laplacian-regularized radial basis function network. Methods 2022;208:1-8. [DOI: 10.1016/j.ymeth.2022.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Revised: 09/26/2022] [Accepted: 10/03/2022] [Indexed: 11/07/2022] Open

Liao M, Zhao JP, Tian J, Zheng CH. iEnhancer-DCLA: using the original sequence to identify enhancers and their strength based on a deep learning framework. BMC Bioinformatics 2022;23:480. [PMCID: PMC9664816 DOI: 10.1186/s12859-022-05033-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 11/02/2022] [Indexed: 11/16/2022] Open

Cui Z, Chen ZH, Zhang QH, Gribova V, Filaretov VF, Huang DS. RMSCNN: A Random Multi-Scale Convolutional Neural Network for Marine Microbial Bacteriocins Identification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:3663-3672. [PMID: 34699364 DOI: 10.1109/tcbb.2021.3122183] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Liu S, Xu X, Yang Z, Zhao X, Liu S, Zhang W. EPIHC: Improving Enhancer-Promoter Interaction Prediction by Using Hybrid Features and Communicative Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:3435-3443. [PMID: 34473626 DOI: 10.1109/tcbb.2021.3109488] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Butt AH, Alkhalifah T, Alturise F, Khan YD. A machine learning technique for identifying DNA enhancer regions utilizing CIS-regulatory element patterns. Sci Rep 2022;12:15183. [PMID: 36071071 PMCID: PMC9452539 DOI: 10.1038/s41598-022-19099-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Accepted: 08/24/2022] [Indexed: 11/26/2022] Open

Cross-species enhancer prediction using machine learning. Genomics 2022;114:110454. [PMID: 36030022 DOI: 10.1016/j.ygeno.2022.110454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 07/28/2022] [Accepted: 08/16/2022] [Indexed: 11/21/2022]

Zeng L, Liu Y, Yu ZG, Liu Y. iEnhancer-DLRA: identification of enhancers and their strengths by a self-attention fusion strategy for local and global features. Brief Funct Genomics 2022;21:399-407. [PMID: 35942693 DOI: 10.1093/bfgp/elac023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/30/2022] [Accepted: 07/12/2022] [Indexed: 11/14/2022] Open

Njoroge H, van't Hof A, Oruni A, Pipini D, Nagi S, Lynd A, Lucas ER, Tomlinson S, Grau‐Bove X, McDermott D, Wat'senga FT, Manzambi EZ, Agossa FR, Mokuba A, Irish S, Kabula B, Mbogo C, Bargul J, Paine MJI, Weetman D, Donnelly MJ. Identification of a rapidly-spreading triple mutant for high-level metabolic insecticide resistance in Anopheles gambiae provides a real-time molecular diagnostic for antimalarial intervention deployment. Mol Ecol 2022;31:4307-4318. [PMID: 35775282 PMCID: PMC9424592 DOI: 10.1111/mec.16591] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Revised: 06/07/2022] [Accepted: 06/27/2022] [Indexed: 12/01/2022]

Abstract

Studies of insecticide resistance provide insights into the capacity of populations to show rapid evolutionary responses to contemporary selection. Malaria control remains heavily dependent on pyrethroid insecticides, primarily in long lasting insecticidal nets (LLINs). Resistance in the major malaria vectors has increased in concert with the expansion of LLIN distributions. Identifying genetic mechanisms underlying high-level resistance is crucial for the development and deployment of resistance-breaking tools. Using the Anopheles gambiae 1000 genomes (Ag1000g) data we identified a very recent selective sweep in mosquitoes from Uganda which localized to a cluster of cytochrome P450 genes. Further interrogation revealed a haplotype involving a trio of mutations, a nonsynonymous point mutation in Cyp6p4 (I236M), an upstream insertion of a partial Zanzibar-like transposable element (TE) and a duplication of the Cyp6aa1 gene. The mutations appear to have originated recently in An. gambiae from the Kenya-Uganda border, with stepwise replacement of the double-mutant (Zanzibar-like TE and Cyp6p4-236 M) with the triple-mutant haplotype (including Cyp6aa1 duplication), which has spread into the Democratic Republic of Congo and Tanzania. The triple-mutant haplotype is strongly associated with increased expression of genes able to metabolize pyrethroids and is strongly predictive of resistance to pyrethroids most notably deltamethrin. Importantly, there was increased mortality in mosquitoes carrying the triple-mutation when exposed to nets cotreated with the synergist piperonyl butoxide (PBO). Frequencies of the triple-mutant haplotype remain spatially variable within countries, suggesting an effective marker system to guide deployment decisions for limited supplies of PBO-pyrethroid cotreated LLINs across African countries.

Collapse

Affiliation(s)

Harun Njoroge Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK Kenya Medical Research Institute (KEMRI) Centre for Geographic Medicine CoastKEMRI‐Wellcome Trust Research ProgrammeKilifiKenya
Arjen van't Hof Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK
Ambrose Oruni Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK College of Veterinary MedicineAnimal Resources and Bio‐securityMakerere UniversityKampalaUganda
Dimitra Pipini Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK
Sanjay C. Nagi Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK
Amy Lynd Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK
Eric R. Lucas Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK
Sean Tomlinson Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK
Xavi Grau‐Bove Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK
Daniel McDermott Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK
Francis T. Wat'senga Institut National de Recherche BiomédicaleKinshasaDemocratic Republic of Congo
Emile Z. Manzambi Institut National de Recherche BiomédicaleKinshasaDemocratic Republic of Congo
Fiacre R. Agossa USAID President's Malaria Initiative, VectorLink Project, Abt AssociatesRockvilleMarylandUSA
Arlette Mokuba USAID President's Malaria Initiative, VectorLink Project, Abt AssociatesRockvilleMarylandUSA
Seth Irish U.S. President's Malaria Initiative and Centers for Disease Control and PreventionAtlantaGeorgiaUSA
Bilali Kabula Amani Research CentreNational Institute for Medical ResearchTanzania
Charles Mbogo Population Health UnitKEMRI‐Wellcome Trust Research ProgrammeNairobiKenya KEMRI‐Centre for Geographic Medicine Research CoastKilifiKenya
Joel Bargul Department of BiochemistryJomo Kenyatta University of Agriculture and TechnologyJujaKenya The Animal Health DepartmentInternational Centre of Insect Physiology and EcologyNairobiKenya
Mark J. I. Paine Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK
David Weetman Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK
Martin J. Donnelly Department of Vector BiologyLiverpool School of Tropical MedicineLiverpoolUK Parasites and Microbes ProgrammeWellcome Sanger InstituteCambridgeUK

Collapse

Huang G, Luo W, Zhang G, Zheng P, Yao Y, Lyu J, Liu Y, Wei DQ. Enhancer-LSTMAtt: A Bi-LSTM and Attention-Based Deep Learning Method for Enhancer Recognition. Biomolecules 2022;12:biom12070995. [PMID: 35883552 PMCID: PMC9313278 DOI: 10.3390/biom12070995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 07/03/2022] [Accepted: 07/07/2022] [Indexed: 01/27/2023] Open

An Effective Deep Learning-Based Architecture for Prediction of N7-Methylguanosine Sites in Health Systems. ELECTRONICS 2022. [DOI: 10.3390/electronics11121917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]

Gao Y, Chen Y, Feng H, Zhang Y, Yue Z. RicENN: Prediction of Rice Enhancers with Neural Network Based on DNA Sequences. Interdiscip Sci 2022;14:555-565. [PMID: 35190950 DOI: 10.1007/s12539-022-00503-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 01/07/2022] [Accepted: 01/18/2022] [Indexed: 01/22/2023]

Abstract

Enhancers are the primary cis-elements of transcriptional regulation and play a vital role in gene expression at different stages of plant growth and development. Having high locational variation and free scattering in non-encoding genomes, identification of enhancers is a crucial, but challenging work in understanding the biological mechanism of model plants. Recently, applications of neural network models are gaining increasing popularity in predicting the function of genomic elements. Although several computational models have shown great advantages to tackle this challenge, a further study of the identification of rice enhancers from DNA sequences is still lacking. We present RicENN, a novel deep learning framework capable of accurately identifying enhancers of rice, integrating convolution neural networks (CNNs), bi-directional recurrent neural networks (RNNs), and attention mechanisms. A combined-feature representation method was designed to extract the sequence features from original DNA sequences using six types of autocorrelation encodings. Moreover, we verified that the integrated model achieves the best performance by an ablation study. Finally, our deep learning framework realized a reliable prediction of the rice enhancers. The results show RicENN outperforms available alternative approaches in rice species, achieving the area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) of 0.960 and 0.960 on cross-validation, and 0.879 and 0.877 during independent tests, respectively. This study develops a hybrid model to combine the merits of different neural network architectures, which shows the potential ability to apply deep learning in bioinformatic sequences and contributes to the acceleration of functional genomic studies of rice. RicENN and its code are freely accessible at http://bioinfor.aielab.cc/RicENN/ .

Collapse

Geng Q, Yang R, Zhang L. A deep learning framework for enhancer prediction using word embedding and sequence generation. Biophys Chem 2022;286:106822. [DOI: 10.1016/j.bpc.2022.106822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 04/21/2022] [Accepted: 04/29/2022] [Indexed: 11/28/2022]

Identifying and Classifying Enhancers by Dinucleotide-Based Auto-Cross Covariance and Attention-Based Bi-LSTM. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022;2022:7518779. [PMID: 35422876 PMCID: PMC9005296 DOI: 10.1155/2022/7518779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/12/2022] [Indexed: 11/17/2022]

Amilpur S, Bhukya R. A sequence-based two-layer predictor for identifying enhancers and their strength through enhanced feature extraction. J Bioinform Comput Biol 2022;20:2250005. [PMID: 35264081 DOI: 10.1142/s0219720022500056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Shen Z, Zhang Q, Han K, Huang DS. A Deep Learning Model for RNA-Protein Binding Preference Prediction Based on Hierarchical LSTM and Attention Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:753-762. [PMID: 32750884 DOI: 10.1109/tcbb.2020.3007544] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

iEnhancer-Deep: A Computational Predictor for Enhancer Sites and Their Strength Using Deep Learning. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12042120] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]

Zhu Z, Ge S, Cai Z, Wu Y, Lu C, Zhang Z, Fu P, Mao L, Wu X, Peng Y. Systematic identification and characterization of repeat sequences in African swine fever virus genomes. Vet Res 2022;53:101. [PMID: 36461107 PMCID: PMC9717548 DOI: 10.1186/s13567-022-01119-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 07/19/2022] [Indexed: 12/03/2022] Open

Affiliation(s)

Zhaozhong Zhu grid.67293.39Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082 China
Shengqiang Ge grid.414245.20000 0004 6063 681XChina Animal Health and Epidemiology Center, Qingdao, 266032 China ,3grid.418524.e0000 0004 0369 6250Key Laboratory of Animal Biosafety Risk Prevention and Control (South), Ministry of Agriculture and Rural Affairs, Qingdao, China
Zena Cai grid.67293.39Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082 China
Yifan Wu grid.67293.39Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082 China
Congyu Lu grid.67293.39Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082 China
Zheng Zhang grid.67293.39Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082 China
Ping Fu grid.67293.39Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082 China
Longfei Mao grid.67293.39Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082 China
Xiaodong Wu grid.414245.20000 0004 6063 681XChina Animal Health and Epidemiology Center, Qingdao, 266032 China ,3grid.418524.e0000 0004 0369 6250Key Laboratory of Animal Biosafety Risk Prevention and Control (South), Ministry of Agriculture and Rural Affairs, Qingdao, China
Yousong Peng grid.67293.39Bioinformatics Center, College of Biology, Hunan Provincial Key Laboratory of Medical Virology, Hunan University, Changsha, 410082 China

Collapse

Wang C, Ju Y, Zou Q, Lin C. DeepAc4C: a convolutional neural network model with hybrid features composed of physicochemical patterns and distributed representation information for identification of N4-acetylcytidine in mRNA. Bioinformatics 2021;38:52-57. [PMID: 34427581 DOI: 10.1093/bioinformatics/btab611] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 08/17/2021] [Accepted: 08/20/2021] [Indexed: 02/03/2023] Open

iDHS-DT: Identifying DNase I hypersensitive sites by integrating DNA dinucleotide and trinucleotide information. Biophys Chem 2021;281:106717. [PMID: 34798459 DOI: 10.1016/j.bpc.2021.106717] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/10/2021] [Accepted: 11/10/2021] [Indexed: 01/02/2023]

Lyu Y, Zhang Z, Li J, He W, Ding Y, Guo F. iEnhancer-KL: A Novel Two-Layer Predictor for Identifying Enhancers by Position Specific of Nucleotide Composition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:2809-2815. [PMID: 33481715 DOI: 10.1109/tcbb.2021.3053608] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Kang Q, Meng J, Su C, Luan Y. Mining plant endogenous target mimics from miRNA-lncRNA interactions based on dual-path parallel ensemble pruning method. Brief Bioinform 2021;23:6399881. [PMID: 34662389 DOI: 10.1093/bib/bbab440] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Revised: 09/07/2021] [Accepted: 09/24/2021] [Indexed: 12/14/2022] Open