1
|
Ahmed F, Sharma A, Shatabda S, Dehzangi I. DeepPhoPred: Accurate Deep Learning Model to Predict Microbial Phosphorylation. Proteins 2025; 93:465-481. [PMID: 39239684 DOI: 10.1002/prot.26734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 06/27/2024] [Accepted: 07/15/2024] [Indexed: 09/07/2024]
Abstract
Phosphorylation is a substantial posttranslational modification of proteins that refers to adding a phosphate group to the amino acid side chain after translation process in the ribosome. It is vital to coordinate cellular functions, such as regulating metabolism, proliferation, apoptosis, subcellular trafficking, and other crucial physiological processes. Phosphorylation prediction in a microbial organism can assist in understanding pathogenesis and host-pathogen interaction, drug and antibody design, and antimicrobial agent development. Experimental methods for predicting phosphorylation sites are costly, slow, and tedious. Hence low-cost and high-speed computational approaches are highly desirable. This paper presents a new deep learning tool called DeepPhoPred for predicting microbial phospho-serine (pS), phospho-threonine (pT), and phospho-tyrosine (pY) sites. DeepPhoPred incorporates a two-headed convolutional neural network architecture with the squeeze and excitation blocks followed by fully connected layers that jointly learn significant features from the peptide's structural and evolutionary information to predict phosphorylation sites. Our empirical results demonstrate that DeepPhoPred significantly outperforms the existing microbial phosphorylation site predictors with its highly efficient deep-learning architecture. DeepPhoPred as a standalone predictor, all its source codes, and our employed datasets are publicly available at https://github.com/faisalahm3d/DeepPhoPred.
Collapse
Affiliation(s)
- Faisal Ahmed
- Digital Health Unit, NVISION Systems and Technologies SL, Barcelona, Spain
- Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili, Tarragona, Spain
| | - Alok Sharma
- Laboratory of Medical Science Mathematics, Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Queensland, Australia
- College of Informatics, Korea University, Seoul, South Korea
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Japan
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, BRAC University, Dhaka, Bangladesh
| | - Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, New Jersey, USA
- Center for Computational and Integrative Biology (CCIB), Rutgers University, Camden, New Jersey, USA
| |
Collapse
|
2
|
Zeng R, Li Z, Li J, Zhang Q. DNA promoter task-oriented dictionary mining and prediction model based on natural language technology. Sci Rep 2025; 15:153. [PMID: 39747934 PMCID: PMC11697570 DOI: 10.1038/s41598-024-84105-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 12/19/2024] [Indexed: 01/04/2025] Open
Abstract
Promoters are essential DNA sequences that initiate transcription and regulate gene expression. Precisely identifying promoter sites is crucial for deciphering gene expression patterns and the roles of gene regulatory networks. Recent advancements in bioinformatics have leveraged deep learning and natural language processing (NLP) to enhance promoter prediction accuracy. Techniques such as convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and BERT models have been particularly impactful. However, current approaches often rely on arbitrary DNA sequence segmentation during BERT pre-training, which may not yield optimal results. To overcome this limitation, this article introduces a novel DNA sequence segmentation method. This approach develops a more refined dictionary for DNA sequences, utilizes it for BERT pre-training, and employs an Inception neural network as the foundational model. This BERT-Inception architecture captures information across multiple granularities. Experimental results show that the model improves the performance of several downstream tasks and introduces deep learning interpretability, providing new perspectives for interpreting and understanding DNA sequence information. The detailed source code is available at https://github.com/katouMegumiH/Promoter_BERT .
Collapse
Affiliation(s)
- Ruolei Zeng
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Zihan Li
- National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, No.11 Fucheng Road, Beijing, 100048, China.
| | - Jialu Li
- National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, No.11 Fucheng Road, Beijing, 100048, China
| | - Qingchuan Zhang
- National Engineering Research Centre for Agri-Product Quality Traceability, Beijing Technology and Business University, No.11 Fucheng Road, Beijing, 100048, China.
| |
Collapse
|
3
|
Kindel F, Triesch S, Schlüter U, Randarevitch LA, Reichel-Deland V, Weber APM, Denton AK. Predmoter-cross-species prediction of plant promoter and enhancer regions. BIOINFORMATICS ADVANCES 2024; 4:vbae074. [PMID: 38841126 PMCID: PMC11150885 DOI: 10.1093/bioadv/vbae074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 04/10/2024] [Accepted: 05/22/2024] [Indexed: 06/07/2024]
Abstract
Motivation Identifying cis-regulatory elements (CREs) is crucial for analyzing gene regulatory networks. Next generation sequencing methods were developed to identify CREs but represent a considerable expenditure for targeted analysis of few genomic loci. Thus, predicting the outputs of these methods would significantly cut costs and time investment. Results We present Predmoter, a deep neural network that predicts base-wise Assay for Transposase Accessible Chromatin using sequencing (ATAC-seq) and histone Chromatin immunoprecipitation DNA-sequencing (ChIP-seq) read coverage for plant genomes. Predmoter uses only the DNA sequence as input. We trained our final model on 21 species for 13 of which ATAC-seq data and for 17 of which ChIP-seq data was publicly available. We evaluated our models on Arabidopsis thaliana and Oryza sativa. Our best models showed accurate predictions in peak position and pattern for ATAC- and histone ChIP-seq. Annotating putatively accessible chromatin regions provides valuable input for the identification of CREs. In conjunction with other in silico data, this can significantly reduce the search space for experimentally verifiable DNA-protein interaction pairs. Availability and implementation The source code for Predmoter is available at: https://github.com/weberlab-hhu/Predmoter. Predmoter takes a fasta file as input and outputs h5, and optionally bigWig and bedGraph files.
Collapse
Affiliation(s)
- Felicitas Kindel
- Institute of Plant Biochemistry, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
| | - Sebastian Triesch
- Institute of Plant Biochemistry, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Germany
| | - Urte Schlüter
- Institute of Plant Biochemistry, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
| | - Laura Alexandra Randarevitch
- Cluster of Excellence on Plant Sciences (CEPLAS), Germany
- Institute of Population Genetics, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
| | - Vanessa Reichel-Deland
- Institute of Plant Biochemistry, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
| | - Andreas P M Weber
- Institute of Plant Biochemistry, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Germany
| | - Alisandra K Denton
- Institute of Plant Biochemistry, Math.-Nat. Faculty, Heinrich Heine University, Düsseldorf 40225, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS), Germany
- Valence Labs, Montréal, Québec H2S 3H1, Canada
| |
Collapse
|
4
|
Villao-Uzho L, Chávez-Navarrete T, Pacheco-Coello R, Sánchez-Timm E, Santos-Ordóñez E. Plant Promoters: Their Identification, Characterization, and Role in Gene Regulation. Genes (Basel) 2023; 14:1226. [PMID: 37372407 DOI: 10.3390/genes14061226] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2023] [Revised: 05/17/2023] [Accepted: 05/30/2023] [Indexed: 06/29/2023] Open
Abstract
One of the strategies to overcome diseases or abiotic stress in crops is the use of improved varieties. Genetic improvement could be accomplished through different methods, including conventional breeding, induced mutation, genetic transformation, or gene editing. The gene function and regulated expression through promoters are necessary for transgenic crops to improve specific traits. The variety of promoter sequences has increased in the generation of genetically modified crops because they could lead to the expression of the gene responsible for the improved trait in a specific manner. Therefore, the characterization of the promoter activity is necessary for the generation of biotechnological crops. That is why several analyses have focused on identifying and isolating promoters using techniques such as reverse transcriptase-polymerase chain reaction (RT-PCR), genetic libraries, cloning, and sequencing. Promoter analysis involves the plant genetic transformation method, a potent tool for determining the promoter activity and function of genes in plants, contributing to understanding gene regulation and plant development. Furthermore, the study of promoters that play a fundamental role in gene regulation is highly relevant. The study of regulation and development in transgenic organisms has made it possible to understand the benefits of directing gene expression in a temporal, spatial, and even controlled manner, confirming the great diversity of promoters discovered and developed. Therefore, promoters are a crucial tool in biotechnological processes to ensure the correct expression of a gene. This review highlights various types of promoters and their functionality in the generation of genetically modified crops.
Collapse
Affiliation(s)
- Liliana Villao-Uzho
- Biotechnological Research Center of Ecuador, ESPOL Polytechnic University, Escuela Superior Politécnica del Litoral, ESPOL, Gustavo Galindo Campus Km. 30.5 Vía Perimetral, Guayaquil 090902, Ecuador
| | - Tatiana Chávez-Navarrete
- Biotechnological Research Center of Ecuador, ESPOL Polytechnic University, Escuela Superior Politécnica del Litoral, ESPOL, Gustavo Galindo Campus Km. 30.5 Vía Perimetral, Guayaquil 090902, Ecuador
| | - Ricardo Pacheco-Coello
- Biotechnological Research Center of Ecuador, ESPOL Polytechnic University, Escuela Superior Politécnica del Litoral, ESPOL, Gustavo Galindo Campus Km. 30.5 Vía Perimetral, Guayaquil 090902, Ecuador
| | - Eduardo Sánchez-Timm
- Biotechnological Research Center of Ecuador, ESPOL Polytechnic University, Escuela Superior Politécnica del Litoral, ESPOL, Gustavo Galindo Campus Km. 30.5 Vía Perimetral, Guayaquil 090902, Ecuador
- Faculty of Life Sciences, ESPOL Polytechnic University, Escuela Superior Politécnica del Litoral, ESPOL, Gustavo Galindo Campus Km. 30.5 Vía Perimetral, Guayaquil 090902, Ecuador
| | - Efrén Santos-Ordóñez
- Biotechnological Research Center of Ecuador, ESPOL Polytechnic University, Escuela Superior Politécnica del Litoral, ESPOL, Gustavo Galindo Campus Km. 30.5 Vía Perimetral, Guayaquil 090902, Ecuador
- Faculty of Life Sciences, ESPOL Polytechnic University, Escuela Superior Politécnica del Litoral, ESPOL, Gustavo Galindo Campus Km. 30.5 Vía Perimetral, Guayaquil 090902, Ecuador
| |
Collapse
|
5
|
Shujaat M, Kim H, Tayara H, Chong KT. iProm-Sigma54: A CNN Base Prediction Tool for σ54 Promoters. Cells 2023; 12:cells12060829. [PMID: 36980170 PMCID: PMC10047130 DOI: 10.3390/cells12060829] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Revised: 02/23/2023] [Accepted: 02/23/2023] [Indexed: 03/11/2023] Open
Abstract
The sigma (σ) factor of RNA holoenzymes is essential for identifying and binding to promoter regions during gene transcription in prokaryotes. σ54 promoters carried out various ancillary methods and environmentally responsive procedures; therefore, it is crucial to accurately identify σ54 promoter sequences to comprehend the underlying process of gene regulation. Herein, we come up with a convolutional neural network (CNN) based prediction tool named “iProm-Sigma54” for the prediction of σ54 promoters. The CNN consists of two one-dimensional convolutional layers, which are followed by max pooling layers and dropout layers. A one-hot encoding scheme was used to extract the input matrix. To determine the prediction performance of iProm-Sigma54, we employed four assessment metrics and five-fold cross-validation; performance was measured using a benchmark and test dataset. According to the findings of this comparison, iProm-Sigma54 outperformed existing methodologies for identifying σ54 promoters. Additionally, a publicly accessible web server was constructed.
Collapse
Affiliation(s)
- Muhammad Shujaat
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Hoonjoo Kim
- School of Pharmacy, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Correspondence: (H.K.); (K.T.C.)
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Correspondence: (H.K.); (K.T.C.)
| |
Collapse
|
6
|
Shujaat M, Jin JS, Tayara H, Chong KT. iProm-phage: A two-layer model to identify phage promoters and their types using a convolutional neural network. Front Microbiol 2022; 13:1061122. [PMID: 36406389 PMCID: PMC9672459 DOI: 10.3389/fmicb.2022.1061122] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2022] [Accepted: 10/18/2022] [Indexed: 04/26/2024] Open
Abstract
The increased interest in phages as antibacterial agents has resulted in a rise in the number of sequenced phage genomes, necessitating the development of user-friendly bioinformatics tools for genome annotation. A promoter is a DNA sequence that is used in the annotation of phage genomes. In this study we proposed a two layer model called "iProm-phage" for the prediction and classification of phage promoters. Model first layer identify query sequence as promoter or non-promoter and if the query sequence is predicted as promoter then model second layer classify it as phage or host promoter. Furthermore, rather than using non-coding regions of the genome as a negative set, we created a more challenging negative dataset using promoter sequences. The presented approach improves discrimination while decreasing the frequency of erroneous positive predictions. For feature selection, we investigated 10 distinct feature encoding approaches and utilized them with several machine-learning algorithms and a 1-D convolutional neural network model. We discovered that the one-hot encoding approach and the CNN model outperformed based on performance metrics. Based on the results of the 5-fold cross validation, the proposed predictor has a high potential. Furthermore, to make it easier for other experimental scientists to obtain the results they require, we set up a freely accessible and user-friendly web server at http://nsclbio.jbnu.ac.kr/tools/iProm-phage/.
Collapse
Affiliation(s)
- Muhammad Shujaat
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, South Korea
| | - Joe Sung Jin
- Graduate School of Integrated Energy AI, Jeonbuk National University, Jeonju, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, South Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, South Korea
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju, South Korea
| |
Collapse
|
7
|
ProB-Site: Protein Binding Site Prediction Using Local Features. Cells 2022; 11:cells11132117. [PMID: 35805201 PMCID: PMC9266162 DOI: 10.3390/cells11132117] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Revised: 06/30/2022] [Accepted: 07/01/2022] [Indexed: 01/16/2023] Open
Abstract
Protein–protein interactions (PPIs) are responsible for various essential biological processes. This information can help develop a new drug against diseases. Various experimental methods have been employed for this purpose; however, their application is limited by their cost and time consumption. Alternatively, computational methods are considered viable means to achieve this crucial task. Various techniques have been explored in the literature using the sequential information of amino acids in a protein sequence, including machine learning and deep learning techniques. The current efficiency of interaction-site prediction still has growth potential. Hence, a deep neural network-based model, ProB-site, is proposed. ProB-site utilizes sequential information of a protein to predict its binding sites. The proposed model uses evolutionary information and predicted structural information extracted from sequential information of proteins, generating three unique feature sets for every amino acid in a protein sequence. Then, these feature sets are fed to their respective sub-CNN architecture to acquire complex features. Finally, the acquired features are concatenated and classified using fully connected layers. This methodology performed better than state-of-the-art techniques because of the selection of the best features and contemplation of local information of each amino acid.
Collapse
|