1
|
Lei R, Jia J, Qin L, Wei X. iPro2L-DG: Hybrid network based on improved densenet and global attention mechanism for identifying promoter sequences. Heliyon 2024; 10:e27364. [PMID: 38510021 PMCID: PMC10950492 DOI: 10.1016/j.heliyon.2024.e27364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/24/2024] [Accepted: 02/28/2024] [Indexed: 03/22/2024] Open
Abstract
The promoter is a key DNA sequence whose primary function is to control the initiation time and the degree of expression of gene transcription. Accurate identification of promoters is essential for understanding gene expression studies. Traditional sequencing techniques for identifying promoters are costly and time-consuming. Therefore, the development of computational methods to identify promoters has become critical. Since deep learning methods show great potential in identifying promoters, this study proposes a new promoter prediction model, called iPro2L-DG. The iPro2L-DG predictor, based on an improved Densely Connected Convolutional Network (DenseNet) and a Global Attention Mechanism (GAM), is constructed to achieve the prediction of promoters. The promoter sequences are combined feature encoding using C2 encoding and nucleotide chemical property (NCP) encoding. An improved DenseNet extracts advanced feature information from the combined feature encoding. GAM evaluates the importance of advanced feature information in terms of channel and spatial dimensions, and finally uses a Full Connect Neural Network (FNN) to derive prediction probabilities. The experimental results showed that the accuracy of iPro2L-DG in the first layer (promoter identification) was 94.10% with Matthews correlation coefficient value of 0.8833. In the second layer (promoter strength prediction), the accuracy was 89.42% with Matthews correlation coefficient value of 0.7915. The iPro2L-DG predictor significantly outperforms other existing predictors in promoter identification and promoter strength prediction. Therefore, our proposed model iPro2L-DG is the most advanced promoter prediction tool. The source code of the iPro2L-DG model can be found in https://github.com/leirufeng/iPro2L-DG.
Collapse
Affiliation(s)
- Rufeng Lei
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Lulu Qin
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen, 333403, China
| | - Xin Wei
- Business School, Jiangxi Institute of Fashion Technology, Nanchang, 330044, China
| |
Collapse
|
2
|
Hassan SS, Bhattacharya T, Nawn D, Jha I, Basu P, Redwan EM, Lundstrom K, Barh D, Andrade BS, Tambuwala MM, Aljabali AA, Hromić-Jahjefendić A, Baetas-da-Cruz W, Serrano-Aroca Á, Uversky VN. SARS-CoV-2 NSP14 governs mutational instability and assists in making new SARS-CoV-2 variants. Comput Biol Med 2024; 170:107899. [PMID: 38232455 DOI: 10.1016/j.compbiomed.2023.107899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2023] [Revised: 12/03/2023] [Accepted: 12/23/2023] [Indexed: 01/19/2024]
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the rapidly evolving RNA virus behind the COVID-19 pandemic, has spawned numerous variants since its 2019 emergence. The multifunctional Nonstructural protein 14 (NSP14) enzyme, possessing exonuclease and messenger RNA (mRNA) capping capabilities, serves as a key player. Notably, single and co-occurring mutations within NSP14 significantly influence replication fidelity and drive variant diversification. This study comprehensively examines 120 co-mutations, 68 unique mutations, and 160 conserved residues across NSP14 homologs, shedding light on their implications for phylogenetic patterns, pathogenicity, and residue interactions. Quantitative physicochemical analysis categorizes 3953 NSP14 variants into three clusters, revealing genetic diversity. This research underscoresthe dynamic nature of SARS-CoV-2 evolution, primarily governed by NSP14 mutations. Understanding these genetic dynamics provides valuable insights for therapeutic and vaccine development.
Collapse
Affiliation(s)
- Sk Sarif Hassan
- Department of Mathematics, Pingla Thana Mahavidyalaya, Maligram, Paschim Medinipur, 721140, West Bengal, India.
| | - Tanishta Bhattacharya
- Department of Biological Sciences, Indian Institute of Science Education and Research, Berhampur, IISER Berhampur Transit campus (Govt. ITI Building), Engg. School Junction, Berhampur, 760010, Odisha, India.
| | - Debaleena Nawn
- Indian Research Institute for Integrated Medicine (IRIIM), Unsani, Howrah, 711302, West Bengal, India.
| | - Ishana Jha
- Department of Bioinformatics, Pondicherry University, Chinna Kalapet, Kalapet, Puducherry 605014, India.
| | - Pallab Basu
- School of Physics, University of the Witwatersrand, Johannesburg, Braamfontein 2000, 721140, South Africa; Adjunct Faculty, Woxsen School of Sciences, Woxsen University, Telangana, 500 033, India.
| | - Elrashdy M Redwan
- Biological Science Department, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia; Therapeutic and Protective Proteins Laboratory, Protein Research Department, Genetic Engineering and Biotechnology Research Institute, City of Scientific Research and Technological Applications, New Borg EL-Arab, 21934, Alexandria, Egypt.
| | | | - Debmalya Barh
- Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba Medinipur, 721172, India; Department of Genetics, Ecology and Evolution, Institute of Biological Sciences, Federal University of Minas Gerais, Belo Horizonte, 31270-901, Brazil.
| | - Bruno Silva Andrade
- Laboratory of Bioinformatics and Computational Chemistry, Department of Biological Sciences, State University of Southwest of Bahia (UESB), Jequié 45083-900, Brazil.
| | - Murtaza M Tambuwala
- Lincoln Medical School, University of Lincoln, Brayford Pool Campus, Lincoln LN6 7TS, UK; College of Pharmacy, Ras Al Khaimah Medical and Health Sciences University, Ras Al Khaimah, United Arab Emirates.
| | - Alaa A Aljabali
- Department of Pharmaceutics and Pharmaceutical Technology, Faculty of Pharmacy, Yarmouk University, Irbid 21163, Jordan.
| | - Altijana Hromić-Jahjefendić
- Department of Genetics and Bioengineering, Faculty of Engineering and Natural Sciences, International University of Sarajevo, Hrasnicka cesta 15, 71000 Sarajevo, Bosnia and Herzegovina.
| | - Wagner Baetas-da-Cruz
- Centre for Experimental Surgery, Translational Laboratory in Molecular Physiology, College of Medicine, Federal University of Rio de Janeiro (UFRJ), Rio de Janeiro, Brazil.
| | - Ángel Serrano-Aroca
- Biomaterials and Bioengineering Lab, Centro de Investigación Traslacional San Alberto Magno, Universidad Católica de Valencia San Vicente Mártir, c/Guillem de Castro 94, 46001 Valencia, Spain.
| | - Vladimir N Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA.
| |
Collapse
|
3
|
Zhang J, Wang R, Wei L. MucLiPred: Multi-Level Contrastive Learning for Predicting Nucleic Acid Binding Residues of Proteins. J Chem Inf Model 2024; 64:1050-1065. [PMID: 38301174 DOI: 10.1021/acs.jcim.3c01471] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Protein-molecule interactions play a crucial role in various biological functions, with their accurate prediction being pivotal for drug discovery and design processes. Traditional methods for predicting protein-molecule interactions are limited. Some can only predict interactions with a specific molecule, restricting their applicability, while others target multiple molecule types but fail to efficiently process diverse interaction information, leading to complexity and inefficiency. This study presents a novel deep learning model, MucLiPred, equipped with a dual contrastive learning mechanism aimed at improving the prediction of multiple molecule-protein interactions and the identification of potential molecule-binding residues. The residue-level paradigm focuses on differentiating binding from non-binding residues, illuminating detailed local interactions. The type-level paradigm, meanwhile, analyzes overarching contexts of molecule types, like DNA or RNA, ensuring that representations of identical molecule types gravitate closer in the representational space, bolstering the model's proficiency in discerning interaction motifs. This dual approach enables comprehensive multi-molecule predictions, elucidating the relationships among different molecule types and strengthening precise protein-molecule interaction predictions. Empirical evidence demonstrates MucLiPred's superiority over existing models in robustness and prediction accuracy. The integration of dual contrastive learning techniques amplifies its capability to detect potential molecule-binding residues with precision. Further optimization, separating representational and classification tasks, has markedly improved its performance. MucLiPred thus represents a significant advancement in protein-molecule interaction prediction, setting a new precedent for future research in this field.
Collapse
Affiliation(s)
- Jiashuo Zhang
- School of Software, Shandong University, Jinan 250101, China
| | - Ruheng Wang
- School of Software, Shandong University, Jinan 250101, China
| | - Leyi Wei
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
- Faculty of Applied Sciences, Macao Polytechnic University, Macao 999078, China
| |
Collapse
|
4
|
Liu Y, Tian B. Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning. Brief Bioinform 2023; 25:bbad488. [PMID: 38171929 PMCID: PMC10782905 DOI: 10.1093/bib/bbad488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/28/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024] Open
Abstract
Protein-DNA interaction is critical for life activities such as replication, transcription and splicing. Identifying protein-DNA binding residues is essential for modeling their interaction and downstream studies. However, developing accurate and efficient computational methods for this task remains challenging. Improvements in this area have the potential to drive novel applications in biotechnology and drug design. In this study, we propose a novel approach called Contrastive Learning And Pre-trained Encoder (CLAPE), which combines a pre-trained protein language model and the contrastive learning method to predict DNA binding residues. We trained the CLAPE-DB model on the protein-DNA binding sites dataset and evaluated the model performance and generalization ability through various experiments. The results showed that the area under ROC curve values of the CLAPE-DB model on the two benchmark datasets reached 0.871 and 0.881, respectively, indicating superior performance compared to other existing models. CLAPE-DB showed better generalization ability and was specific to DNA-binding sites. In addition, we trained CLAPE on different protein-ligand binding sites datasets, demonstrating that CLAPE is a general framework for binding sites prediction. To facilitate the scientific community, the benchmark datasets and codes are freely available at https://github.com/YAndrewL/clape.
Collapse
Affiliation(s)
- Yufan Liu
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China
| |
Collapse
|