1
|
Moreno-Vargas LM, Prada-Gracia D. Exploring the Chemical Features and Biomedical Relevance of Cell-Penetrating Peptides. Int J Mol Sci 2024; 26:59. [PMID: 39795918 PMCID: PMC11720145 DOI: 10.3390/ijms26010059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 11/27/2024] [Accepted: 11/28/2024] [Indexed: 01/13/2025] Open
Abstract
Cell-penetrating peptides (CPPs) are a diverse group of peptides, typically composed of 4 to 40 amino acids, known for their unique ability to transport a wide range of substances-such as small molecules, plasmid DNA, small interfering RNA, proteins, viruses, and nanoparticles-across cellular membranes while preserving the integrity of the cargo. CPPs exhibit passive and non-selective behavior, often requiring functionalization or chemical modification to enhance their specificity and efficacy. The precise mechanisms governing the cellular uptake of CPPs remain ambiguous; however, electrostatic interactions between positively charged amino acids and negatively charged glycosaminoglycans on the membrane, particularly heparan sulfate proteoglycans, are considered the initial crucial step for CPP uptake. Clinical trials have highlighted the potential of CPPs in diagnosing and treating various diseases, including cancer, central nervous system disorders, eye disorders, and diabetes. This review provides a comprehensive overview of CPP classifications, potential applications, transduction mechanisms, and the most relevant algorithms to improve the accuracy and reliability of predictions in CPP development.
Collapse
|
2
|
Niu S, Fan H, Wang F, Yang X, Xia J. Identification of Multi-functional Therapeutic Peptides Based on Prototypical Supervised Contrastive Learning. Interdiscip Sci 2024:10.1007/s12539-024-00674-3. [PMID: 39714581 DOI: 10.1007/s12539-024-00674-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 10/31/2024] [Accepted: 11/04/2024] [Indexed: 12/24/2024]
Abstract
High-throughput sequencing has exponentially increased peptide sequences, necessitating a computational method to identify multi-functional therapeutic peptides (MFTP) from their sequences. However, existing computational methods are challenged by class imbalance, particularly in learning effective sequence representations. To address this, we propose PSCFA, a prototypical supervised contrastive learning with a feature augmentation method for MFTP prediction. We employ a two-stage training scheme to train the feature extractor and the classifier respectively, underpinned by the principle that better feature representation boosts classification accuracy. In the first stage, we utilize a prototypical supervised contrastive learning strategy to enhance the uniformity of feature space distribution, ensuring that the characteristics of samples within the same category are tightly clustered while those from different categories are more dispersed. In the second stage, a feature augmentation strategy that focuses on infrequent labels (tail labels) is used to refine the learning process of the classifier. We use a prototype-based variational autoencoder to capture semantic links among common labels (head labels) and their prototypes. This knowledge is then transferred to tail labels, generating enhanced features for classifier training. The experiments prove that the PSCFA method significantly outperforms existing methods for MFTP prediction, making a significant advancement in therapeutic peptide identification.
Collapse
Affiliation(s)
- Sitong Niu
- College of Mathematics and System sciences, Xinjiang University, Urumqi, 830046, Xinjiang, China
| | - Henghui Fan
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Fei Wang
- School of Artificial Intelligence, Anhui University, Hefei, 230601, Anhui, China
| | - Xiaomei Yang
- College of Mathematics and System sciences, Xinjiang University, Urumqi, 830046, Xinjiang, China.
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.
| |
Collapse
|
3
|
Akbar S, Ullah M, Raza A, Zou Q, Alghamdi W. DeepAIPs-Pred: Predicting Anti-Inflammatory Peptides Using Local Evolutionary Transformation Images and Structural Embedding-Based Optimal Descriptors with Self-Normalized BiTCNs. J Chem Inf Model 2024; 64:9609-9625. [PMID: 39625463 DOI: 10.1021/acs.jcim.4c01758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2024]
Abstract
Inflammation is a biological response to harmful stimuli, playing a crucial role in facilitating tissue repair by eradicating pathogenic microorganisms. However, when inflammation becomes chronic, it leads to numerous serious disorders, particularly in autoimmune diseases. Anti-inflammatory peptides (AIPs) have emerged as promising therapeutic agents due to their high specificity, potency, and low toxicity. However, identifying AIPs using traditional in vivo methods is time-consuming and expensive. Recent advancements in computational-based intelligent models for peptides have offered a cost-effective alternative for identifying various inflammatory diseases, owing to their selectivity toward targeted cells with low side effects. In this paper, we propose a novel computational model, namely, DeepAIPs-Pred, for the accurate prediction of AIP sequences. The training samples are represented using LBP-PSSM- and LBP-SMR-based evolutionary image transformation methods. Additionally, to capture contextual semantic features, we employed attention-based ProtBERT-BFD embedding and QLC for structural features. Furthermore, differential evolution (DE)-based weighted feature integration is utilized to produce a multiview feature vector. The SMOTE-Tomek Links are introduced to address the class imbalance problem, and a two-layer feature selection technique is proposed to reduce and select the optimal features. Finally, the novel self-normalized bidirectional temporal convolutional networks (SnBiTCN) are trained using optimal features, achieving a significant predictive accuracy of 94.92% and an AUC of 0.97. The generalization of our proposed model is validated using two independent datasets, demonstrating higher performance with the improvement of ∼2 and ∼10% of accuracies than the existing state-of-the-art model using Ind-I and Ind-II, respectively. The efficacy and reliability of DeepAIPs-Pred highlight its potential as a valuable and promising tool for drug development and research academia.
Collapse
Affiliation(s)
- Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, KP 23200, Pakistan
| | - Matee Ullah
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Ali Raza
- Department of Computer Science, MY University, Islamabad 45750, Pakistan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
4
|
Liu X, Luo J, Wang X, Zhang Y, Chen J. Directed evolution of antimicrobial peptides using multi-objective zeroth-order optimization. Brief Bioinform 2024; 26:bbae715. [PMID: 39800873 PMCID: PMC11725395 DOI: 10.1093/bib/bbae715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Revised: 12/08/2024] [Accepted: 12/27/2024] [Indexed: 01/16/2025] Open
Abstract
Antimicrobial peptides (AMPs) emerge as a type of promising therapeutic compounds that exhibit broad spectrum antimicrobial activity with high specificity and good tolerability. Natural AMPs usually need further rational design for improving antimicrobial activity and decreasing toxicity to human cells. Although several algorithms have been developed to optimize AMPs with desired properties, they explored the variations of AMPs in a discrete amino acid sequence space, usually suffering from low efficiency, lack diversity, and local optimum. In this work, we propose a novel directed evolution method, named PepZOO, for optimizing multi-properties of AMPs in a continuous representation space guided by multi-objective zeroth-order optimization. PepZOO projects AMPs from a discrete amino acid sequence space into continuous latent representation space by a variational autoencoder. Subsequently, the latent embeddings of prototype AMPs are taken as start points and iteratively updated according to the guidance of multi-objective zeroth-order optimization. Experimental results demonstrate PepZOO outperforms state-of-the-art methods on improving the multi-properties in terms of antimicrobial function, activity, toxicity, and binding affinity to the targets. Molecular docking and molecular dynamics simulations are further employed to validate the effectiveness of our method. Moreover, PepZOO can reveal important motifs which are required to maintain a particular property during the evolution by aligning the evolutionary sequences. PepZOO provides a novel research paradigm that optimizes AMPs by exploring property change instead of exploring sequence mutations, accelerating the discovery of potential therapeutic peptides.
Collapse
Affiliation(s)
- Xianliang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, HIT Campus, Shenzhen University Town, Nanshan District, Shenzhen 518055, Guangdong, China
| | - Jiawei Luo
- School of Computer Science and Technology, Harbin Institute of Technology, HIT Campus, Shenzhen University Town, Nanshan District, Shenzhen 518055, Guangdong, China
| | - Xinyan Wang
- Core Research Facility, Southern University of Science and Technology, No. 1088 Xueyuan Road, Nanshan District, Shenzhen 518055, Guangdong, China
| | - Yang Zhang
- School of Science, Harbin Institute of Technology, HIT Campus, Shenzhen University Town, Nanshan District, Shenzhen 518055, Guangdong, China
| | - Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology, HIT Campus, Shenzhen University Town, Nanshan District, Shenzhen 518055, Guangdong, China
| |
Collapse
|
5
|
Kaur D, Arora A, Vigneshwar P, Raghava GPS. Prediction of peptide hormones using an ensemble of machine learning and similarity-based methods. Proteomics 2024; 24:e2400004. [PMID: 38803012 DOI: 10.1002/pmic.202400004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 04/29/2024] [Accepted: 05/13/2024] [Indexed: 05/29/2024]
Abstract
Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.
Collapse
Affiliation(s)
- Dashleen Kaur
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Akanksha Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Palani Vigneshwar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
6
|
Zhang W, Ding Y, Wei L, Guo X, Ni F. Therapeutic peptides identification via kernel risk sensitive loss-based k-nearest neighbor model and multi-Laplacian regularization. Brief Bioinform 2024; 25:bbae534. [PMID: 39438076 PMCID: PMC11495874 DOI: 10.1093/bib/bbae534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 08/30/2024] [Accepted: 10/08/2024] [Indexed: 10/25/2024] Open
Abstract
Therapeutic peptides are therapeutic agents synthesized from natural amino acids, which can be used as carriers for precisely transporting drugs and can activate the immune system for preventing and treating various diseases. However, screening therapeutic peptides using biochemical assays is expensive, time-consuming, and limited by experimental conditions and biological samples, and there may be ethical considerations in the clinical stage. In contrast, screening therapeutic peptides using machine learning and computational methods is efficient, automated, and can accurately predict potential therapeutic peptides. In this study, a k-nearest neighbor model based on multi-Laplacian and kernel risk sensitive loss was proposed, which introduces a kernel risk loss function derived from the K-local hyperplane distance nearest neighbor model as well as combining the Laplacian regularization method to predict therapeutic peptides. The findings indicated that the suggested approach achieved satisfactory results and could effectively predict therapeutic peptide sequences.
Collapse
Affiliation(s)
- Wenyu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 2006 Xiyuan Avenue, High tech Zone, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Leyi Wei
- Macao Polytechnic University, Gomes Street, Macau Peninsula, Macau 999078, China
| | - Xiaoyi Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Fengming Ni
- Department of Gastroenterology, The First Hospital of Jilin University, No. 71 Xinmin Street, Chaoyang District, Changchun 130021, China
| |
Collapse
|
7
|
Isaac KS, Combe M, Potter G, Sokolenko S. Machine learning tools for peptide bioactivity evaluation - Implications for cell culture media optimization and the broader cultivated meat industry. Curr Res Food Sci 2024; 9:100842. [PMID: 39435450 PMCID: PMC11491887 DOI: 10.1016/j.crfs.2024.100842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 09/07/2024] [Indexed: 10/23/2024] Open
Abstract
Although bioactive peptides have traditionally been studied for their health-promoting qualities in the context of nutrition and medicine, the past twenty years have seen a steady increase in their application to cell culture media optimization. Complex natural sources of bioactive peptides, such as hydrolysates, offer a sustainable and cost-effective means of promoting cellular growth, making them an essential component of scaling-up cultivated meat production. However, the sheer diversity of hydrolysates makes product selection difficult, highlighting the need for functional characterization. Traditional wet-lab techniques for isolating and estimating peptide bioactivity cannot keep pace with peptide identification using high-throughput tools such as mass spectrometry, requiring the development and use of machine learning-based classifiers. This review provides a comprehensive list of available software tools to evaluate peptide bioactivity, classified and compared based on the algorithm, training set, functionality, and limitations of the underlying models. We curated independent test sets to compare the predictive performance of different models based on specific bioactivity classification relevant to promoting cell culture growth: antioxidant and anti-inflammatory. A comprehensive screening of all bioactivity classifiers revealed that while there are approximately fifty tools to elucidate antimicrobial activity and sixteen that predict anti-inflammatory activity, fewer tools are available for other functionalities related to cell growth - five that predict antioxidant activity and two for growth factor and/or cell signaling prediction. A thorough evaluation of the available tools revealed significant issues with sensitivity, specificity, and overall accuracy. Despite the overall interest in estimating peptide bioactivity, our work highlights key gaps in the broader adoption of existing software for the specific application of cell culture media optimization in the context of cultivated meat and beyond.
Collapse
Affiliation(s)
- Kathy Sharon Isaac
- Process Engineering and Applied Science, Dalhousie University, 5273 DaCosta Row, PO Box 15000, Halifax, B3H 4R2, NS, Canada
| | - Michelle Combe
- Process Engineering and Applied Science, Dalhousie University, 5273 DaCosta Row, PO Box 15000, Halifax, B3H 4R2, NS, Canada
| | | | - Stanislav Sokolenko
- Process Engineering and Applied Science, Dalhousie University, 5273 DaCosta Row, PO Box 15000, Halifax, B3H 4R2, NS, Canada
| |
Collapse
|
8
|
Ge R, Xia Y, Jiang M, Jia G, Jing X, Li Y, Cai Y. HybAVPnet: A Novel Hybrid Network Architecture for Antiviral Peptides Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1358-1365. [PMID: 38587961 DOI: 10.1109/tcbb.2024.3385635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
Viruses pose a great threat to human production and life, thus the research and development of antiviral drugs is urgently needed. Antiviral peptides play an important role in drug design and development. Compared with the time-consuming and laborious wet chemical experiment methods, it is critical to use computational methods to predict antiviral peptides accurately and rapidly. However, due to limited data, accurate prediction of antiviral peptides is still challenging and extracting effective feature representations from sequences is crucial for creating accurate models. This study introduces a novel two-step approach, named HybAVPnet, to predict antiviral peptides with a hybrid network architecture based on neural networks and traditional machine learning methods. We adopted a stacking-like structure to capture both the long-term dependencies and local evolution information to achieve a comprehensive and diverse prediction using the predicted labels and probabilities. Using an ensemble technique with the different kinds of features can reduce the variance without increasing the bias. The experimental result shows HybAVPnet can achieve better and more robust performance compared with the state-of-the-art methods, which makes it useful for the research and development of antiviral drugs. Meanwhile, it can also be extended to other peptide recognition problems because of its generalization ability.
Collapse
|
9
|
Rathore AS, Choudhury S, Arora A, Tijare P, Raghava GPS. ToxinPred 3.0: An improved method for predicting the toxicity of peptides. Comput Biol Med 2024; 179:108926. [PMID: 39038391 DOI: 10.1016/j.compbiomed.2024.108926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 05/17/2024] [Accepted: 07/17/2024] [Indexed: 07/24/2024]
Abstract
Toxicity emerges as a prominent challenge in the design of therapeutic peptides, causing the failure of numerous peptides during clinical trials. In 2013, our group developed ToxinPred, a computational method that has been extensively adopted by the scientific community for predicting peptide toxicity. In this paper, we propose a refined variant of ToxinPred that showcases improved reliability and accuracy in predicting peptide toxicity. Initially, we utilized a similarity/alignment-based approach employing BLAST to predict toxic peptides, which yielded satisfactory accuracy; however, the method suffered from inadequate coverage. Subsequently, we employed a motif-based approach using MERCI software to uncover specific patterns or motifs that are exclusively observed in toxic peptides. The search for these motifs in peptides allowed us to predict toxic peptides with a high level of specificity with poor sensitivity. To overcome the coverage limitations, we developed alignment-free methods using machine/deep learning techniques to balance sensitivity and specificity of prediction. Deep learning model (ANN - LSTM with fixed sequence length) developed using one-hot encoding achieved a maximum AUROC of 0.93 with MCC of 0.71 on an independent dataset. Machine learning model (extra tree) developed using compositional features of peptides achieved a maximum AUROC of 0.95 with MCC of 0.78. We also developed large language models and achieved maximum AUC of 0.93 using ESM2-t33. Finally, we developed hybrid or ensemble methods combining two or more methods to enhance performance. Our specific hybrid method, which combines a motif-based approach with a machine learning-based model, achieved a maximum AUROC of 0.98 with MCC 0.81 on an independent dataset. In this study, all models were trained and tested on 80 % of data using five-fold cross-validation and evaluated on the remaining 20 % of data called independent dataset. The evaluation of all methods on an independent dataset revealed that the method proposed in this study exhibited better performance than existing methods. To cater to the needs of the scientific community, we have developed a standalone software, pip package and web-based server ToxinPred3 (https://github.com/raghavagps/toxinpred3 and https://webs.iiitd.edu.in/raghava/toxinpred3/).
Collapse
Affiliation(s)
- Anand Singh Rathore
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Akanksha Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Purva Tijare
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| |
Collapse
|
10
|
Kang Y, Zhang H, Wang X, Yang Y, Jia Q. MMDB: Multimodal dual-branch model for multi-functional bioactive peptide prediction. Anal Biochem 2024; 690:115491. [PMID: 38460901 DOI: 10.1016/j.ab.2024.115491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 01/21/2024] [Accepted: 02/19/2024] [Indexed: 03/11/2024]
Abstract
Bioactive peptides can hinder oxidative processes and microbial spoilage in foodstuffs and play important roles in treating diverse diseases and disorders. While most of the methods focus on single-functional bioactive peptides and have obtained promising prediction performance, it is still a significant challenge to accurately detect complex and diverse functions simultaneously with the quick increase of multi-functional bioactive peptides. In contrast to previous research on multi-functional bioactive peptide prediction based solely on sequence, we propose a novel multimodal dual-branch (MMDB) lightweight deep learning model that designs two different branches to effectively capture the complementary information of peptide sequence and structural properties. Specifically, a multi-scale dilated convolution with Bi-LSTM branch is presented to effectively model the different scales sequence properties of peptides while a multi-layer convolution branch is proposed to capture structural information. To the best of our knowledge, this is the first effective extraction of peptide sequence features using multi-scale dilated convolution without parameter increase. Multimodal features from both branches are integrated via a fully connected layer for multi-label classification. Compared to state-of-the-art methods, our MMDB model exhibits competitive results across metrics, with a 9.1% Coverage increase and 5.3% and 3.5% improvements in Precision and Accuracy, respectively.
Collapse
Affiliation(s)
- Yan Kang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China; Yunnan Key Laboratory of Software Engineering, China
| | - Huadong Zhang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China
| | - Xinchao Wang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China
| | - Yun Yang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China; Yunnan Key Laboratory of Software Engineering, China.
| | - Qi Jia
- School of Information Science, Yunnan University, Kunming, 650091, Yunnan, China
| |
Collapse
|
11
|
Chen Y, Zhao YP, Wang S, Chen J, Zhang Z. Partial Tubal Nuclear Norm-Regularized Multiview Subspace Learning. IEEE TRANSACTIONS ON CYBERNETICS 2024; 54:3777-3790. [PMID: 37058384 DOI: 10.1109/tcyb.2023.3263175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In this article, a unified multiview subspace learning model, called partial tubal nuclear norm-regularized multiview subspace learning (PTN2MSL), was proposed for unsupervised multiview subspace clustering (MVSC), semisupervised MVSC, and multiview dimension reduction. Unlike most of the existing methods which treat the above three related tasks independently, PTN2MSL integrates the projection learning and the low-rank tensor representation to promote each other and mine their underlying correlations. Moreover, instead of minimizing the tensor nuclear norm which treats all singular values equally and neglects their differences, PTN2MSL develops the partial tubal nuclear norm (PTNN) as a better alternative solution by minimizing the partial sum of tubal singular values. The PTN2MSL method was applied to the above three multiview subspace learning tasks. It demonstrated that these tasks organically benefited from each other and PTN2MSL has achieved better performance in comparison to state-of-the-art methods.
Collapse
|
12
|
Liu Z, Bai T, Liu B, Yu L. MulStack: An ensemble learning prediction model of multilabel mRNA subcellular localization. Comput Biol Med 2024; 175:108289. [PMID: 38688123 DOI: 10.1016/j.compbiomed.2024.108289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 02/28/2024] [Accepted: 03/12/2024] [Indexed: 05/02/2024]
Abstract
Subcellular localization of mRNA is related to protein synthesis, cell polarity, cell movement and other biological regulation mechanisms. The distribution of mRNAs in subcellulars is similar to that of proteins, and most mRNAs are distributed in multiple subcellulars. Recently, some computational methods have been designed to predict the subcellular localization of mRNA. However, these methods only employed a sin-gle level of mRNA features and did not employ the position encoding of nucleotides in mRNA. In this paper, an ensemble learning prediction model is proposed, named MulStack, which is based on random forest and deep learning for multilabel mRNA subcellular localization. The proposed method employs two levels of mRNA features, including sequence-level and residue-level features, and position encoding is employed for the first time in the field of subcellular localization of mRNA. Random forest is employed to learn mRNA sequence-level feature, deep learning is employed to learn mRNA sequence-level feature and mRNA residue-level combined with position encoding. And the outputs of random forest and deep learning model will be weighted sum as the prediction probability. Compared with existing methods, the results show that MulStack is the best in the localization of the nucleus, cytosol and exosome. In addition, position weight matrices (PWMs) are extracted by convolutional neural networks (CNNs) that can be matched with known RNA binding protein motifs. Gene ontology (GO) enrichment analysis shows biological processes, molecular functions and cellular components of mRNA genes. The prediction web server of MulStack is freely accessible at http://bliulab.net/MulStack.
Collapse
Affiliation(s)
- Ziqi Liu
- School of Computer Science and Technology, Xidian University, Xian, 710075, China.
| | - Tao Bai
- School of Mathematics & Computer Science, Yan'an University, Shaanxi, 716000, China; School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China.
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, 100081, China.
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xian, 710075, China.
| |
Collapse
|
13
|
Cui Y, Liu H, Ming Y, Zhang Z, Liu L, Liu R. Prediction of strand-specific and cell-type-specific G-quadruplexes based on high-resolution CUT&Tag data. Brief Funct Genomics 2024; 23:265-275. [PMID: 37357985 DOI: 10.1093/bfgp/elad024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2023] [Revised: 05/20/2023] [Accepted: 06/01/2023] [Indexed: 06/27/2023] Open
Abstract
G-quadruplex (G4), a non-classical deoxyribonucleic acid structure, is widely distributed in the genome and involved in various biological processes. In vivo, high-throughput sequencing has indicated that G4s are significantly enriched at functional regions in a cell-type-specific manner. Therefore, the prediction of G4s based on computational methods is necessary instead of the time-consuming and laborious experimental methods. Recently, G4 CUT&Tag has been developed to generate higher-resolution sequencing data than ChIP-seq, which provides more accurate training samples for model construction. In this paper, we present a new dataset construction method based on G4 CUT&Tag sequencing data and an XGBoost prediction model based on the machine learning boost method. The results show that our model performs well within and across cell types. Furthermore, sequence analysis indicates that the formation of G4 structure is greatly affected by the flanking sequences, and the GC content of the G4 flanking sequences is higher than non-G4. Moreover, we also identified G4 motifs in the high-resolution dataset, among which we found several motifs for known transcription factors (TFs), such as SP2 and BPC. These TFs may directly or indirectly affect the formation of the G4 structure.
Collapse
Affiliation(s)
- Yizhi Cui
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, Zhejiang, China
| | - Hongzhi Liu
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
| | - Yutong Ming
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
| | - Zheng Zhang
- Department of Computer Science and Software Engineering, Auburn University, Auburn, 36830, Alabama, USA
| | - Li Liu
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, Zhejiang, China
| | - Ruijun Liu
- School of Computer Science and Engineering, Beijing Technology and Business University, Beijing, 100048, China
| |
Collapse
|
14
|
Ullah M, Akbar S, Raza A, Zou Q. DeepAVP-TPPred: identification of antiviral peptides using transformed image-based localized descriptors and binary tree growth algorithm. Bioinformatics 2024; 40:btae305. [PMID: 38710482 PMCID: PMC11256913 DOI: 10.1093/bioinformatics/btae305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 04/08/2024] [Accepted: 05/03/2024] [Indexed: 05/08/2024] Open
Abstract
MOTIVATION Despite the extensive manufacturing of antiviral drugs and vaccination, viral infections continue to be a major human ailment. Antiviral peptides (AVPs) have emerged as potential candidates in the pursuit of novel antiviral drugs. These peptides show vigorous antiviral activity against a diverse range of viruses by targeting different phases of the viral life cycle. Therefore, the accurate prediction of AVPs is an essential yet challenging task. Lately, many machine learning-based approaches have developed for this purpose; however, their limited capabilities in terms of feature engineering, accuracy, and generalization make these methods restricted. RESULTS In the present study, we aim to develop an efficient machine learning-based approach for the identification of AVPs, referred to as DeepAVP-TPPred, to address the aforementioned problems. First, we extract two new transformed feature sets using our designed image-based feature extraction algorithms and integrate them with an evolutionary information-based feature. Next, these feature sets were optimized using a novel feature selection approach called binary tree growth Algorithm. Finally, the optimal feature space from the training dataset was fed to the deep neural network to build the final classification model. The proposed model DeepAVP-TPPred was tested using stringent 5-fold cross-validation and two independent dataset testing methods, which achieved the maximum performance and showed enhanced efficiency over existing predictors in terms of both accuracy and generalization capabilities. AVAILABILITY AND IMPLEMENTATION https://github.com/MateeullahKhan/DeepAVP-TPPred.
Collapse
Affiliation(s)
- Matee Ullah
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China
| | - Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan
| | - Ali Raza
- Department of Computer Science, MY University, Islamabad 45750, Pakistan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China
| |
Collapse
|
15
|
Akbar S, Raza A, Zou Q. Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model. BMC Bioinformatics 2024; 25:102. [PMID: 38454333 PMCID: PMC10921744 DOI: 10.1186/s12859-024-05726-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/01/2024] [Indexed: 03/09/2024] Open
Abstract
BACKGROUND Viral infections have been the main health issue in the last decade. Antiviral peptides (AVPs) are a subclass of antimicrobial peptides (AMPs) with substantial potential to protect the human body against various viral diseases. However, there has been significant production of antiviral vaccines and medications. Recently, the development of AVPs as an antiviral agent suggests an effective way to treat virus-affected cells. Recently, the involvement of intelligent machine learning techniques for developing peptide-based therapeutic agents is becoming an increasing interest due to its significant outcomes. The existing wet-laboratory-based drugs are expensive, time-consuming, and cannot effectively perform in screening and predicting the targeted motif of antiviral peptides. METHODS In this paper, we proposed a novel computational model called Deepstacked-AVPs to discriminate AVPs accurately. The training sequences are numerically encoded using a novel Tri-segmentation-based position-specific scoring matrix (PSSM-TS) and word2vec-based semantic features. Composition/Transition/Distribution-Transition (CTDT) is also employed to represent the physiochemical properties based on structural features. Apart from these, the fused vector is formed using PSSM-TS features, semantic information, and CTDT descriptors to compensate for the limitations of single encoding methods. Information gain (IG) is applied to choose the optimal feature set. The selected features are trained using a stacked-ensemble classifier. RESULTS The proposed Deepstacked-AVPs model achieved a predictive accuracy of 96.60%%, an area under the curve (AUC) of 0.98, and a precision-recall (PR) value of 0.97 using training samples. In the case of the independent samples, our model obtained an accuracy of 95.15%, an AUC of 0.97, and a PR value of 0.97. CONCLUSION Our Deepstacked-AVPs model outperformed existing models with a ~ 4% and ~ 2% higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed Deepstacked-AVPs model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia.
Collapse
Affiliation(s)
- Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Ali Raza
- Department of Physical and Numerical Sciences, Qurtuba University of Science and Information Technology, Peshawar, 25124, KP, Pakistan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, People's Republic of China.
| |
Collapse
|
16
|
Chang L, Mondal A, Singh B, Martínez-Noa Y, Perez A. Revolutionizing Peptide-Based Drug Discovery: Advances in the Post-AlphaFold Era. WILEY INTERDISCIPLINARY REVIEWS. COMPUTATIONAL MOLECULAR SCIENCE 2024; 14:e1693. [PMID: 38680429 PMCID: PMC11052547 DOI: 10.1002/wcms.1693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2023] [Accepted: 09/18/2023] [Indexed: 05/01/2024]
Abstract
Peptide-based drugs offer high specificity, potency, and selectivity. However, their inherent flexibility and differences in conformational preferences between their free and bound states create unique challenges that have hindered progress in effective drug discovery pipelines. The emergence of AlphaFold (AF) and Artificial Intelligence (AI) presents new opportunities for enhancing peptide-based drug discovery. We explore recent advancements that facilitate a successful peptide drug discovery pipeline, considering peptides' attractive therapeutic properties and strategies to enhance their stability and bioavailability. AF enables efficient and accurate prediction of peptide-protein structures, addressing a critical requirement in computational drug discovery pipelines. In the post-AF era, we are witnessing rapid progress with the potential to revolutionize peptide-based drug discovery such as the ability to rank peptide binders or classify them as binders/non-binders and the ability to design novel peptide sequences. However, AI-based methods are struggling due to the lack of well-curated datasets, for example to accommodate modified amino acids or unconventional cyclization. Thus, physics-based methods, such as docking or molecular dynamics simulations, continue to hold a complementary role in peptide drug discovery pipelines. Moreover, MD-based tools offer valuable insights into binding mechanisms, as well as the thermodynamic and kinetic properties of complexes. As we navigate this evolving landscape, a synergistic integration of AI and physics-based methods holds the promise of reshaping the landscape of peptide-based drug discovery.
Collapse
Affiliation(s)
- Liwei Chang
- Department of Chemistry, University of Florida, Gainesville, FL 32611
| | - Arup Mondal
- Department of Chemistry, University of Florida, Gainesville, FL 32611
| | - Bhumika Singh
- Department of Chemistry, University of Florida, Gainesville, FL 32611
| | | | - Alberto Perez
- Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, FL 32611
| |
Collapse
|
17
|
Chen S, Yan K, Liu B. PDB-BRE: A ligand-protein interaction binding residue extractor based on Protein Data Bank. Proteins 2024; 92:145-153. [PMID: 37750380 DOI: 10.1002/prot.26596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2023] [Revised: 08/13/2023] [Accepted: 09/11/2023] [Indexed: 09/27/2023]
Abstract
Proteins typically exert their biological functions by interacting with other biomolecules or ligands. The study of ligand-protein interactions is crucial in elucidating the biological mechanisms of proteins. Most existing studies have focused on analyzing ligand-protein interactions, and they ignore the additional situational of inserted and modified residues. Besides, the resources often support only a single ligand type and cannot obtain satisfied results in analyzing novel complexes. Therefore, it is important to develop a general analytical tool to extract the binding residues of ligand-protein interactions in complexes fully. In this study, we propose a ligand-protein interaction binding residue extractor (PDB-BRE), which can be used to automatically extract interacting ligand or protein-binding residues from complex three-dimensional (3D) structures based on the RCSB Protein Data Bank (RCSB PDB). PDB-BRE offers a notable advantage in its comprehensive support for analyzing six distinct types of ligands, including proteins, peptides, DNA, RNA, mixed DNA and RNA entities, and non-polymeric entities. Moreover, it takes into account the consideration of inserted and modified residues within complexes. Compared to other state-of-the-art methods, PDB-BRE is more suitable for massively parallel batch analysis, and can be directly applied for downstream tasks, such as predicting binding residues of novel complexes. PDB-BRE is freely available at http://bliulab.net/PDB-BRE.
Collapse
Affiliation(s)
- Shutao Chen
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
18
|
Chung CR, Liou JT, Wu LC, Horng JT, Lee TY. Multi-label classification and features investigation of antimicrobial peptides with various functional classes. iScience 2023; 26:108250. [PMID: 38025779 PMCID: PMC10679894 DOI: 10.1016/j.isci.2023.108250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 07/15/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
The challenge of drug-resistant bacteria to global public health has led to increased attention on antimicrobial peptides (AMPs) as a targeted therapeutic alternative with a lower risk of resistance. However, high production costs and limitations in functional class prediction have hindered progress in this field. In this study, we used multi-label classifiers with binary relevance and algorithm adaptation techniques to predict different functions of AMPs across a wide range of pathogen categories, including bacteria, mammalian cells, fungi, viruses, and cancer cells. Our classifiers attained promising AUC scores varying from 0.8492 to 0.9126 on independent testing data. Forward feature selection identified sequence order and charge as critical, with specific amino acids (C and E) as discriminative. These findings provide valuable insights for the design of antimicrobial peptides (AMPs) with multiple functionalities, thus contributing to the broader effort to combat drug-resistant pathogens.
Collapse
Affiliation(s)
- Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
| | - Jhen-Ting Liou
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
| | - Li-Ching Wu
- Department of Biomedical Sciences and Engineering, National Central University, Taoyuan, Taiwan
| | - Jorng-Tzong Horng
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taoyuan City, Taiwan
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu City, Taiwan
- Center for Intelligent Drug Systems and Smart Biodevices (IDS2B), National Yang Ming Chiao Tung University, Hsinchu City, Taiwan
| |
Collapse
|
19
|
Meng C, Yuan Y, Zhao H, Pei Y, Li Z. IIFS: An improved incremental feature selection method for protein sequence processing. Comput Biol Med 2023; 167:107654. [PMID: 37944304 DOI: 10.1016/j.compbiomed.2023.107654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 10/09/2023] [Accepted: 10/31/2023] [Indexed: 11/12/2023]
Abstract
MOTIVATION Discrete features can be obtained from protein sequences using a feature extraction method. These features are the basis of downstream processing of protein data, but it is necessary to screen and select some important features from them as they generally have data redundancy. RESULT Here, we report IIFS, an improved incremental feature selection method that exploits a new subset search strategy to find the optimal feature set. IIFS combines nonadjacent sorting features to prevent the drawbacks of data explosion and excessive reliance on feature sorting results. The comparative experimental results on 27 feature sorting data show that IIFS can find more accurate and important features compared to existing methods.The IIFS approach also handles data redundancy more efficiently and finds more representative and discriminatory features while ensuring minimal feature dimensionality and good evaluation metrics. Moreover, we wrap this method and deploy it on a web server for access at http://112.124.26.17:8005/.
Collapse
Affiliation(s)
- Chaolu Meng
- College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot, China; Inner Mongolia Autonomous Region Key Laboratory of Big Data Research and Application of Agriculture and Animal Husbandry, China
| | - Ye Yuan
- Beidahuang Industry Group General Hospital, Harbin, 150001, China
| | - Haiyan Zhao
- College of Integration of Traditional Chinese and Western Medicine to Southwest Medical University, Luzhou, Sichuan, 646000, China
| | - Yue Pei
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100190, China
| | - Zhi Li
- Department of Spleen and Stomach Diseases, The Affiliated Traditional Chinese Medicine Hospital of Southwest Medical University, Luzhou, Sichuan, 646000, China.
| |
Collapse
|
20
|
Guo L, Yu H, Li Y, Zhang C, Kharbach M. Tensor methods in data analysis of chromatography/mass spectroscopy-based plant metabolomics. PLANT METHODS 2023; 19:130. [PMID: 37990220 PMCID: PMC10662285 DOI: 10.1186/s13007-023-01105-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 11/06/2023] [Indexed: 11/23/2023]
Abstract
Plant metabolomics is an important research area in plant science. Chemometrics is a useful tool for plant metabolomic data analysis and processing. Among them, high-order chemometrics represented by tensor modeling provides a new and promising technical method for the analysis of complex multi-way plant metabolomics data. This paper systematically reviews different tensor methods widely applied to the analysis of complex plant metabolomic data. The advantages and disadvantages as well as the latest methodological advances of tensor models are reviewed and summarized. At the same time, application of different tensor methods in solving plant science problems are also reviewed and discussed. The reviewed applications of tensor methods in plant metabolomics cover a wide range of important plant science topics including plant gene mutation and phenotype, plant disease and resistance, plant pharmacology and nutrition analysis, and plant products ingredient characterization and quality evaluation. It is evident from the review that tensor methods significantly promote the automated and intelligent process of plant metabolomics analysis and profoundly affect the paradigm of plant science research. To the best of our knowledge, this is the first review to systematically summarize the tensor analysis methods in plant metabolomic data analysis.
Collapse
Affiliation(s)
- Lili Guo
- Weifang University of Science and Technology, Shouguang, 262700, China
| | - Huiwen Yu
- Shenzhen Hospital, Southern Medical University, Shenzhen, 518005, China.
- Chemometrics Group, Faculty of Science, University of Copenhagen, Frederiksberg, 1958, Denmark.
| | - Yuan Li
- Northwest Land and Resources Research Center, Shaanxi Normal University, Xi'an, 710062, China
| | - Chenxi Zhang
- Weifang University of Science and Technology, Shouguang, 262700, China
| | - Mourad Kharbach
- Department of Food and Nutrition, University of Helsinki, Helsinki, 00014, Finland
- Department of Computer Sciences, University of Helsinki, Helsinki, 00560, Finland
| |
Collapse
|
21
|
Lv H, Yan K, Liu B. TPpred-LE: therapeutic peptide function prediction based on label embedding. BMC Biol 2023; 21:238. [PMID: 37904157 PMCID: PMC10617231 DOI: 10.1186/s12915-023-01740-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 10/17/2023] [Indexed: 11/01/2023] Open
Abstract
BACKGROUND Therapeutic peptides play an essential role in human physiology, treatment paradigms and bio-pharmacy. Several computational methods have been developed to identify the functions of therapeutic peptides based on binary classification and multi-label classification. However, these methods fail to explicitly exploit the relationship information among different functions, preventing the further improvement of the prediction performance. Besides, with the development of peptide detection technology, peptide functions will be more comprehensively discovered. Therefore, it is necessary to explore computational methods for detecting therapeutic peptide functions with limited labeled data. RESULTS In this study, a novel method called TPpred-LE based on Transformer framework was proposed for predicting therapeutic peptide multiple functions, which can explicitly extract the function correlation information by using label embedding methodology and exploit the specificity information based on function-specific classifiers. Besides, we incorporated the multi-label classifier retraining approach (MCRT) into TPpred-LE to detect the new therapeutic functions with limited labeled data. Experimental results demonstrate that TPpred-LE outperforms the other state-of-the-art methods, and TPpred-LE with MCRT is robust for the limited labeled data. CONCLUSIONS In summary, TPpred-LE is a function-specific classifier for accurate therapeutic peptide function prediction, demonstrating the importance of the relationship information for therapeutic peptide function prediction. MCRT is a simple but effective strategy to detect functions with limited labeled data.
Collapse
Affiliation(s)
- Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, No. 5, South Zhongguancun Street, Haidian District, Beijing, 100081, China.
| |
Collapse
|
22
|
Shao J, Zhang Q, Yan K, Liu B. PreHom-PCLM: protein remote homology detection by combing motifs and protein cubic language model. Brief Bioinform 2023; 24:bbad347. [PMID: 37833837 DOI: 10.1093/bib/bbad347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 08/14/2023] [Accepted: 09/14/2023] [Indexed: 10/15/2023] Open
Abstract
Protein remote homology detection is essential for structure prediction, function prediction, disease mechanism understanding, etc. The remote homology relationship depends on multiple protein properties, such as structural information and local sequence patterns. Previous studies have shown the challenges for predicting remote homology relationship by protein features at sequence level (e.g. position-specific score matrix). Protein motifs have been used in structure and function analysis due to their unique sequence patterns and implied structural information. Therefore, designing a usable architecture to fuse multiple protein properties based on motifs is urgently needed to improve protein remote homology detection performance. To make full use of the characteristics of motifs, we employed the language model called the protein cubic language model (PCLM). It combines multiple properties by constructing a motif-based neural network. Based on the PCLM, we proposed a predictor called PreHom-PCLM by extracting and fusing multiple motif features for protein remote homology detection. PreHom-PCLM outperforms the other state-of-the-art methods on the test set and independent test set. Experimental results further prove the effectiveness of multiple features fused by PreHom-PCLM for remote homology detection. Furthermore, the protein features derived from the PreHom-PCLM show strong discriminative power for proteins from different structural classes in the high-dimensional space. Availability and Implementation: http://bliulab.net/PreHom-PCLM.
Collapse
Affiliation(s)
- Jiangyi Shao
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Qi Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
23
|
Tao H, Shan S, Fu H, Zhu C, Liu B. An Augmented Sample Selection Framework for Prediction of Anticancer Peptides. Molecules 2023; 28:6680. [PMID: 37764455 PMCID: PMC10535447 DOI: 10.3390/molecules28186680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 09/14/2023] [Accepted: 09/15/2023] [Indexed: 09/29/2023] Open
Abstract
Anticancer peptides (ACPs) have promising prospects for cancer treatment. Traditional ACP identification experiments have the limitations of low efficiency and high cost. In recent years, data-driven deep learning techniques have shown significant potential for ACP prediction. However, data-driven prediction models rely heavily on extensive training data. Furthermore, the current publicly accessible ACP dataset is limited in size, leading to inadequate model generalization. While data augmentation effectively expands dataset size, existing techniques for augmenting ACP data often generate noisy samples, adversely affecting prediction performance. Therefore, this paper proposes a novel augmented sample selection framework for the prediction of anticancer peptides (ACPs-ASSF). First, the prediction model is trained using raw data. Then, the augmented samples generated using the data augmentation technique are fed into the trained model to compute pseudo-labels and estimate the uncertainty of the model prediction. Finally, samples with low uncertainty, high confidence, and pseudo-labels consistent with the original labels are selected and incorporated into the training set to retrain the model. The evaluation results for the ACP240 and ACP740 datasets show that ACPs-ASSF achieved accuracy improvements of up to 5.41% and 5.68%, respectively, compared to the traditional data augmentation method.
Collapse
Affiliation(s)
- Huawei Tao
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Shuai Shan
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Hongliang Fu
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Chunhua Zhu
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Boye Liu
- College of Food Science and Engineering, Henan University of Technology, Zhengzhou 450001, China
| |
Collapse
|
24
|
Xin J, Zhan X, Zheng F, Li H, Wang Y, Li C, Jiang J. The effect of low-frequency high-intensity ultrasound combined with aspirin on tooth movement in rats. BMC Oral Health 2023; 23:642. [PMID: 37670292 PMCID: PMC10478369 DOI: 10.1186/s12903-023-03359-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2022] [Accepted: 08/26/2023] [Indexed: 09/07/2023] Open
Abstract
BACKGROUND Given the difficulties or incapacity of teeth movement in orthodontic treatment, the ways to speed tooth movement must be investigated. Besides, nonsteroidal anti-inflammatory drugs (NSAIDs) were utilized to treat pain caused by tooth movement during orthodontic treatment. The purpose of this study is to examine the impact of aspirin and low-frequency high-intensity ultrasound (LFHIU) on rat orthodontic tooth movement in rats. METHODS Thirty-six male Sprague-Dawley rats were divided into three groups: orthodontic (O), ultrasound-treated orthodontic (OU), and ultrasound-treated orthodontic with aspirin gavage (OUA) group. In the OU and OUA group, LFHIU (44 W/cm2, 28 kHz) was applied to the buccal side of the maxillary first molar alveolar bone for 10 s every day. In the OUA group, aspirin was given by gavage every day. The rats were sacrificed on days 1, 3, 7, and 14. RESULTS After ultrasonic treatment, the speed of tooth movement was increased by about 1.5 times. And the number of osteoclasts considerably increased by about 2 times. However, they decreased slightly after aspirin gavage. By Applying ultrasound therapy, Receptor Activator for Nuclear Factor-κ B Ligand (RANKL) levels in periodontal tissue were elevated. Aspirin was able to reduce these increases. Results from Micro Computed Tomography (Micro-CT) revealed that bone mineral density decreased by about 1/5 after ultrasound treatment on the compression side. The rate of bone mineral apposition indicated that bone was forming under tension, and that of the OU group increased by about 1.3 times that O group. CONCLUSIONS Although aspirin slowed this trend, LFHIU still enhanced overall tooth mobility in orthodontic treatment.
Collapse
Affiliation(s)
- Jiao Xin
- Central Laboratory, Peking University School and Hospital of Stomatology & National Center of Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, Beijing, China
| | - Xinxin Zhan
- Department of Orthodontics, Peking University School and Hospital of Stomatology & National Center of Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, Beijing, China
| | - Fu Zheng
- Department of Orthodontics, Peking University School and Hospital of Stomatology & National Center of Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, Beijing, China
| | - Huazhi Li
- Department of Orthodontics, Peking University School and Hospital of Stomatology & National Center of Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, Beijing, China
| | - Yixiang Wang
- Central Laboratory, Department of Oral and Maxillofacial Surgery, Hospital of Stomatology & National Center of Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, Peking University School, Beijing, China
| | - Cuiying Li
- Central Laboratory, Peking University School and Hospital of Stomatology & National Center of Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, Beijing, China
| | - Jiuhui Jiang
- Department of Orthodontics, Peking University School and Hospital of Stomatology & National Center of Stomatology & National Clinical Research Center for Oral Diseases & National Engineering Research Center of Oral Biomaterials and Digital Medical Devices, Beijing, China.
| |
Collapse
|
25
|
Cui Z, Wang SG, He Y, Chen ZH, Zhang QH. DeepTPpred: A Deep Learning Approach With Matrix Factorization for Predicting Therapeutic Peptides by Integrating Length Information. IEEE J Biomed Health Inform 2023; 27:4611-4622. [PMID: 37368803 DOI: 10.1109/jbhi.2023.3290014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]
Abstract
The abuse of traditional antibiotics has led to increased resistance of bacteria and viruses. Efficient therapeutic peptide prediction is critical for peptide drug discovery. However, most of the existing methods only make effective predictions for one class of therapeutic peptides. It is worth noting that currently no predictive method considers sequence length information as a distinct feature of therapeutic peptides. In this article, a novel deep learning approach with matrix factorization for predicting therapeutic peptides (DeepTPpred) by integrating length information are proposed. The matrix factorization layer can learn the potential features of the encoded sequence through the mechanism of first compression and then restoration. And the length features of the sequence of therapeutic peptides are embedded with encoded amino acid sequences. To automatically learn therapeutic peptide predictions, these latent features are input into the neural networks with self-attention mechanism. On eight therapeutic peptide datasets, DeepTPpred achieved excellent prediction results. Based on these datasets, we first integrated eight datasets to obtain a full therapeutic peptide integration dataset. Then, we obtained two functional integration datasets based on the functional similarity of the peptides. Finally, we also conduct experiments on the latest versions of the ACP and CPP datasets. Overall, the experimental results show that our work is effective for the identification of therapeutic peptides.
Collapse
|
26
|
Yan K, Feng J, Huang J, Wu H. iDRPro-SC: identifying DNA-binding proteins and RNA-binding proteins based on subfunction classifiers. Brief Bioinform 2023:bbad251. [PMID: 37405873 DOI: 10.1093/bib/bbad251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 06/10/2023] [Accepted: 06/12/2023] [Indexed: 07/07/2023] Open
Abstract
Nucleic acid-binding proteins are proteins that interact with DNA and RNA to regulate gene expression and transcriptional control. The pathogenesis of many human diseases is related to abnormal gene expression. Therefore, recognizing nucleic acid-binding proteins accurately and efficiently has important implications for disease research. To address this question, some scientists have proposed the method of using sequence information to identify nucleic acid-binding proteins. However, different types of nucleic acid-binding proteins have different subfunctions, and these methods ignore their internal differences, so the performance of the predictor can be further improved. In this study, we proposed a new method, called iDRPro-SC, to predict the type of nucleic acid-binding proteins based on the sequence information. iDRPro-SC considers the internal differences of nucleic acid-binding proteins and combines their subfunctions to build a complete dataset. Additionally, we used an ensemble learning to characterize and predict nucleic acid-binding proteins. The results of the test dataset showed that iDRPro-SC achieved the best prediction performance and was superior to the other existing nucleic acid-binding protein prediction methods. We have established a web server that can be accessed online: http://bliulab.net/iDRPro-SC.
Collapse
Affiliation(s)
- Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Jiawei Feng
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Jing Huang
- Huajian Yutong Technology (Beijing) Co., Ltd
- State Key Laboratory of Media Convergence Production Technology and Systems, Beijing China,100803
- Xinhua New Media Culture Communication Co., Ltd
| | - Hao Wu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
27
|
Bai T, Yan K, Liu B. DAmiRLocGNet: miRNA subcellular localization prediction by combining miRNA-disease associations and graph convolutional networks. Brief Bioinform 2023:bbad212. [PMID: 37332057 DOI: 10.1093/bib/bbad212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Revised: 05/17/2023] [Accepted: 05/18/2023] [Indexed: 06/20/2023] Open
Abstract
MicroRNAs (miRNAs) are human post-transcriptional regulators in humans, which are involved in regulating various physiological processes by regulating the gene expression. The subcellular localization of miRNAs plays a crucial role in the discovery of their biological functions. Although several computational methods based on miRNA functional similarity networks have been presented to identify the subcellular localization of miRNAs, it remains difficult for these approaches to effectively extract well-referenced miRNA functional representations due to insufficient miRNA-disease association representation and disease semantic representation. Currently, there has been a significant amount of research on miRNA-disease associations, making it possible to address the issue of insufficient miRNA functional representation. In this work, a novel model is established, named DAmiRLocGNet, based on graph convolutional network (GCN) and autoencoder (AE) for identifying the subcellular localizations of miRNA. The DAmiRLocGNet constructs the features based on miRNA sequence information, miRNA-disease association information and disease semantic information. GCN is utilized to gather the information of neighboring nodes and capture the implicit information of network structures from miRNA-disease association information and disease semantic information. AE is employed to capture sequence semantics from sequence similarity networks. The evaluation demonstrates that the performance of DAmiRLocGNet is superior to other competing computational approaches, benefiting from implicit features captured by using GCNs. The DAmiRLocGNet has the potential to be applied to the identification of subcellular localization of other non-coding RNAs. Moreover, it can facilitate further investigation into the functional mechanisms underlying miRNA localization. The source code and datasets are accessed at http://bliulab.net/DAmiRLocGNet.
Collapse
Affiliation(s)
- Tao Bai
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- School of Mathematics & Computer Science, Yan'an University, Shaanxi 716000, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
28
|
Zhang L, Bai T, Wu H. sgRNA-2wPSM: Identify sgRNAs on-target activity by combining two-window-based position specific mismatch and synthetic minority oversampling technique. Comput Biol Med 2023; 155:106489. [PMID: 36841059 DOI: 10.1016/j.compbiomed.2022.106489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 12/27/2022] [Indexed: 12/30/2022]
Abstract
MOTIVATION sgRNAs on-target activity prediction is a critical step in the CRISPR-Cas9 system. Due to its importance to RNA function research and genome editing application, some computational methods were introduced, treating it as a binary classification task or a regression task. Among these methods, sgRNA-PSM is a state-of-the-art method. In this work, we improved this method by proposing a new feature extraction method called two-window-based PSM, which divides the DNA sequences into two non-overlapping segments so as to extract different patterns in the two different segments. The two-window-based PSM were fed into Support Vector Machines (SVMs), and a new method called sgRNA-2wPSM was proposed. Furthermore, a new oversampling method called SCORE-SVM-SMOTE was proposed to solve the imbalanced training set problem based on the SVM-SMOTE algorithm. Results on the benchmark datasets indicated that sgRNA-2wPSM is superior to other methods.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen, China.
| | - Tao Bai
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; School of Mathematics & Computer Science, Yanan University, Shanxi, 716000, China.
| | - Hao Wu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
29
|
Constructing discriminative feature space for LncRNA-protein interaction based on deep autoencoder and marginal fisher analysis. Comput Biol Med 2023; 157:106711. [PMID: 36924738 DOI: 10.1016/j.compbiomed.2023.106711] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/26/2023] [Accepted: 02/26/2023] [Indexed: 03/04/2023]
Abstract
Long non-coding RNAs (lncRNAs) play important roles by regulating proteins in many biological processes and life activities. To uncover molecular mechanisms of lncRNA, it is very necessary to identify interactions of lncRNA with proteins. Recently, some machine learning methods were proposed to detect lncRNA-protein interactions according to the distribution of known interactions. The performances of these methods were largely dependent upon: (1) how exactly the distribution of known interactions was characterized by feature space; (2) how discriminative the feature space was for distinguishing lncRNA-protein interactions. Because the known interactions may be multiple and complex model, it remains a challenge to construct discriminative feature space for lncRNA-protein interactions. To resolve this problem, a novel method named DFRPI was developed based on deep autoencoder and marginal fisher analysis in this paper. Firstly, some initial features of lncRNA-protein interactions were extracted from the primary sequences and secondary structures of lncRNA and protein. Secondly, a deep autoencoder was exploited to learn encode parameters of the initial features to describe the known interactions precisely. Next, the marginal fisher analysis was employed to optimize the encode parameters of features to characterize a discriminative feature space of the lncRNA-protein interactions. Finally, a random forest-based predictor was trained on the discriminative feature space to detect lncRNA-protein interactions. Verified by a series of experiments, the results showed that our predictor achieved the precision of 0.920, recall of 0.916, accuracy of 0.918, MCC of 0.836, specificity of 0.920, sensitivity of 0.916 and AUC of 0.906 respectively, which outperforms the concerned methods for predicting lncRNA-protein interaction. It may be suggested that the proposed method can generate a reasonable and effective feature space for distinguishing lncRNA-protein interactions accurately. The code and data are available on https://github.com/D0ub1e-D/DFRPI.
Collapse
|
30
|
Shi H, Wu C, Bai T, Chen J, Li Y, Wu H. Identify essential genes based on clustering based synthetic minority oversampling technique. Comput Biol Med 2023; 153:106523. [PMID: 36652869 DOI: 10.1016/j.compbiomed.2022.106523] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 12/13/2022] [Accepted: 12/31/2022] [Indexed: 01/03/2023]
Abstract
Prediction of essential genes in a life organism is one of the central tasks in synthetic biology. Computational predictors are desired because experimental data is often unavailable. Recently, some sequence-based predictors have been constructed to identify essential genes. However, their predictive performance should be further improved. One key problem is how to effectively extract the sequence-based features, which are able to discriminate the essential genes. Another problem is the imbalanced training set. The amount of essential genes in human cell lines is lower than that of non-essential genes. Therefore, predictors trained with such imbalanced training set tend to identify an unseen sequence as a non-essential gene. Here, a new over-sampling strategy was proposed called Clustering based Synthetic Minority Oversampling Technique (CSMOTE) to overcome the imbalanced data issue. Combining CSMOTE with the Z curve, the global features, and Support Vector Machines, a new protocol called iEsGene-CSMOTE was proposed to identify essential genes. The rigorous jackknife cross validation results indicated that iEsGene-CSMOTE is better than the other competing methods. The proposed method outperformed λ-interval Z curve by 35.48% and 11.25% in terms of Sn and BACC, respectively.
Collapse
Affiliation(s)
- Hua Shi
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, China.
| | - Chenjin Wu
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, China.
| | - Tao Bai
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; School of Mathematics & Computer Science, Yanan University, Shanxi, 716000, China.
| | - Jiahai Chen
- Xiamen Sankuai Online Technology Co., Ltd, Xiamen, China.
| | - Yan Li
- School of Opto-electronic and Communication Engineering, Xiamen University of Technology, Xiamen, China.
| | - Hao Wu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| |
Collapse
|
31
|
Li J, Wu Z, Lin W, Luo J, Zhang J, Chen Q, Chen J. iEnhancer-ELM: improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models. BIOINFORMATICS ADVANCES 2023; 3:vbad043. [PMID: 37113248 PMCID: PMC10125906 DOI: 10.1093/bioadv/vbad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Revised: 02/04/2023] [Accepted: 03/24/2023] [Indexed: 04/29/2023]
Abstract
Motivation Enhancers are important cis-regulatory elements that regulate a wide range of biological functions and enhance the transcription of target genes. Although many feature extraction methods have been proposed to improve the performance of enhancer identification, they cannot learn position-related multiscale contextual information from raw DNA sequences. Results In this article, we propose a novel enhancer identification method (iEnhancer-ELM) based on BERT-like enhancer language models. iEnhancer-ELM tokenizes DNA sequences with multi-scale k-mers and extracts contextual information of different scale k-mers related with their positions via an multi-head attention mechanism. We first evaluate the performance of different scale k-mers, then ensemble them to improve the performance of enhancer identification. The experimental results on two popular benchmark datasets show that our model outperforms state-of-the-art methods. We further illustrate the interpretability of iEnhancer-ELM. For a case study, we discover 30 enhancer motifs via a 3-mer-based model, where 12 of motifs are verified by STREME and JASPAR, demonstrating our model has a potential ability to unveil the biological mechanism of enhancer. Availability and implementation The models and associated code are available at https://github.com/chen-bioinfo/iEnhancer-ELM. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
| | | | - Wenhao Lin
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Jiawei Luo
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Jun Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Qingcai Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
- Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | | |
Collapse
|
32
|
Yan K, Lv H, Guo Y, Peng W, Liu B. sAMPpred-GAT: prediction of antimicrobial peptide by graph attention network and predicted peptide structure. Bioinformatics 2023; 39:btac715. [PMID: 36342186 PMCID: PMC9805557 DOI: 10.1093/bioinformatics/btac715] [Citation(s) in RCA: 68] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 10/24/2022] [Accepted: 11/04/2022] [Indexed: 11/09/2022] Open
Abstract
MOTIVATION Antimicrobial peptides (AMPs) are essential components of therapeutic peptides for innate immunity. Researchers have developed several computational methods to predict the potential AMPs from many candidate peptides. With the development of artificial intelligent techniques, the protein structures can be accurately predicted, which are useful for protein sequence and function analysis. Unfortunately, the predicted peptide structure information has not been applied to the field of AMP prediction so as to improve the predictive performance. RESULTS In this study, we proposed a computational predictor called sAMPpred-GAT for AMP identification. To the best of our knowledge, sAMPpred-GAT is the first approach based on the predicted peptide structures for AMP prediction. The sAMPpred-GAT predictor constructs the graphs based on the predicted peptide structures, sequence information and evolutionary information. The Graph Attention Network (GAT) is then performed on the graphs to learn the discriminative features. Finally, the full connection networks are utilized as the output module to predict whether the peptides are AMP or not. Experimental results show that sAMPpred-GAT outperforms the other state-of-the-art methods in terms of AUC, and achieves better or highly comparable performance in terms of the other metrics on the eight independent test datasets, demonstrating that the predicted peptide structure information is important for AMP prediction. AVAILABILITY AND IMPLEMENTATION A user-friendly webserver of sAMPpred-GAT can be accessed at http://bliulab.net/sAMPpred-GAT and the source code is available at https://github.com/HongWuL/sAMPpred-GAT/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yichen Guo
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Wei Peng
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
33
|
Chamoli T, Khera A, Sharma A, Gupta A, Garg S, Mamgain K, Bansal A, Verma S, Gupta A, Alajangi HK, Singh G, Barnwal RP. Peptide Utility (PU) search server: A new tool for peptide sequence search from multiple databases. Heliyon 2022; 8:e12283. [PMID: 36590540 PMCID: PMC9800339 DOI: 10.1016/j.heliyon.2022.e12283] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 10/21/2022] [Accepted: 12/05/2022] [Indexed: 12/14/2022] Open
Abstract
Proteins are essential building blocks in humans that have garnered huge attention from researchers worldwide due to their numerous therapeutic applications. To date, different computational tools have been developed to extract pre-existing information on these biological molecules, but most of these tools suffer from limitations such as non-user friendly interface, redundancy of data, etc. To overcome these limitations, a user-friendly interface, the Peptide Utility (PU) webserver (https://chain-searching.herokuapp.com/) has been developed for searching and analyzing homologous and identical protein/peptide sequences that can be searched from approximately 0.4 million sequences (structural and sequence information) in both online and offline modes. The PU web server can also be used to study different types of interactions in PDBSum, identifying the most dominating interface residues, the most prevalent interactions, and the interaction preferences of different residues. The webserver would also pave way for the design of novel therapeutic peptides and folds by identifying conserved residues in the three-dimensional structure space of proteins.
Collapse
Affiliation(s)
- Tanishq Chamoli
- Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology, Chandigarh, India
| | - Alisha Khera
- Department of Biophysics, Panjab University, Chandigarh 160014, India,National Centre for Cell Science, NCCS Complex, S. P. Pune University Campus, Ganeshkhind, Pune, Maharashtra 411007, India
| | - Akanksha Sharma
- Department of Biophysics, Panjab University, Chandigarh 160014, India,University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh 160014, India
| | - Anshul Gupta
- Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology, Chandigarh, India
| | - Sonam Garg
- Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology, Chandigarh, India
| | - Kanishk Mamgain
- Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology, Chandigarh, India
| | - Aayushi Bansal
- Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology, Chandigarh, India
| | - Shriya Verma
- Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology, Chandigarh, India
| | - Ankit Gupta
- Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology, Chandigarh, India
| | - Hema K. Alajangi
- Department of Biophysics, Panjab University, Chandigarh 160014, India,University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh 160014, India,Corresponding author.
| | - Gurpal Singh
- University Institute of Pharmaceutical Sciences, Panjab University, Chandigarh 160014, India,Corresponding author.
| | - Ravi P. Barnwal
- Department of Computer Science and Engineering, Chandigarh College of Engineering and Technology, Chandigarh, India,Corresponding author.
| |
Collapse
|
34
|
Bi Y, Li F, Guo X, Wang Z, Pan T, Guo Y, Webb GI, Yao J, Jia C, Song J. Clarion is a multi-label problem transformation method for identifying mRNA subcellular localizations. Brief Bioinform 2022; 23:bbac467. [PMID: 36341591 PMCID: PMC10148739 DOI: 10.1093/bib/bbac467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Revised: 09/09/2022] [Accepted: 09/29/2022] [Indexed: 11/09/2022] Open
Abstract
Subcellular localization of messenger RNAs (mRNAs) plays a key role in the spatial regulation of gene activity. The functions of mRNAs have been shown to be closely linked with their localizations. As such, understanding of the subcellular localizations of mRNAs can help elucidate gene regulatory networks. Despite several computational methods that have been developed to predict mRNA localizations within cells, there is still much room for improvement in predictive performance, especially for the multiple-location prediction. In this study, we proposed a novel multi-label multi-class predictor, termed Clarion, for mRNA subcellular localization prediction. Clarion was developed based on a manually curated benchmark dataset and leveraged the weighted series method for multi-label transformation. Extensive benchmarking tests demonstrated Clarion achieved competitive predictive performance and the weighted series method plays a crucial role in securing superior performance of Clarion. In addition, the independent test results indicate that Clarion outperformed the state-of-the-art methods and can secure accuracy of 81.47, 91.29, 79.77, 92.10, 89.15, 83.74, 80.74, 79.23 and 84.74% for chromatin, cytoplasm, cytosol, exosome, membrane, nucleolus, nucleoplasm, nucleus and ribosome, respectively. The webserver and local stand-alone tool of Clarion is freely available at http://monash.bioweb.cloud.edu.au/Clarion/.
Collapse
Affiliation(s)
- Yue Bi
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | - Fuyi Li
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria 3000, Australia
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Zhikang Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Tong Pan
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
| | - Yuming Guo
- Department of Epidemiology and Preventive Medicine, School of Public Health and Preventive Medicine, Monash University, Melbourne, Victoria 3004, Australia
| | - Geoffrey I Webb
- Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| | | | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, Victoria 3800, Australia
| |
Collapse
|
35
|
Yan W, Tang W, Wang L, Bin Y, Xia J. PrMFTP: Multi-functional therapeutic peptides prediction based on multi-head self-attention mechanism and class weight optimization. PLoS Comput Biol 2022; 18:e1010511. [PMID: 36094961 PMCID: PMC9499272 DOI: 10.1371/journal.pcbi.1010511] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 09/22/2022] [Accepted: 08/24/2022] [Indexed: 11/18/2022] Open
Abstract
Prediction of therapeutic peptide is a significant step for the discovery of promising therapeutic drugs. Most of the existing studies have focused on the mono-functional therapeutic peptide prediction. However, the number of multi-functional therapeutic peptides (MFTP) is growing rapidly, which requires new computational schemes to be proposed to facilitate MFTP discovery. In this study, based on multi-head self-attention mechanism and class weight optimization algorithm, we propose a novel model called PrMFTP for MFTP prediction. PrMFTP exploits multi-scale convolutional neural network, bi-directional long short-term memory, and multi-head self-attention mechanisms to fully extract and learn informative features of peptide sequence to predict MFTP. In addition, we design a class weight optimization scheme to address the problem of label imbalanced data. Comprehensive evaluation demonstrate that PrMFTP is superior to other state-of-the-art computational methods for predicting MFTP. We provide a user-friendly web server of PrMFTP, which is available at http://bioinfo.ahu.edu.cn/PrMFTP.
Collapse
Affiliation(s)
- Wenhui Yan
- Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| | - Wending Tang
- Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| | - Lihua Wang
- Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| | - Yannan Bin
- Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
- * E-mail: (YB); (JX)
| | - Junfeng Xia
- Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
- * E-mail: (YB); (JX)
| |
Collapse
|
36
|
Qiu XY, Wu H, Shao J. TALE-cmap: Protein function prediction based on a TALE-based architecture and the structure information from contact map. Comput Biol Med 2022; 149:105938. [DOI: 10.1016/j.compbiomed.2022.105938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 07/26/2022] [Accepted: 08/06/2022] [Indexed: 11/03/2022]
|
37
|
Wang N, Yan K, Zhang J, Liu B. iDRNA-ITF: identifying DNA- and RNA-binding residues in proteins based on induction and transfer framework. Brief Bioinform 2022; 23:6609520. [PMID: 35709747 DOI: 10.1093/bib/bbac236] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 05/06/2022] [Accepted: 05/20/2022] [Indexed: 11/14/2022] Open
Abstract
Protein-DNA and protein-RNA interactions are involved in many biological activities. In the post-genome era, accurate identification of DNA- and RNA-binding residues in protein sequences is of great significance for studying protein functions and promoting new drug design and development. Therefore, some sequence-based computational methods have been proposed for identifying DNA- and RNA-binding residues. However, they failed to fully utilize the functional properties of residues, leading to limited prediction performance. In this paper, a sequence-based method iDRNA-ITF was proposed to incorporate the functional properties in residue representation by using an induction and transfer framework. The properties of nucleic acid-binding residues were induced by the nucleic acid-binding residue feature extraction network, and then transferred into the feature integration modules of the DNA-binding residue prediction network and the RNA-binding residue prediction network for the final prediction. Experimental results on four test sets demonstrate that iDRNA-ITF achieves the state-of-the-art performance, outperforming the other existing sequence-based methods. The webserver of iDRNA-ITF is freely available at http://bliulab.net/iDRNA-ITF.
Collapse
Affiliation(s)
- Ning Wang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Jun Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|