1
|
Arif R, Kanwal S, Ahmed S, Kabir M. A Computational Predictor for Accurate Identification of Tumor Homing Peptides by Integrating Sequential and Deep BiLSTM Features. Interdiscip Sci 2024:10.1007/s12539-024-00628-9. [PMID: 38733473 DOI: 10.1007/s12539-024-00628-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 03/16/2024] [Accepted: 03/27/2024] [Indexed: 05/13/2024]
Abstract
Cancer remains a severe illness, and current research indicates that tumor homing peptides (THPs) play an important part in cancer therapy. The identification of THPs can provide crucial insights for drug-discovery and pharmaceutical industries as they allow for tailored medication delivery towards cancer cells. These peptides have a high affinity enabling particular receptors present upon tumor surfaces, allowing for the creation of precision medications that reduce off-target consequences and enhance cancer patient treatment results. Wet-lab techniques are considered essential tools for studying THPs; however, they're labor-extensive and time-consuming, therefore making prediction of THPs a challenging task for the researchers. Computational-techniques, on the other hand, are considered significant tools in identifying THPs according to the sequence data. Despite many strategies have been presented to predict new THP, there is still a need to develop a robust method with higher rates of success. In this paper, we developed a novel framework, THP-DF, for accurately identifying THPs on a large-scale. Firstly, the peptide sequences are encoded through various sequential features. Secondly, each feature is passed to BiLSTM and attention layers to extract simplified deep features. Finally, an ensemble-framework is formed via integrating sequential- and deep features which are fed to a support vector machine which with 10-fold cross-validation to carry to validate the efficiency. The experimental results showed that THP-DF worked better on both [Formula: see text] and [Formula: see text] datasets by achieving accuracy of > 95% which are higher than existing predictors both datasets. This indicates that the proposed predictor could be a beneficial tool to precisely and rapidly identify THPs and will contribute to the cutting-edge cancer treatment strategies and pharmaceuticals.
Collapse
Affiliation(s)
- Roha Arif
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Sameera Kanwal
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Saeed Ahmed
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Muhammad Kabir
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan.
| |
Collapse
|
2
|
Pal J, Ghosh S, Maji B, Bhattacharya DK. Use of 2D FFT and DTW in Protein Sequence Comparison. Protein J 2024; 43:1-11. [PMID: 37848727 DOI: 10.1007/s10930-023-10160-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/20/2023] [Indexed: 10/19/2023]
Abstract
Protein sequence comparison remains a challenging work for the researchers owing to the computational complexity due to the presence of 20 amino acids compared with only four nucleotides in Genome sequences. Further, protein sequences of different species are of different lengths; it throws additional changes to the researchers to develop methods, specially alignment-free methods, to compare protein sequences. In this work, an efficient technique to compare protein sequences is developed by a graphical representation. First, the classified grouping of 20 amino acids with a cardinality of 4 based on polar class is considered to narrow down the representational range from 20 to 4. Then a unit vector technique based on a two-quadrant Cartesian system is proposed to provide a new two-dimensional graphical representation of the protein sequence. Now, two approaches are proposed to cope with the varying lengths of protein sequences from various species: one uses Dynamic Time Warping (DTW), while the other one uses a two-dimensional Fast Fourier Transform (2D FFT). Next, the effectiveness of these two techniques is analyzed using two evaluation criteria-quantitative measures based on symmetric distance (SD) and computational speed. An analysis is performed on five data sets of 9 ND4, 9 ND5, 9 ND6, 12 Baculovirus, and 24 TF proteins under the two methods. It is found that the FFT-based method produces the same results as DTW but in less computational time. It is found that the result of the proposed method agrees with the known biological reference. Further, the present method produces better clustering than the existing ones.
Collapse
Affiliation(s)
- Jayanta Pal
- Department of ECE, National Institute of Technology, Durgapur, India.
- Department of CSE, Narula Institute of Technology, Kolkata, India.
| | - Soumen Ghosh
- Department of ECE, National Institute of Technology, Durgapur, India
| | - Bansibadan Maji
- Department of ECE, National Institute of Technology, Durgapur, India
| | | |
Collapse
|
3
|
Tao H, Shan S, Fu H, Zhu C, Liu B. An Augmented Sample Selection Framework for Prediction of Anticancer Peptides. Molecules 2023; 28:6680. [PMID: 37764455 PMCID: PMC10535447 DOI: 10.3390/molecules28186680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 09/14/2023] [Accepted: 09/15/2023] [Indexed: 09/29/2023] Open
Abstract
Anticancer peptides (ACPs) have promising prospects for cancer treatment. Traditional ACP identification experiments have the limitations of low efficiency and high cost. In recent years, data-driven deep learning techniques have shown significant potential for ACP prediction. However, data-driven prediction models rely heavily on extensive training data. Furthermore, the current publicly accessible ACP dataset is limited in size, leading to inadequate model generalization. While data augmentation effectively expands dataset size, existing techniques for augmenting ACP data often generate noisy samples, adversely affecting prediction performance. Therefore, this paper proposes a novel augmented sample selection framework for the prediction of anticancer peptides (ACPs-ASSF). First, the prediction model is trained using raw data. Then, the augmented samples generated using the data augmentation technique are fed into the trained model to compute pseudo-labels and estimate the uncertainty of the model prediction. Finally, samples with low uncertainty, high confidence, and pseudo-labels consistent with the original labels are selected and incorporated into the training set to retrain the model. The evaluation results for the ACP240 and ACP740 datasets show that ACPs-ASSF achieved accuracy improvements of up to 5.41% and 5.68%, respectively, compared to the traditional data augmentation method.
Collapse
Affiliation(s)
- Huawei Tao
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Shuai Shan
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Hongliang Fu
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Chunhua Zhu
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Boye Liu
- College of Food Science and Engineering, Henan University of Technology, Zhengzhou 450001, China
| |
Collapse
|
4
|
Tîrziu A, Avram S, Madă L, Crișan-Vida M, Popovici C, Popovici D, Faur C, Duda-Seiman C, Păunescu V, Vernic C. Design of a Synthetic Long Peptide Vaccine Targeting HPV-16 and -18 Using Immunoinformatic Methods. Pharmaceutics 2023; 15:1798. [PMID: 37513985 PMCID: PMC10384861 DOI: 10.3390/pharmaceutics15071798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 06/19/2023] [Accepted: 06/21/2023] [Indexed: 07/30/2023] Open
Abstract
Human papillomavirus types 16 and 18 cause the majority of cervical cancers worldwide. Despite the availability of three prophylactic vaccines based on virus-like particles (VLP) of the major capsid protein (L1), these vaccines are unable to clear an existing infection. Such infected persons experience an increased risk of neoplastic transformation. To overcome this problem, this study proposes an alternative synthetic long peptide (SLP)-based vaccine for persons already infected, including those with precancerous lesions. This new vaccine was designed to stimulate both CD8+ and CD4+ T cells, providing a robust and long-lasting immune response. The SLP construct includes both HLA class I- and class II-restricted epitopes, identified from IEDB or predicted using NetMHCPan and NetMHCIIPan. None of the SLPs were allergenic nor toxic, based on in silico studies. Population coverage studies provided 98.18% coverage for class I epitopes and 99.81% coverage for class II peptides in the IEDB world population's allele set. Three-dimensional structure ab initio prediction using Rosetta provided good quality models, which were assessed using PROCHECK and QMEAN4. Molecular docking with toll-like receptor 2 identified potential intrinsic TLR2 agonist activity, while molecular dynamics studies of SLPs in water suggested good stability, with favorable thermodynamic properties.
Collapse
Affiliation(s)
- Alexandru Tîrziu
- Department of Functional Sciences, "Victor Babes" University of Medicine and Pharmacy, Eftimie Murgu Square, No. 2, 300041 Timisoara, Romania
| | - Speranța Avram
- Department of Anatomy, Animal Physiology and Biophysics, Faculty of Biology, University of Bucharest, 91-95 Splaiul Independentei, 050095 Bucharest, Romania
| | - Leonard Madă
- Syonic SRL, Grigore T Popa Street, No. 81, 300254 Timisoara, Romania
| | - Mihaela Crișan-Vida
- Department of Automation and Computers, Politehnica University of Timisoara, 300006 Timisoara, Romania
| | - Casiana Popovici
- Section of Bioinformatics, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - Dan Popovici
- Department of Mathematics, University of the West Timişoara, Bd. Vasile Pârvan No. 4, 300223 Timişoara, Romania
| | - Cosmin Faur
- Department of Orthopaedic Surgery, University of Medicine and Pharmacy "Victor Babes", Dropiei Street, No. 7, sc B, ap 8, 300661 Timisoara, Romania
| | - Corina Duda-Seiman
- Department of Chemistry and Biology, Faculty of Chemistry, Biology, Geography, West University of Timisoara, 16 Pestalozzi, 300115 Timisoara, Romania
| | - Virgil Păunescu
- Department of Functional Sciences, "Victor Babes" University of Medicine and Pharmacy, Eftimie Murgu Square, No. 2, 300041 Timisoara, Romania
- Center for Gene and Cellular Therapies in the Treatment of Cancer Timisoara-OncoGen, Clinical Emergency County Hospital "Pius Brinzeu" Timisoara, No. 156 Liviu Rebreanu, 300723 Timisoara, Romania
- Immuno-Physiology and Biotechnologies Center, Department of Functional Sciences, "Victor Babes" University of Medicine and Pharmacy, No. 2 Eftimie Murgu Square, 300041 Timisoara, Romania
| | - Corina Vernic
- Department of Functional Sciences, "Victor Babes" University of Medicine and Pharmacy, Eftimie Murgu Square, No. 2, 300041 Timisoara, Romania
- Discipline of Medical Informatics and Biostatistics, "Victor Babes" University of Medicine and Pharmacy, 300041 Timisoara, Romania
| |
Collapse
|
5
|
Yao L, Li W, Zhang Y, Deng J, Pang Y, Huang Y, Chung CR, Yu J, Chiang YC, Lee TY. Accelerating the Discovery of Anticancer Peptides through Deep Forest Architecture with Deep Graphical Representation. Int J Mol Sci 2023; 24:ijms24054328. [PMID: 36901759 PMCID: PMC10001941 DOI: 10.3390/ijms24054328] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 02/02/2023] [Accepted: 02/07/2023] [Indexed: 02/24/2023] Open
Abstract
Cancer is one of the leading diseases threatening human life and health worldwide. Peptide-based therapies have attracted much attention in recent years. Therefore, the precise prediction of anticancer peptides (ACPs) is crucial for discovering and designing novel cancer treatments. In this study, we proposed a novel machine learning framework (GRDF) that incorporates deep graphical representation and deep forest architecture for identifying ACPs. Specifically, GRDF extracts graphical features based on the physicochemical properties of peptides and integrates their evolutionary information along with binary profiles for constructing models. Moreover, we employ the deep forest algorithm, which adopts a layer-by-layer cascade architecture similar to deep neural networks, enabling excellent performance on small datasets but without complicated tuning of hyperparameters. The experiment shows GRDF exhibits state-of-the-art performance on two elaborate datasets (Set 1 and Set 2), achieving 77.12% accuracy and 77.54% F1-score on Set 1, as well as 94.10% accuracy and 94.15% F1-score on Set 2, exceeding existing ACP prediction methods. Our models exhibit greater robustness than the baseline algorithms commonly used for other sequence analysis tasks. In addition, GRDF is well-interpretable, enabling researchers to better understand the features of peptide sequences. The promising results demonstrate that GRDF is remarkably effective in identifying ACPs. Therefore, the framework presented in this study could assist researchers in facilitating the discovery of anticancer peptides and contribute to developing novel cancer treatments.
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Wenshuo Li
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Yuntian Zhang
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Junyang Deng
- School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Yuxuan Pang
- School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Yixian Huang
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Chia-Ru Chung
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Jinhan Yu
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
| | - Ying-Chih Chiang
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- Correspondence: (Y.-C.C.); (T.-Y.L.)
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, School of Medicine, The Chinese University of Hong Kong (Shenzhen), 2001 Longxiang Road, Shenzhen 518172, China
- Correspondence: (Y.-C.C.); (T.-Y.L.)
| |
Collapse
|
6
|
Ali Z, Alturise F, Alkhalifah T, Khan YD. IGPred-HDnet: Prediction of Immunoglobulin Proteins Using Graphical Features and the Hierarchal Deep Learning-Based Approach. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:2465414. [PMID: 36744119 PMCID: PMC9891831 DOI: 10.1155/2023/2465414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/16/2022] [Accepted: 10/12/2022] [Indexed: 01/26/2023]
Abstract
Motivation. Immunoglobulin proteins (IGP) (also called antibodies) are glycoproteins that act as B-cell receptors against external or internal antigens like viruses and bacteria. IGPs play a significant role in diverse cellular processes ranging from adhesion to cell recognition. IGP identifications via the in-silico approach are faster and more cost-effective than wet-lab technological methods. Methods. In this study, we developed an intelligent theoretical deep learning framework, "IGPred-HDnet" for the discrimination of IGPs and non-IGPs. Three types of promising descriptors are feature extraction based on graphical and statistical features (FEGS), amphiphilic pseudo-amino acid composition (Amp-PseAAC), and dipeptide composition (DPC) to extract the graphical, physicochemical, and sequential features. Next, the extracted attributes are evaluated through machine learning, i.e., decision tree (DT), support vector machine (SVM), k-nearest neighbour (KNN), and hierarchical deep network (HDnet) classifiers. The proposed predictor IGPred-HDnet was trained and tested using a 10-fold cross-validation and independent test. Results and Conclusion. The success rates in terms of accuracy (ACC) and Matthew's correlation coefficient (MCC) of IGPred-HDnet on training and independent dataset (Dtrain Dtest) are ACC = 98.00%, 99.10%, and MCC = 0.958, and 0.980 points, respectively. The empirical outcomes demonstrate that the IGPred-HDnet model efficacy on both datasets using the novel FEGS feature and HDnet algorithm achieved superior predictions to other existing computational models. We hope this research will provide great insights into the large-scale identification of IGPs and pharmaceutical companies in new drug design.
Collapse
Affiliation(s)
- Zakir Ali
- Department of Computer Science, School of Science and Technology, University of Management and Technology, Lahore, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Science and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
7
|
Preeti P, Nath SK, Arambam N, Sharma T, Choudhury PR, Choudhury A, Khanna V, Strych U, Hotez PJ, Bottazzi ME, Rawal K. Vaxi-DL: An Artificial Intelligence-Enabled Platform for Vaccine Development. Methods Mol Biol 2023; 2673:305-316. [PMID: 37258923 DOI: 10.1007/978-1-0716-3239-0_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Vaccine development is a complex and long process. It involves several steps, including computational studies, experimental analyses, animal model system studies, and clinical trials. This process can be accelerated by using in silico antigen screening to identify potential vaccine candidates. In this chapter, we describe a deep learning-based technique which utilizes 18 biological and 9154 physicochemical properties of proteins for finding potential vaccine candidates. Using this technique, a new web-based system, named Vaxi-DL, was developed which helped in finding new vaccine candidates from bacteria, protozoa, viruses, and fungi. Vaxi-DL is available at: https://vac.kamalrawal.in/vaxidl/ .
Collapse
Affiliation(s)
- P Preeti
- Centre for Computational Biology and Bioinformatics, AIB, Amity University, Noida, Uttar Pradesh, India
| | - Swarsat Kaushik Nath
- Centre for Computational Biology and Bioinformatics, AIB, Amity University, Noida, Uttar Pradesh, India
| | - Nevidita Arambam
- Centre for Computational Biology and Bioinformatics, AIB, Amity University, Noida, Uttar Pradesh, India
| | - Trapti Sharma
- Centre for Computational Biology and Bioinformatics, AIB, Amity University, Noida, Uttar Pradesh, India
| | - Priyanka Ray Choudhury
- Centre for Computational Biology and Bioinformatics, AIB, Amity University, Noida, Uttar Pradesh, India
| | - Alakto Choudhury
- Centre for Computational Biology and Bioinformatics, AIB, Amity University, Noida, Uttar Pradesh, India
| | - Vrinda Khanna
- Centre for Computational Biology and Bioinformatics, AIB, Amity University, Noida, Uttar Pradesh, India
| | - Ulrich Strych
- Department of Pediatrics, Division of Tropical Medicine, Baylor College of Medicine, Houston, TX, USA
- Texas Children's Hospital Center for Vaccine Development, Houston, TX, USA
| | - Peter J Hotez
- Department of Pediatrics, Division of Tropical Medicine, Baylor College of Medicine, Houston, TX, USA
- Texas Children's Hospital Center for Vaccine Development, Houston, TX, USA
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, USA
- Department of Biology, Baylor University, Waco, TX, USA
| | - Maria Elena Bottazzi
- Department of Pediatrics, Division of Tropical Medicine, Baylor College of Medicine, Houston, TX, USA
- Texas Children's Hospital Center for Vaccine Development, Houston, TX, USA
- Department of Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, USA
- Department of Biology, Baylor University, Waco, TX, USA
| | - Kamal Rawal
- Centre for Computational Biology and Bioinformatics, AIB, Amity University, Noida, Uttar Pradesh, India.
| |
Collapse
|
8
|
Zhao Z, Gui J, Yao A, Le NQK, Chua MCH. Improved Prediction Model of Protein and Peptide Toxicity by Integrating Channel Attention into a Convolutional Neural Network and Gated Recurrent Units. ACS OMEGA 2022; 7:40569-40577. [PMID: 36385847 PMCID: PMC9647964 DOI: 10.1021/acsomega.2c05881] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Accepted: 10/19/2022] [Indexed: 06/16/2023]
Abstract
In recent times, the importance of peptides in the biomedical domain has received increasing concern in terms of their effect on multiple disease treatments. However, before successful large-scale implementation in the industry, accurate identification of peptide toxicity is a vital prerequisite. The existing computational methods have reached good results from toxicity prediction, and we present an improved model based on different deep learning architectures. The modification mainly focuses on two aspects: sequence encoding and variational information bottlenecks. Consequently, one of our modified plans shows an obvious increase in sensitivity, while the rest show good performance meanwhile adding novelty in the peptide toxicity prediction domain. In detail, our best model could achieve an accuracy of 97.38 and 95.03% in protein and peptide toxicity predictions, respectively. The performance was superior to previous predictors on the same datasets.
Collapse
Affiliation(s)
- Zhengyun Zhao
- Institute of Systems
Science, National University of Singapore, 25 Heng Mui Keng Terrace, Singapore 119615, Singapore
| | - Jingyu Gui
- Institute of Systems
Science, National University of Singapore, 25 Heng Mui Keng Terrace, Singapore 119615, Singapore
| | - Anqi Yao
- Institute of Systems
Science, National University of Singapore, 25 Heng Mui Keng Terrace, Singapore 119615, Singapore
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence
in Medicine, College of Medicine, Taipei
Medical University, Taipei 106, Taiwan
- Research
Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Matthew Chin Heng Chua
- Institute of Systems
Science, National University of Singapore, 25 Heng Mui Keng Terrace, Singapore 119615, Singapore
| |
Collapse
|
9
|
Li W, Yang L, Qiu Y, Yuan Y, Li X, Meng Z. FFP: joint Fast Fourier transform and fractal dimension in amino acid property-aware phylogenetic analysis. BMC Bioinformatics 2022; 23:347. [PMID: 35986255 PMCID: PMC9392226 DOI: 10.1186/s12859-022-04889-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 08/11/2022] [Indexed: 11/10/2022] Open
Abstract
Abstract
Background
Amino acid property-aware phylogenetic analysis (APPA) refers to the phylogenetic analysis method based on amino acid property encoding, which is used for understanding and inferring evolutionary relationships between species from the molecular perspective. Fast Fourier transform (FFT) and Higuchi’s fractal dimension (HFD) have excellent performance in describing sequences’ structural and complexity information for APPA. However, with the exponential growth of protein sequence data, it is very important to develop a reliable APPA method for protein sequence analysis.
Results
Consequently, we propose a new method named FFP, it joints FFT and HFD. Firstly, FFP is used to encode protein sequences on the basis of the important physicochemical properties of amino acids, the dissociation constant, which determines acidity and basicity of protein molecules. Secondly, FFT and HFD are used to generate the feature vectors of encoded sequences, whereafter, the distance matrix is calculated from the cosine function, which describes the degree of similarity between species. The smaller the distance between them, the more similar they are. Finally, the phylogenetic tree is constructed. When FFP is tested for phylogenetic analysis on four groups of protein sequences, the results are obviously better than other comparisons, with the highest accuracy up to more than 97%.
Conclusion
FFP has higher accuracy in APPA and multi-sequence alignment. It also can measure the protein sequence similarity effectively. And it is hoped to play a role in APPA’s related research.
Collapse
|
10
|
Li W, Yang L, Meng Z, Qiu Y, Wang PSP, Li X. Phylogenetic Analysis: A Novel Method of Protein Sequence Similarity Analysis. INT J PATTERN RECOGN 2022. [DOI: 10.1142/s0218001422580071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein sequence similarity analysis (PSSA) is a significant task in bioinformatics, which can obtain information about unknown sequences such as protein structures and homology relationships. Protein sequence refers to the series of amino acids with rich physical and chemical properties, namely the basic structure of proteins. However, sequence similarity analysis and phylogenetic analysis between different species which have complex amino acid sequences is a challenging problem. In this paper, nine properties of amino acids were considered and the sequence was converted into numerical values by principal component analysis (PCA); with Haar Wavelet Transform, and Higuchi fractal dimension (HFD), a new feature vector is constructed to represent the sequence; Spearman distance was selected to calculate the distance matrix and the phylogenetic tree was constructed. In this paper, two representative protein sequences (9 ND5 (NADH dehydrogenase 5) and 8 ND6 (NADH dehydrogenase 6)) were selected for similarity analysis and phylogenetic analysis, and compared with MEGA software and other existing methods. The extensive results show that our method is outperforming and results consistent with the known facts.
Collapse
Affiliation(s)
- Wei Li
- School of Computer, Electronics and Information, Guangxi University, Nanning, P. R. China
| | - Lina Yang
- School of Computer, Electronics and Information, Guangxi University, Nanning, P. R. China
| | - Zuqiang Meng
- School of Computer, Electronics and Information, Guangxi University, Nanning, P. R. China
| | - Yu Qiu
- School of Computer, Electronics and Information, Guangxi University, Nanning, P. R. China
| | | | - Xichun Li
- Guangxi Normal University for Nationalities, Chongzuo 532200, China
| |
Collapse
|
11
|
Wei L, Ye X, Sakurai T, Mu Z, Wei L. ToxIBTL: prediction of peptide toxicity based on information bottleneck and transfer learning. Bioinformatics 2022; 38:1514-1524. [PMID: 34999757 DOI: 10.1093/bioinformatics/btac006] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 11/29/2021] [Accepted: 01/04/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Recently, peptides have emerged as a promising class of pharmaceuticals for various diseases treatment poised between traditional small molecule drugs and therapeutic proteins. However, one of the key bottlenecks preventing them from therapeutic peptides is their toxicity toward human cells, and few available algorithms for predicting toxicity are specially designed for short-length peptides. RESULTS We present ToxIBTL, a novel deep learning framework by utilizing the information bottleneck principle and transfer learning to predict the toxicity of peptides as well as proteins. Specifically, we use evolutionary information and physicochemical properties of peptide sequences and integrate the information bottleneck principle into a feature representation learning scheme, by which relevant information is retained and the redundant information is minimized in the obtained features. Moreover, transfer learning is introduced to transfer the common knowledge contained in proteins to peptides, which aims to improve the feature representation capability. Extensive experimental results demonstrate that ToxIBTL not only achieves a higher prediction performance than state-of-the-art methods on the peptide dataset, but also has a competitive performance on the protein dataset. Furthermore, a user-friendly online web server is established as the implementation of the proposed ToxIBTL. AVAILABILITY AND IMPLEMENTATION The proposed ToxIBTL and data can be freely accessible at http://server.wei-group.net/ToxIBTL. Our source code is available at https://github.com/WLYLab/ToxIBTL. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Lesong Wei
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Zengchao Mu
- School of Mathematics and Statistics, Shandong University, Weihai, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China
| |
Collapse
|
12
|
Li C, Dai Q, He PA. A time series representation of protein sequences for similarity comparison. J Theor Biol 2022; 538:111039. [DOI: 10.1016/j.jtbi.2022.111039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Revised: 01/18/2022] [Accepted: 01/20/2022] [Indexed: 10/19/2022]
|