1
|
Qi D, Liu T. VotePLMs-AFP: Identification of antifreeze proteins using transformer-embedding features and ensemble learning. Biochim Biophys Acta Gen Subj 2024; 1868:130721. [PMID: 39426757 DOI: 10.1016/j.bbagen.2024.130721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Revised: 09/24/2024] [Accepted: 10/11/2024] [Indexed: 10/21/2024]
Abstract
Antifreeze proteins (AFPs) are a unique class of biomolecules capable of protecting other proteins, cell membranes, and cellular structures within organisms from damage caused by freezing conditions. Given the significance of AFPs in various domains such as biotechnology, agriculture, and medicine, several machine learning methods have been developed to identify AFPs. However, due to the complexity and diversity of AFPs, the predictive performance of existing methods is limited. Therefore, there is an urgent need to develop an efficient and rapid computational method for accurately predicting AFPs. In this study, we proposed a novel predictor based on transformer-embedding features and ensemble learning for the identification of AFPs, termed VotePLMs-AFP. Firstly, three types of feature descriptors were extracted from pre-trained protein language models (PLMs) during the feature extraction process. Subsequently, we analyzed six combinations generated by these three embeddings to explore the optimal feature set, which was input into the soft voting-based ensemble learning classifier for the identification of AFPs. Finally, we evaluated the model on the two benchmark datasets. The experimental results show that our model achieves high prediction accuracy in 10-fold cross-validation (CV) and independent set testing, outperforming existing state-of-the-art methods. Therefore, our model could serve as an effective tool for predicting AFPs.
Collapse
Affiliation(s)
- Dawei Qi
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.
| |
Collapse
|
2
|
Yadav AK, Gupta PK, Singh TR. PMTPred: machine-learning-based prediction of protein methyltransferases using the composition of k-spaced amino acid pairs. Mol Divers 2024; 28:2301-2315. [PMID: 39033257 DOI: 10.1007/s11030-024-10937-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 07/10/2024] [Indexed: 07/23/2024]
Abstract
Protein methyltransferases (PMTs) are a group of enzymes that help catalyze the transfer of a methyl group to its substrates. These enzymes play an important role in epigenetic regulation and can methylate various substrates with DNA, RNA, protein, and small-molecule secondary metabolites. Dysregulation of methyltransferases is implicated in various human cancers. However, in light of the well-recognized significance of PMTs, reliable and efficient identification methods are essential. In the present work, we propose a machine-learning-based method for the identification of PMTs. Various sequence-based features were calculated, and prediction models were trained using various machine-learning algorithms using a tenfold cross-validation technique. After evaluating each model on the dataset, the SVM-based CKSAAP model achieved the highest prediction accuracy with balanced sensitivity and specificity. Also, this SVM model outperformed deep-learning algorithms for the prediction of PMTs. In addition, cross-database validation was performed to ensure the robustness of the model. Feature importance was assessed using shapley additive explanations (SHAP) values, providing insights into the contributions of different features to the model's predictions. Finally, the SVM-based CKSAAP model was implemented in a standalone tool, PMTPred, due to its consistent performance during independent testing and cross-database evaluation. We believe that PMTPred will be a useful and efficient tool for the identification of PMTs. The PMTPred is freely available for download at https://github.com/ArvindYadav7/PMTPred and http://www.bioinfoindia.org/PMTPred/home.html for research and academic use.
Collapse
Affiliation(s)
- Arvind Kumar Yadav
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Solan- 173234, Himachal Pradesh, India
| | - Pradeep Kumar Gupta
- Department of Computer Science and Engineering, Jaypee University of Information Technology, Solan- 173234, Himachal Pradesh, India
- School of Computing, Department of Data Science and Engineering, Mohan Babu University, Tirupati- 517102, Andhra Pradesh, India
| | - Tiratha Raj Singh
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Solan- 173234, Himachal Pradesh, India.
- Centre of Excellence in Healthcare Technologies and Informatics (CHETI), Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Solan- 173234, Himachal Pradesh, India.
| |
Collapse
|
3
|
Feng C, Wei H, Li X, Feng B, Xu C, Zhu X, Liu R. A stacking-based algorithm for antifreeze protein identification using combined physicochemical, pseudo amino acid composition, and reduction property features. Comput Biol Med 2024; 176:108534. [PMID: 38754217 DOI: 10.1016/j.compbiomed.2024.108534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Revised: 04/03/2024] [Accepted: 04/28/2024] [Indexed: 05/18/2024]
Abstract
Antifreeze proteins have wide applications in the medical and food industries. In this study, we propose a stacking-based classifier that can effectively identify antifreeze proteins. Initially, feature extraction was performed in three aspects: reduction properties, scalable pseudo amino acid composition, and physicochemical properties. A hybrid feature set comprised of the combined information from these three categories was obtained. Subsequently, we trained the training set based on LightGBM, XGBoost, and RandomForest algorithms, and the training outcomes were passed to the Logistic algorithm for matching, thereby establishing a stacking algorithm. The proposed algorithm was tested on the test set and an independent validation set. Experimental data indicates that the algorithm achieved a recognition accuracy of 98.3 %, and an accuracy of 98.5 % on the validation set. Lastly, we analyzed the reasons why numerical features achieved high recognition capabilities from multiple aspects. Data dimensionality reduction and the analysis from two-dimensional and three-dimensional views revealed separability between positive and negative samples, and the protein three-dimensional structure further demonstrated significant differences in related features between the two samples. Analysis of the classifier revealed that Hr*Hr, HrHr, and Sc-PseAAC_1, 188D(152,116,57,183) were among the seven most important numerical features affecting algorithm recognition. For Hr*Hr and HrHr, supportive sequence level evidence for the reduction dictionary was found in terms of conservation area analysis, multiple sequence alignment, and amino acid conservative substitution. Moreover, the importance of the reduction dictionary was recognized through a comparative analysis of importance before and after the reduction, realizing the effectiveness of the dictionary in improving feature importance. A decision tree model has been utilized to discern the distinctions between dipeptides associated with the physical and chemical properties of His(H), Iso(I), Leu(L), and Lys(K) and other dipeptides. We finally analyzed the other seven features of importance, and data analysis confirmed that hydrophobicity, secondary structure, charge properties, van der Waals forces, and solvent accessibility are also factors affecting the antifreeze capability of proteins.
Collapse
Affiliation(s)
- Changli Feng
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Haiyan Wei
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Xin Li
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Bin Feng
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Chugui Xu
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Xiaorong Zhu
- Department of Information Science and Technology, Taishan University, Taian, 271000, China.
| | - Ruijun Liu
- School of Software, Beihang University, Beijing, 100191, China.
| |
Collapse
|
4
|
Box ICH, van der Burg KRL, Marshall KE. Analysis of Ice-Binding Protein Evolution. Methods Mol Biol 2024; 2730:219-229. [PMID: 37943462 DOI: 10.1007/978-1-0716-3503-2_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2023]
Abstract
Discovering novel ice-binding proteins (IBPs) is important for understanding the evolution of IBPs but it is difficult to determine where resources should be directed in the search for novel IBPs. For this reason, we developed a simple bioinformatic approach for aiding in the determination of where to direct efforts in the search for IBPs. First, BLAST is used to obtain a candidate list of putative IBPs. Next, phylogenetic trees are constructed to map the candidate list of putative IBPs to determine if any patterns are forming. These candidate putative IBPs and their patterns are then assessed through the production of ancestral sequences and reverse BLAST searches, in addition to the use of IBP calculators, to determine which sequences should be cut to produce the final putative IBP list. Finally, we explain an avenue to investigate these putative IBPs further for the development of hypotheses on their evolution.
Collapse
Affiliation(s)
- Isaiah C H Box
- Department of Zoology, University of British Columbia, Vancouver, BC, Canada
| | | | - Katie E Marshall
- Department of Zoology, University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
5
|
Dhibar S, Jana B. Accurate Prediction of Antifreeze Protein from Sequences through Natural Language Text Processing and Interpretable Machine Learning Approaches. J Phys Chem Lett 2023; 14:10727-10735. [PMID: 38009833 DOI: 10.1021/acs.jpclett.3c02817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Antifreeze proteins (AFPs) bind to growing iceplanes owing to their structural complementarity nature, thereby inhibiting the ice-crystal growth by thermal hysteresis. Classification of AFPs from sequence is a difficult task due to their low sequence similarity, and therefore, the usual sequence similarity algorithms, like Blast and PSI-Blast, are not efficient. Here, a method combining n-gram feature vectors and machine learning models to accelerate the identification of potential AFPs from sequences is proposed. All these n-gram features are extracted from the K-mer counting method. The comparative analysis reveals that, among different machine learning models, Xgboost outperforms others in predicting AFPs from sequence when penta-mers are used as a feature vector. When tested on an independent dataset, our method performed better compared to other existing ones with sensitivity of 97.50%, recall of 98.30%, and f1 score of 99.10%. Further, we used the SHAP method, which provides important insight into the functional activity of AFPs.
Collapse
Affiliation(s)
- Saikat Dhibar
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| | - Biman Jana
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| |
Collapse
|
6
|
Ma X, Liang Y, Zhang S. iAVPs-ResBi: Identifying antiviral peptides by using deep residual network and bidirectional gated recurrent unit. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:21563-21587. [PMID: 38124610 DOI: 10.3934/mbe.2023954] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Human history is also the history of the fight against viral diseases. From the eradication of viruses to coexistence, advances in biomedicine have led to a more objective understanding of viruses and a corresponding increase in the tools and methods to combat them. More recently, antiviral peptides (AVPs) have been discovered, which due to their superior advantages, have achieved great impact as antiviral drugs. Therefore, it is very necessary to develop a prediction model to accurately identify AVPs. In this paper, we develop the iAVPs-ResBi model using k-spaced amino acid pairs (KSAAP), encoding based on grouped weight (EBGW), enhanced grouped amino acid composition (EGAAC) based on the N5C5 sequence, composition, transition and distribution (CTD) based on physicochemical properties for multi-feature extraction. Then we adopt bidirectional long short-term memory (BiLSTM) to fuse features for obtaining the most differentiated information from multiple original feature sets. Finally, the deep model is built by combining improved residual network and bidirectional gated recurrent unit (BiGRU) to perform classification. The results obtained are better than those of the existing methods, and the accuracies are 95.07, 98.07, 94.29 and 97.50% on the four datasets, which show that iAVPs-ResBi can be used as an effective tool for the identification of antiviral peptides. The datasets and codes are freely available at https://github.com/yunyunliang88/iAVPs-ResBi.
Collapse
Affiliation(s)
- Xinyan Ma
- School of Science, Xi'an Polytechnic University, Xi'an 710048, China
| | - Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an 710048, China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| |
Collapse
|
7
|
Chen S, Liao Y, Zhao J, Bin Y, Zheng C. PACVP: Prediction of Anti-Coronavirus Peptides Using a Stacking Learning Strategy With Effective Feature Representation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3106-3116. [PMID: 37022025 DOI: 10.1109/tcbb.2023.3238370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Due to the global outbreak of COVID-19 and its variants, antiviral peptides with anti-coronavirus activity (ACVPs) represent a promising new drug candidate for the treatment of coronavirus infection. At present, several computational tools have been developed to identify ACVPs, but the overall prediction performance is still not enough to meet the actual therapeutic application. In this study, we constructed an efficient and reliable prediction model PACVP (Prediction of Anti-CoronaVirus Peptides) for identifying ACVPs based on effective feature representation and a two-layer stacking learning framework. In the first layer, we use nine feature encoding methods with different feature representation angles to characterize the rich sequence information and fuse them into a feature matrix. Secondly, data normalization and unbalanced data processing are carried out. Next, 12 baseline models are constructed by combining three feature selection methods and four machine learning classification algorithms. In the second layer, we input the optimal probability features into the logistic regression algorithm (LR) to train the final model PACVP. The experiments show that PACVP achieves favorable prediction performance on independent test dataset, with ACC of 0.9208 and AUC of 0.9465. We hope that PACVP will become a useful method for identifying, annotating and characterizing novel ACVPs.
Collapse
|
8
|
Khan A, Uddin J, Ali F, Kumar H, Alghamdi W, Ahmad A. AFP-SPTS: An Accurate Prediction of Antifreeze Proteins Using Sequential and Pseudo-Tri-Slicing Evolutionary Features with an Extremely Randomized Tree. J Chem Inf Model 2023; 63:826-834. [PMID: 36649569 DOI: 10.1021/acs.jcim.2c01417] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
The development of intracellular ice in the bodies of cold-blooded living organisms may cause them to die. These species yield antifreeze proteins (AFPs) to live in subzero temperature environments. Additionally, AFPs are implemented in biotechnological, industrial, agricultural, and medical fields. Machine learning-based predictors were presented for AFP identification. However, more accurate predictors are still highly desirable for boosting the AFP prediction. This work presents a novel approach, named AFP-SPTS, for the correct prediction of AFPs. We explored the discriminative features with four schemes, namely, dipeptide deviation from the expected mean (DDE), reduced amino acid alphabet (RAAA), grouped dipeptide composition (GDPC), and a novel representative method, called pseudo-position-specific scoring matrix tri-slicing (PseTS-PSSM). Considering the advantages of ensemble learning strategy, we fused each feature vector into different combinations and trained the models with five machine learning algorithms, i.e., multilayer perceptron (MLP), extremely randomized tree (ERT), decision tree (DT), random forest (RF), and AdaBoost. Among all models, PseTS-PSSM + RAAA with an extremely randomized tree attained the best outcomes. The proposed predictor (AFP-SPTS) boosted the accuracies of AFPs in the literature by 1.82 and 4.1%.
Collapse
Affiliation(s)
- Adnan Khan
- Qurtuba University of Science and Information Technology, Peshawar5000, Khyber Pakhtunkhwa, Pakistan
| | - Jamal Uddin
- Qurtuba University of Science and Information Technology, Peshawar5000, Khyber Pakhtunkhwa, Pakistan
| | - Farman Ali
- Sarhad University of Science and Information Technology, Mardan Campus, Peshawar23200, Pakistan.,Department of Elementary and Secondary Education Department, Government of Khyber Pakhtunkhwa, Peshawar5000, Khyber Pakhtunkhwa, Pakistan
| | - Harish Kumar
- Department of Computer Science, College of Computer Science, King Khalid University, Abha61421, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King AbdulAziz University, Jeddah21589, Saudi Arabia
| | - Aftab Ahmad
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan23200, Pakistan
| |
Collapse
|
9
|
Khan A, Uddin J, Ali F, Ahmad A, Alghushairy O, Banjar A, Daud A. Prediction of antifreeze proteins using machine learning. Sci Rep 2022; 12:20672. [PMID: 36450775 PMCID: PMC9712683 DOI: 10.1038/s41598-022-24501-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Accepted: 11/16/2022] [Indexed: 12/03/2022] Open
Abstract
Living organisms including fishes, microbes, and animals can live in extremely cold weather. To stay alive in cold environments, these species generate antifreeze proteins (AFPs), also referred to as ice-binding proteins. Moreover, AFPs are extensively utilized in many important fields including medical, agricultural, industrial, and biotechnological. Several predictors were constructed to identify AFPs. However, due to the sequence and structural heterogeneity of AFPs, correct identification is still a challenging task. It is highly desirable to develop a more promising predictor. In this research, a novel computational method, named AFP-LXGB has been proposed for prediction of AFPs more precisely. The information is explored by Dipeptide Composition (DPC), Grouped Amino Acid Composition (GAAC), Position Specific Scoring Matrix-Segmentation-Autocorrelation Transformation (Sg-PSSM-ACT), and Pseudo Position Specific Scoring Matrix Tri-Slicing (PseTS-PSSM). Keeping the benefits of ensemble learning, these feature sets are concatenated into different combinations. The best feature set is selected by Extremely Randomized Tree-Recursive Feature Elimination (ERT-RFE). The models are trained by Light eXtreme Gradient Boosting (LXGB), Random Forest (RF), and Extremely Randomized Tree (ERT). Among classifiers, LXGB has obtained the best prediction results. The novel method (AFP-LXGB) improved the accuracies by 3.70% and 4.09% than the best methods. These results verified that AFP-LXGB can predict AFPs more accurately and can participate in a significant role in medical, agricultural, industrial, and biotechnological fields.
Collapse
Affiliation(s)
- Adnan Khan
- grid.444994.00000 0004 0609 284XQurtuba University of Science and Technology, Peshawar, Khyber Pakhtunkhwa Pakistan
| | - Jamal Uddin
- grid.444994.00000 0004 0609 284XQurtuba University of Science and Technology, Peshawar, Khyber Pakhtunkhwa Pakistan
| | - Farman Ali
- Department of Elementary and Secondary Education, Peshawar, Khyber Pakhtunkhwa Pakistan ,grid.444996.20000 0004 0609 292XSarhad University of Science and Information Technology, Mardan, Pakistan
| | - Ashfaq Ahmad
- grid.440522.50000 0004 0478 6450Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Omar Alghushairy
- grid.460099.2Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Ameen Banjar
- grid.460099.2Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Ali Daud
- Abu Dhabi School of Management, Abu Dhabi, United Arab Emirates ,grid.460099.2Department of Computer Science and Artificial Intelligence, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
10
|
Machine learning algorithms identify demographics, dietary features, and blood biomarkers associated with stroke records. J Neurol Sci 2022; 440:120335. [PMID: 35863116 DOI: 10.1016/j.jns.2022.120335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Revised: 05/26/2022] [Accepted: 07/05/2022] [Indexed: 11/22/2022]
Abstract
OBJECTIVE We conducted a comprehensive evaluation of features associated with stroke records. METHODS We screened the dietary nutrients, blood biomarkers, and clinical information from the National Health and Nutrition Examination Survey (NHANES) 2015-16 database to assess a self-reported history of all strokes (136 strokes, n = 4381). We computed feature importance, built machine learning (ML) models, developed a nomogram, and validated the nomogram on NHANES 2007-08, 2017-18, and the baseline UK Biobank. We calculated the odds ratios with/without adjusting sampling weights (OR/ORw). RESULTS The clinical features have the best predictive power compared to dietary nutrients and blood biomarkers, with 22.8% increased average area under the receiver operating characteristic curves (AUROC) in ML models. We further modeled with ten most important clinical features without compromising the predictive performance. The key features positively associated with stroke include age, cigarette smoking, tobacco smoking, Caucasian or African American race, hypertension, diabetes mellitus, asthma history; the negatively associated feature is the family income. The nomogram based on these key features achieved good performances (AUROC between 0.753 and 0.822) on the test set, the NHANES 2007-08, 2017-18, and the UK Biobank. Key features from the nomogram model include age (OR = 1.05, ORw = 1.06), Caucasian/African American (OR = 2.68, ORw = 2.67), diabetes mellitus (OR = 2.30, ORw = 1.99), asthma (OR = 2.10, ORw = 2.41), hypertension (OR = 1.86, ORw = 2.10), and income (OR = 0.83, ORw = 0.81). CONCLUSIONS We identified clinical key features and built predictive models for assessing stroke records with high performance. A nomogram consisting of questionnaire-based variables would help identify stroke survivors and evaluate the potential risk of stroke.
Collapse
|
11
|
Satyakam, Zinta G, Singh RK, Kumar R. Cold adaptation strategies in plants—An emerging role of epigenetics and antifreeze proteins to engineer cold resilient plants. Front Genet 2022; 13:909007. [PMID: 36092945 PMCID: PMC9459425 DOI: 10.3389/fgene.2022.909007] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2022] [Accepted: 07/21/2022] [Indexed: 11/13/2022] Open
Abstract
Cold stress adversely affects plant growth, development, and yield. Also, the spatial and geographical distribution of plant species is influenced by low temperatures. Cold stress includes chilling and/or freezing temperatures, which trigger entirely different plant responses. Freezing tolerance is acquired via the cold acclimation process, which involves prior exposure to non-lethal low temperatures followed by profound alterations in cell membrane rigidity, transcriptome, compatible solutes, pigments and cold-responsive proteins such as antifreeze proteins. Moreover, epigenetic mechanisms such as DNA methylation, histone modifications, chromatin dynamics and small non-coding RNAs play a crucial role in cold stress adaptation. Here, we provide a recent update on cold-induced signaling and regulatory mechanisms. Emphasis is given to the role of epigenetic mechanisms and antifreeze proteins in imparting cold stress tolerance in plants. Lastly, we discuss genetic manipulation strategies to improve cold tolerance and develop cold-resistant plants.
Collapse
|
12
|
Box ICH, Matthews BJ, Marshall KE. Molecular evidence of intertidal habitats selecting for repeated ice-binding protein evolution in invertebrates. J Exp Biol 2022; 225:274373. [PMID: 35258616 DOI: 10.1242/jeb.243409] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Accepted: 12/20/2021] [Indexed: 12/21/2022]
Abstract
Ice-binding proteins (IBPs) have evolved independently in multiple taxonomic groups to improve their survival at sub-zero temperatures. Intertidal invertebrates in temperate and polar regions frequently encounter sub-zero temperatures, yet there is little information on IBPs in these organisms. We hypothesized that there are far more IBPs than are currently known and that the occurrence of freezing in the intertidal zone selects for these proteins. We compiled a list of genome-sequenced invertebrates across multiple habitats and a list of known IBP sequences and used BLAST to identify a wide array of putative IBPs in those invertebrates. We found that the probability of an invertebrate species having an IBP was significantly greater in intertidal species than in those primarily found in open ocean or freshwater habitats. These intertidal IBPs had high sequence similarity to fish and tick antifreeze glycoproteins and fish type II antifreeze proteins. Previously established classifiers based on machine learning techniques further predicted ice-binding activity in the majority of our newly identified putative IBPs. We investigated the potential evolutionary origin of one putative IBP from the hard-shelled mussel Mytilus coruscus and suggest that it arose through gene duplication and neofunctionalization. We show that IBPs likely readily evolve in response to freezing risk and that there is an array of uncharacterized IBPs, and highlight the need for broader laboratory-based surveys of the diversity of ice-binding activity across diverse taxonomic and ecological groups.
Collapse
Affiliation(s)
- Isaiah C H Box
- Department of Zoology, University of British Columbia, 6270 University Blvd, Vancouver, BC, CanadaV6T 1Z4
| | - Benjamin J Matthews
- Department of Zoology, University of British Columbia, 6270 University Blvd, Vancouver, BC, CanadaV6T 1Z4
| | - Katie E Marshall
- Department of Zoology, University of British Columbia, 6270 University Blvd, Vancouver, BC, CanadaV6T 1Z4
| |
Collapse
|
13
|
Ekpo MD, Xie J, Hu Y, Liu X, Liu F, Xiang J, Zhao R, Wang B, Tan S. Antifreeze Proteins: Novel Applications and Navigation towards Their Clinical Application in Cryobanking. Int J Mol Sci 2022; 23:2639. [PMID: 35269780 PMCID: PMC8910022 DOI: 10.3390/ijms23052639] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Revised: 02/16/2022] [Accepted: 02/25/2022] [Indexed: 12/04/2022] Open
Abstract
Antifreeze proteins (AFPs) or thermal hysteresis (TH) proteins are biomolecular gifts of nature to sustain life in extremely cold environments. This family of peptides, glycopeptides and proteins produced by diverse organisms including bacteria, yeast, insects and fish act by non-colligatively depressing the freezing temperature of the water below its melting point in a process termed thermal hysteresis which is then responsible for ice crystal equilibrium and inhibition of ice recrystallisation; the major cause of cell dehydration, membrane rupture and subsequent cryodamage. Scientists on the other hand have been exploring various substances as cryoprotectants. Some of the cryoprotectants in use include trehalose, dimethyl sulfoxide (DMSO), ethylene glycol (EG), sucrose, propylene glycol (PG) and glycerol but their extensive application is limited mostly by toxicity, thus fueling the quest for better cryoprotectants. Hence, extracting or synthesizing antifreeze protein and testing their cryoprotective activity has become a popular topic among researchers. Research concerning AFPs encompasses lots of effort ranging from understanding their sources and mechanism of action, extraction and purification/synthesis to structural elucidation with the aim of achieving better outcomes in cryopreservation. This review explores the potential clinical application of AFPs in the cryopreservation of different cells, tissues and organs. Here, we discuss novel approaches, identify research gaps and propose future research directions in the application of AFPs based on recent studies with the aim of achieving successful clinical and commercial use of AFPs in the future.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Songwen Tan
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha 410013, China; (M.D.E.); (J.X.); (Y.H.); (X.L.); (F.L.); (J.X.); (R.Z.); (B.W.)
| |
Collapse
|
14
|
Al-Saggaf UM, Usman M, Naseem I, Moinuddin M, Jiman AA, Alsaggaf MU, Alshoubaki HK, Khan S. ECM-LSE: Prediction of Extracellular Matrix Proteins Using Deep Latent Space Encoding of k-Spaced Amino Acid Pairs. Front Bioeng Biotechnol 2021; 9:752658. [PMID: 34722479 PMCID: PMC8552119 DOI: 10.3389/fbioe.2021.752658] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2021] [Accepted: 09/13/2021] [Indexed: 12/26/2022] Open
Abstract
Extracelluar matrix (ECM) proteins create complex networks of macromolecules which fill-in the extracellular spaces of living tissues. They provide structural support and play an important role in maintaining cellular functions. Identification of ECM proteins can play a vital role in studying various types of diseases. Conventional wet lab-based methods are reliable; however, they are expensive and time consuming and are, therefore, not scalable. In this research, we propose a sequence-based novel machine learning approach for the prediction of ECM proteins. In the proposed method, composition of k-spaced amino acid pair (CKSAAP) features are encoded into a classifiable latent space (LS) with the help of deep latent space encoding (LSE). A comprehensive ablation analysis is conducted for performance evaluation of the proposed method. Results are compared with other state-of-the-art methods on the benchmark dataset, and the proposed ECM-LSE approach has shown to comprehensively outperform the contemporary methods.
Collapse
Affiliation(s)
- Ubaid M. Al-Saggaf
- Center of Excellence in Intelligent Engineering Systems, King Abdulaziz University, Jeddah, Saudi Arabia
- Electrical and Computer Engineering Department, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Usman
- Department of Computer Engineering, Chosun University, Gwangju, South Korea
| | - Imran Naseem
- Research and Development, Love For Data, Karachi, Pakistan
- School of Electrical, Electronic and Computer Engineering, The University of Western Australia, Perth, WA, Australia
- College of Engineering, Karachi Institute of Economics and Technology, Korangi Creek, Karachi, Pakistan
| | - Muhammad Moinuddin
- Center of Excellence in Intelligent Engineering Systems, King Abdulaziz University, Jeddah, Saudi Arabia
- Electrical and Computer Engineering Department, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ahmad A. Jiman
- Electrical and Computer Engineering Department, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mohammed U. Alsaggaf
- Center of Excellence in Intelligent Engineering Systems, King Abdulaziz University, Jeddah, Saudi Arabia
- Department of Radiology, Faculty of Medicine, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Hitham K. Alshoubaki
- Center of Excellence in Intelligent Engineering Systems, King Abdulaziz University, Jeddah, Saudi Arabia
- Electrical and Computer Engineering Department, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Shujaat Khan
- Department of Bio and Brain Engineering, Daejeon, South Korea
| |
Collapse
|
15
|
AoP-LSE: Antioxidant Proteins Classification Using Deep Latent Space Encoding of Sequence Features. Curr Issues Mol Biol 2021; 43:1489-1501. [PMID: 34698113 PMCID: PMC8928959 DOI: 10.3390/cimb43030105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 09/28/2021] [Accepted: 09/29/2021] [Indexed: 11/16/2022] Open
Abstract
It is of utmost importance to develop a computational method for accurate prediction of antioxidants, as they play a vital role in the prevention of several diseases caused by oxidative stress. In this correspondence, we present an effective computational methodology based on the notion of deep latent space encoding. A deep neural network classifier fused with an auto-encoder learns class labels in a pruned latent space. This strategy has eliminated the need to separately develop classifier and the feature selection model, allowing the standalone model to effectively harness discriminating feature space and perform improved predictions. A thorough analytical study has been presented alongwith the PCA/tSNE visualization and PCA-GCNR scores to show the discriminating power of the proposed method. The proposed method showed a high MCC value of 0.43 and a balanced accuracy of 76.2%, which is superior to the existing models. The model has been evaluated on an independent dataset during which it outperformed the contemporary methods by correctly identifying the novel proteins with an accuracy of 95%.
Collapse
|
16
|
Prediction and analysis of antifreeze proteins. Heliyon 2021; 7:e07953. [PMID: 34604556 PMCID: PMC8473546 DOI: 10.1016/j.heliyon.2021.e07953] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 07/28/2021] [Accepted: 09/03/2021] [Indexed: 11/20/2022] Open
Abstract
Antifreeze proteins (AFPs) are proteins that protect cellular fluids and body fluids from freezing by inhibiting the nucleation and growth of ice crystals and preventing ice recrystallization, thereby contributing to the maintenance of life in living organisms. They exist in fish, insects, microorganisms, and fungi. However, the number of known AFPs is currently limited, and it is essential to construct a reliable dataset of AFPs and develop a bioinformatics tool to predict AFPs. In this work, we first collected AFPs sequences from UniProtKB considering the reliability of annotations and, based on these datasets, developed a prediction system using random forest. We achieved accuracies of 0.961 and 0.947 for non-redundant sequences with less than 90% and 30% identities and achieved the accuracy of 0.953 for representative sequences for each species. Using the ability of random forest, we identified the sequence features that contributed to the prediction. Some sequence features were common to AFPs from different species. These features include the Cys content, Ala-Ala content, Trp-Gly content, and the amino acids' distribution related to the disorder propensity. The computer program and the dataset developed in this work are available from the GitHub site: https://github.com/ryomiya/Prediction-and-analysis-of-antifreeze-proteins.
Collapse
|
17
|
Akbar S, Ahmad A, Hayat M, Rehman AU, Khan S, Ali F. iAtbP-Hyb-EnC: Prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model. Comput Biol Med 2021; 137:104778. [PMID: 34481183 DOI: 10.1016/j.compbiomed.2021.104778] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 08/16/2021] [Accepted: 08/17/2021] [Indexed: 11/26/2022]
Abstract
Tuberculosis (TB) is a worldwide illness caused by the bacteria Mycobacterium tuberculosis. Owing to the high prevalence of multidrug-resistant tuberculosis, numerous traditional strategies for developing novel alternative therapies have been presented. The effectiveness and dependability of these procedures are not always consistent. Peptide-based therapy has recently been regarded as a preferable alternative due to its excellent selectivity in targeting specific cells without affecting the normal cells. However, due to the rapid growth of the peptide samples, predicting TB accurately has become a challenging task. To effectively identify antitubercular peptides, an intelligent and reliable prediction model is indispensable. An ensemble learning approach was used in this study to improve expected results by compensating for the shortcomings of individual classification algorithms. Initially, three distinct representation approaches were used to formulate the training samples: k-space amino acid composition, composite physiochemical properties, and one-hot encoding. The feature vectors of the applied feature extraction methods are then combined to generate a heterogeneous vector. Finally, utilizing individual and heterogeneous vectors, five distinct nature classification models were used to evaluate prediction rates. In addition, a genetic algorithm-based ensemble model was used to improve the suggested model's prediction and training capabilities. Using Training and independent datasets, the proposed ensemble model achieved an accuracy of 94.47% and 92.68%, respectively. It was observed that our proposed "iAtbP-Hyb-EnC" model outperformed and reported ~10% highest training accuracy than existing predictors. The "iAtbP-Hyb-EnC" model is suggested to be a reliable tool for scientists and might play a valuable role in academic research and drug discovery. The source code and all datasets are publicly available at https://github.com/Farman335/iAtbP-Hyb-EnC.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Ashfaq Ahmad
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Ateeq Ur Rehman
- Department of Information Technology, The University of Haripur, KP, Pakistan.
| | - Salman Khan
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Farman Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| |
Collapse
|
18
|
Kozuch DJ, Stillinger FH, Debenedetti PG. Genetic Algorithm Approach for the Optimization of Protein Antifreeze Activity Using Molecular Simulations. J Chem Theory Comput 2020; 16:7866-7873. [PMID: 33201707 DOI: 10.1021/acs.jctc.0c00773] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Antifreeze proteins (AFPs) are of much interest for their ability to inhibit ice growth at low concentrations. In this work, we present a genetic algorithm for the in silico design of AFP mutants with improved antifreeze activity, measured as the predicted thermal hysteresis at a fixed concentration, ΔTC. Central to the algorithm is our recently developed neural network method for predicting ΔTC from molecular simulations [Kozuch et al., PNAS, 115, 13252 (2018)]. Applying the algorithm to three structurally diverse AFPs, wfAFP, rQAE, and RiAFP, we find that significantly improved mutants are discovered for rQAE and RiAFP. Testing of the optimized mutants shows an increase in ΔTC of 0.572 ± 0.11 K (262 ± 50.6%) and 1.33 ± 0.14 K (39.9 ± 4.19%) over the native structures for rQAE and RiAFP, respectively. Structural analysis of the optimized mutants reveals that the algorithm is able to exploit two pathways for enhancing the predicted antifreeze activity of the mutants: (1) increasing the local order of surface waters by encouraging the formation of internal water channels in the protein and (2) increasing the total ice-binding area by improving the planar structure of the ice-binding surface. Additionally, analysis of all mutants explored by the algorithm reveals that a subset of residues, mainly nonpolar, are particularly helpful in improving antifreeze activity at the ice-binding surface.
Collapse
Affiliation(s)
- Daniel J Kozuch
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| | - Frank H Stillinger
- Department of Chemistry, Princeton University, Princeton, New Jersey 08544, United States
| | - Pablo G Debenedetti
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey 08544, United States
| |
Collapse
|