1
|
Prabhu H, Bhosale H, Sane A, Dhadwal R, Ramakrishnan V, Valadi J. Protein feature engineering framework for AMPylation site prediction. Sci Rep 2024; 14:8695. [PMID: 38622194 PMCID: PMC11369087 DOI: 10.1038/s41598-024-58450-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2023] [Accepted: 03/29/2024] [Indexed: 04/17/2024] Open
Abstract
AMPylation is a biologically significant yet understudied post-translational modification where an adenosine monophosphate (AMP) group is added to Tyrosine and Threonine residues primarily. While recent work has illuminated the prevalence and functional impacts of AMPylation, experimental identification of AMPylation sites remains challenging. Computational prediction techniques provide a faster alternative approach. The predictive performance of machine learning models is highly dependent on the features used to represent the raw amino acid sequences. In this work, we introduce a novel feature extraction pipeline to encode the key properties relevant to AMPylation site prediction. We utilize a recently published dataset of curated AMPylation sites to develop our feature generation framework. We demonstrate the utility of our extracted features by training various machine learning classifiers, on various numerical representations of the raw sequences extracted with the help of our framework. Tenfold cross-validation is used to evaluate the model's capability to distinguish between AMPylated and non-AMPylated sites. The top-performing set of features extracted achieved MCC score of 0.58, Accuracy of 0.8, AUC-ROC of 0.85 and F1 score of 0.73. Further, we elucidate the behaviour of the model on the set of features consisting of monogram and bigram counts for various representations using SHapley Additive exPlanations.
Collapse
Affiliation(s)
- Hardik Prabhu
- Computing and Data Sciences, FLAME University, Pune, 412115, India
- Robert Bosch Centre for Cyber Physical Systems, Indian Institute of Science, Bengaluru, 560012, India
| | | | - Aamod Sane
- Computing and Data Sciences, FLAME University, Pune, 412115, India
| | - Renu Dhadwal
- Computing and Data Sciences, FLAME University, Pune, 412115, India
| | - Vigneshwar Ramakrishnan
- Bioinformatics Center, School of Chemical and Biotechnology, SASTRA Deemed to be University, Thanjavur, 613401, India
| | - Jayaraman Valadi
- Computing and Data Sciences, FLAME University, Pune, 412115, India.
| |
Collapse
|
2
|
Sun X, Ding L, Zhang L, Lai S, Chen F. Interaction mechanisms of peanut protein isolate and high methoxyl pectin with ultrasound treatment: The effect of ultrasound parameters, biopolymer ratio, and pH. Food Chem 2023; 429:136810. [PMID: 37442086 DOI: 10.1016/j.foodchem.2023.136810] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2022] [Revised: 05/29/2023] [Accepted: 07/03/2023] [Indexed: 07/15/2023]
Abstract
Ultrasound could effectively change molecular structure of proteins, polysaccharides, and their interactions, and was used to treat the peanut protein isolate-high methoxy pectin (PPI-HMP) complexes in this study. Effects of different ultrasound parameters, PPI-HMP mixing ratio (40:1-5:2), and pH (2.0-8.0) on the PPI-HMP interactions were investigated. Turbidity, solution appearance, and Zeta-potential analysis revealed an electrostatic interaction between PPI and HMP from pH 2.0 to pH 6.0. Ultrasound changed the tertiary structure conformation of PPI according to the surface hydrophobicity analysis. Increased ultrasound power density and pH broke the hydrogen bonds between the complexes according to Fourier transform infrared spectroscopy analysis. Apparent viscosity and confocal laser scanning microscopy analysis showed that appropriate ultrasound treatment (5.43 W/cm3, 25 min, 25 °C) reduced the viscosity of the complexes, and enhanced the electrostatic and hydrophobic interactions between PPI and HMP. These findings will contribute to the application of PPI-HMP complexes in the food industry.
Collapse
Affiliation(s)
- Xiaoyang Sun
- College of Food and Biological Engineering, Henan University of Animal Husbandry and Economy, Zhengzhou, Henan 450046, PR China
| | - Ling Ding
- College of Food Science and Technology, Henan University of Technology, Zhengzhou, Henan 450001, PR China
| | - Lifen Zhang
- College of Food Science and Technology, Henan University of Technology, Zhengzhou, Henan 450001, PR China.
| | - Shaojuan Lai
- College of Basic Medicine, Guizhou University of Traditional Chinese Medicine, Guiyang, Guizhou 550025, PR China
| | - Fusheng Chen
- College of Food Science and Technology, Henan University of Technology, Zhengzhou, Henan 450001, PR China.
| |
Collapse
|
3
|
Bhosale H, Ramakrishnan V, Jayaraman VK. Support vector machine-based prediction of pore-forming toxins (PFT) using distributed representation of reduced alphabets. J Bioinform Comput Biol 2021; 19:2150028. [PMID: 34693886 DOI: 10.1142/s0219720021500281] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Bacterial virulence can be attributed to a wide variety of factors including toxins that harm the host. Pore-forming toxins are one class of toxins that confer virulence to the bacteria and are one of the promising targets for therapeutic intervention. In this work, we develop a sequence-based machine learning framework for the prediction of pore-forming toxins. For this, we have used distributed representation of the protein sequence encoded by reduced alphabet schemes based on conformational similarity and hydropathy index as input features to Support Vector Machines (SVMs). The choice of conformational similarity and hydropathy indices is based on the functional mechanism of pore-forming toxins. Our methodology achieves about 81% accuracy indicating that conformational similarity, an indicator of the flexibility of amino acids, along with hydrophobic index can capture the intrinsic features of pore-forming toxins that distinguish it from other types of transporter proteins. Increased understanding of the mechanisms of pore-forming toxins can further contribute to the use of such "mechanism-informed" features that may increase the prediction accuracy further.
Collapse
Affiliation(s)
- Hrushikesh Bhosale
- Department of Computer Science, FLAME University, Pune, Maharashtra, India
| | - Vigneshwar Ramakrishnan
- School of Chemical & Biotechnology, SASTRA Deemed-to-be University, Thanjavur, Tamilnadu, India
| | - Valadi K Jayaraman
- Department of Computer Science, FLAME University, Pune, Maharashtra, India
| |
Collapse
|
4
|
Ou J, Liu H, Nirala NK, Stukalov A, Acharya U, Green MR, Zhu LJ. dagLogo: An R/Bioconductor package for identifying and visualizing differential amino acid group usage in proteomics data. PLoS One 2020; 15:e0242030. [PMID: 33156866 PMCID: PMC7647101 DOI: 10.1371/journal.pone.0242030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 10/23/2020] [Indexed: 11/18/2022] Open
Abstract
Sequence logos have been widely used as graphical representations of conserved nucleic acid and protein motifs. Due to the complexity of the amino acid (AA) alphabet, rich post-translational modification, and diverse subcellular localization of proteins, few versatile tools are available for effective identification and visualization of protein motifs. In addition, various reduced AA alphabets based on physicochemical, structural, or functional properties have been valuable in the study of protein alignment, folding, structure prediction, and evolution. However, there is lack of tools for applying reduced AA alphabets to the identification and visualization of statistically significant motifs. To fill this gap, we developed an R/Bioconductor package dagLogo, which has several advantages over existing tools. First, dagLogo allows various formats for input sets and provides comprehensive options to build optimal background models. It implements different reduced AA alphabets to group AAs of similar properties. Furthermore, dagLogo provides statistical and visual solutions for differential AA (or AA group) usage analysis of both large and small data sets. Case studies showed that dagLogo can better identify and visualize conserved protein sequence patterns from different types of inputs and can potentially reveal the biological patterns that could be missed by other logo generators.
Collapse
Affiliation(s)
- Jianhong Ou
- Department of Molecular, Cell, and Cancer Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- Regeneration NEXT, Duke University School of Medicine, Duke University, Durham, North Carolina, United States of America
| | - Haibo Liu
- Department of Molecular, Cell, and Cancer Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Niraj K. Nirala
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Alexey Stukalov
- Institute of Virology, Technical University of Munich, Munich, Germany
| | - Usha Acharya
- Department of Molecular, Cell, and Cancer Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Michael R. Green
- Department of Molecular, Cell, and Cancer Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Lihua Julie Zhu
- Department of Molecular, Cell, and Cancer Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
5
|
Li Y, Zhang Y, Lv J. An Effective Cumulative Torsion Angles Model for Prediction of Protein Folding Rates. Protein Pept Lett 2020; 27:321-328. [PMID: 31612815 DOI: 10.2174/0929866526666191014152207] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2019] [Revised: 06/07/2019] [Accepted: 06/29/2019] [Indexed: 02/05/2023]
Abstract
BACKGROUND Protein folding rate is mainly determined by the size of the conformational space to search, which in turn is dictated by factors such as size, structure and amino-acid sequence in a protein. It is important to integrate these factors effectively to form a more precisely description of conformation space. But there is no general paradigm to answer this question except some intuitions and empirical rules. Therefore, at the present stage, predictions of the folding rate can be improved through finding new factors, and some insights are given to the above question. OBJECTIVE Its purpose is to propose a new parameter that can describe the size of the conformational space to improve the prediction accuracy of protein folding rate. METHODS Based on the optimal set of amino acids in a protein, an effective cumulative backbone torsion angles (CBTAeff) was proposed to describe the size of the conformational space. Linear regression model was used to predict protein folding rate with CBTAeff as a parameter. The degree of correlation was described by the coefficient of determination and the mean absolute error MAE between the predicted folding rates and experimental observations. RESULTS It achieved a high correlation (with the coefficient of determination of 0.70 and MAE of 1.88) between the logarithm of folding rates and the (CBTAeff)0.5 with experimental over 112 twoand multi-state folding proteins. CONCLUSION The remarkable performance of our simplistic model demonstrates that CBTA based on optimal set was the major determinants of the conformation space of natural proteins.
Collapse
Affiliation(s)
- Yanru Li
- Department of Physics, College of Science, Inner Mongolia University of Technology, Hohhot, China
| | - Ying Zhang
- Department of Physics, College of Science, Inner Mongolia University of Technology, Hohhot, China
| | - Jun Lv
- Department of Physics, College of Science, Inner Mongolia University of Technology, Hohhot, China
| |
Collapse
|
6
|
Yan J, Bhadra P, Li A, Sethiya P, Qin L, Tai HK, Wong KH, Siu SWI. Deep-AmPEP30: Improve Short Antimicrobial Peptides Prediction with Deep Learning. MOLECULAR THERAPY. NUCLEIC ACIDS 2020; 20:882-894. [PMID: 32464552 PMCID: PMC7256447 DOI: 10.1016/j.omtn.2020.05.006] [Citation(s) in RCA: 117] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 04/08/2020] [Accepted: 05/06/2020] [Indexed: 12/12/2022]
Abstract
Antimicrobial peptides (AMPs) are a valuable source of antimicrobial agents and a potential solution to the multi-drug resistance problem. In particular, short-length AMPs have been shown to have enhanced antimicrobial activities, higher stability, and lower toxicity to human cells. We present a short-length (≤30 aa) AMP prediction method, Deep-AmPEP30, developed based on an optimal feature set of PseKRAAC reduced amino acids composition and convolutional neural network. On a balanced benchmark dataset of 188 samples, Deep-AmPEP30 yields an improved performance of 77% in accuracy, 85% in the area under the receiver operating characteristic curve (AUC-ROC), and 85% in area under the precision-recall curve (AUC-PR) over existing machine learning-based methods. To demonstrate its power, we screened the genome sequence of Candida glabrata—a gut commensal fungus expected to interact with and/or inhibit other microbes in the gut—for potential AMPs and identified a peptide of 20 aa (P3, FWELWKFLKSLWSIFPRRRP) with strong anti-bacteria activity against Bacillus subtilis and Vibrio parahaemolyticus. The potency of the peptide is remarkably comparable to that of ampicillin. Therefore, Deep-AmPEP30 is a promising prediction tool to identify short-length AMPs from genomic sequences for drug discovery. Our method is available at https://cbbio.cis.um.edu.mo/AxPEP for both individual sequence prediction and genome screening for AMPs.
Collapse
Affiliation(s)
- Jielu Yan
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Pratiti Bhadra
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Ang Li
- Faculty of Health Sciences, University of Macau, Macau, China
| | - Pooja Sethiya
- Faculty of Health Sciences, University of Macau, Macau, China
| | - Longguang Qin
- Faculty of Health Sciences, University of Macau, Macau, China
| | - Hio Kuan Tai
- Department of Computer and Information Science, University of Macau, Macau, China
| | - Koon Ho Wong
- Faculty of Health Sciences, University of Macau, Macau, China; Institute of Translational Medicines, University of Macau, Macau, China
| | - Shirley W I Siu
- Department of Computer and Information Science, University of Macau, Macau, China.
| |
Collapse
|
7
|
Mirzaie M. Identification of native protein structures captured by principal interactions. BMC Bioinformatics 2019; 20:604. [PMID: 31752663 PMCID: PMC6873546 DOI: 10.1186/s12859-019-3186-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 11/01/2019] [Indexed: 11/20/2022] Open
Abstract
Background Evaluation of protein structure is based on trustworthy potential function. The total potential of a protein structure is approximated as the summation of all pair-wise interaction potentials. Knowledge-based potentials (KBP) are one type of potential functions derived by known experimentally determined protein structures. Although several KBP functions with different methods have been introduced, the key interactions that capture the total potential have not studied yet. Results In this study, we seek the interaction types that preserve as much of the total potential as possible. We employ a procedure based on the principal component analysis (PCA) to extract the significant and key interactions in native protein structures. We call these interactions as principal interactions and show that the results of the model that considers only these interactions are very close to the full interaction model that considers all interactions in protein fold recognition. In fact, the principal interactions maintain the discriminative power of the full interaction model. This method was evaluated on 3 KBPs with different contact definitions and thresholds of distance and revealed that their corresponding principal interactions are very similar and have a lot in common. Additionally, the principal interactions consisted of 20 % of the full interactions on average, and they are between residues, which are considered important in protein folding. Conclusions This work shows that all interaction types are not equally important in discrimination of native structure. The results of the reduced model based on principal interactions that were very close to the full interaction model suggest that a new strategy is needed to capture the role of remaining interactions (non-principal interactions) to improve the power of knowledge-based potential functions.
Collapse
Affiliation(s)
- Mehdi Mirzaie
- Department of Applied Mathematics, Faculty of Mathematical Sciences, Tarbiat Modares University, Jalal Ale Ahmad Highway, P.O.Box: 14115-134, Tehran, Iran.
| |
Collapse
|
8
|
Discrimination power of knowledge-based potential dictated by the dominant energies in native protein structures. Amino Acids 2019; 51:1029-1038. [PMID: 31098784 DOI: 10.1007/s00726-019-02743-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 05/08/2019] [Indexed: 01/20/2023]
Abstract
Extracting a well-designed energy function is important for protein structure evaluation. Knowledge-based potential functions are one type of the energy functions which can be obtained from known protein structures. The pairwise potential between atom types is approximated using Boltzmann's law which relates the frequency of atom types to its potential. The total energy is approximated as a summation of pairwise potential between the atomic pairs. In the present study, the performance of knowledge-based potential function was assessed based on the strength of interaction between groups of amino acids. The dominant energies involved in the pairwise potentials were revealed by eigenvalue analysis of the matrix, the elements of which represent the energy between amino acids. For this purpose, the matrix including the mean of the energies of residue-residue interaction types was constructed using 500 native protein structures. The matrix has a dominant eigenvalue and amino acids, with LEU, VAL, ILE, PHE, TYR, ALA and TRP having high values along the dominant eigenvector. The results show that the ranking of amino acids is consistent with the power of amino acids in discriminating native structures using K-alphabet reduced model. In the reduced interactions, only amino acids from a subset of all 20 amino acids, along with their interactions are considered to assess the energy. In the K-alphabet reduced model, the reduced structures are constructed based on only the K-amino acid types. The dominant K-alphabet reduced model derived for the k-first amino acids in the list [LEU, VAL, PHE, ILE, TYR, ALA, TRP] of amino acids has the best discrimination of native structure among all possible K-alphabet reduced models. Knowledge-based potentials might be improved with a new strategy.
Collapse
|