1
|
Harihar B, Saravanan KM, Gromiha MM, Selvaraj S. Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design. Mol Biotechnol 2024:10.1007/s12033-024-01119-4. [PMID: 38498284 DOI: 10.1007/s12033-024-01119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 02/10/2024] [Indexed: 03/20/2024]
Abstract
Inter-residue interactions in protein structures provide valuable insights into protein folding and stability. Understanding these interactions can be helpful in many crucial applications, including rational design of therapeutic small molecules and biologics, locating functional protein sites, and predicting protein-protein and protein-ligand interactions. The process of developing machine learning models incorporating inter-residue interactions has been improved recently. This review highlights the theoretical models incorporating inter-residue interactions in predicting folding and unfolding rates of proteins. Utilizing contact maps to depict inter-residue interactions aids researchers in developing computer models for detecting remote homologs and interface residues within protein-protein complexes which, in turn, enhances our knowledge of the relationship between sequence and structure of proteins. Further, the application of contact maps derived from inter-residue interactions is highlighted in the field of drug discovery. Overall, this review presents an extensive assessment of the significant models that use inter-residue interactions to investigate folding rates, unfolding rates, remote homology, and drug development, providing potential future advancements in constructing efficient computational models in structural biology.
Collapse
Affiliation(s)
- Balasubramanian Harihar
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Konda Mani Saravanan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, 600073, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.
| |
Collapse
|
2
|
Wu Z, Basu S, Wu X, Kurgan L. qNABpredict: Quick, accurate, and taxonomy-aware sequence-based prediction of content of nucleic acid binding amino acids. Protein Sci 2023; 32:e4544. [PMID: 36519304 PMCID: PMC9798252 DOI: 10.1002/pro.4544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2022] [Revised: 12/07/2022] [Accepted: 12/08/2022] [Indexed: 12/23/2022]
Abstract
Protein sequence-based predictors of nucleic acid (NA)-binding include methods that predict NA-binding proteins and NA-binding residues. The residue-level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA-binding residues, offering more information than the protein-level prediction and much shorter runtime than the residue-level tools. Our first-of-its-kind content predictor, qNABpredict, relies on a small, rationally designed and fast-to-compute feature set that represents relevant characteristics extracted from the input sequence and a well-parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy-agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy-aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low-similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue-level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue-level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/. This new tool should be particularly useful to predict details of protein-NA interactions for large protein families and proteomes.
Collapse
Affiliation(s)
- Zhonghua Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Sushmita Basu
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| | - Xuantai Wu
- School of Mathematical Sciences and LPMCNankai UniversityTianjinChina
| | - Lukasz Kurgan
- Department of Computer ScienceVirginia Commonwealth UniversityRichmondVirginiaUSA
| |
Collapse
|
3
|
Wei H, Wang B, Yang J, Gao J. RNA Flexibility Prediction With Sequence Profile and Predicted Solvent Accessibility. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2017-2022. [PMID: 31794403 DOI: 10.1109/tcbb.2019.2956496] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Structural flexibility plays an essential role in many biological processes. B-factor is an important indicator to measure the flexibility of protein or RNA structures. Many methods were developed to predict protein B-factors, but few studies have been done for RNA B-factor prediction. In this paper, we proposed a new method RNAbval to predict RNA B-factors using random forest. The method was developed using a comprehensive set of features, including the sequence profile and predicted solvent accessibility. RNAbval achieved an improvement of 9.2-20.5 percent over the state-of-the-art method on two benchmark test datasets. The proposed method is available at http://yanglab.nankai.edu.cn/RNAbval/.
Collapse
|
4
|
Maiangwa J, Mohamad Ali MS, Salleh AB, Rahman RNZRA, Normi YM, Mohd Shariff F, Leow TC. Lid opening and conformational stability of T1 Lipase is mediated by increasing chain length polar solvents. PeerJ 2017; 5:e3341. [PMID: 28533982 PMCID: PMC5438581 DOI: 10.7717/peerj.3341] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 04/21/2017] [Indexed: 11/20/2022] Open
Abstract
The dynamics and conformational landscape of proteins in organic solvents are events of potential interest in nonaqueous process catalysis. Conformational changes, folding transitions, and stability often correspond to structural rearrangements that alter contacts between solvent molecules and amino acid residues. However, in nonaqueous enzymology, organic solvents limit stability and further application of proteins. In the present study, molecular dynamics (MD) of a thermostable Geobacillus zalihae T1 lipase was performed in different chain length polar organic solvents (methanol, ethanol, propanol, butanol, and pentanol) and water mixture systems to a concentration of 50%. On the basis of the MD results, the structural deviations of the backbone atoms elucidated the dynamic effects of water/organic solvent mixtures on the equilibrium state of the protein simulations in decreasing solvent polarity. The results show that the solvent mixture gives rise to deviations in enzyme structure from the native one simulated in water. The drop in the flexibility in H2O, MtOH, EtOH and PrOH simulation mixtures shows that greater motions of residues were influenced in BtOH and PtOH simulation mixtures. Comparing the root mean square fluctuations value with the accessible solvent area (SASA) for every residue showed an almost correspondingly high SASA value of residues to high flexibility and low SASA value to low flexibility. The study further revealed that the organic solvents influenced the formation of more hydrogen bonds in MtOH, EtOH and PrOH and thus, it is assumed that increased intraprotein hydrogen bonding is ultimately correlated to the stability of the protein. However, the solvent accessibility analysis showed that in all solvent systems, hydrophobic residues were exposed and polar residues tended to be buried away from the solvent. Distance variation of the tetrahedral intermediate packing of the active pocket was not conserved in organic solvent systems, which could lead to weaknesses in the catalytic H-bond network and most likely a drop in catalytic activity. The conformational variation of the lid domain caused by the solvent molecules influenced its gradual opening. Formation of additional hydrogen bonds and hydrophobic interactions indicates that the contribution of the cooperative network of interactions could retain the stability of the protein in some solvent systems. Time-correlated atomic motions were used to characterize the correlations between the motions of the atoms from atomic coordinates. The resulting cross-correlation map revealed that the organic solvent mixtures performed functional, concerted, correlated motions in regions of residues of the lid domain to other residues. These observations suggest that varying lengths of polar organic solvents play a significant role in introducing dynamic conformational diversity in proteins in a decreasing order of polarity.
Collapse
Affiliation(s)
- Jonathan Maiangwa
- Department of Cell and Molecular Biology/Enzyme Microbial Technology Research center/Faculty of Biotechnology and Biomolecular Science, Universiti Putra Malaysia, Serdang, Serlangor, Malaysia
| | - Mohd Shukuri Mohamad Ali
- Department of Biochemistry/Enzyme Microbial Technology Research center/Faculty of Biotechnology and Biomolecular Science, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Abu Bakar Salleh
- Department of Biochemistry/Enzyme Microbial Technology Research center/Faculty of Biotechnology and Biomolecular Science, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Raja Noor Zaliha Raja Abd Rahman
- Department of Microbiology/Enzyme Microbial Technology Research center/Faculty of Biotechnology and Biomolecular Science, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Yahaya M Normi
- Department of Cell and Molecular Biology/Enzyme Microbial Technology Research center/Faculty of Biotechnology and Biomolecular Science, Universiti Putra Malaysia, Serdang, Serlangor, Malaysia
| | - Fairolniza Mohd Shariff
- Department of Microbiology/Enzyme Microbial Technology Research center/Faculty of Biotechnology and Biomolecular Science, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Thean Chor Leow
- Department of Cell and Molecular Biology/Enzyme and Microbial Technology Research center/Faculty of Biotechnology and Biomolecular Science/Institute of Bioscience, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| |
Collapse
|
5
|
Faraggi E, Kloczkowski A. GENN: a GEneral Neural Network for learning tabulated data with examples from protein structure prediction. Methods Mol Biol 2015; 1260:165-78. [PMID: 25502381 PMCID: PMC6930076 DOI: 10.1007/978-1-4939-2239-0_10] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
We present a GEneral Neural Network (GENN) for learning trends from existing data and making predictions of unknown information. The main novelty of GENN is in its generality, simplicity of use, and its specific handling of windowed input/output. Its main strength is its efficient handling of the input data, enabling learning from large datasets. GENN is built on a two-layered neural network and has the option to use separate inputs-output pairs or window-based data using data structures to efficiently represent input-output pairs. The program was tested on predicting the accessible surface area of globular proteins, scoring proteins according to similarity to native, predicting protein disorder, and has performed remarkably well. In this paper we describe the program and its use. Specifically, we give as an example the construction of a similarity to native protein scoring function that was constructed using GENN. The source code and Linux executables for GENN are available from Research and Information Systems at http://mamiris.com and from the Battelle Center for Mathematical Medicine at http://mathmed.org. Bugs and problems with the GENN program should be reported to EF.
Collapse
Affiliation(s)
- Eshel Faraggi
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana 46202, USA; Battelle Center for Mathematical Medicine, Nationwide Children’s Hospital, Columbus, Ohio 43215, USA; and Physics Division, Research and Information Systems, LLC, Carmel, Indiana, 46032, USA, phone: 317-332-0368
| | - Andrzej Kloczkowski
- Andrzej Kloczkowski Battelle Center for Mathematical Medicine, Nationwide Children’s Hospital, Columbus, Ohio 43215, USA; and Department of Pediatrics, The Ohio State University, Columbus, Ohio 43215, USA
| |
Collapse
|
6
|
Faraggi E, Zhou Y, Kloczkowski A. Accurate single-sequence prediction of solvent accessible surface area using local and global features. Proteins 2014; 82:3170-6. [PMID: 25204636 DOI: 10.1002/prot.24682] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2014] [Revised: 08/08/2014] [Accepted: 08/22/2014] [Indexed: 01/04/2023]
Abstract
We present a new approach for predicting the Accessible Surface Area (ASA) using a General Neural Network (GENN). The novelty of the new approach lies in not using residue mutation profiles generated by multiple sequence alignments as descriptive inputs. Instead we use solely sequential window information and global features such as single-residue and two-residue compositions of the chain. The resulting predictor is both highly more efficient than sequence alignment-based predictors and of comparable accuracy to them. Introduction of the global inputs significantly helps achieve this comparable accuracy. The predictor, termed ASAquick, is tested on predicting the ASA of globular proteins and found to perform similarly well for so-called easy and hard cases indicating generalizability and possible usability for de-novo protein structure prediction. The source code and a Linux executables for GENN and ASAquick are available from Research and Information Systems at http://mamiris.com, from the SPARKS Lab at http://sparks-lab.org, and from the Battelle Center for Mathematical Medicine at http://mathmed.org.
Collapse
Affiliation(s)
- Eshel Faraggi
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana, 46202; Battelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, Ohio, 43215; Physics Division, Research and Information Systems, LLC, Carmel, Indiana, 46032
| | | | | |
Collapse
|
7
|
Matsuoka M, Kikuchi T. Sequence analysis on the information of folding initiation segments in ferredoxin-like fold proteins. BMC STRUCTURAL BIOLOGY 2014; 14:15. [PMID: 24884463 PMCID: PMC4055915 DOI: 10.1186/1472-6807-14-15] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Accepted: 05/15/2014] [Indexed: 02/06/2023]
Abstract
BACKGROUND While some studies have shown that the 3D protein structures are more conservative than their amino acid sequences, other experimental studies have shown that even if two proteins share the same topology, they may have different folding pathways. There are many studies investigating this issue with molecular dynamics or Go-like model simulations, however, one should be able to obtain the same information by analyzing the proteins' amino acid sequences, if the sequences contain all the information about the 3D structures. In this study, we use information about protein sequences to predict the location of their folding segments. We focus on proteins with a ferredoxin-like fold, which has a characteristic topology. Some of these proteins have different folding segments. RESULTS Despite the simplicity of our methods, we are able to correctly determine the experimentally identified folding segments by predicting the location of the compact regions considered to play an important role in structural formation. We also apply our sequence analyses to some homologues of each protein and confirm that there are highly conserved folding segments despite the homologues' sequence diversity. These homologues have similar folding segments even though the homology of two proteins' sequences is not so high. CONCLUSION Our analyses have proven useful for investigating the common or different folding features of the proteins studied.
Collapse
Affiliation(s)
| | - Takeshi Kikuchi
- Department of Bioinformatics, College of Life Sciences, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan.
| |
Collapse
|
8
|
Klus P, Bolognesi B, Agostini F, Marchese D, Zanzoni A, Tartaglia GG. The cleverSuite approach for protein characterization: predictions of structural properties, solubility, chaperone requirements and RNA-binding abilities. ACTA ACUST UNITED AC 2014; 30:1601-8. [PMID: 24493033 PMCID: PMC4029037 DOI: 10.1093/bioinformatics/btu074] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Motivation: The recent shift towards high-throughput screening is posing new challenges for the interpretation of experimental results. Here we propose the cleverSuite approach for large-scale characterization of protein groups. Description: The central part of the cleverSuite is the cleverMachine (CM), an algorithm that performs statistics on protein sequences by comparing their physico-chemical propensities. The second element is called cleverClassifier and builds on top of the models generated by the CM to allow classification of new datasets. Results: We applied the cleverSuite to predict secondary structure properties, solubility, chaperone requirements and RNA-binding abilities. Using cross-validation and independent datasets, the cleverSuite reproduces experimental findings with great accuracy and provides models that can be used for future investigations. Availability: The intuitive interface for dataset exploration, analysis and prediction is available at http://s.tartaglialab.com/clever_suite. Contact:gian.tartaglia@crg.es Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Petr Klus
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Benedetta Bolognesi
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Federico Agostini
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Domenica Marchese
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Andreas Zanzoni
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| | - Gian Gaetano Tartaglia
- Gene Function and Evolution, Centre for Genomic Regulation (CRG), Dr. Aiguader 88 and Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain
| |
Collapse
|
9
|
Mizianty MJ, Zhang T, Xue B, Zhou Y, Dunker AK, Uversky VN, Kurgan L. In-silico prediction of disorder content using hybrid sequence representation. BMC Bioinformatics 2011; 12:245. [PMID: 21682902 PMCID: PMC3212983 DOI: 10.1186/1471-2105-12-245] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2010] [Accepted: 06/17/2011] [Indexed: 11/25/2022] Open
Abstract
Background Intrinsically disordered proteins play important roles in various cellular activities and their prevalence was implicated in a number of human diseases. The knowledge of the content of the intrinsic disorder in proteins is useful for a variety of studies including estimation of the abundance of disorder in protein families, classes, and complete proteomes, and for the analysis of disorder-related protein functions. The above investigations currently utilize the disorder content derived from the per-residue disorder predictions. We show that these predictions may over-or under-predict the overall amount of disorder, which motivates development of novel tools for direct and accurate sequence-based prediction of the disorder content. Results We hypothesize that sequence-level aggregation of input information may provide more accurate content prediction when compared with the content extracted from the local window-based residue-level disorder predictors. We propose a novel predictor, DisCon, that takes advantage of a small set of 29 custom-designed descriptors that aggregate and hybridize information concerning sequence, evolutionary profiles, and predicted secondary structure, solvent accessibility, flexibility, and annotation of globular domains. Using these descriptors and a ridge regression model, DisCon predicts the content with low, 0.05, mean squared error and high, 0.68, Pearson correlation. This is a statistically significant improvement over the content computed from outputs of ten modern disorder predictors on a test dataset with proteins that share low sequence identity with the training sequences. The proposed predictive model is analyzed to discuss factors related to the prediction of the disorder content. Conclusions DisCon is a high-quality alternative for high-throughput annotation of the disorder content. We also empirically demonstrate that the DisCon's predictions can be used to improve binary annotations of the disordered residues from the real-value disorder propensities generated by current residue-level disorder predictors. The web server that implements the DisCon is available at http://biomine.ece.ualberta.ca/DisCon/.
Collapse
Affiliation(s)
- Marcin J Mizianty
- Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta T6G 2V4, Canada
| | | | | | | | | | | | | |
Collapse
|