1
|
Harihar B, Saravanan KM, Gromiha MM, Selvaraj S. Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design. Mol Biotechnol 2024:10.1007/s12033-024-01119-4. [PMID: 38498284 DOI: 10.1007/s12033-024-01119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 02/10/2024] [Indexed: 03/20/2024]
Abstract
Inter-residue interactions in protein structures provide valuable insights into protein folding and stability. Understanding these interactions can be helpful in many crucial applications, including rational design of therapeutic small molecules and biologics, locating functional protein sites, and predicting protein-protein and protein-ligand interactions. The process of developing machine learning models incorporating inter-residue interactions has been improved recently. This review highlights the theoretical models incorporating inter-residue interactions in predicting folding and unfolding rates of proteins. Utilizing contact maps to depict inter-residue interactions aids researchers in developing computer models for detecting remote homologs and interface residues within protein-protein complexes which, in turn, enhances our knowledge of the relationship between sequence and structure of proteins. Further, the application of contact maps derived from inter-residue interactions is highlighted in the field of drug discovery. Overall, this review presents an extensive assessment of the significant models that use inter-residue interactions to investigate folding rates, unfolding rates, remote homology, and drug development, providing potential future advancements in constructing efficient computational models in structural biology.
Collapse
Affiliation(s)
- Balasubramanian Harihar
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Konda Mani Saravanan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, 600073, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.
| |
Collapse
|
2
|
Xiao N, Yang W, Wang J, Li J, Zhao R, Li M, Li C, Liu K, Li Y, Yin C, Chen Z, Li X, Jiang Y. Protein structuromics: A new method for protein structure-function crosstalk in glioma. Proteins 2024; 92:24-36. [PMID: 37497743 DOI: 10.1002/prot.26555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Revised: 06/16/2023] [Accepted: 07/04/2023] [Indexed: 07/28/2023]
Abstract
Glioma is a type of tumor that starts in the glial cells of the brain or spine. Since the 1800s, when the disease was first named, its survival rates have always been unsatisfactory. Despite great advances in molecular biology and traditional treatment methods, many questions regarding cancer occurrence and the underlying mechanism remain to be answered. In this study, we assessed the protein structural features of 20 oncogenes and 20 anti-oncogenes via protein structure and dynamic analysis methods and 3D structural and systematic analyses of the structure-function relationships of proteins. All of these results directly indicate that unfavorable group proteins show more complex structures than favorable group proteins. As the tumor cell microenvironment changes, the balance of oncogene-related and anti-oncogene-related proteins is disrupted, and most of the structures of the two groups of proteins will be disrupted. However, more unfavorable group proteins will maintain and refold to achieve their correct shape faster and perform their functions more quickly than favorable group proteins, and the former thus support cancer development. We hope that these analyses will help promote mechanistic research and the development of new treatments for glioma.
Collapse
Affiliation(s)
- Nan Xiao
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Wenming Yang
- Department of Neurosurgery, The First Affiliated Hospital of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Jin Wang
- Department of Rehabilitation, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Jiarong Li
- Department of Rehabilitation, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Ruoxuan Zhao
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Muzheng Li
- Department of Rehabilitation, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Chi Li
- Department of Anesthesiology, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Kang Liu
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Yingxin Li
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Chaoqun Yin
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Zhibo Chen
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Xingqi Li
- Department of Medicine, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| | - Yun Jiang
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou, Liaoning, China
| |
Collapse
|
3
|
Pandey M, Shah SK, Gromiha MM. Computational approaches for identifying disease-causing mutations in proteins. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2023; 139:141-171. [PMID: 38448134 DOI: 10.1016/bs.apcsb.2023.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Advancements in genome sequencing have expanded the scope of investigating mutations in proteins across different diseases. Amino acid mutations in a protein alter its structure, stability and function and some of them lead to diseases. Identification of disease-causing mutations is a challenging task and it will be helpful for designing therapeutic strategies. Hence, mutation data available in the literature have been curated and stored in several databases, which have been effectively utilized for developing computational methods to identify deleterious mutations (drivers), using sequence and structure-based properties of proteins. In this chapter, we describe the contents of specific databases that have information on disease-causing and neutral mutations followed by sequence and structure-based properties. Further, characteristic features of disease-causing mutations will be discussed along with computational methods for identifying cancer hotspot residues and disease-causing mutations in proteins.
Collapse
Affiliation(s)
- Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - Suraj Kumar Shah
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India; International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama, Japan.
| |
Collapse
|
4
|
Casier R, Duhamel J. Appraisal of blob-Based Approaches in the Prediction of Protein Folding Times. J Phys Chem B 2023; 127:8852-8859. [PMID: 37793094 DOI: 10.1021/acs.jpcb.3c04958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/06/2023]
Abstract
A series of reports published in the last 3 years has illustrated that a blob-based model (BBM) can predict the folding time of proteins from their primary amino acid (aa) sequence based on three simple rules established to characterize the long-range backbone dynamics (LRBD) of racemic polypeptides. The sole use of LRBD to predict protein folding times with the BBM represents a radical departure from all other prediction methods currently applied to determine protein folding times, which rely instead on parameters such as the structure content, folding kinetics, chain length, amino acid properties, or contact topography of proteins. Furthermore, the built-in modularity of the BBM enables the parametrization and inclusion of new phenomena affecting the LRBD of polypeptides, while its conceptual simplicity makes it an interesting new mathematical tool for studying protein folding. However, its novelty implies that its relationship with many other methods used to predict protein folding times has not been well researched. Consequently, the purpose of this report is to uncover the physical phenomena encountered during protein folding that are best described by the BBM through the identification of parameters that have been recognized over the years as being strong predictors for protein folding, such as protein size, topology, structural class, and folding kinetics. This was accomplished by determining the parameters most strongly correlated with the folding times predicted by the BBM. While the BBM in its present form appears to be a good indicator of the folding times of the vast majority of the 195 proteins considered so far, this report finds that it excels for moderately large proteins that are primarily composed of locally formed structural motifs such as α-helices or for proteins that fold in multiple steps. Altogether, these observations based on the use of the BBM support the notion that proteins fold the way they do because the LRBD of polypeptides is mostly driven by the local interactions experienced between aa's within reach of one another.
Collapse
Affiliation(s)
- Remi Casier
- Institute for Polymer Research, Waterloo Institute for Nanotechnology, Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L3G1, Canada
| | - Jean Duhamel
- Institute for Polymer Research, Waterloo Institute for Nanotechnology, Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L3G1, Canada
| |
Collapse
|
5
|
Xiao N, Ma H, Gao H, Yang J, Tong D, Gan D, Yang J, Li C, Liu K, Li Y, Chen Z, Yin C, Li X, Wang H. Structure-function crosstalk in liver cancer research: Protein structuromics. Int J Biol Macromol 2023:125291. [PMID: 37315670 DOI: 10.1016/j.ijbiomac.2023.125291] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/04/2023] [Accepted: 06/07/2023] [Indexed: 06/16/2023]
Abstract
Liver cancer can be primary (starting in the liver) or secondary (cancer that has spread from elsewhere to the liver, known as liver metastasis). Liver metastasis is more common than primary liver cancer. Despite great advances in molecular biology methods and treatments, liver cancer is still associated with a poor survival rate and a high death rate, and there is no cure. Many questions remain regarding the mechanisms of liver cancer occurrence and development as well as tumor reoccurrence after treatment. In this study, we assessed the protein structural features of 20 oncogenes and 20 anti-oncogenes via protein structure and dynamic analysis methods and 3D structural and systematic analyses of the structure-function relationships of proteins. Our aim was to provide new insights that may inform research on the development and treatment of liver cancer.
Collapse
Affiliation(s)
- Nan Xiao
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China.
| | - Hongming Ma
- Department of Oncology, China Emergency General Hospital City, Beijing, China
| | - Hong Gao
- Department of Oncology, China Emergency General Hospital City, Beijing, China
| | - Jing Yang
- Department of Computer Center, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Dan Tong
- Department of Nurse, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Dingzhu Gan
- Department of Publicity, Peking Union Medical College, Beijing, China
| | - Jinhua Yang
- Department of Development and Production, Institute of Medical Biology, Peking Union Medical College, Kunming City, Yunnan Province, China
| | - Chi Li
- Department of Anesthesiology, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Kang Liu
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Yingxin Li
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Zhibo Chen
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Chaoqun Yin
- Department of Medical Science, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Xingqi Li
- Department of Medicine, Medical College of Jinzhou Medical University, Jinzhou City, Liaoning Province, China
| | - Hongwu Wang
- Department of Respiratory and Critical Care Medicine, Dongzhimen Hospital Affiliated to Beijing University of Chinese Medicine, Beijing, China
| |
Collapse
|
6
|
Nithiyanandam S, Sangaraju VK, Manavalan B, Lee G. Computational prediction of protein folding rate using structural parameters and network centrality measures. Comput Biol Med 2023; 155:106436. [PMID: 36848800 DOI: 10.1016/j.compbiomed.2022.106436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 11/28/2022] [Accepted: 12/13/2022] [Indexed: 02/17/2023]
Abstract
Protein folding is a complex physicochemical process whereby a polymer of amino acids samples numerous conformations in its unfolded state before settling on an essentially unique native three-dimensional (3D) structure. To understand this process, several theoretical studies have used a set of 3D structures, identified different structural parameters, and analyzed their relationships using the natural logarithmic protein folding rate (ln(kf)). Unfortunately, these structural parameters are specific to a small set of proteins that are not capable of accurately predicting ln(kf) for both two-state (TS) and non-two-state (NTS) proteins. To overcome the limitations of the statistical approach, a few machine learning (ML)-based models have been proposed using limited training data. However, none of these methods can explain plausible folding mechanisms. In this study, we evaluated the predictive capabilities of ten different ML algorithms using eight different structural parameters and five different network centrality measures based on newly constructed datasets. In comparison to the other nine regressors, support vector machine was found to be the most appropriate for predicting ln(kf) with mean absolute differences of 1.856, 1.55, and 1.745 for the TS, NTS, and combined datasets, respectively. Furthermore, combining structural parameters and network centrality measures improves the prediction performance compared to individual parameters, indicating that multiple factors are involved in the folding process.
Collapse
Affiliation(s)
- Saraswathy Nithiyanandam
- Department of Molecular Science and Technology, Ajou University, 206 World Cup-ro, Suwon, 16499, South Korea
| | - Vinoth Kumar Sangaraju
- Department of Physiology, Ajou University School of Medicine, 206 World Cup-ro, Suwon, 16499, South Korea
| | - Balachandran Manavalan
- Department of Physiology, Ajou University School of Medicine, 206 World Cup-ro, Suwon, 16499, South Korea.
| | - Gwang Lee
- Department of Molecular Science and Technology, Ajou University, 206 World Cup-ro, Suwon, 16499, South Korea; Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, South Korea.
| |
Collapse
|
7
|
Casier R, Duhamel J. Synergetic Effects of Alanine and Glycine in Blob-Based Methods for Predicting Protein Folding Times. J Phys Chem B 2023; 127:1325-1337. [PMID: 36749707 DOI: 10.1021/acs.jpcb.2c08155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The polypeptide PGlyAlaGlu was prepared with 20 mol % glycine (Gly), 36 mol % d,l-alanine (Ala), and 44 mol % d,l-glutamic acid (Glu) and labeled with the dye 1-pyrenemethylamine to yield a series of Py-PGlyAlaGlu samples. The fluorescence decays of the Py-PGlyAlaGlu samples were analyzed according to the fluorescence blob model (FBM) to obtain the number Nblobexp of amino acids (aa's) encompassed inside the subvolume Vblob of the polypeptide probed by an excited pyrene. An Nblobexp value of 29 (±2) was retrieved for Py-PGlyAlaGlu, which was much larger than for any of the copolypeptide PGlyGlu or PAlaGlu prepared with either Gly and Glu or Ala and Glu, respectively. The continuous increase in Nblobexp with decreasing side chain size (SCS) from 10 aa's for PGlu to 16 aa's for PAlaGlu and 22 aa's for PGlyGlu was used earlier to define the reach of an aa and determine the groups of aa's that could interact with each other along a polypeptide backbone according to their SCS. These groups of aa's, referred to as blobs, led to the implementation of blob-based models (BBM) to predict the folding time τFtheo,BBM of 145 proteins, which was found to match their experimental folding time τFexp with a relatively high 0.71 correlation coefficient. Nevertheless, the much higher Nblobexp value found for Py-PGlyAlaGlu compared to all other pyrene-labeled polypeptides studied to date indicates that the reach of aa's along a polypeptide sequence is affected not only by SCS but also by synergetic effects between different aa's. Following this new insight, a revised BBM was implemented to predict τFtheo,BBM for 195 proteins assuming the existence or absence of synergies to control the interactions between aa's along a polypeptide sequence. Similarly good correlation coefficients of 0.71 and 0.74 were obtained for a direct 1:1 comparison of τFexp and τFtheo,BBM for the 195 proteins without and with synergies, respectively. This result suggests that synergetic effects between different aa's have little effect on τFtheo,BBM predicted from BBM underlying the robustness of this methodology.
Collapse
Affiliation(s)
- Remi Casier
- Institute for Polymer Research, Waterloo Institute for Nanotechnology, Department of Chemistry, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Jean Duhamel
- Institute for Polymer Research, Waterloo Institute for Nanotechnology, Department of Chemistry, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| |
Collapse
|
8
|
Contreras-Torres E, Marrero-Ponce Y, Terán JE, Agüero-Chapin G, Antunes A, García-Jacas CR. Fuzzy spherical truncation-based multi-linear protein descriptors: From their definition to application in structural-related predictions. Front Chem 2022; 10:959143. [PMID: 36277354 PMCID: PMC9585278 DOI: 10.3389/fchem.2022.959143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2022] [Accepted: 08/15/2022] [Indexed: 11/13/2022] Open
Abstract
This study introduces a set of fuzzy spherically truncated three-dimensional (3D) multi-linear descriptors for proteins. These indices codify geometric structural information from kth spherically truncated spatial-(dis)similarity two-tuple and three-tuple tensors. The coefficients of these truncated tensors are calculated by applying a smoothing value to the 3D structural encoding based on the relationships between two and three amino acids of a protein embedded into a sphere. At considering, the geometrical center of the protein matches with center of the sphere, the distance between each amino acid involved in any specific interaction and the geometrical center of the protein can be computed. Then, the fuzzy membership degree of each amino acid from an spherical region of interest is computed by fuzzy membership functions (FMFs). The truncation value is finally a combination of the membership degrees from interacting amino acids, by applying the arithmetic mean as fusion rule. Several fuzzy membership functions with diverse biases on the calculation of amino acids memberships (e.g., Z-shaped (close to the center), PI-shaped (middle region), and A-Gaussian (far from the center)) were considered as well as traditional truncation functions (e.g., Switching). Such truncation functions were comparatively evaluated by exploring: 1) the frequency of membership degrees, 2) the variability and orthogonality analyses among them based on the Shannon Entropy’s and Principal Component’s methods, respectively, and 3) the prediction performance of alignment-free prediction of protein folding rates and structural classes. These analyses unraveled the singularity of the proposed fuzzy spherically truncated MDs with respect to the classical (non-truncated) ones and respect to the MDs truncated with traditional functions. They also showed an improved prediction power by attaining an external correlation coefficient of 95.82% in the folding rate modelling and an accuracy of 100% in distinguishing structural protein classes. These outcomes are better than the ones attained by existing approaches, justifying the theoretical contribution of this report. Thus, the fuzzy spherically truncated-based protein descriptors from MuLiMs-MCoMPAs (http://tomocomd.com/mulims-mcompas) are promising alignment-free predictors for modeling protein functions and properties.
Collapse
Affiliation(s)
- Ernesto Contreras-Torres
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Universidad San Francisco de Quito (USFQ), Quito, Pichincha, Ecuador
- Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
- BCAM—Basque Center for Applied Mathematics, Bilbao, Spain
| | - Yovani Marrero-Ponce
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Universidad San Francisco de Quito (USFQ), Quito, Pichincha, Ecuador
- Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
- Computer-Aided Molecular “Biosilico” Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Quito, Ecuador
- *Correspondence: Yovani Marrero-Ponce, , , César R. García-Jacas, , ,
| | - Julio E. Terán
- Grupo de Medicina Molecular y Traslacional (MeM&T), Colegio de Ciencias de la Salud (COCSA), Escuela de Medicina, Universidad San Francisco de Quito (USFQ), Quito, Pichincha, Ecuador
- Instituto de Simulación Computacional (ISC-USFQ), Quito, Pichincha, Ecuador
- Department of Textile Engineering, Chemistry and Science, College of Textiles, North Carolina State University, Raleigh, NC, United States
| | - Guillermin Agüero-Chapin
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Porto, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - Agostinho Antunes
- CIIMAR—Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Porto, Portugal
- Departamento de Biologia, Faculdade de Ciências, Universidade do Porto, Porto, Portugal
| | - César R. García-Jacas
- Cátedras Conacyt—Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Ensenada, Baja California, Mexico
- *Correspondence: Yovani Marrero-Ponce, , , César R. García-Jacas, , ,
| |
Collapse
|
9
|
Casier R, Duhamel J. Blob-Based Predictions of Protein Folding Times from the Amino Acid-Dependent Conformation of Polypeptides in Solution. Macromolecules 2021. [DOI: 10.1021/acs.macromol.0c02617] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Remi Casier
- Institute for Polymer Research, Waterloo Institute for Nanotechnology, Department of Chemistry, University of Waterloo, Waterloo, ON N2L3G1, Canada
| | - Jean Duhamel
- Institute for Polymer Research, Waterloo Institute for Nanotechnology, Department of Chemistry, University of Waterloo, Waterloo, ON N2L3G1, Canada
| |
Collapse
|
10
|
Casier R, Duhamel J. Blob-Based Approach to Estimate the Folding Time of Proteins Supported by Pyrene Excimer Fluorescence Experiments. Macromolecules 2020. [DOI: 10.1021/acs.macromol.0c02201] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Remi Casier
- Institute for Polymer Research, Waterloo Institute for Nanotechnology, Department of Chemistry, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| | - Jean Duhamel
- Institute for Polymer Research, Waterloo Institute for Nanotechnology, Department of Chemistry, University of Waterloo, Waterloo, ON N2L 3G1, Canada
| |
Collapse
|
11
|
Jamal S, Khubaib M, Gangwar R, Grover S, Grover A, Hasnain SE. Artificial Intelligence and Machine learning based prediction of resistant and susceptible mutations in Mycobacterium tuberculosis. Sci Rep 2020; 10:5487. [PMID: 32218465 PMCID: PMC7099008 DOI: 10.1038/s41598-020-62368-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2019] [Accepted: 03/13/2020] [Indexed: 11/09/2022] Open
Abstract
Tuberculosis (TB), an infectious disease caused by Mycobacterium tuberculosis (M.tb), causes highest number of deaths globally for any bacterial disease necessitating novel diagnosis and treatment strategies. High-throughput sequencing methods generate a large amount of data which could be exploited in determining multi-drug resistant (MDR-TB) associated mutations. The present work is a computational framework that uses artificial intelligence (AI) based machine learning (ML) approaches for predicting resistance in the genes rpoB, inhA, katG, pncA, gyrA and gyrB for the drugs rifampicin, isoniazid, pyrazinamide and fluoroquinolones. The single nucleotide variations were represented by several sequence and structural features that indicate the influence of mutations on the target protein coded by each gene. We used ML algorithms - naïve bayes, k nearest neighbor, support vector machine, and artificial neural network, to build the prediction models. The classification models had an average accuracy of 85% across all examined genes and were evaluated on an external unseen dataset to demonstrate their application. Further, molecular docking and molecular dynamics simulations were performed for wild type and predicted resistance causing mutant protein and anti-TB drug complexes to study their impact on the conformation of proteins to confirm the observed phenotype.
Collapse
Affiliation(s)
- Salma Jamal
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Mohd Khubaib
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Rishabh Gangwar
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Sonam Grover
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India
| | - Abhinav Grover
- School of Biotechnology, Jawaharlal Nehru University, New Mehrauli Road, New Delhi, 110 067, India
| | - Seyed E Hasnain
- Jamia Hamdard Institute of Molecular Medicine, Jamia Hamdard, Hamdard Nagar, New Delhi, 110062, India. .,Dr. Reddy's Institute of Life Sciences, University of Hyderabad Campus, Professor C.R. Rao Road, Hyderabad, 500046, India.
| |
Collapse
|
12
|
Ivankov DN, Finkelstein AV. Solution of Levinthal's Paradox and a Physical Theory of Protein Folding Times. Biomolecules 2020; 10:biom10020250. [PMID: 32041303 PMCID: PMC7072185 DOI: 10.3390/biom10020250] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 01/30/2020] [Accepted: 02/01/2020] [Indexed: 12/19/2022] Open
Abstract
“How do proteins fold?” Researchers have been studying different aspects of this question for more than 50 years. The most conceptual aspect of the problem is how protein can find the global free energy minimum in a biologically reasonable time, without exhaustive enumeration of all possible conformations, the so-called “Levinthal’s paradox.” Less conceptual but still critical are aspects about factors defining folding times of particular proteins and about perspectives of machine learning for their prediction. We will discuss in this review the key ideas and discoveries leading to the current understanding of folding kinetics, including the solution of Levinthal’s paradox, as well as the current state of the art in the prediction of protein folding times.
Collapse
Affiliation(s)
- Dmitry N. Ivankov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, 121205 Moscow, Russia
- Correspondence: or (D.N.I.); (A.V.F.); Tel.: +7-495-280-1481 (ext. 3320) (D.N.I.); +7-496-731-8412 (A.V.F.)
| | - Alexei V. Finkelstein
- Institute of Protein Research, Russian Academy of Sciences, 142290 Pushchino, Moscow Region, Russia
- Biology Department, Lomonosov Moscow State University, 119192 Moscow, Russia
- Biotechnology Department, Lomonosov Moscow State University, 142290 Pushchino, Moscow Region, Russia
- Correspondence: or (D.N.I.); (A.V.F.); Tel.: +7-495-280-1481 (ext. 3320) (D.N.I.); +7-496-731-8412 (A.V.F.)
| |
Collapse
|
13
|
Marrero-Ponce Y, Teran JE, Contreras-Torres E, García-Jacas CR, Perez-Castillo Y, Cubillan N, Peréz-Giménez F, Valdés-Martini JR. LEGO-based generalized set of two linear algebraic 3D bio-macro-molecular descriptors: Theory and validation by QSARs. J Theor Biol 2020; 485:110039. [DOI: 10.1016/j.jtbi.2019.110039] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2019] [Revised: 09/11/2019] [Accepted: 10/02/2019] [Indexed: 11/28/2022]
|
14
|
Yrazu FM, Pinamonti G, Clementi C. The Effect of Electrostatic Interactions on the Folding Kinetics of a 3-α-Helical Bundle Protein Family. J Phys Chem B 2018; 122:11800-11806. [PMID: 30277393 DOI: 10.1021/acs.jpcb.8b08676] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
The trio of protein segment repeats called spectrins diverges by more than 2 orders of magnitude in their folding and unfolding rates, despite having very similar stabilities and almost coincidental topologies. Experimental studies revealed that the mutation of five particular residues dramatically alters the kinetic rates in the slow folders, making them similar to the rates of the fast folder. This is considered to be an exceptional behavior which seems in principle to challenge the current understanding of the protein folding process. In this work, we analyze this scenario, using a simplified computational model, combined with state-of-the-art kinetic analysis techniques. Our model faithfully separates the kinetics of the fast and slow folders and captures the effect of the five mutations. We show that the inclusion of electrostatics in the model is necessary to explain the experimental findings.
Collapse
Affiliation(s)
- Fernando Miguel Yrazu
- Department of Chemical and Biomolecular Engineering , Rice University , Houston , Texas 77005 , United States
| | - Giovanni Pinamonti
- Department of Informatics and Mathematics , Freie Universität Berlin , 14195 Berlin , Germany
| | - Cecilia Clementi
- Department of Chemical and Biomolecular Engineering , Rice University , Houston , Texas 77005 , United States.,Department of Informatics and Mathematics , Freie Universität Berlin , 14195 Berlin , Germany.,Center for Theoretical Biological Physics and Department of Chemistry , Rice University , Houston , Texas 77005 , United States
| |
Collapse
|
15
|
Rajendran S, Jothi A. Sequentially distant but structurally similar proteins exhibit fold specific patterns based on their biophysical properties. Comput Biol Chem 2018; 75:143-153. [PMID: 29783123 DOI: 10.1016/j.compbiolchem.2018.05.009] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2017] [Revised: 05/06/2018] [Accepted: 05/07/2018] [Indexed: 11/25/2022]
Abstract
The Three-dimensional structure of a protein depends on the interaction between their amino acid residues. These interactions are in turn influenced by various biophysical properties of the amino acids. There are several examples of proteins that share the same fold but are very dissimilar at the sequence level. For proteins to share a common fold some crucial interactions should be maintained despite insignificant sequence similarity. Since the interactions are because of the biophysical properties of the amino acids, we should be able to detect descriptive patterns for folds at such a property level. In this line, the main focus of our research is to analyze such proteins and to characterize them in terms of their biophysical properties. Protein structures with sequence similarity lesser than 40% were selected for ten different subfolds from three different mainfolds (according to CATH classification) and were used for this analysis. We used the normalized values of the 49 physio-chemical, energetic and conformational properties of amino acids. We characterize the folds based on the average biophysical property values. We also observed a fold specific correlational behavior of biophysical properties despite a very low sequence similarity in our data. We further trained three different binary classification models (Naive Bayes-NB, Support Vector Machines-SVM and Bayesian Generalized Linear Model-BGLM) which could discriminate mainfold based on the biophysical properties. We also show that among the three generated models, the BGLM classifier model was able to discriminate protein sequences coming under all beta category with 81.43% accuracy and all alpha, alpha-beta proteins with 83.37% accuracy.
Collapse
Affiliation(s)
- Senthilnathan Rajendran
- Department of Bioinformatics, School of Chemical and Biotechnology, SASTRA Deemed University, Thanjavur, Tamil Nadu, 613401, India.
| | - Arunachalam Jothi
- Department of Bioinformatics, School of Chemical and Biotechnology, SASTRA Deemed University, Thanjavur, Tamil Nadu, 613401, India.
| |
Collapse
|
16
|
Influence of Amino Acid Properties for Characterizing Amyloid Peptides in Human Proteome. INTELLIGENT COMPUTING THEORIES AND APPLICATION 2017. [DOI: 10.1007/978-3-319-63312-1_47] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
17
|
Anoosha P, Sakthivel R, Michael Gromiha M. Exploring preferred amino acid mutations in cancer genes: Applications to identify potential drug targets. Biochim Biophys Acta Mol Basis Dis 2015; 1862:155-65. [PMID: 26581171 DOI: 10.1016/j.bbadis.2015.11.006] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Revised: 10/24/2015] [Accepted: 11/11/2015] [Indexed: 12/25/2022]
Abstract
Somatic mutations developed with missense, silent, insertions and deletions have varying effects on the resulting protein and are one of the important reasons for cancer development. In this study, we have systematically analysed the effect of these mutations at protein level in 41 different cancer types from COSMIC database on different perspectives: (i) Preference of residues at the mutant positions, (ii) probability of substitutions, (iii) influence of neighbouring residues in driver and passenger mutations, (iv) distribution of driver and passenger mutations around hotspot site in five typical genes and (v) distribution of silent and missense substitutions. We observed that R→H substitution is dominant in drivers followed by R→Q and R→C whereas E→K has the highest preference in passenger mutations. A set of 17 mutations including R→Y, W→A and V→R are specific to driver mutations and 31 preferred substitutions are observed only in passenger mutations. These frequencies of driver mutations vary across different cancer types and are selective to specific tissues. Further, driver missense mutations are mainly surrounded with silent driver mutations whereas the passenger missense mutations are surrounded with silent passenger mutations. This study reveals the variation of mutations at protein level in different cancer types and their preferences in cancer genes and provides new insights for understanding cancer mutations and drug development.
Collapse
Affiliation(s)
- P Anoosha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| | - R Sakthivel
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India.
| |
Collapse
|
18
|
Marrero-Ponce Y, Contreras-Torres E, García-Jacas CR, Barigye SJ, Cubillán N, Alvarado YJ. Novel 3D bio-macromolecular bilinear descriptors for protein science: Predicting protein structural classes. J Theor Biol 2015; 374:125-37. [DOI: 10.1016/j.jtbi.2015.03.026] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Revised: 02/23/2015] [Accepted: 03/20/2015] [Indexed: 12/11/2022]
|
19
|
Barigye SJ, Marrero-Ponce Y, Zupan J, Pérez-Giménez F, Freitas MP. Structural and Physicochemical Interpretation of GT-STAF Information Theory-Based Indices. BULLETIN OF THE CHEMICAL SOCIETY OF JAPAN 2015. [DOI: 10.1246/bcsj.20140037] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Stephen J. Barigye
- Departamento de Química, Universidade Federal de Lavras, UFLA
- Unit of Computer-Aided Molecular “Biosilico” Discovery and Bioinformatic Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy, Universidad Central “Martha Abreu” de Las Villas
| | - Yovani Marrero-Ponce
- Unit of Computer-Aided Molecular “Biosilico” Discovery and Bioinformatic Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy, Universidad Central “Martha Abreu” de Las Villas
- Institut Universitari de Ciència Molecular, Universitat de València, Edifici d’Instituts de Paterna
- Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València
- Facultad de Química Farmacéutica, Universidad de Cartagena
| | - Jure Zupan
- Laboratory of Chemometrics, National Institute of Chemistry
| | - Facundo Pérez-Giménez
- Unidad de Investigación de Diseño de Fármacos y Conectividad Molecular, Departamento de Química Física, Facultad de Farmacia, Universitat de València
| | | |
Collapse
|
20
|
Ruiz-Blanco YB, Marrero-Ponce Y, Prieto PJ, Salgado J, García Y, Sotomayor-Torres CM. A Hooke׳s law-based approach to protein folding rate. J Theor Biol 2015; 364:407-17. [DOI: 10.1016/j.jtbi.2014.09.002] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2014] [Revised: 08/28/2014] [Accepted: 09/02/2014] [Indexed: 10/24/2022]
|
21
|
Computational and experimental approaches to reveal the effects of single nucleotide polymorphisms with respect to disease diagnostics. Int J Mol Sci 2014; 15:9670-717. [PMID: 24886813 PMCID: PMC4100115 DOI: 10.3390/ijms15069670] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 05/15/2014] [Accepted: 05/16/2014] [Indexed: 12/25/2022] Open
Abstract
DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules.
Collapse
|
22
|
Real value prediction of protein folding rate change upon point mutation. J Comput Aided Mol Des 2012; 26:339-47. [DOI: 10.1007/s10822-012-9560-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2011] [Accepted: 03/02/2012] [Indexed: 10/28/2022]
|
23
|
Rao HB, Zhu F, Yang GB, Li ZR, Chen YZ. Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence. Nucleic Acids Res 2011; 39:W385-90. [PMID: 21609959 PMCID: PMC3125735 DOI: 10.1093/nar/gkr284] [Citation(s) in RCA: 105] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Sequence-derived structural and physicochemical features have been extensively used for analyzing and predicting structural, functional, expression and interaction profiles of proteins and peptides. PROFEAT has been developed as a web server for computing commonly used features of proteins and peptides from amino acid sequence. To facilitate more extensive studies of protein and peptides, numerous improvements and updates have been made to PROFEAT. We added new functions for computing descriptors of protein–protein and protein–small molecule interactions, segment descriptors for local properties of protein sequences, topological descriptors for peptide sequences and small molecule structures. We also added new feature groups for proteins and peptides (pseudo-amino acid composition, amphiphilic pseudo-amino acid composition, total amino acid properties and atomic-level topological descriptors) as well as for small molecules (atomic-level topological descriptors). Overall, PROFEAT computes 11 feature groups of descriptors for proteins and peptides, and a feature group of more than 400 descriptors for small molecules plus the derived features for protein–protein and protein–small molecule interactions. Our computational algorithms have been extensively tested and used in a number of published works for predicting proteins of specific structural or functional classes, protein–protein interactions, peptides of specific functions and quantitative structure activity relationships of small molecules. PROFEAT is accessible free of charge at http://bidd.cz3.nus.edu.sg/cgi-bin/prof/protein/profnew.cgi.
Collapse
Affiliation(s)
- H B Rao
- College of Chemistry, Sichuan University, Chengdu, 610064, PR China
| | | | | | | | | |
Collapse
|
24
|
Harihar B, Selvaraj S. Application of long-range order to predict unfolding rates of two-state proteins. Proteins 2010; 79:880-7. [DOI: 10.1002/prot.22925] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2010] [Revised: 10/07/2010] [Accepted: 10/24/2010] [Indexed: 01/09/2023]
|
25
|
Chang L, Wang J, Wang W. Composition-based effective chain length for prediction of protein folding rates. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2010; 82:051930. [PMID: 21230523 DOI: 10.1103/physreve.82.051930] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2010] [Indexed: 05/30/2023]
Abstract
Folding rate prediction is a useful way to find the key factors affecting folding kinetics of proteins. Structural information is more or less required in the present prediction methods, which limits the application of these methods to various proteins. In this work, an "effective length" is defined solely based on the composition of a protein, namely, the number of specific types of amino acids in a protein. A physical theory based on a minimalist model is employed to describe the relation between the folding rates and the effective length of proteins. Based on the resultant relationship between folding rates and effective length, the optimal sets of amino acids are found through the enumeration over all possible combinations of amino acids. This optimal set achieves a high correlation (with the coefficient of 0.84) between the folding rates and the optimal effective length. The features of these amino acids are consistent with our model and landscape theory. Further comparisons between our effective length and other factors are carried out. The effective length is physically consistent with structure-based prediction methods and has the best predictability for folding rates. These results all suggest that both entropy and energetics contribute importantly to folding kinetics. The ability to accurately and efficiently predict folding rates from composition enables the analysis of the kinetics for various kinds of proteins. The underlying physics in our method may be helpful to stimulate further understanding on the effects of various amino acids in folding dynamics.
Collapse
Affiliation(s)
- Le Chang
- National Laboratory of Solid State Microstructure and Department of Physics, Nanjing University, Nanjing 210093, China
| | | | | |
Collapse
|
26
|
Huang LT, Gromiha MM. First insight into the prediction of protein folding rate change upon point mutation. Bioinformatics 2010; 26:2121-7. [DOI: 10.1093/bioinformatics/btq350] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
27
|
Xi L, Li S, Liu H, Li J, Lei B, Yao X. Global and local prediction of protein folding rates based on sequence autocorrelation information. J Theor Biol 2010; 264:1159-68. [DOI: 10.1016/j.jtbi.2010.03.042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2009] [Revised: 03/28/2010] [Accepted: 03/29/2010] [Indexed: 11/24/2022]
|
28
|
Harihar B, Selvaraj S. Refinement of the long-range order parameter in predicting folding rates of two-state proteins. Biopolymers 2009; 91:928-35. [DOI: 10.1002/bip.21281] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
29
|
Gromiha MM. Multiple Contact Network Is a Key Determinant to Protein Folding Rates. J Chem Inf Model 2009; 49:1130-5. [DOI: 10.1021/ci800440x] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- M. Michael Gromiha
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan
| |
Collapse
|
30
|
Huang LT, Gromiha MM. Analysis and prediction of protein folding rates using quadratic response surface models. J Comput Chem 2008; 29:1675-83. [PMID: 18351617 DOI: 10.1002/jcc.20925] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Understanding the relationship between amino acid sequences and folding rates of proteins is an important task in computational and molecular biology. In this work, we have systematically analyzed the composition of amino acid residues for proteins with different ranges of folding rates. We observed that the polar residues, Asn, Gln, Ser, and Lys, are dominant in fast folding proteins whereas the hydrophobic residues, Ala, Cys, Gly, and Leu, prefer to be in slow folding proteins. Further, we have developed a method based on quadratic response surface models for predicting the folding rates of 77 two- and three-state proteins. Our method showed a correlation of 0.90 between experimental and predicted protein folding rates using leave-one-out cross-validation method. The classification of proteins based on structural class improved the correlation to 0.98 and it is 0.99, 0.98, and 0.96, respectively, for all-alpha, all-beta, and mixed class proteins. In addition, we have utilized Baysean classification theory for discriminating two- and three-state proteins, which showed an accuracy of 90%. We have developed a web server for predicting protein folding rates and it is available at http://bioinformatics.myweb.hinet.net/foldrate.htm.
Collapse
Affiliation(s)
- Liang-Tsung Huang
- Department of Computer Science and Information Engineering, Ming-Dao University, Changhua 523, Taiwan
| | | |
Collapse
|
31
|
Istomin AY, Jacobs DJ, Livesay DR. On the role of structural class of a protein with two-state folding kinetics in determining correlations between its size, topology, and folding rate. Protein Sci 2008; 16:2564-9. [PMID: 17962408 DOI: 10.1110/ps.073124507] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
The time it takes for proteins to fold into their native states varies over several orders of magnitude depending on their native-state topology, size, and amino acid composition. In a number of previous studies, it was found that there is strong correlation between logarithmic folding rates and contact order for proteins that fold with two-state kinetics, while such correlation is absent for three-state proteins. Conversely, strong correlations between folding rates and chain length occur within three-state proteins, but not in two-state proteins. Here, we demonstrate that chain lengths and folding rates of two-state proteins are not correlated with each other only when all-alpha, all-beta, and mixed-class proteins are considered together, which is typically the case. However, when considering all-alpha and all-beta two-state proteins separately, there is significant linear correlation between folding rate and size. Moreover, the sets of data points for the all-alpha and all-beta classes define asymptotes of lower and upper limits on folding rates of mixed-class proteins. By analyzing correlation of other topological parameters with folding rates of two-state proteins, we find that only the long-range order exhibits correlation with folding rates that is uniform over all three classes. It is also the only descriptor to provide statistically significant correlations for each of the three structural classes. We give an interpretation of this observation in terms of Makarov and Plaxco's diffusion-based topomer-search model.
Collapse
Affiliation(s)
- Andrei Y Istomin
- Department of Physics and Optical Science, University of North Carolina at Charlotte 28223, USA.
| | | | | |
Collapse
|
32
|
Taguchi YH, Gromiha MM. Application of amino acid occurrence for discriminating different folding types of globular proteins. BMC Bioinformatics 2007; 8:404. [PMID: 17953741 PMCID: PMC2174517 DOI: 10.1186/1471-2105-8-404] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2007] [Accepted: 10/22/2007] [Indexed: 11/10/2022] Open
Abstract
Background Predicting the three-dimensional structure of a protein from its amino acid sequence is a long-standing goal in computational/molecular biology. The discrimination of different structural classes and folding types are intermediate steps in protein structure prediction. Results In this work, we have proposed a method based on linear discriminant analysis (LDA) for discriminating 30 different folding types of globular proteins using amino acid occurrence. Our method was tested with a non-redundant set of 1612 proteins and it discriminated them with the accuracy of 38%, which is comparable to or better than other methods in the literature. A web server has been developed for discriminating the folding type of a query protein from its amino acid sequence and it is available at http://granular.com/PROLDA/. Conclusion Amino acid occurrence has been successfully used to discriminate different folding types of globular proteins. The discrimination accuracy obtained with amino acid occurrence is better than that obtained with amino acid composition and/or amino acid properties. In addition, the method is very fast to obtain the results.
Collapse
Affiliation(s)
- Y-h Taguchi
- Department of Physics, Faculty of Science and Technology, Chuo University, 1-13-27 Kasuga, Bunkyo-ku, Tokyo 112-8551, Japan.
| | | |
Collapse
|
33
|
Huang LT, Saraboji K, Ho SY, Hwang SF, Ponnuswamy MN, Gromiha MM. Prediction of protein mutant stability using classification and regression tool. Biophys Chem 2007; 125:462-70. [PMID: 17113702 DOI: 10.1016/j.bpc.2006.10.009] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2006] [Revised: 10/19/2006] [Accepted: 10/23/2006] [Indexed: 11/18/2022]
Abstract
Prediction of protein stability upon amino acid substitutions is an important problem in molecular biology and the solving of which would help for designing stable mutants. In this work, we have analyzed the stability of protein mutants using two different datasets of 1396 and 2204 mutants obtained from ProTherm database, respectively for free energy change due to thermal (DeltaDeltaG) and denaturant denaturations (DeltaDeltaG(H(2)O)). We have used a set of 48 physical, chemical energetic and conformational properties of amino acid residues and computed the difference of amino acid properties for each mutant in both sets of data. These differences in amino acid properties have been related to protein stability (DeltaDeltaG and DeltaDeltaG(H(2)O)) and are used to train with classification and regression tool for predicting the stability of protein mutants. Further, we have tested the method with 4 fold, 5 fold and 10 fold cross validation procedures. We found that the physical properties, shape and flexibility are important determinants of protein stability. The classification of mutants based on secondary structure (helix, strand, turn and coil) and solvent accessibility (buried, partially buried, partially exposed and exposed) distinguished the stabilizing/destabilizing mutants at an average accuracy of 81% and 80%, respectively for DeltaDeltaG and DeltaDeltaG(H(2)O). The correlation between the experimental and predicted stability change is 0.61 for DeltaDeltaG and 0.44 for DeltaDeltaG(H(2)O). Further, the free energy change due to the replacement of amino acid residue has been predicted within an average error of 1.08 kcal/mol and 1.37 kcal/mol for thermal and chemical denaturation, respectively. The relative importance of secondary structure and solvent accessibility, and the influence of the dataset on prediction of protein mutant stability have been discussed.
Collapse
Affiliation(s)
- Liang-Tsung Huang
- Institute of Information Engineering and Computer Science, Feng-Chia University, Taichung, 407, Taiwan
| | | | | | | | | | | |
Collapse
|
34
|
Zhou P, Zeng H, Tian FF, Li B, Li ZL. Applying Novel Molecular Electronegativity-Interaction Vector (MEIV) to QSPR Study on Collision Cross Section of Singly Protonated Peptides. ACTA ACUST UNITED AC 2007. [DOI: 10.1002/qsar.200510220] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
35
|
Gromiha MM, Thangakani AM, Selvaraj S. FOLD-RATE: prediction of protein folding rates from amino acid sequence. Nucleic Acids Res 2006; 34:W70-4. [PMID: 16845101 PMCID: PMC1538837 DOI: 10.1093/nar/gkl043] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We have developed a web server, FOLD-RATE, for predicting the folding rates of proteins from their amino acid sequences. The relationship between amino acid properties and protein folding rates has been systematically analyzed and a statistical method based on linear regression technique has been proposed for predicting the folding rate of proteins. We found that the classification of proteins into different structural classes shows an excellent correlation between amino acid properties and folding rates of two and three-state proteins. Consequently, different regression equations have been developed for proteins belonging to all-alpha, all-beta and mixed class. We observed an excellent agreement between predicted and experimentally observed folding rates of proteins; the correlation coefficients are, 0.99, 0.97 and 0.90, respectively, for all-alpha, all-beta and mixed class proteins. The prediction server is freely available at http://psfs.cbrc.jp/fold-rate/.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|
36
|
Gromiha MM, Selvaraj S, Thangakani AM. A Statistical Method for Predicting Protein Unfolding Rates from Amino Acid Sequence. J Chem Inf Model 2006; 46:1503-8. [PMID: 16711769 DOI: 10.1021/ci050417u] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The prediction of protein unfolding rates from amino acid sequences is one of the most important challenges in computational biology and chemistry. The analysis on the relationship between protein unfolding rates and physical-chemical, energetic, and conformational properties of amino acid residues provides valuable information to understand and predict the unfolding rates of two- and three-state proteins. We found that the classification of proteins into different structural classes shows an excellent correlation between amino acid properties and unfolding rates of two- and three-state proteins, indicating the importance of native-state topology in determining the protein unfolding rates. We have formulated three independent linear regression equations to different structural classes of proteins for predicting their unfolding rates from amino acid sequences and obtained an excellent agreement between predicted and experimentally observed unfolding rates of proteins; the correlation coefficients are 0.999, 0.990, and 0.992, respectively, for all-alpha, all-beta, and mixed-class proteins. Further, we have derived a general equation applicable to all structural classes of proteins, which can be used for predicting the unfolding rates for proteins of an unknown structural class. We observed a correlation of 0.987 and 0.930, respectively, for back-check and jack-knife tests. These accuracy levels are better than those of other methods in the literature.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), AIST Tokyo Waterfront Bio-IT Research Building, 2-42 Aomi, Tokyo 135-0064, Japan.
| | | | | |
Collapse
|
37
|
Schmuck C, Heil M, Scheiber J, Baumann K. Ladungswechselwirkungen machen es möglich: ein kombinierter statistischer und kombinatorischer Ansatz zur Auffindung künstlicher Rezeptoren für die Bindung von Tetrapeptiden in Wasser. Angew Chem Int Ed Engl 2005. [DOI: 10.1002/ange.200501812] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
38
|
Schmuck C, Heil M, Scheiber J, Baumann K. Charge Interactions Do the Job: A Combined Statistical and Combinatorial Approach to Finding Artificial Receptors for Binding Tetrapeptides in Water. Angew Chem Int Ed Engl 2005; 44:7208-12. [PMID: 16231382 DOI: 10.1002/anie.200501812] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Carsten Schmuck
- Universität Würzburg, Institut für Organische Chemie, Am Hubland, 97074 Würzburg, Germany.
| | | | | | | |
Collapse
|
39
|
Gromiha MM. A Statistical Model for Predicting Protein Folding Rates from Amino Acid Sequence with Structural Class Information. J Chem Inf Model 2005; 45:494-501. [PMID: 15807515 DOI: 10.1021/ci049757q] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Prediction of protein folding rates from amino acid sequences is one of the most important challenges in molecular biology. In this work, I have related the protein folding rates with physical-chemical, energetic and conformational properties of amino acid residues. I found that the classification of proteins into different structural classes shows an excellent correlation between amino acid properties and folding rates of two- and three-state proteins, indicating the importance of native state topology in determining the protein folding rates. I have formulated a simple linear regression model for predicting the protein folding rates from amino acid sequences along with structural class information and obtained an excellent agreement between predicted and experimentally observed folding rates of proteins; the correlation coefficients are 0.99, 0.96 and 0.95, respectively, for all-alpha, all-beta and mixed class proteins. This is the first available method, which is capable of predicting the protein folding rates just from the amino acid sequence with the aid of generic amino acid properties and structural class information.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Aomi Frontier Building 17F, 2-43 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| |
Collapse
|
40
|
Jacobs DJ, Dallakyan S. Elucidating protein thermodynamics from the three-dimensional structure of the native state using network rigidity. Biophys J 2004; 88:903-15. [PMID: 15542549 PMCID: PMC1305163 DOI: 10.1529/biophysj.104.048496] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Given the three-dimensional structure of a protein, its thermodynamic properties are calculated using a recently introduced distance constraint model (DCM) within a mean-field treatment. The DCM is constructed from a free energy decomposition that partitions microscopic interactions into a variety of constraint types, i.e., covalent bonds, salt-bridges, hydrogen-bonds, and torsional-forces, each associated with an enthalpy and entropy contribution. A Gibbs ensemble of accessible microstates is defined by a set of topologically distinct mechanical frameworks generated by perturbing away from the native constraint topology. The total enthalpy of a given framework is calculated as a linear sum of enthalpy components over all constraints present. Total entropy is generally a nonadditive property of free energy decompositions. Here, we calculate total entropy as a linear sum of entropy components over a set of independent constraints determined by a graph algorithm that builds up a mechanical framework one constraint at a time, placing constraints with lower entropy before those with greater entropy. This procedure provides a natural mechanism for enthalpy-entropy compensation. A minimal DCM with five phenomenological parameters is found to capture the essential physics relating thermodynamic response to network rigidity. Moreover, two parameters are fixed by simultaneously fitting to heat capacity curves for histidine binding protein and ubiquitin at five different pH conditions. The three free parameter DCM provides a quantitative characterization of conformational flexibility consistent with thermodynamic stability. It is found that native hydrogen bond topology provides a key signature in governing molecular cooperativity and the folding-unfolding transition.
Collapse
Affiliation(s)
- Donald J Jacobs
- Physics and Astronomy Department, California State University, Northridge, California, USA.
| | | |
Collapse
|
41
|
Gromiha MM, Selvaraj S. Inter-residue interactions in protein folding and stability. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2004; 86:235-77. [PMID: 15288760 DOI: 10.1016/j.pbiomolbio.2003.09.003] [Citation(s) in RCA: 225] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
During the process of protein folding, the amino acid residues along the polypeptide chain interact with each other in a cooperative manner to form the stable native structure. The knowledge about inter-residue interactions in protein structures is very helpful to understand the mechanism of protein folding and stability. In this review, we introduce the classification of inter-residue interactions into short, medium and long range based on a simple geometric approach. The features of these interactions in different structural classes of globular and membrane proteins, and in various folds have been delineated. The development of contact potentials and the application of inter-residue contacts for predicting the structural class and secondary structures of globular proteins, solvent accessibility, fold recognition and ab initio tertiary structure prediction have been evaluated. Further, the relationship between inter-residue contacts and protein-folding rates has been highlighted. Moreover, the importance of inter-residue interactions in protein-folding kinetics and for understanding the stability of proteins has been discussed. In essence, the information gained from the studies on inter-residue interactions provides valuable insights for understanding protein folding and de novo protein design.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Aomi Frontier Building 17F, 2-43 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | |
Collapse
|
42
|
Abstract
Small monomeric proteins often fold in apparent two-state processes with folding speeds dictated by their native-state topology. Here we test, for the first time, the influence of monomer topology on the folding speed of an oligomeric protein: the heptameric cochaperonin protein 10 (cpn10), which in the native state has seven beta-barrel subunits noncovalently assembled through beta-strand pairing. Cpn10 is a particularly useful model because equilibrium-unfolding experiments have revealed that the denatured state in urea is that of a nonnative heptamer. Surprisingly, refolding of the nonnative cpn10 heptamer is a simple two-state kinetic process with a folding-rate constant in water (2.1 sec(-1); pH 7.0, 20 degrees C) that is in excellent agreement with the prediction based on the native-state topology of the cpn10 monomer. Thus, the monomers appear to fold as independent units, with a speed that correlates with topology, although the C and N termini are trapped in beta-strand pairing with neighboring subunits. In contrast, refolding of unfolded cpn10 monomers is dominated by a slow association step.
Collapse
Affiliation(s)
- Neil Bascos
- Molecular and Cellular Biology Graduate Program, Tulane University, New Orleans, Louisiana 70112, USA
| | | | | |
Collapse
|
43
|
Gromiha MM, Saraboji K, Ahmad S, Ponnuswamy MN, Suwa M. Role of non-covalent interactions for determining the folding rate of two-state proteins. Biophys Chem 2004; 107:263-72. [PMID: 14967241 DOI: 10.1016/j.bpc.2003.09.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2003] [Revised: 09/08/2003] [Accepted: 09/17/2003] [Indexed: 11/30/2022]
Abstract
Understanding the factors influencing the folding rate of proteins is a challenging problem. In this work, we have analyzed the role of non-covalent interactions for the folding rate of two-state proteins by free-energy approach. We have computed the free-energy terms, hydrophobic, electrostatic, hydrogen-bonding and van der Waals free energies. The hydrophobic free energy has been divided into the contributions from different atoms, carbon, neutral nitrogen and oxygen, charged nitrogen and oxygen, and sulfur. All the free-energy terms have been related with the folding rates of 28 two-state proteins with single and multiple correlation coefficients. We found that the hydrophobic free energy due to carbon atoms and hydrogen-bonding free energy play important roles to determine the folding rate in combination with other free energies. The normalized energies with total number of residues showed better results than the total energy of the protein. The comparison of amino acid properties with free-energy terms indicates that the energetic terms explain better the folding rate than amino acid properties. Further, the combination of free energies with topological parameters yielded the correlation of 0.91. The present study demonstrates the importance of topology for determining the folding rate of two-state proteins.
Collapse
Affiliation(s)
- M Michael Gromiha
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Aomi Frontier Building 17F, 2-43 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | | | | | | | |
Collapse
|