1
|
Pal P, Chakraborty S, Jana B. Number of Hydrogen Bonds per Unit Solvent Accessible Surface Area: A Descriptor of Functional States of Proteins. J Phys Chem B 2022; 126:10822-10833. [PMID: 36524238 DOI: 10.1021/acs.jpcb.2c05367] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Proteins function close to native and near-native conformations. These states are evolutionarily selected to ensure the effect of mutations is minimized. The structural organization of a protein is hierarchical and modular, which reduces the dimensionality of the configurational space of the native states. Thus, finding appropriate descriptors that define the native state among all possible states of a protein is a problem of immense interest. The present study explores the correlation between solvent accessible surface areas (SASAs) and different intraprotein as well as protein-water hydrogen bonds of 55 single-chain globular proteins from four different structural classes (all α, all β, α+β, and α/β), 16 multichain proteins, and 4 macromolecular complexes. A systematic analysis of the solvent accessible surface area and intraprotein and protein-water hydrogen bonds suggests a linear relationship between SASAs and hydrogen bonds. The number of protein-water hydrogen bonds per unit SASA ranges from 3 to 4 for all the different structural protein classes. In contrast, the number of intramolecular hydrogen bonds per unit SASA, including the mainchain-mainchain, mainchain-sidechain, and sidechain-sidechain, varies between 0.75 to 2. The solvation free energy of a protein linearly decreases with SASA. Our study also shows that the solvation free energy/SASA varies from -75 to -105 kJ mol-1 nm-2 across all the native states studied here. The number conservancy of intraprotein hydrogen bonds per unit SASA possibly imparts structural stability to the native structure. On the other hand, 3-4 protein-water hydrogen bonds per unit SASA are possibly required to maintain a balance between the solubility and functionality of the native states. This study provides a basis for synthetic biologists to design new folds with improved functionality.
Collapse
Affiliation(s)
- Prasun Pal
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| | - Sandipan Chakraborty
- Center for Innovation in Molecular and Pharmaceutical Sciences (CIMPS), Dr. Reddy's Institute of Life Sciences, University of Hyderabad Campus, Gachibowli, Hyderabad 500046, India
| | - Biman Jana
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| |
Collapse
|
2
|
Hierarchical classification of protein folds using a novel ensemble classifier. PLoS One 2013; 8:e56499. [PMID: 23437146 PMCID: PMC3577917 DOI: 10.1371/journal.pone.0056499] [Citation(s) in RCA: 111] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2012] [Accepted: 01/10/2013] [Indexed: 12/03/2022] Open
Abstract
The analysis of biological information from protein sequences is important for the study of cellular functions and interactions, and protein fold recognition plays a key role in the prediction of protein structures. Unfortunately, the prediction of protein fold patterns is challenging due to the existence of compound protein structures. Here, we processed the latest release of the Structural Classification of Proteins (SCOP, version 1.75) database and exploited novel techniques to impressively increase the accuracy of protein fold classification. The techniques proposed in this paper include ensemble classifying and a hierarchical framework, in the first layer of which similar or redundant sequences were deleted in two manners; a set of base classifiers, fused by various selection strategies, divides the input into seven classes; in the second layer of which, an analogous ensemble method is adopted to predict all protein folds. To our knowledge, it is the first time all protein folds can be intelligently detected hierarchically. Compared with prior studies, our experimental results demonstrated the efficiency and effectiveness of our proposed method, which achieved a success rate of 74.21%, which is much higher than results obtained with previous methods (ranging from 45.6% to 70.5%). When applied to the second layer of classification, the prediction accuracy was in the range between 23.13% and 46.05%. This value, which may not be remarkably high, is scientifically admirable and encouraging as compared to the relatively low counts of proteins from most fold recognition programs. The web server Hierarchical Protein Fold Prediction (HPFP) is available at http://datamining.xmu.edu.cn/software/hpfp.
Collapse
|
3
|
Chen W, Liu X, Huang Y, Jiang Y, Zou Q, Lin C. Improved method for predicting protein fold patterns with ensemble classifiers. GENETICS AND MOLECULAR RESEARCH 2012; 11:174-81. [PMID: 22370884 DOI: 10.4238/2012.january.27.4] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
Abstract
Protein folding is recognized as a critical problem in the field of biophysics in the 21st century. Predicting protein-folding patterns is challenging due to the complex structure of proteins. In an attempt to solve this problem, we employed ensemble classifiers to improve prediction accuracy. In our experiments, 188-dimensional features were extracted based on the composition and physical-chemical property of proteins and 20-dimensional features were selected using a coupled position-specific scoring matrix. Compared with traditional prediction methods, these methods were superior in terms of prediction accuracy. The 188-dimensional feature-based method achieved 71.2% accuracy in five cross-validations. The accuracy rose to 77% when we used a 20-dimensional feature vector. These methods were used on recent data, with 54.2% accuracy. Source codes and dataset, together with web server and software tools for prediction, are available at: http://datamining.xmu.edu.cn/main/~cwc/ProteinPredict.html.
Collapse
Affiliation(s)
- W Chen
- School of Information Science and Technology, Xiamen University, Xiamen, Fujian, China
| | | | | | | | | | | |
Collapse
|
4
|
Muda HM, Saad P, Othman RM. Remote protein homology detection and fold recognition using two-layer support vector machine classifiers. Comput Biol Med 2011; 41:687-99. [PMID: 21704312 DOI: 10.1016/j.compbiomed.2011.06.004] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2009] [Revised: 03/16/2011] [Accepted: 06/05/2011] [Indexed: 02/07/2023]
Abstract
Remote protein homology detection and fold recognition refer to detection of structural homology in proteins where there are small or no similarities in the sequence. To detect protein structural classes from protein primary sequence information, homology-based methods have been developed, which can be divided to three types: discriminative classifiers, generative models for protein families and pairwise sequence comparisons. Support Vector Machines (SVM) and Neural Networks (NN) are two popular discriminative methods. Recent studies have shown that SVM has fast speed during training, more accurate and efficient compared to NN. We present a comprehensive method based on two-layer classifiers. The 1st layer is used to detect up to superfamily and family in SCOP hierarchy using optimized binary SVM classification rules. It used the kernel function known as the Bio-kernel, which incorporates the biological information in the classification process. The 2nd layer uses discriminative SVM algorithm with string kernel that will detect up to protein fold level in SCOP hierarchy. The results obtained were evaluated using mean ROC and mean MRFP and the significance of the result produced with pairwise t-test was tested. Experimental results show that our approaches significantly improve the performance of remote protein homology detection and fold recognition for all three different version SCOP datasets (1.53, 1.67 and 1.73). We achieved 4.19% improvements in term of mean ROC in SCOP 1.53, 4.75% in SCOP 1.67 and 4.03% in SCOP 1.73 datasets when compared to the result produced by well-known methods. The combination of first layer and second layer of BioSVM-2L performs well in remote homology detection and fold recognition even in three different versions of datasets.
Collapse
Affiliation(s)
- Hilmi M Muda
- Laboratory of Computational Intelligence and Biology, Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310 UTM Skudai, Malaysia
| | | | | |
Collapse
|
5
|
Triviño JC, Pazos F. Quantitative global studies of reactomes and metabolomes using a vectorial representation of reactions and chemical compounds. BMC SYSTEMS BIOLOGY 2010; 4:46. [PMID: 20406431 PMCID: PMC2883543 DOI: 10.1186/1752-0509-4-46] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2009] [Accepted: 04/20/2010] [Indexed: 12/02/2022]
Abstract
Background Global studies of the protein repertories of organisms are providing important information on the characteristics of the protein space. Many of these studies entail classification of the protein repertory on the basis of structure and/or sequence similarities. The situation is different for metabolism. Because there is no good way of measuring similarities between chemical reactions, there is a barrier to the development of global classifications of "metabolic space" and subsequent studies comparable to those done for protein sequences and structures. Results In this work, we propose a vectorial representation of chemical reactions, which allows them to be compared and classified. In this representation, chemical compounds, reactions and pathways may be represented in the same vectorial space. We show that the representation of chemical compounds reflects their physicochemical properties and can be used for predictive purposes. We use the vectorial representations of reactions to perform a global classification of the reactome of the model organism E. coli. Conclusions We show that this unsupervised clustering results in groups of enzymes more coherent in biological terms than equivalent groupings obtained from the EC hierarchy. This hierarchical clustering produces an optimal set of 21 groups which we analyzed for their biological meaning.
Collapse
Affiliation(s)
- Juan C Triviño
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/Darwin, 3, Cantoblanco, 28049 Madrid, Spain
| | | |
Collapse
|
6
|
Urban P, Truan G, Pompon D. Differences in Functional Clustering of Endogenous and Exogenous Substrates Between Members of the CYP1A Subfamily. ACTA ACUST UNITED AC 2009. [DOI: 10.2174/1874073100903010017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
7
|
Danielsson J, Liljedahl L, Bárány-Wallje E, Sønderby P, Kristensen LH, Martinez-Yamout MA, Dyson HJ, Wright PE, Poulsen FM, Mäler L, Gräslund A, Kragelund BB. The intrinsically disordered RNR inhibitor Sml1 is a dynamic dimer. Biochemistry 2009; 47:13428-37. [PMID: 19086274 DOI: 10.1021/bi801040b] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Sml1 is a small ribonucleotide reductase (RNR) regulatory protein in Saccharomyces cerevisiae that binds to and inhibits RNR activation. NMR studies of 15N-labeled Sml1 (104 residues), as well as of a truncated variant (residues 50-104), have allowed characterization of their molecular properties. Sml1 belongs to the class of intrinsically disordered proteins with a high degree of dynamics and very little stable structure. Earlier suggestions for a dimeric structure of Sml1 were confirmed, and from translation diffusion NMR measurements, a dimerization dissociation constant of 0.1 mM at 4 degreesC could be determined. The hydrodynamic radius for the monomeric form of Sml1 was determined to be 23.4 A, corresponding to a protein size between those of a globular protein and a coil. Formation of a dimer results in a hydrodynamic radius of 34.4 A. The observed chemical shifts showed in agreement with previous studies two segments with transient helical structure, residues 4-20 and 60-86, and relaxation studies clearly showed restricted motion in these segments. A spin-label attached to C14 showed long-range interactions with residues 60-70 and 85-95, suggesting that the N-terminal domain folds onto the C-terminal domain. Importantly, protease degradation studies combined with mass spectrometry indicated that the N-terminal domain is degraded before the C-terminal region and thus may serve as a protection against proteolysis of the functionally important C-terminal region. Dimer formation was not associated with significant induction of structure but was found to provide further protection against proteolysis. We propose that this molecular shielding and protection of vital functional structures from degradation by functionally unimportant sites may be a general attribute of other natively disordered proteins.
Collapse
Affiliation(s)
- Jens Danielsson
- Department of Biochemistry and Biophysics, Stockholm University, S-106 91 Stockholm, Sweden
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
8
|
Martínez L, Andreani R, Martínez JM. Convergent algorithms for protein structural alignment. BMC Bioinformatics 2007; 8:306. [PMID: 17714583 PMCID: PMC1995224 DOI: 10.1186/1471-2105-8-306] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2007] [Accepted: 08/22/2007] [Indexed: 11/15/2022] Open
Abstract
Background Many algorithms exist for protein structural alignment, based on internal protein coordinates or on explicit superposition of the structures. These methods are usually successful for detecting structural similarities. However, current practical methods are seldom supported by convergence theories. In particular, although the goal of each algorithm is to maximize some scoring function, there is no practical method that theoretically guarantees score maximization. A practical algorithm with solid convergence properties would be useful for the refinement of protein folding maps, and for the development of new scores designed to be correlated with functional similarity. Results In this work, the maximization of scoring functions in protein alignment is interpreted as a Low Order Value Optimization (LOVO) problem. The new interpretation provides a framework for the development of algorithms based on well established methods of continuous optimization. The resulting algorithms are convergent and increase the scoring functions at every iteration. The solutions obtained are critical points of the scoring functions. Two algorithms are introduced: One is based on the maximization of the scoring function with Dynamic Programming followed by the continuous maximization of the same score, with respect to the protein position, using a smooth Newtonian method. The second algorithm replaces the Dynamic Programming step by a fast procedure for computing the correspondence between Cα atoms. The algorithms are shown to be very effective for the maximization of the STRUCTAL score. Conclusion The interpretation of protein alignment as a LOVO problem provides a new theoretical framework for the development of convergent protein alignment algorithms. These algorithms are shown to be very reliable for the maximization of the STRUCTAL score, and other distance-dependent scores may be optimized with same strategy. The improved score optimization provided by these algorithms provide means for the refinement of protein fold maps and also for the development of scores designed to match biological function. The LOVO strategy may be also used for more general structural superposition problems such as flexible or non-sequential alignments. The package is available on-line at http://www.ime.unicamp.br/~martinez/lovoalign.
Collapse
Affiliation(s)
- Leandro Martínez
- Institute of Chemistry, State University of Campinas, Campinas, SP, Brazil
| | - Roberto Andreani
- Department of Applied Mathematics, IMECC-UNICAMP, State University of Campinas, Campinas, SP, Brazil
| | - José Mario Martínez
- Department of Applied Mathematics, IMECC-UNICAMP, State University of Campinas, CP 6065, 13081-970, Campinas, SP, Brazil
| |
Collapse
|
9
|
De Masi G, Iori G, Caldarelli G. Fitness model for the Italian interbank money market. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 74:066112. [PMID: 17280126 DOI: 10.1103/physreve.74.066112] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/22/2006] [Indexed: 05/13/2023]
Abstract
We use the theory of complex networks in order to quantitatively characterize the formation of communities in a particular financial market. The system is composed by different banks exchanging on a daily basis loans and debts of liquidity. Through topological analysis and by means of a model of network growth we can determine the formation of different group of banks characterized by different business strategy. The model based on Pareto's law makes no use of growth or preferential attachment and it reproduces correctly all the various statistical properties of the system. We believe that this network modeling of the market could be an efficient way to evaluate the impact of different policies in the market of liquidity.
Collapse
Affiliation(s)
- G De Masi
- Dipartimento di Fisica, Università di L'quila, Via Vetoio, 67010 Coppito (AQ), Italy
| | | | | |
Collapse
|