1
|
Yu K, Cui Z, Sui X, Qiu X, Zhang J. Biological Network Inference With GRASP: A Bayesian Network Structure Learning Method Using Adaptive Sequential Monte Carlo. Front Genet 2021; 12:764020. [PMID: 34912373 PMCID: PMC8668238 DOI: 10.3389/fgene.2021.764020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Accepted: 10/25/2021] [Indexed: 11/13/2022] Open
Abstract
Bayesian networks (BNs) provide a probabilistic, graphical framework for modeling high-dimensional joint distributions with complex correlation structures. BNs have wide applications in many disciplines, including biology, social science, finance and biomedical science. Despite extensive studies in the past, network structure learning from data is still a challenging open question in BN research. In this study, we present a sequential Monte Carlo (SMC)-based three-stage approach, GRowth-based Approach with Staged Pruning (GRASP). A double filtering strategy was first used for discovering the overall skeleton of the target BN. To search for the optimal network structures we designed an adaptive SMC (adSMC) algorithm to increase the quality and diversity of sampled networks which were further improved by a third stage to reclaim edges missed in the skeleton discovery step. GRASP gave very satisfactory results when tested on benchmark networks. Finally, BN structure learning using multiple types of genomics data illustrates GRASP’s potential in discovering novel biological relationships in integrative genomic studies.
Collapse
Affiliation(s)
- Kaixian Yu
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - Zihan Cui
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - Xin Sui
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| | - Xing Qiu
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United States
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, United States
| |
Collapse
|
2
|
Ravikumar A, de Brevern AG, Srinivasan N. Conformational Strain Indicated by Ramachandran Angles for the Protein Backbone Is Only Weakly Related to the Flexibility. J Phys Chem B 2021; 125:2597-2606. [PMID: 33666418 DOI: 10.1021/acs.jpcb.1c00168] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Studies on energy associated with free dipeptides have shown that conformers with unfavorable (ϕ,ψ) torsion angles have higher energy compared to conformers with favorable (ϕ,ψ) angles. It is expected that higher energy confers higher dynamics and flexibility to that part of the protein. Here, we explore a potential relationship between conformational strain in a residue due to unfavorable (ϕ,ψ) angles and its flexibility and dynamics in the context of protein structures. We compared flexibility of strained and relaxed residues, which are recognized based on outlier/allowed and favorable (ϕ,ψ) angles respectively, using normal-mode analysis (NMA). We also performed in-depth analysis on flexibility and dynamics at catalytic residues in protein kinases, which exhibit different strain status in different kinase structures using NMA and molecular dynamics simulations. We underline that strain of a residue, as defined by backbone torsion angles, is almost unrelated to the flexibility and dynamics associated with it. Even the overall trend observed among all high-resolution structures in which relaxed residues tend to have slightly higher flexibility than strained residues is counterintuitive. Consequently, we propose that identifying strained residues based on (ϕ,ψ) values is not an effective way to recognize energetic strain in protein structures.
Collapse
Affiliation(s)
- Ashraya Ravikumar
- Molecular Biophysics Unit, Indian Institute of Science, Bengaluru, India, 560012
| | - Alexandre G de Brevern
- INSERM, U 1134, DSIMB, Paris F-75739, France.,University of Paris, Paris F-75739, France.,Institut National de la Transfusion Sanguine (INTS), Paris F-75739, France.,Laboratoire d'Excellence GR-Ex, Paris F-75739, France
| | | |
Collapse
|
3
|
Li W, Chen R, Tan Z. Efficient Sequential Monte Carlo With Multiple Proposals and Control Variates. J Am Stat Assoc 2016. [DOI: 10.1080/01621459.2015.1006364] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
4
|
Liang J, Cao Y, Gürsoy G, Naveed H, Terebus A, Zhao J. Multiscale Modeling of Cellular Epigenetic States: Stochasticity in Molecular Networks, Chromatin Folding in Cell Nuclei, and Tissue Pattern Formation of Cells. Crit Rev Biomed Eng 2015; 43:323-46. [PMID: 27480462 PMCID: PMC4976639 DOI: 10.1615/critrevbiomedeng.2016016559] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Genome sequences provide the overall genetic blueprint of cells, but cells possessing the same genome can exhibit diverse phenotypes. There is a multitude of mechanisms controlling cellular epigenetic states and that dictate the behavior of cells. Among these, networks of interacting molecules, often under stochastic control, depending on the specific wirings of molecular components and the physiological conditions, can have a different landscape of cellular states. In addition, chromosome folding in three-dimensional space provides another important control mechanism for selective activation and repression of gene expression. Fully differentiated cells with different properties grow, divide, and interact through mechanical forces and communicate through signal transduction, resulting in the formation of complex tissue patterns. Developing quantitative models to study these multi-scale phenomena and to identify opportunities for improving human health requires development of theoretical models, algorithms, and computational tools. Here we review recent progress made in these important directions.
Collapse
Affiliation(s)
- Jie Liang
- Program in Bioinformatics, Department of Bioengineering, University of Illinois at Chicago, IL, 60612, USA
| | - Youfang Cao
- Theoretical Biology and Biophysics (T-6) and Center for Nonlinear Studies (CNLS), Los Alamos National Laboratory, Los Alamos, NM, 87545, USA
| | - Gamze Gürsoy
- Program in Bioinformatics, Department of Bioengineering, University of Illinois at Chicago, IL, 60612, USA
| | - Hammad Naveed
- Toyota Technological Institute at Chicago, 6045 S. Kenwood Ave. Chicago, Illinois 60637, USA
| | - Anna Terebus
- Program in Bioinformatics, Department of Bioengineering, University of Illinois at Chicago, IL, 60612, USA
| | - Jieling Zhao
- Program in Bioinformatics, Department of Bioengineering, University of Illinois at Chicago, IL, 60612, USA
| |
Collapse
|
5
|
Tang K, Zhang J, Liang J. Fast protein loop sampling and structure prediction using distance-guided sequential chain-growth Monte Carlo method. PLoS Comput Biol 2014; 10:e1003539. [PMID: 24763317 PMCID: PMC3998890 DOI: 10.1371/journal.pcbi.1003539] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 02/01/2014] [Indexed: 11/18/2022] Open
Abstract
Loops in proteins are flexible regions connecting regular secondary structures. They are often involved in protein functions through interacting with other molecules. The irregularity and flexibility of loops make their structures difficult to determine experimentally and challenging to model computationally. Conformation sampling and energy evaluation are the two key components in loop modeling. We have developed a new method for loop conformation sampling and prediction based on a chain growth sequential Monte Carlo sampling strategy, called Distance-guided Sequential chain-Growth Monte Carlo (DISGRO). With an energy function designed specifically for loops, our method can efficiently generate high quality loop conformations with low energy that are enriched with near-native loop structures. The average minimum global backbone RMSD for 1,000 conformations of 12-residue loops is 1:53 A° , with a lowest energy RMSD of 2:99 A° , and an average ensembleRMSD of 5:23 A° . A novel geometric criterion is applied to speed up calculations. The computational cost of generating 1,000 conformations for each of the x loops in a benchmark dataset is only about 10 cpu minutes for 12-residue loops, compared to ca 180 cpu minutes using the FALCm method. Test results on benchmark datasets show that DISGRO performs comparably or better than previous successful methods, while requiring far less computing time. DISGRO is especially effective in modeling longer loops (10-17 residues).
Collapse
Affiliation(s)
- Ke Tang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, Florida, United States of America
- * E-mail: (JZ); (JL)
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, United States of America
- * E-mail: (JZ); (JL)
| |
Collapse
|
6
|
Mamonov AB, Zhang X, Zuckerman DM. Rapid sampling of all-atom peptides using a library-based polymer-growth approach. J Comput Chem 2011; 32:396-405. [PMID: 20734315 PMCID: PMC3005036 DOI: 10.1002/jcc.21626] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2010] [Revised: 05/17/2010] [Accepted: 06/12/2010] [Indexed: 12/30/2022]
Abstract
We adapted existing polymer growth strategies for equilibrium sampling of peptides described by modern atomistic forcefields with a simple uniform dielectric solvent. The main novel feature of our approach is the use of precalculated statistical libraries of molecular fragments. A molecule is sampled by combining fragment configurations-of single residues in this study-which are stored in the libraries. Ensembles generated from the independent libraries are reweighted to conform with the Boltzmann-factor distribution of the forcefield describing the full molecule. In this way, high-quality equilibrium sampling of small peptides (4-8 residues) typically requires less than one hour of single-processor wallclock time and can be significantly faster than Langevin simulations. Furthermore, approximate, clash-free ensembles can be generated for larger peptides (up to 32 residues in this study) in less than a minute of single-processor computing. We discuss possible applications of our growth procedure to free energy calculation, fragment assembly protein-structure prediction protocols, and to "multi-resolution" sampling.
Collapse
Affiliation(s)
- Artem B Mamonov
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA
| | | | | |
Collapse
|
7
|
Liang J, Qian H. Computational Cellular Dynamics Based on the Chemical Master Equation: A Challenge for Understanding Complexity. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 2010; 25:154-168. [PMID: 24999297 PMCID: PMC4079062 DOI: 10.1007/s11390-010-9312-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Modern molecular biology has always been a great source of inspiration for computational science. Half a century ago, the challenge from understanding macromolecular dynamics has led the way for computations to be part of the tool set to study molecular biology. Twenty-five years ago, the demand from genome science has inspired an entire generation of computer scientists with an interest in discrete mathematics to join the field that is now called bioinformatics. In this paper, we shall lay out a new mathematical theory for dynamics of biochemical reaction systems in a small volume (i.e., mesoscopic) in terms of a stochastic, discrete-state continuous-time formulation, called the chemical master equation (CME). Similar to the wavefunction in quantum mechanics, the dynamically changing probability landscape associated with the state space provides a fundamental characterization of the biochemical reaction system. The stochastic trajectories of the dynamics are best known through the simulations using the Gillespie algorithm. In contrast to the Metropolis algorithm, this Monte Carlo sampling technique does not follow a process with detailed balance. We shall show several examples how CMEs are used to model cellular biochemical systems. We shall also illustrate the computational challenges involved: multiscale phenomena, the interplay between stochasticity and nonlinearity, and how macroscopic determinism arises from mesoscopic dynamics. We point out recent advances in computing solutions to the CME, including exact solution of the steady state landscape and stochastic differential equations that offer alternatives to the Gilespie algorithm. We argue that the CME is an ideal system from which one can learn to understand "complex behavior" and complexity theory, and from which important biological insight can be gained.
Collapse
Affiliation(s)
- Jie Liang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607, U.S.A
- Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hong Qian
- Department of Applied Mathematics, University of Washington, Seattle, WA 98195, U.S.A
- Kavli Institute for Theoretical Physics China, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
8
|
Zhang J, Lin M, Chen R, Wang W, Liang J. Discrete state model and accurate estimation of loop entropy of RNA secondary structures. J Chem Phys 2008; 128:125107. [PMID: 18376982 DOI: 10.1063/1.2895050] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Conformational entropy makes important contribution to the stability and folding of RNA molecule, but it is challenging to either measure or compute conformational entropy associated with long loops. We develop optimized discrete k-state models of RNA backbone based on known RNA structures for computing entropy of loops, which are modeled as self-avoiding walks. To estimate entropy of hairpin, bulge, internal loop, and multibranch loop of long length (up to 50), we develop an efficient sampling method based on the sequential Monte Carlo principle. Our method considers excluded volume effect. It is general and can be applied to calculating entropy of loops with longer length and arbitrary complexity. For loops of short length, our results are in good agreement with a recent theoretical model and experimental measurement. For long loops, our estimated entropy of hairpin loops is in excellent agreement with the Jacobson-Stockmayer extrapolation model. However, for bulge loops and more complex secondary structures such as internal and multibranch loops, we find that the Jacobson-Stockmayer extrapolation model has large errors. Based on estimated entropy, we have developed empirical formulae for accurate calculation of entropy of long loops in different secondary structures. Our study on the effect of asymmetric size of loops suggest that loop entropy of internal loops is largely determined by the total loop length, and is only marginally affected by the asymmetric size of the two loops. Our finding suggests that the significant asymmetric effects of loop length in internal loops measured by experiments are likely to be partially enthalpic. Our method can be applied to develop improved energy parameters important for studying RNA stability and folding, and for predicting RNA secondary and tertiary structures. The discrete model and the program used to calculate loop entropy can be downloaded at http://gila.bioengr.uic.edu/resources/RNA.html.
Collapse
Affiliation(s)
- Jian Zhang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois 60607, USA
| | | | | | | | | |
Collapse
|
9
|
Lin M, Chen R, Liang J. Statistical geometry of lattice chain polymers with voids of defined shapes: sampling with strong constraints. J Chem Phys 2008; 128:084903. [PMID: 18315083 DOI: 10.1063/1.2831905] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Proteins contain many voids, which are unfilled spaces enclosed in the interior. A few of them have shapes compatible to ligands and substrates and are important for protein functions. An important general question is how the need for maintaining functional voids is influenced by, and affects other aspects of proteins structures and properties (e.g., protein folding stability, kinetic accessibility, and evolution selection pressure). In this paper, we examine in detail the effects of maintaining voids of different shapes and sizes using two-dimensional lattice models. We study the propensity for conformations to form a void of specific shape, which is related to the entropic cost of void maintenance. We also study the location that voids of a specific shape and size tend to form, and the influence of compactness on the formation of such voids. As enumeration is infeasible for long chain polymer, a key development in this work is the design of a novel sequential Monte Carlo strategy for generating large number of sample conformations under very constraining restrictions. Our method is validated by comparing results obtained from sampling and from enumeration for short polymer chains. We succeeded in accurate estimation of entropic cost of void maintenance, with and without an increasing number of restrictive conditions, such as loops forming the wall of void with fixed length, with additionally fixed starting position in the sequence. Additionally, we have identified the key structural properties of voids that are important in determining the entropic cost of void formation. We have further developed a parametric model to predict quantitatively void entropy. Our model is highly effective, and these results indicate that voids representing functional sites can be used as an improved model for studying the evolution of protein functions and how protein function relates to protein stability.
Collapse
Affiliation(s)
- Ming Lin
- Department of Information & Decision Science, University of Illinois at Chicago, 845 S. Morgan St., Chicago, Illinois 60607, USA
| | | | | |
Collapse
|
10
|
Lu HM, Liang J. A model study of protein nascent chain and cotranslational folding using hydrophobic-polar residues. Proteins 2008; 70:442-9. [PMID: 17680696 DOI: 10.1002/prot.21575] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
To study protein nascent chain folding during biosynthesis, we investigate the folding behavior of models of hydrophobic and polar (HP) chains at growing length using both two-dimensional square lattice model and an optimized three-dimensional 4-state discrete off-lattice model. After enumerating all possible sequences and conformations of HP heteropolymers up to length N = 18 and N = 15 in two and three-dimensional space, respectively, we examine changes in adopted structure, stability, and tolerance to single point mutation as the nascent chain grows. In both models, we find that stable model proteins have fewer folded nascent chains during growth, and often will only fold after reaching full length. For the few occasions where partial chains of stable proteins fold, these partial conformations on average are very similar to the corresponding parts of the final conformations at full length. Conversely, we find that sequences with fewer stable nascent chains and sequences with native-like folded nascent chains are more stable. In addition, these stable sequences in general can have many more point mutations and still fold into the same conformation as the wild type sequence. Our results suggest that stable proteins are less likely to be trapped in metastable conformations during biosynthesis, and are more resistant to point-mutations. Our results also imply that less stable proteins will require the assistance of chaperone and other factors during nascent chain folding. Taken together with other reported studies, it seems that cotranslational folding may not be a general mechanism of in vivo protein folding for small proteins, and in vitro folding studies are still relevant for understanding how proteins fold biologically.
Collapse
Affiliation(s)
- Hsiao-Mei Lu
- Department of Bioengineering, MC-063 University of Illinois at Chicago, Chicago, Illinois 60607, USA
| | | |
Collapse
|
11
|
Nanda V, Andrianarijaona A, Narayanan C. The role of protein homochirality in shaping the energy landscape of folding. Protein Sci 2007; 16:1667-75. [PMID: 17600146 PMCID: PMC2203351 DOI: 10.1110/ps.072867007] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
The homochirality, or isotacticity, of the natural amino acids facilitates the formation of regular secondary structures such as alpha-helices and beta-sheets. However, many examples exist in nature where novel polypeptide topologies use both l- and d-amino acids. In this study, we explore how stereochemistry of the polypeptide backbone influences basic properties such as compactness and the size of fold space by simulating both lattice and all-atom polypeptide chains. We formulate a rectangular lattice chain model in both two and three dimensions, where monomers are chiral, having the effect of restricting local conformation. Syndiotactic chains with alternating chirality of adjacent monomers have a very large ensemble of accessible conformations characterized predominantly by extended structures. Isotactic chains on the other hand, have far fewer possible conformations and a significant fraction of these are compact. Syndiotactic chains are often unable to access maximally compact states available to their isotactic counterparts of the same length. Similar features are observed in all-atom models of isotactic versus syndiotactic polyalanine. Our results suggest that protein isotacticity has evolved to increase the enthalpy of chain collapse by facilitating compact helical states and to reduce the entropic cost of folding by restricting the size of the unfolded ensemble of competing states.
Collapse
Affiliation(s)
- Vikas Nanda
- Center for Advanced Biotechnology and Medicine, Department of Biochemistry, Robert Wood Johnson Medical School, University of Medicine and Dentistry of New Jersey, Piscataway, New Jersey 08854, USA.
| | | | | |
Collapse
|
12
|
Zhang J, Liu JS. On side-chain conformational entropy of proteins. PLoS Comput Biol 2006; 2:e168. [PMID: 17154716 PMCID: PMC1676032 DOI: 10.1371/journal.pcbi.0020168] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2006] [Accepted: 10/26/2006] [Indexed: 11/19/2022] Open
Abstract
The role of side-chain entropy (SCE) in protein folding has long been speculated about but is still not fully understood. Utilizing a newly developed Monte Carlo method, we conducted a systematic investigation of how the SCE relates to the size of the protein and how it differs among a protein's X-ray, NMR, and decoy structures. We estimated the SCE for a set of 675 nonhomologous proteins, and observed that there is a significant SCE for both exposed and buried residues for all these proteins-the contribution of buried residues approaches approximately 40% of the overall SCE. Furthermore, the SCE can be quite different for structures with similar compactness or even similar conformations. As a striking example, we found that proteins' X-ray structures appear to pack more "cleverly" than their NMR or decoy counterparts in the sense of retaining higher SCE while achieving comparable compactness, which suggests that the SCE plays an important role in favouring native protein structures. By including a SCE term in a simple free energy function, we can significantly improve the discrimination of native protein structures from decoys.
Collapse
Affiliation(s)
- Jinfeng Zhang
- Department of Statistics, Harvard University, Cambridge, Massachusetts, United States of America
| | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
13
|
Zhang J, Lin M, Chen R, Liang J, Liu JS. Monte Carlo sampling of near-native structures of proteins with applications. Proteins 2006; 66:61-8. [PMID: 17039507 DOI: 10.1002/prot.21203] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Since a protein's dynamic fluctuation inside cells affects the protein's biological properties, we present a novel method to study the ensemble of near-native structures (NNS) of proteins, namely, the conformations that are very similar to the experimentally determined native structure. We show that this method enables us to (i) quantify the difficulty of predicting a protein's structure, (ii) choose appropriate simplified representations of protein structures, and (iii) assess the effectiveness of knowledge-based potential functions. We found that well-designed simple representations of protein structures are likely as accurate as those more complex ones for certain potential functions. We also found that the widely used contact potential functions stabilize NNS poorly, whereas potential functions incorporating local structure information significantly increase the stability of NNS.
Collapse
Affiliation(s)
- Jinfeng Zhang
- Department of Statistics, Harvard University, Cambridge, Massachusetts, USA
| | | | | | | | | |
Collapse
|
14
|
Zhang J, Chen R, Liang J. Empirical potential function for simplified protein models: combining contact and local sequence-structure descriptors. Proteins 2006; 63:949-60. [PMID: 16477624 DOI: 10.1002/prot.20809] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
An effective potential function is critical for protein structure prediction and folding simulation. Simplified protein models such as those requiring only Calpha or backbone atoms are attractive because they enable efficient search of the conformational space. We show residue-specific reduced discrete-state models can represent the backbone conformations of proteins with small RMSD values. However, no potential functions exist that are designed for such simplified protein models. In this study, we develop optimal potential functions by combining contact interaction descriptors and local sequence-structure descriptors. The form of the potential function is a weighted linear sum of all descriptors, and the optimal weight coefficients are obtained through optimization using both native and decoy structures. The performance of the potential function in a test of discriminating native protein structures from decoys is evaluated using several benchmark decoy sets. Our potential function requiring only backbone atoms or Calpha atoms have comparable or better performance than several residue-based potential functions that require additional coordinates of side-chain centers or coordinates of all side-chain atoms. By reducing the residue alphabets down to size 10 for contact descriptors, the performance of the potential function can be further improved. Our results also suggest that local sequence-structure correlation may play important role in reducing the entropic cost of protein folding.
Collapse
Affiliation(s)
- Jinfeng Zhang
- Department of Bioengineering, University of Illinois, Chicago, Illinois, USA
| | | | | |
Collapse
|