1
|
Wang J, Wang W, Shang Y. Protein Loop Modeling Using AlphaFold2. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3306-3313. [PMID: 37037235 DOI: 10.1109/tcbb.2023.3264899] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The functions of proteins are largely determined by their three-dimensional (3D) structures. Loop modeling tries to predict the conformation of a relatively short stretch of protein backbone and sidechain. It is a difficult problem due to conformational variability. Recently, AlphaFold2 has achieved outstanding results in 3-D protein structure prediction and is expected to perform well on loop modeling. In this paper, we investigate the performances of AlphaFold2 variants on popular loop modeling benchmark datasets and propose an efficient protocol of using AlphaFold2 for loop modeling, called IAFLoop. To predict the structure of a loop region, IAFLoop gives a moderately extended segment of the target loop region as input to AlphaFold2, runs a fast version of AlphaFold2 using a reduced database without ensembling, and uses RMSD based consensus scores to select the final output models. Our experimental results on benchmark datasets show that IAFLoop generated highly accurate loop models. It achieves comparable performance to the original application of AlphaFold2 in terms of RMSD error, and achieving much better results on some targets, while only using half of the time. Compared to the best previous methods, IAFLoop reduces the RMSD error by almost half on the 8-residual loop dataset, and more than 70% on the 12-residual loop dataset.
Collapse
|
2
|
KAIRI AMIT, SAHU TANMAYAKUMAR, RAO ATMAKURIRAMAKRISHNA. An information system on genomic elements and predicted protein structures of buffalo (Bubalus bubalis). THE INDIAN JOURNAL OF ANIMAL SCIENCES 2021. [DOI: 10.56093/ijans.v90i11.111494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Among the livestock species, buffalo remained as an integral part of the Indian rural economy. With the advent of genome sequencing technologies, it became possible to sequence the whole genome of Murrah buffalo. Also, significant amount of information on different genomic elements of buffalo is available at National Centre for Biotechnology Information (NCBI). However, the positions of these elements on the genome are not fully known. In addition, the 3D structures of buffalo proteins are not available and also there exist no browser to visualize important genic elements on buffalo genome. Hence, a study was taken up to develop a web-based information system having information on genomic elements, protein 3-D structures and genome browser. Initially, information on nucleotide and protein sequences were retrieved from NCBI and parsed suitably. Later, the protein structures were predicted, validated, refined and stabilized in silico. An Information System on Buffalo Genome (ISBG) with 3-tier architecture was developed containing the sequence and structural information. ISBG contains complete coding sequences (CDS), Mitochondrial DNAs, 1k upstream regions and Untranslated Regions (UTRs) of buffalo genome. The buffalo genes were also mapped onto the genome. The results revealed that maximum number of genes were found distributed on chromosome 4 followed by chromosome 18, which can also be visualized from the developed genome browser. ISBG can be accessed at http://cabgrid.res.in:8080/bgis. The proposed information system helps animal breeders and biotechnologist in animal improvement.
Collapse
|
3
|
Studer G, Tauriello G, Bienert S, Biasini M, Johner N, Schwede T. ProMod3-A versatile homology modelling toolbox. PLoS Comput Biol 2021; 17:e1008667. [PMID: 33507980 PMCID: PMC7872268 DOI: 10.1371/journal.pcbi.1008667] [Citation(s) in RCA: 142] [Impact Index Per Article: 47.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 02/09/2021] [Accepted: 01/03/2021] [Indexed: 11/18/2022] Open
Abstract
Computational methods for protein structure modelling are routinely used to complement experimental structure determination, thus they help to address a broad spectrum of scientific questions in biomedical research. The most accurate methods today are based on homology modelling, i.e. detecting a homologue to the desired target sequence that can be used as a template for modelling. Here we present a versatile open source homology modelling toolbox as foundation for flexible and computationally efficient modelling workflows. ProMod3 is a fully scriptable software platform that can perform all steps required to generate a protein model by homology. Its modular design aims at fast prototyping of novel algorithms and implementing flexible modelling pipelines. Common modelling tasks, such as loop modelling, sidechain modelling or generating a full protein model by homology, are provided as production ready pipelines, forming the starting point for own developments and enhancements. ProMod3 is the central software component of the widely used SWISS-MODEL web-server.
Collapse
Affiliation(s)
- Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Gerardo Tauriello
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Stefan Bienert
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Marco Biasini
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Niklaus Johner
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
4
|
Nguyen SP, Li Z, Xu D, Shang Y. New Deep Learning Methods for Protein Loop Modeling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:596-606. [PMID: 29990046 PMCID: PMC6580050 DOI: 10.1109/tcbb.2017.2784434] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Computational protein structure prediction is a long-standing challenge in bioinformatics. In the process of predicting protein 3D structures, it is common that parts of an experimental structure are missing or parts of a predicted structure need to be remodeled. The process of predicting local protein structures of particular regions is called loop modeling. In this paper, five new loop modeling methods based on machine learning techniques, called NearLooper, ConLooper, ResLooper, HyLooper1, and HyLooper2 are proposed. NearLooper is based on the nearest neighbor technique. ConLooper applies deep convolutional neural networks to predict ${\mathrm{C}}_{{{\alpha }}}$Cα atoms distance matrix as an orientation-independent representation of protein structure. ResLooper uses residual neural networks instead of deep convolutional neural networks. HyLooper1 combines the results of NearLooper and ConLooper while HyLooper2 combines NearLooper and ResLooper. Three commonly used benchmarks for loop modeling are used to compare the performance between these methods and existing state-of-the-art methods. The experiment results show promising performance in which our best method improves existing state-of-the-art methods by 28 and 54 percent of average RMSD on two datasets while being comparable on the other one.
Collapse
|
5
|
Kundert K, Kortemme T. Computational design of structured loops for new protein functions. Biol Chem 2019; 400:275-288. [PMID: 30676995 PMCID: PMC6530579 DOI: 10.1515/hsz-2018-0348] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2018] [Accepted: 12/18/2018] [Indexed: 12/20/2022]
Abstract
The ability to engineer the precise geometries, fine-tuned energetics and subtle dynamics that are characteristic of functional proteins is a major unsolved challenge in the field of computational protein design. In natural proteins, functional sites exhibiting these properties often feature structured loops. However, unlike the elements of secondary structures that comprise idealized protein folds, structured loops have been difficult to design computationally. Addressing this shortcoming in a general way is a necessary first step towards the routine design of protein function. In this perspective, we will describe the progress that has been made on this problem and discuss how recent advances in the field of loop structure prediction can be harnessed and applied to the inverse problem of computational loop design.
Collapse
Affiliation(s)
- Kale Kundert
- Graduate Group in Biophysics, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
| | - Tanja Kortemme
- Graduate Group in Biophysics, University of California San Francisco, San Francisco, CA 94158, USA
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA 94158, USA
- Chan Zuckerberg Biohub, 499 Illinois St, San Francisco, CA 94158, USA
| |
Collapse
|
6
|
Hooper WF, Walcott BD, Wang X, Bystroff C. Fast design of arbitrary length loops in proteins using InteractiveRosetta. BMC Bioinformatics 2018; 19:337. [PMID: 30249181 PMCID: PMC6154894 DOI: 10.1186/s12859-018-2345-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 08/29/2018] [Indexed: 11/10/2022] Open
Abstract
Background With increasing interest in ab initio protein design, there is a desire to be able to fully explore the design space of insertions and deletions. Nature inserts and deletes residues to optimize energy and function, but allowing variable length indels in the context of an interactive protein design session presents challenges with regard to speed and accuracy. Results Here we present a new module (INDEL) for InteractiveRosetta which allows the user to specify a range of lengths for a desired indel, and which returns a set of low energy backbones in a matter of seconds. To make the loop search fast, loop anchor points are geometrically hashed using C α-C α and C β-C β distances, and the hash is mapped to start and end points in a pre-compiled random access file of non-redundant, protein backbone coordinates. Loops with superposable anchors are filtered for collisions and returned to InteractiveRosetta as poly-alanine for display and selective incorporation into the design template. Sidechains can then be added using RosettaDesign tools. Conclusions INDEL was able to find viable loops in 100% of 500 attempts for all lengths from 3 to 20 residues. INDEL has been applied to the task of designing a domain-swapping loop for T7-endonuclease I, changing its specificity from Holliday junctions to paranemic crossover (PX) DNA. Electronic supplementary material The online version of this article (10.1186/s12859-018-2345-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- William F Hooper
- Emmes Corporation, Rockville, Washington, MD, USA.,Department of Biology, Rensselaer Polytechnic Institute, Troy, NY, USA
| | | | - Xing Wang
- Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Christopher Bystroff
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY, USA. .,Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA.
| |
Collapse
|
7
|
Wood CW, Heal JW, Thomson AR, Bartlett GJ, Ibarra AÁ, Brady RL, Sessions RB, Woolfson DN. ISAMBARD: an open-source computational environment for biomolecular analysis, modelling and design. Bioinformatics 2018; 33:3043-3050. [PMID: 28582565 PMCID: PMC5870769 DOI: 10.1093/bioinformatics/btx352] [Citation(s) in RCA: 36] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2017] [Accepted: 05/31/2017] [Indexed: 12/03/2022] Open
Abstract
Motivation The rational design of biomolecules is becoming a reality. However, further computational tools are needed to facilitate and accelerate this, and to make it accessible to more users. Results Here we introduce ISAMBARD, a tool for structural analysis, model building and rational design of biomolecules. ISAMBARD is open-source, modular, computationally scalable and intuitive to use. These features allow non-experts to explore biomolecular design in silico. ISAMBARD addresses a standing issue in protein design, namely, how to introduce backbone variability in a controlled manner. This is achieved through the generalization of tools for parametric modelling, describing the overall shape of proteins geometrically, and without input from experimentally determined structures. This will allow backbone conformations for entire folds and assemblies not observed in nature to be generated de novo, that is, to access the ‘dark matter of protein-fold space’. We anticipate that ISAMBARD will find broad applications in biomolecular design, biotechnology and synthetic biology. Availability and implementation A current stable build can be downloaded from the python package index (https://pypi.python.org/pypi/isambard/) with development builds available on GitHub (https://github.com/woolfson-group/) along with documentation, tutorial material and all the scripts used to generate the data described in this paper. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher W Wood
- School of Chemistry, University of Bristol, Bristol BS8 1TS, UK.,School of Biochemistry, University of Bristol, Bristol BS8 1TD, UK
| | - Jack W Heal
- School of Chemistry, University of Bristol, Bristol BS8?1TS, UK
| | - Andrew R Thomson
- School of Chemistry, University of Bristol, Bristol BS8 1TS, UK.,School of Chemistry, University of Glasgow, Glasgow G12 8QQ, UK
| | - Gail J Bartlett
- School of Chemistry, University of Bristol, Bristol BS8?1TS, UK
| | - Amaurys Á Ibarra
- School of Biochemistry, University of Bristol, Bristol BS8?1TD, UK
| | - R Leo Brady
- School of Biochemistry, University of Bristol, Bristol BS8?1TD, UK
| | - Richard B Sessions
- School of Biochemistry, University of Bristol, Bristol BS8 1TD, UK.,BrisSynBio, University of Bristol, Bristol BS8 1TQ, UK
| | - Derek N Woolfson
- School of Chemistry, University of Bristol, Bristol BS8 1TS, UK.,School of Biochemistry, University of Bristol, Bristol BS8 1TD, UK.,BrisSynBio, University of Bristol, Bristol BS8 1TQ, UK
| |
Collapse
|
8
|
Bansal N, Zheng Z, Song LF, Pei J, Merz KM. The Role of the Active Site Flap in Streptavidin/Biotin Complex Formation. J Am Chem Soc 2018; 140:5434-5446. [PMID: 29607642 DOI: 10.1021/jacs.8b00743] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Obtaining a detailed description of how active site flap motion affects substrate or ligand binding will advance structure-based drug design (SBDD) efforts on systems including the kinases, HSP90, HIV protease, ureases, etc. Through this understanding, we will be able to design better inhibitors and better proteins that have desired functions. Herein we address this issue by generating the relevant configurational states of a protein flap on the molecular energy landscape using an approach we call MTFlex-b and then following this with a procedure to estimate the free energy associated with the motion of the flap region. To illustrate our overall workflow, we explored the free energy changes in the streptavidin/biotin system upon introducing conformational flexibility in loop3-4 in the biotin unbound ( apo) and bound ( holo) state. The free energy surfaces were created using the Movable Type free energy method, and for further validation, we compared them to potential of mean force (PMF) generated free energy surfaces using MD simulations employing the FF99SBILDN and FF14SB force fields. We also estimated the free energy thermodynamic cycle using an ensemble of closed-like and open-like end states for the ligand unbound and bound states and estimated the binding free energy to be approximately -16.2 kcal/mol (experimental -18.3 kcal/mol). The good agreement between MTFlex-b in combination with the MT method with experiment and MD simulations supports the effectiveness of our strategy in obtaining unique insights into the motions in proteins that can then be used in a range of biological and biomedical applications.
Collapse
Affiliation(s)
- Nupur Bansal
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Zheng Zheng
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Lin Frank Song
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Jun Pei
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Kenneth M Merz
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States.,Institute for Cyber Enabled Research , Michigan State University , 567 Wilson Road , East Lansing , Michigan 48824 , United States
| |
Collapse
|
9
|
Marks C, Nowak J, Klostermann S, Georges G, Dunbar J, Shi J, Kelm S, Deane CM. Sphinx: merging knowledge-based and ab initio approaches to improve protein loop prediction. Bioinformatics 2018; 33:1346-1353. [PMID: 28453681 PMCID: PMC5408792 DOI: 10.1093/bioinformatics/btw823] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 01/09/2017] [Indexed: 01/31/2023] Open
Abstract
Motivation Loops are often vital for protein function, however, their irregular structures make them difficult to model accurately. Current loop modelling algorithms can mostly be divided into two categories: knowledge-based, where databases of fragments are searched to find suitable conformations and ab initio, where conformations are generated computationally. Existing knowledge-based methods only use fragments that are the same length as the target, even though loops of slightly different lengths may adopt similar conformations. Here, we present a novel method, Sphinx, which combines ab initio techniques with the potential extra structural information contained within loops of a different length to improve structure prediction. Results We show that Sphinx is able to generate high-accuracy predictions and decoy sets enriched with near-native loop conformations, performing better than the ab initio algorithm on which it is based. In addition, it is able to provide predictions for every target, unlike some knowledge-based methods. Sphinx can be used successfully for the difficult problem of antibody H3 prediction, outperforming RosettaAntibody, one of the leading H3-specific ab initio methods, both in accuracy and speed. Availability and Implementation Sphinx is available at http://opig.stats.ox.ac.uk/webapps/sphinx. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Claire Marks
- Department of Statistics, University of Oxford, Oxford, UK
| | - Jaroslaw Nowak
- Department of Statistics, University of Oxford, Oxford, UK
| | | | - Guy Georges
- Pharma Research and Early Development, Large Molecule Research, Roche Innovation Center Munich, Penzberg, DE, Germany
| | - James Dunbar
- Pharma Research and Early Development, Large Molecule Research, Roche Innovation Center Munich, Penzberg, DE, Germany
| | - Jiye Shi
- Department of Informatics, UCB Pharma, Slough, UK
| | | | | |
Collapse
|
10
|
Wong SWK, Liu JS, Kou SC. Fast de novo discovery of low-energy protein loop conformations. Proteins 2017; 85:1402-1412. [PMID: 28378911 DOI: 10.1002/prot.25300] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2017] [Revised: 03/19/2017] [Accepted: 03/27/2017] [Indexed: 12/25/2022]
Abstract
In the prediction of protein structure from amino acid sequence, loops are challenging regions for computational methods. Since loops are often located on the protein surface, they can have significant roles in determining protein functions and binding properties. Loop prediction without the aid of a structural template requires extensive conformational sampling and energy minimization, which are computationally difficult. In this article we present a new de novo loop sampling method, the Parallely filtered Energy Targeted All-atom Loop Sampler (PETALS) to rapidly locate low energy conformations. PETALS explores both backbone and side-chain positions of the loop region simultaneously according to the energy function selected by the user, and constructs a nonredundant ensemble of low energy loop conformations using filtering criteria. The method is illustrated with the DFIRE potential and DiSGro energy function for loops, and shown to be highly effective at discovering conformations with near-native (or better) energy. Using the same energy function as the DiSGro algorithm, PETALS samples conformations with both lower RMSDs and lower energies. PETALS is also useful for assessing the accuracy of different energy functions. PETALS runs rapidly, requiring an average time cost of 10 minutes for a length 12 loop on a single 3.2 GHz processor core, comparable to the fastest existing de novo methods for generating an ensemble of conformations. Proteins 2017; 85:1402-1412. © 2017 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Samuel W K Wong
- Department of Statistics, University of Florida, Gainesville, Florida, 32611
| | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, Massachusetts, 02138
| | - S C Kou
- Department of Statistics, Harvard University, Cambridge, Massachusetts, 02138
| |
Collapse
|
11
|
Heo S, Lee J, Joo K, Shin HC, Lee J. Protein Loop Structure Prediction Using Conformational Space Annealing. J Chem Inf Model 2017; 57:1068-1078. [DOI: 10.1021/acs.jcim.6b00742] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Seungryong Heo
- School
of Systems Biomedical Science, Soongsil University, Seoul 06978, Korea
| | - Juyong Lee
- Laboratory
of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, Maryland 20892, United States
| | | | - Hang-Cheol Shin
- School
of Systems Biomedical Science, Soongsil University, Seoul 06978, Korea
| | | |
Collapse
|
12
|
Tang K, Zhang J, Liang J. Distance-Guided Forward and Backward Chain-Growth Monte Carlo Method for Conformational Sampling and Structural Prediction of Antibody CDR-H3 Loops. J Chem Theory Comput 2016; 13:380-388. [PMID: 27996262 DOI: 10.1021/acs.jctc.6b00845] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Antibodies recognize antigens through the complementary determining regions (CDR) formed by six-loop hypervariable regions crucial for the diversity of antigen specificities. Among the six CDR loops, the H3 loop is the most challenging to predict because of its much higher variation in sequence length and identity, resulting in much larger and complex structural space, compared to the other five loops. We developed a novel method based on a chain-growth sequential Monte Carlo method, called distance-guided sequential chain-growth Monte Carlo for H3 loops (DiSGro-H3). The new method samples protein chains in both forward and backward directions. It can efficiently generate low energy, near-native H3 loop structures using the conformation types predicted from the sequences of H3 loops. DiSGro-H3 performs significantly better than another ab initio method, RosettaAntibody, in both sampling and prediction, while taking less computational time. It performs comparably to template-based methods. As an ab initio method, DiSGro-H3 offers satisfactory accuracy while being able to predict any H3 loops without templates.
Collapse
Affiliation(s)
- Ke Tang
- Department of Bioengineering, University of Illinois at Chicago , Chicago, Illinois 60607, United States
| | - Jinfeng Zhang
- Department of Statistics, Florida State University , Tallahassee, Florida 32306, United States
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago , Chicago, Illinois 60607, United States
| |
Collapse
|
13
|
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
14
|
Webb B, Sali A. Comparative Protein Structure Modeling Using MODELLER. CURRENT PROTOCOLS IN BIOINFORMATICS 2016; 54:5.6.1-5.6.37. [PMID: 27322406 PMCID: PMC5031415 DOI: 10.1002/cpbi.3] [Citation(s) in RCA: 1970] [Impact Index Per Article: 246.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
15
|
Messih MA, Lepore R, Tramontano A. LoopIng: a template-based tool for predicting the structure of protein loops. Bioinformatics 2015; 31:3767-72. [PMID: 26249814 PMCID: PMC4653384 DOI: 10.1093/bioinformatics/btv438] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 07/21/2015] [Indexed: 12/31/2022] Open
Abstract
Motivation: Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in modeling their structure. However, loops are often involved in protein function, hence inferring their structure is important for predicting protein structure as well as function. Results: We describe a method, LoopIng, based on the Random Forest automated learning technique, which, given a target loop, selects a structural template for it from a database of loop candidates. Compared to the most recently available methods, LoopIng is able to achieve similar accuracy for short loops (4–10 residues) and significant enhancements for long loops (11–20 residues). The quality of the predictions is robust to errors that unavoidably affect the stem regions when these are modeled. The method returns a confidence score for the predicted template loops and has the advantage of being very fast (on average: 1 min/loop). Availability and implementation:www.biocomputing.it/looping Contact:anna.tramontano@uniroma1.it Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Rosalba Lepore
- Department of Physics, Sapienza University, 00185 Rome, Italy and
| | - Anna Tramontano
- Department of Physics, Sapienza University, 00185 Rome, Italy and Istituto Pasteur-Fondazione Cenci Bolognetti, Viale Regina Elena 291, 00161 Rome, Italy
| |
Collapse
|
16
|
The proline-rich antimicrobial peptide Onc112 inhibits translation by blocking and destabilizing the initiation complex. Nat Struct Mol Biol 2015; 22:470-5. [PMID: 25984971 DOI: 10.1038/nsmb.3034] [Citation(s) in RCA: 139] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2015] [Accepted: 04/22/2015] [Indexed: 01/05/2023]
Abstract
The increasing prevalence of multidrug-resistant pathogenic bacteria is making current antibiotics obsolete. Proline-rich antimicrobial peptides (PrAMPs) display potent activity against Gram-negative bacteria and thus represent an avenue for antibiotic development. PrAMPs from the oncocin family interact with the ribosome to inhibit translation, but their mode of action has remained unclear. Here we have determined a structure of the Onc112 peptide in complex with the Thermus thermophilus 70S ribosome at a resolution of 3.1 Å by X-ray crystallography. The Onc112 peptide binds within the ribosomal exit tunnel and extends toward the peptidyl transferase center, where it overlaps with the binding site for an aminoacyl-tRNA. We show biochemically that the binding of Onc112 blocks and destabilizes the initiation complex, thus preventing entry into the elongation phase. Our findings provide a basis for the future development of this class of potent antimicrobial agents.
Collapse
|
17
|
Stockert JA, Devi LA. Advancements in therapeutically targeting orphan GPCRs. Front Pharmacol 2015; 6:100. [PMID: 26005419 PMCID: PMC4424851 DOI: 10.3389/fphar.2015.00100] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2015] [Accepted: 04/21/2015] [Indexed: 11/23/2022] Open
Abstract
G-protein coupled receptors (GPCRs) are popular biological targets for drug discovery and development. To date there are more than 140 orphan GPCRs, i.e., receptors whose endogenous ligands are unknown. Traditionally orphan GPCRs have been difficult to study and the development of therapeutic compounds targeting these receptors has been extremely slow although these GPCRs are considered important targets based on their distribution and behavioral phenotype as revealed by animals lacking the receptor. Recent advances in several methods used to study orphan receptors, including protein crystallography and homology modeling are likely to be useful in the identification of therapeutics targeting these receptors. In the past 13 years, over a dozen different Class A GPCRs have been crystallized; this trend is exciting, since homology modeling of GPCRs has previously been limited by the availability of solved structures. As the number of solved GPCR structures continues to grow so does the number of templates that can be used to generate increasingly accurate models of phylogenetically related orphan GPCRs. The availability of solved structures along with the advances in using multiple templates to build models (in combination with molecular dynamics simulations that reveal structural information not provided by crystallographic data and methods for modeling hard-to-predict flexible loop regions) have improved the quality of GPCR homology models. This, in turn, has improved the success rates of virtual ligand screens that use homology models to identify potential receptor binding compounds. Experimental testing of the predicted hits and validation using traditional GPCR pharmacological approaches can be used to drive ligand-based efforts to probe orphan receptor biology as well as to define the chemotypes and chemical scaffolds important for binding. As a result of these advances, orphan GPCRs are emerging from relative obscurity as a new class of drug targets.
Collapse
Affiliation(s)
- Jennifer A Stockert
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - Lakshmi A Devi
- Department of Pharmacology and Systems Therapeutics, Icahn School of Medicine at Mount Sinai, New York, NY USA
| |
Collapse
|
18
|
Park H, Lee GR, Heo L, Seok C. Protein loop modeling using a new hybrid energy function and its application to modeling in inaccurate structural environments. PLoS One 2014; 9:e113811. [PMID: 25419655 PMCID: PMC4242723 DOI: 10.1371/journal.pone.0113811] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2014] [Accepted: 10/30/2014] [Indexed: 11/19/2022] Open
Abstract
Protein loop modeling is a tool for predicting protein local structures of particular interest, providing opportunities for applications involving protein structure prediction and de novo protein design. Until recently, the majority of loop modeling methods have been developed and tested by reconstructing loops in frameworks of experimentally resolved structures. In many practical applications, however, the protein loops to be modeled are located in inaccurate structural environments. These include loops in model structures, low-resolution experimental structures, or experimental structures of different functional forms. Accordingly, discrepancies in the accuracy of the structural environment assumed in development of the method and that in practical applications present additional challenges to modern loop modeling methods. This study demonstrates a new strategy for employing a hybrid energy function combining physics-based and knowledge-based components to help tackle this challenge. The hybrid energy function is designed to combine the strengths of each energy component, simultaneously maintaining accurate loop structure prediction in a high-resolution framework structure and tolerating minor environmental errors in low-resolution structures. A loop modeling method based on global optimization of this new energy function is tested on loop targets situated in different levels of environmental errors, ranging from experimental structures to structures perturbed in backbone as well as side chains and template-based model structures. The new method performs comparably to force field-based approaches in loop reconstruction in crystal structures and better in loop prediction in inaccurate framework structures. This result suggests that higher-accuracy predictions would be possible for a broader range of applications. The web server for this method is available at http://galaxy.seoklab.org/loop with the PS2 option for the scoring function.
Collapse
Affiliation(s)
- Hahnbeom Park
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Gyu Rie Lee
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Lim Heo
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, Republic of Korea
- * E-mail:
| |
Collapse
|
19
|
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | | |
Collapse
|
20
|
Tang K, Zhang J, Liang J. Fast protein loop sampling and structure prediction using distance-guided sequential chain-growth Monte Carlo method. PLoS Comput Biol 2014; 10:e1003539. [PMID: 24763317 PMCID: PMC3998890 DOI: 10.1371/journal.pcbi.1003539] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 02/01/2014] [Indexed: 11/18/2022] Open
Abstract
Loops in proteins are flexible regions connecting regular secondary structures. They are often involved in protein functions through interacting with other molecules. The irregularity and flexibility of loops make their structures difficult to determine experimentally and challenging to model computationally. Conformation sampling and energy evaluation are the two key components in loop modeling. We have developed a new method for loop conformation sampling and prediction based on a chain growth sequential Monte Carlo sampling strategy, called Distance-guided Sequential chain-Growth Monte Carlo (DISGRO). With an energy function designed specifically for loops, our method can efficiently generate high quality loop conformations with low energy that are enriched with near-native loop structures. The average minimum global backbone RMSD for 1,000 conformations of 12-residue loops is 1:53 A° , with a lowest energy RMSD of 2:99 A° , and an average ensembleRMSD of 5:23 A° . A novel geometric criterion is applied to speed up calculations. The computational cost of generating 1,000 conformations for each of the x loops in a benchmark dataset is only about 10 cpu minutes for 12-residue loops, compared to ca 180 cpu minutes using the FALCm method. Test results on benchmark datasets show that DISGRO performs comparably or better than previous successful methods, while requiring far less computing time. DISGRO is especially effective in modeling longer loops (10-17 residues).
Collapse
Affiliation(s)
- Ke Tang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, Florida, United States of America
- * E-mail: (JZ); (JL)
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, United States of America
- * E-mail: (JZ); (JL)
| |
Collapse
|
21
|
Liang S, Zhang C, Zhou Y. LEAP: highly accurate prediction of protein loop conformations by integrating coarse-grained sampling and optimized energy scores with all-atom refinement of backbone and side chains. J Comput Chem 2014; 35:335-41. [PMID: 24327406 PMCID: PMC4125323 DOI: 10.1002/jcc.23509] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Revised: 10/06/2013] [Accepted: 11/24/2013] [Indexed: 11/11/2022]
Abstract
Prediction of protein loop conformations without any prior knowledge (ab initio prediction) is an unsolved problem. Its solution will significantly impact protein homology and template-based modeling as well as ab initio protein-structure prediction. Here, we developed a coarse-grained, optimized scoring function for initial sampling and ranking of loop decoys. The resulting decoys are then further optimized in backbone and side-chain conformations and ranked by all-atom energy scoring functions. The final integrated technique called loop prediction by energy-assisted protocol achieved a median value of 2.1 Å root mean square deviation (RMSD) for 325 12-residue test loops and 2.0 Å RMSD for 45 12-residue loops from critical assessment of structure-prediction techniques (CASP) 10 target proteins with native core structures (backbone and side chains). If all side-chain conformations in protein cores were predicted in the absence of the target loop, loop-prediction accuracy only reduces slightly (0.2 Å difference in RMSD for 12-residue loops in the CASP target proteins). The accuracy obtained is about 1 Å RMSD or more improvement over other methods we tested. The executable file for a Linux system is freely available for academic users at http://sparks-lab.org.
Collapse
Affiliation(s)
- Shide Liang
- Systems Immunology Lab, Immunology Frontier Research Center, Osaka University, Suita, Osaka, 565-0871, Japan
| | - Chi Zhang
- School of Biological Sciences, Center for Plant Science and Innovation, University of Nebraska, Lincoln, NE, 68588, USA
| | - Yaoqi Zhou
- School of Informatics, Indiana University Purdue University at Indianapolis, Indianapolis, IN 46202, Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Institute for Glycomics and School of Informatics and Communication Technology, Griffith University, Parklands Drive, Southport Qld 4222, Australia
| |
Collapse
|
22
|
Holtby D, Li SC, Li M. LoopWeaver: loop modeling by the weighted scaling of verified proteins. J Comput Biol 2014; 20:212-23. [PMID: 23461572 DOI: 10.1089/cmb.2012.0078] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Modeling loops is a necessary step in protein structure determination, even with experimental nuclear magnetic resonance (NMR) data, it is widely known to be difficult. Database techniques have the advantage of producing a higher proportion of predictions with subangstrom accuracy when compared with ab initio techniques, but the disadvantage of also producing a higher proportion of clashing or highly inaccurate predictions. We introduce LoopWeaver, a database method that uses multidimensional scaling to achieve better, clash-free placement of loops obtained from a database of protein structures. This allows us to maintain the above-mentioned advantage while avoiding the disadvantage. Test results show that we achieve significantly better results than all other methods, including Modeler, Loopy, SuperLooper, and Rapper, before refinement. With refinement, our results (LoopWeaver and Loopy consensus) are better than ROSETTA, with 0.42 Å RMSD on average for 206 length 6 loops, 0.64 Å local RMSD for 168 length 7 loops, 0.81Å RMSD for 117 length 8 loops, and 0.98 Å RMSD for length 9 loops, while ROSETTA has 0.55, 0.79, 1.16, 1.42, respectively, at the same average time limit (3 hours). When we allow ROSETTA to run for over a week, it approaches, but does not surpass, our accuracy.
Collapse
Affiliation(s)
- Daniel Holtby
- David R. Chariton School of Computer Science, University of Waterloo, Waterloo, Canada.
| | | | | |
Collapse
|
23
|
Abstract
Structural proteomics aims to understand the structural basis of protein interactions and functions. A prerequisite for this is the availability of 3D protein structures that mediate the biochemical interactions. The explosion in the number of available gene sequences set the stage for the next step in genome-scale projects -- to obtain 3D structures for each protein. To achieve this ambitious goal, the slow and costly structure determination experiments are supplemented with theoretical approaches. The current state and recent advances in structure modeling approaches are reviewed here, with special emphasis on comparative protein structure modeling techniques.
Collapse
Affiliation(s)
- András Fiser
- Department of Biochemistry, Seaver Foundation Center for Bioinformatics, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA.
| |
Collapse
|
24
|
Webb B, Eswar N, Fan H, Khuri N, Pieper U, Dong G, Sali A. Comparative Modeling of Drug Target Proteins☆. REFERENCE MODULE IN CHEMISTRY, MOLECULAR SCIENCES AND CHEMICAL ENGINEERING 2014. [PMCID: PMC7157477 DOI: 10.1016/b978-0-12-409547-2.11133-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state-of-the-art by a number of specific examples.
Collapse
|
25
|
Kelm S, Vangone A, Choi Y, Ebejer JP, Shi J, Deane CM. Fragment-based modeling of membrane protein loops: successes, failures, and prospects for the future. Proteins 2013; 82:175-86. [PMID: 23589399 DOI: 10.1002/prot.24299] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2012] [Revised: 02/22/2013] [Accepted: 03/26/2013] [Indexed: 11/12/2022]
Abstract
Membrane proteins (MPs) have become a major focus in structure prediction, due to their medical importance. There is, however, a lack of fast and reliable methods that specialize in the modeling of MP loops. Often methods designed for soluble proteins (SPs) are applied directly to MPs. In this article, we investigate the validity of such an approach in the realm of fragment-based methods. We also examined the differences in membrane and soluble protein loops that might affect accuracy. We test our ability to predict soluble and MP loops with the previously published method FREAD. We show that it is possible to predict accurately the structure of MP loops using a database of MP fragments (0.5-1 Å median root-mean-square deviation). The presence of homologous proteins in the database helps prediction accuracy. However, even when homologues are removed better results are still achieved using fragments of MPs (0.8-1.6 Å) rather than SPs (1-4 Å) to model MP loops. We find that many fragments of SPs have shapes similar to their MP counterparts but have very different sequences; however, they do not appear to differ in their substitution patterns. Our findings may allow further improvements to fragment-based loop modeling algorithms for MPs. The current version of our proof-of-concept loop modeling protocol produces high-accuracy loop models for MPs and is available as a web server at http://medeller.info/fread.
Collapse
Affiliation(s)
- Sebastian Kelm
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | | | | | | | | | | |
Collapse
|
26
|
MacDonald JT, Kelley LA, Freemont PS. Validating a Coarse-Grained Potential Energy Function through Protein Loop Modelling. PLoS One 2013; 8:e65770. [PMID: 23824634 PMCID: PMC3688807 DOI: 10.1371/journal.pone.0065770] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2013] [Accepted: 04/26/2013] [Indexed: 12/02/2022] Open
Abstract
Coarse-grained (CG) methods for sampling protein conformational space have the potential to increase computational efficiency by reducing the degrees of freedom. The gain in computational efficiency of CG methods often comes at the expense of non-protein like local conformational features. This could cause problems when transitioning to full atom models in a hierarchical framework. Here, a CG potential energy function was validated by applying it to the problem of loop prediction. A novel method to sample the conformational space of backbone atoms was benchmarked using a standard test set consisting of 351 distinct loops. This method used a sequence-independent CG potential energy function representing the protein using -carbon positions only and sampling conformations with a Monte Carlo simulated annealing based protocol. Backbone atoms were added using a method previously described and then gradient minimised in the Rosetta force field. Despite the CG potential energy function being sequence-independent, the method performed similarly to methods that explicitly use either fragments of known protein backbones with similar sequences or residue-specific /-maps to restrict the search space. The method was also able to predict with sub-Angstrom accuracy two out of seven loops from recently solved crystal structures of proteins with low sequence and structure similarity to previously deposited structures in the PDB. The ability to sample realistic loop conformations directly from a potential energy function enables the incorporation of additional geometric restraints and the use of more advanced sampling methods in a way that is not possible to do easily with fragment replacement methods and also enable multi-scale simulations for protein design and protein structure prediction. These restraints could be derived from experimental data or could be design restraints in the case of computational protein design. C++ source code is available for download from http://www.sbg.bio.ic.ac.uk/phyre2/PD2/.
Collapse
Affiliation(s)
- James T. MacDonald
- Division of Molecular Biosciences, Imperial College London, London, United Kingdom
- * E-mail:
| | - Lawrence A. Kelley
- Division of Molecular Biosciences, Imperial College London, London, United Kingdom
| | - Paul S. Freemont
- Division of Molecular Biosciences, Imperial College London, London, United Kingdom
| |
Collapse
|
27
|
Chys P, Chacón P. Random Coordinate Descent with Spinor-matrices and Geometric Filters for Efficient Loop Closure. J Chem Theory Comput 2013; 9:1821-9. [PMID: 26587638 DOI: 10.1021/ct300977f] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Protein loop closure constitutes a critical step in loop and protein modeling whereby geometrically feasible loops must be found between two given anchor residues. Here, a new analytic/iterative algorithm denoted random coordinate descent (RCD) to perform protein loop closure is described. The algorithm solves loop closure through minimization as in cyclic coordinate descent but selects bonds for optimization randomly, updates loop conformations by spinor-matrices, performs loop closure in both chain directions, and uses a set of geometric filters to yield efficient conformational sampling. Geometric filters allow one to detect clashes and constrain dihedral angles on the fly. The RCD algorithm is at least comparable to state of the art loop closure algorithms due to an excellent balance between efficiency and intrinsic sampling capability. Furthermore, its efficiency allows one to improve conformational sampling by increasing the sampling number without much penalty. Overall, RCD turns out to be accurate, fast, robust, and applicable over a wide range of loop lengths. Because of the versatility of RCD, it is a solid alternative for integration with current loop modeling strategies.
Collapse
Affiliation(s)
- Pieter Chys
- Structural Bioinformatics Group, Biological Chemical Physics Department, Institute of Physical Chemistry Rocasolano (IQFR), Consejo Superior de Investigaciones Cientı́ficas (CSIC), Calle de Serrano 119, Madrid 28006, Spain
| | - Pablo Chacón
- Structural Bioinformatics Group, Biological Chemical Physics Department, Institute of Physical Chemistry Rocasolano (IQFR), Consejo Superior de Investigaciones Cientı́ficas (CSIC), Calle de Serrano 119, Madrid 28006, Spain
| |
Collapse
|
28
|
Li Y. Conformational sampling in template-free protein loop structure modeling: an overview. Comput Struct Biotechnol J 2013; 5:e201302003. [PMID: 24688696 PMCID: PMC3962101 DOI: 10.5936/csbj.201302003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2012] [Revised: 01/23/2013] [Accepted: 01/28/2013] [Indexed: 01/04/2023] Open
Abstract
Accurately modeling protein loops is an important step to predict three-dimensional structures as well as to understand functions of many proteins. Because of their high flexibility, modeling the three-dimensional structures of loops is difficult and is usually treated as a "mini protein folding problem" under geometric constraints. In the past decade, there has been remarkable progress in template-free loop structure modeling due to advances of computational methods as well as stably increasing number of known structures available in PDB. This mini review provides an overview on the recent computational approaches for loop structure modeling. In particular, we focus on the approaches of sampling loop conformation space, which is a critical step to obtain high resolution models in template-free methods. We review the potential energy functions for loop modeling, loop buildup mechanisms to satisfy geometric constraints, and loop conformation sampling algorithms. The recent loop modeling results are also summarized.
Collapse
Affiliation(s)
- Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA
| |
Collapse
|
29
|
Miller EB, Murrett CS, Zhu K, Zhao S, Goldfeld DA, Bylund JH, Friesner RA. Prediction of Long Loops with Embedded Secondary Structure using the Protein Local Optimization Program. J Chem Theory Comput 2013; 9:1846-4864. [PMID: 23814507 DOI: 10.1021/ct301083q] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Robust homology modeling to atomic-level accuracy requires in the general case successful prediction of protein loops containing small segments of secondary structure. Further, as loop prediction advances to success with larger loops, the exclusion of loops containing secondary structure becomes awkward. Here, we extend the applicability of the Protein Local Optimization Program (PLOP) to loops up to 17 residues in length that contain either helical or hairpin segments. In general, PLOP hierarchically samples conformational space and ranks candidate loops with a high-quality molecular mechanics force field. For loops identified to possess α-helical segments, we employ an alternative dihedral library composed of (ϕ,ψ) angles commonly found in helices. The alternative library is searched over a user-specified range of residues that define the helical bounds. The source of these helical bounds can be from popular secondary structure prediction software or from analysis of past loop predictions where a propensity to form a helix is observed. Due to the maturity of our energy model, the lowest energy loop across all experiments can be selected with an accuracy of sub-Ångström RMSD in 80% of cases, 1.0 to 1.5 Å RMSD in 14% of cases, and poorer than 1.5 Å RMSD in 6% of cases. The effectiveness of our current methods in predicting hairpin-containing loops is explored with hairpins up to 13 residues in length and again reaching an accuracy of sub-Ångström RMSD in 83% of cases, 1.0 to 1.5 Å RMSD in 10% of cases, and poorer than 1.5 Å RMSD in 7% of cases. Finally, we explore the effect of an imprecise surrounding environment, in which side chains, but not the backbone, are initially in perturbed geometries. In these cases, loops perturbed to 3Å RMSD from the native environment were restored to their native conformation with sub-Ångström RMSD.
Collapse
Affiliation(s)
- Edward B Miller
- Department of Chemistry, Columbia University, New York, New York
| | | | | | | | | | | | | |
Collapse
|
30
|
Abstract
Loops are irregular structures which connect two secondary structure elements in proteins. They often play important roles in function, including enzyme reactions and ligand binding. Despite their importance, their structure remains difficult to predict. Most protein loop structure prediction methods sample local loop segments and score them. In particular protein loop classifications and database search methods depend heavily on local properties of loops. Here we examine the distance between a loop's end points (span). We find that the distribution of loop span appears to be independent of the number of residues in the loop, in other words the separation between the anchors of a loop does not increase with an increase in the number of loop residues. Loop span is also unaffected by the secondary structures at the end points, unless the two anchors are part of an anti-parallel beta sheet. As loop span appears to be independent of global properties of the protein we suggest that its distribution can be described by a random fluctuation model based on the Maxwell-Boltzmann distribution. It is believed that the primary difficulty in protein loop structure prediction comes from the number of residues in the loop. Following the idea that loop span is an independent local property, we investigate its effect on protein loop structure prediction and show how normalised span (loop stretch) is related to the structural complexity of loops. Highly contracted loops are more difficult to predict than stretched loops.
Collapse
Affiliation(s)
- Yoonjoo Choi
- Department of Computer Science , Dartmouth College , Hanover, NH , USA
| | | | | |
Collapse
|
31
|
Korkut A, Hendrickson WA. Structural plasticity and conformational transitions of HIV envelope glycoprotein gp120. PLoS One 2012; 7:e52170. [PMID: 23300605 PMCID: PMC3531394 DOI: 10.1371/journal.pone.0052170] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 11/12/2012] [Indexed: 11/18/2022] Open
Abstract
HIV envelope glycoproteins undergo large-scale conformational changes as they interact with cellular receptors to cause the fusion of viral and cellular membranes that permits viral entry to infect targeted cells. Conformational dynamics in HIV gp120 are also important in masking conserved receptor epitopes from being detected for effective neutralization by the human immune system. Crystal structures of HIV gp120 and its complexes with receptors and antibody fragments provide high-resolution pictures of selected conformational states accessible to gp120. Here we describe systematic computational analyses of HIV gp120 plasticity in such complexes with CD4 binding fragments, CD4 mimetic proteins, and various antibody fragments. We used three computational approaches: an isotropic elastic network analysis of conformational plasticity, a full atomic normal mode analysis, and simulation of conformational transitions with our coarse-grained virtual atom molecular mechanics (VAMM) potential function. We observe collective sub-domain motions about hinge points that coordinate those motions, correlated local fluctuations at the interfacial cavity formed when gp120 binds to CD4, and concerted changes in structural elements that form at the CD4 interface during large-scale conformational transitions to the CD4-bound state from the deformed states of gp120 in certain antibody complexes.
Collapse
Affiliation(s)
- Anil Korkut
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
| | - Wayne A. Hendrickson
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Department of Physiology and Cellular Biophysics, Columbia University, New York, New York, United States of America
- Howard Hughes Medical Institute, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
32
|
Abstract
The prediction of loop structures is considered one of the main challenges in the protein folding problem. Regardless of the dependence of the overall algorithm on the protein data bank, the flexibility of loop regions dictates the need for special attention to their structures. In this article, we present algorithms for loop structure prediction with fixed stem and flexible stem geometry. In the flexible stem geometry problem, only the secondary structure of three stem residues on either side of the loop is known. In the fixed stem geometry problem, the structure of the three stem residues on either side of the loop is also known. Initial loop structures are generated using a probability database for the flexible stem geometry problem, and using torsion angle dynamics for the fixed stem geometry problem. Three rotamer optimization algorithms are introduced to alleviate steric clashes between the generated backbone structures and the side chain rotamers. The structures are optimized by energy minimization using an all-atom force field. The optimized structures are clustered using a traveling salesman problem-based clustering algorithm. The structures in the densest clusters are then utilized to refine dihedral angle bounds on all amino acids in the loop. The entire procedure is carried out for a number of iterations, leading to improved structure prediction and refined dihedral angle bounds. The algorithms presented in this article have been tested on 3190 loops from the PDBSelect25 data set and on targets from the recently concluded CASP9 community-wide experiment.
Collapse
Affiliation(s)
- A. Subramani
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - C. A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| |
Collapse
|
33
|
St-Pierre JF, Mousseau N. Large loop conformation sampling using the activation relaxation technique, ART-nouveau method. Proteins 2012; 80:1883-94. [PMID: 22488731 DOI: 10.1002/prot.24085] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2011] [Revised: 03/19/2011] [Accepted: 03/30/2012] [Indexed: 12/25/2022]
Abstract
We present an adaptation of the ART-nouveau energy surface sampling method to the problem of loop structure prediction. This method, previously used to study protein folding pathways and peptide aggregation, is well suited to the problem of sampling the conformation space of large loops by targeting probable folding pathways instead of sampling exhaustively that space. The number of sampled conformations needed by ART nouveau to find the global energy minimum for a loop was found to scale linearly with the sequence length of the loop for loops between 8 and about 20 amino acids. Considering the linear scaling dependence of the computation cost on the loop sequence length for sampling new conformations, we estimate the total computational cost of sampling larger loops to scale quadratically compared to the exponential scaling of exhaustive search methods.
Collapse
Affiliation(s)
- Jean-François St-Pierre
- Département de Physique and Regroupement Québécois sur les Matériaux de Pointe, Université de Montréal, CP 6128, Succursale Centre-Ville, Montréal, Québec, Canada H3C 3J7
| | | |
Collapse
|
34
|
Subramani A, Wei Y, Floudas CA. ASTRO-FOLD 2.0: an Enhanced Framework for Protein Structure Prediction. AIChE J 2012; 58:1619-1637. [PMID: 23049093 DOI: 10.1002/aic.12669] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The three-dimensional (3-D) structure prediction of proteins, given their amino acid sequence, is addressed using the first principles-based approach ASTRO-FOLD 2.0. The key features presented are: (1) Secondary structure prediction using a novel optimization-based consensus approach, (2) β-sheet topology prediction using mixed-integer linear optimization (MILP), (3) Residue-to-residue contact prediction using a high-resolution distance-dependent force field and MILP formulation, (4) Tight dihedral angle and distance bound generation for loop residues using dihedral angle clustering and non-linear optimization (NLP), (5) 3-D structure prediction using deterministic global optimization, stochastic conformational space annealing, and the full-atomistic ECEPP/3 potential, (6) Near-native structure selection using a traveling salesman problem-based clustering approach, ICON, and (7) Improved bound generation using chemical shifts of subsets of heavy atoms, generated by SPARTA and CS23D. Computational results of ASTRO-FOLD 2.0 on 47 blind targets of the recently concluded CASP9 experiment are presented.
Collapse
Affiliation(s)
- A Subramani
- Dept. of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544
| | | | | |
Collapse
|
35
|
Liang S, Zhang C, Sarmiento J, Standley DM. Protein Loop Modeling with Optimized Backbone Potential Functions. J Chem Theory Comput 2012; 8:1820-7. [PMID: 26593673 DOI: 10.1021/ct300131p] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We represented protein backbone potential as a Fourier series. The parameters of the backbone dihedral potential were initialized to random values and optimized by Monte Carlo simulations so that generated native-like loop decoys had a lower energy than non-native decoys. The low energy regions of the optimized backbone potential were consistent with observed Ramachandran plots derived from crystal structures. The backbone potential was then used for the prediction of loop conformations (OSCAR-loop) combining with the previously described OSCAR force field, which has been shown to be very accurate in side chain modeling. As a result, the accuracy of OSCAR-loop was improved by local energy minimization based on the complete force field. The average accuracies were 0.40, 0.70, 1.10, 2.08, and 3.58 Å for 4, 6, 8, 10, and 12-residue loops, respectively, with each size being represented by 325 to 2809 targets. The accuracy was better than that of other loop modeling algorithms for short loops (<10 residues). For longer loops, the prediction accuracy was improved by concurrently sampling with a fragment-based method, Spanner. OSCAR-loop is available for download at http://sysimm.ifrec.osaka-u.ac.jp/OSCAR/ .
Collapse
Affiliation(s)
- Shide Liang
- Systems Immunology Lab, Immunology Frontier Research Center, Osaka University , Suita, Osaka, 565-0871, Japan
| | - Chi Zhang
- School of Biological Sciences, Center for Plant Science and Innovation, University of Nebraska , Lincoln, Nebraska 68588, United States
| | - Jamica Sarmiento
- Systems Immunology Lab, Immunology Frontier Research Center, Osaka University , Suita, Osaka, 565-0871, Japan
| | - Daron M Standley
- Systems Immunology Lab, Immunology Frontier Research Center, Osaka University , Suita, Osaka, 565-0871, Japan
| |
Collapse
|
36
|
Sacan A, Ekins S, Kortagere S. Applications and limitations of in silico models in drug discovery. Methods Mol Biol 2012; 910:87-124. [PMID: 22821594 DOI: 10.1007/978-1-61779-965-5_6] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Drug discovery in the late twentieth and early twenty-first century has witnessed a myriad of changes that were adopted to predict whether a compound is likely to be successful, or conversely enable identification of molecules with liabilities as early as possible. These changes include integration of in silico strategies for lead design and optimization that perform complementary roles to that of the traditional in vitro and in vivo approaches. The in silico models are facilitated by the availability of large datasets associated with high-throughput screening, bioinformatics algorithms to mine and annotate the data from a target perspective, and chemoinformatics methods to integrate chemistry methods into lead design process. This chapter highlights the applications of some of these methods and their limitations. We hope this serves as an introduction to in silico drug discovery.
Collapse
Affiliation(s)
- Ahmet Sacan
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | | | | |
Collapse
|
37
|
Adhikari AN, Peng J, Wilde M, Xu J, Freed KF, Sosnick TR. Modeling large regions in proteins: applications to loops, termini, and folding. Protein Sci 2012; 21:107-21. [PMID: 22095743 PMCID: PMC3323786 DOI: 10.1002/pro.767] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2011] [Revised: 11/02/2011] [Accepted: 11/06/2011] [Indexed: 11/10/2022]
Abstract
Template-based methods for predicting protein structure provide models for a significant portion of the protein but often contain insertions or chain ends (InsEnds) of indeterminate conformation. The local structure prediction "problem" entails modeling the InsEnds onto the rest of the protein. A well-known limit involves predicting loops of ≤12 residues in crystal structures. However, InsEnds may contain as many as ~50 amino acids, and the template-based model of the protein itself may be imperfect. To address these challenges, we present a free modeling method for predicting the local structure of loops and large InsEnds in both crystal structures and template-based models. The approach uses single amino acid torsional angle "pivot" moves of the protein backbone with a C(β) level representation. Nevertheless, our accuracy for loops is comparable to existing methods. We also apply a more stringent test, the blind structure prediction and refinement categories of the CASP9 tournament, where we improve the quality of several homology based models by modeling InsEnds as long as 45 amino acids, sizes generally inaccessible to existing loop prediction methods. Our approach ranks as one of the best in the CASP9 refinement category that involves improving template-based models so that they can function as molecular replacement models to solve the phase problem for crystallographic structure determination.
Collapse
Affiliation(s)
- Aashish N Adhikari
- Department of Chemistry, The University of ChicagoChicago, Illinois 60637
- The James Franck Institute, The University of ChicagoChicago, Illinois 60637
| | - Jian Peng
- Toyota Technological Institute at ChicagoChicago, Illinois 60637
| | - Michael Wilde
- Department of Biochemistry and Molecular Biology, The University of ChicagoChicago, Illinois 60637
| | - Jinbo Xu
- Toyota Technological Institute at ChicagoChicago, Illinois 60637
| | - Karl F Freed
- Department of Chemistry, The University of ChicagoChicago, Illinois 60637
- The James Franck Institute, The University of ChicagoChicago, Illinois 60637
- Computation Institute, The University of Chicago and Argonne National LaboratoryChicago, Illinois 60637
| | - Tobin R Sosnick
- Computation Institute, The University of Chicago and Argonne National LaboratoryChicago, Illinois 60637
- Department of Biochemistry and Molecular Biology, The University of ChicagoChicago, Illinois 60637
- Institute for Biophysical Dynamics, The University of ChicagoChicago, Illinois 60637
| |
Collapse
|
38
|
Singh RK, Larson JD, Zhu W, Rambo RP, Hura GL, Becker DF, Tanner JJ. Small-angle X-ray scattering studies of the oligomeric state and quaternary structure of the trifunctional proline utilization A (PutA) flavoprotein from Escherichia coli. J Biol Chem 2011; 286:43144-53. [PMID: 22013066 DOI: 10.1074/jbc.m111.292474] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The trifunctional flavoprotein proline utilization A (PutA) links metabolism and gene regulation in Gram-negative bacteria by catalyzing the two-step oxidation of proline to glutamate and repressing transcription of the proline utilization regulon. Small-angle x-ray scattering (SAXS) and domain deletion analysis were used to obtain solution structural information for the 1320-residue PutA from Escherichia coli. Shape reconstructions show that PutA is a symmetric V-shaped dimer having dimensions of 205 × 85 × 55 Å. The particle consists of two large lobes connected by a 30-Å diameter cylinder. Domain deletion analysis shows that the N-terminal DNA-binding domain mediates dimerization. Rigid body modeling was performed using the crystal structure of the DNA-binding domain and a hybrid x-ray/homology model of residues 87-1113. The calculations suggest that the DNA-binding domain is located in the connecting cylinder, whereas residues 87-1113, which contain the two catalytic active sites, reside in the large lobes. The SAXS data and amino acid sequence analysis suggest that the Δ(1)-pyrroline-5-carboxylate dehydrogenase domains lack the conventional oligomerization flap, which is unprecedented for the aldehyde dehydrogenase superfamily. The data also provide insight into the function of the 200-residue C-terminal domain. It is proposed that this domain serves as a lid that covers the internal substrate channeling cavity, thus preventing escape of the catalytic intermediate into the bulk medium. Finally, the SAXS model is consistent with a cloaking mechanism of gene regulation whereby interaction of PutA with the membrane hides the DNA-binding surface from the put regulon thereby activating transcription.
Collapse
Affiliation(s)
- Ranjan K Singh
- Department of Chemistry, University of Missouri-Columbia, Columbia, Missouri 65211, USA
| | | | | | | | | | | | | |
Collapse
|
39
|
Liang S, Zheng D, Zhang C, Standley DM. Fast and accurate prediction of protein side-chain conformations. ACTA ACUST UNITED AC 2011; 27:2913-4. [PMID: 21873640 PMCID: PMC3187653 DOI: 10.1093/bioinformatics/btr482] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Summary: We developed a fast and accurate side-chain modeling program [Optimized Side Chain Atomic eneRgy (OSCAR)-star] based on orientation-dependent energy functions and a rigid rotamer model. The average computing time was 18 s per protein for 218 test proteins with higher prediction accuracy (1.1% increase for χ1 and 0.8% increase for χ1+2) than the best performing program developed by other groups. We show that the energy functions, which were calibrated to tolerate the discrete errors of rigid rotamers, are appropriate for protein loop selection, especially for decoys without extensive structural refinement. Availability: OSCAR-star and the 218 test proteins are available for download at http://sysimm.ifrec.osaka-u.ac.jp/OSCAR Contact:standley@ifrec.osaka-u.ac.jp Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shide Liang
- Systems Immunology Lab, Immunology Frontier Research Center, Osaka University, Suita, Osaka 565-0871, Japan
| | | | | | | |
Collapse
|
40
|
Zhao S, Zhu K, Li J, Friesner RA. Progress in super long loop prediction. Proteins 2011; 79:2920-35. [PMID: 21905115 DOI: 10.1002/prot.23129] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2010] [Revised: 05/06/2011] [Accepted: 06/15/2011] [Indexed: 11/07/2022]
Abstract
Sampling errors are very common in super long loop (referring here to loops that have more than thirteen residues) prediction, simply because the sampling space is vast. We have developed a dipeptide segment sampling algorithm to solve this problem. As a first step in evaluating the performance of this algorithm, it was applied to the problem of reconstructing loops in native protein structures. With a newly constructed test set of 89 loops ranging from 14 to 17 residues, this method obtains average/median global backbone root-mean-square deviations (RMSDs) to the native structure (superimposing the body of the protein, not the loop itself) of 1.46/0.68 Å. Specifically, results for loops of various lengths are 1.19/0.67 Å for 36 fourteen-residue loops, 1.55/0.75 Å for 30 fifteen-residue loops, 1.43/0.80 Å for 14 sixteen-residue loops, and 2.30/1.92 Å for nine seventeen-residue loops. In the vast majority of cases, the method locates energy minima that are lower than or equal to that of the minimized native loop, thus indicating that the new sampling method is successful and rarely limits prediction accuracy. Median RMSDs are substantially lower than the averages because of a small number of outliers. The causes of these failures are examined in some detail, and some can be attributed to flaws in the energy function, such as π-π interactions are not accurately accounted for by the OPLS-AA force field we employed in this study. By introducing a new energy model which has a superior description of π-π interactions, significantly better results were achieved for quite a few former outliers. Crystal packing is explicitly included in order to provide a fair comparison with crystal structures.
Collapse
Affiliation(s)
- Suwen Zhao
- Department of Chemistry, Columbia University, New York, New York 1027, USA
| | | | | | | |
Collapse
|
41
|
Li Y, Rata I, Jakobsson E. Sampling multiple scoring functions can improve protein loop structure prediction accuracy. J Chem Inf Model 2011; 51:1656-66. [PMID: 21702492 PMCID: PMC3211142 DOI: 10.1021/ci200143u] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Accurately predicting loop structures is important for understanding functions of many proteins. In order to obtain loop models with high accuracy, efficiently sampling the loop conformation space to discover reasonable structures is a critical step. In loop conformation sampling, coarse-grain energy (scoring) functions coupling with reduced protein representations are often used to reduce the number of degrees of freedom as well as sampling computational time. However, due to implicitly considering many factors by reduced representations, the coarse-grain scoring functions may have potential insensitivity and inaccuracy, which can mislead the sampling process and consequently ignore important loop conformations. In this paper, we present a new computational sampling approach to obtain reasonable loop backbone models, so-called the Pareto optimal sampling (POS) method. The rationale of the POS method is to sample the function space of multiple, carefully selected scoring functions to discover an ensemble of diversified structures yielding Pareto optimality to all sampled conformations. The POS method can efficiently tolerate insensitivity and inaccuracy in individual scoring functions and thereby lead to significant accuracy improvement in loop structure prediction. We apply the POS method to a set of 4-12-residue loop targets using a function space composed of backbone-only Rosetta and distance-scale finite ideal-gas reference (DFIRE) and a triplet backbone dihedral potential developed in our lab. Our computational results show that in 501 out of 502 targets, the model sets generated by POS contain structure models are within subangstrom resolution. Moreover, the top-ranked models have a root mean square deviation (rmsd) less than 1 A in 96.8, 84.1, and 72.2% of the short (4-6 residues), medium (7-9 residues), and long (10-12 residues) targets, respectively, when the all-atom models are generated by local optimization from the backbone models and are ranked by our recently developed Pareto optimal consensus (POC) method. Similar sampling effectiveness can also be found in a set of 13-residue loop targets.
Collapse
Affiliation(s)
- Yaohang Li
- Department of Computer Science, Old Dominion University
| | - Ionel Rata
- Center for Biophysics and Computational Biology, University of Illinois at Urbana-Champaign
| | - Eric Jakobsson
- Department of Molecular and Integrative Physiology, Beckman Institute, and National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign
| |
Collapse
|
42
|
Shim JY, Rudd J, Ding TT. Distinct second extracellular loop structures of the brain cannabinoid CB(1) receptor: implication in ligand binding and receptor function. Proteins 2011; 79:581-97. [PMID: 21120862 DOI: 10.1002/prot.22907] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The G-protein-coupled receptor (GPCR) second extracellular loop (E2) is known to play an important role in receptor structure and function. The brain cannabinoid (CB(1)) receptor is unique in that it lacks the interloop E2 disulfide linkage to the transmembrane (TM) helical bundle, a characteristic of many GPCRs. Recent mutation studies of the CB(1) receptor, however, suggest the presence of an alternative intraloop disulfide bond between two E2 Cys residues. Considering the oxidation state of these Cys residues, we determine the molecular structures of the 17-residue E2 in the dithiol form (E2(dithiol)) and in the disulfide form (E2(disulfide)) of the CB(1) receptor in a fully hydrated 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine bilayer, using a combination of simulated annealing and molecular dynamics simulation approaches. We characterize the CB(1) receptor models with these two E2 forms, CB(1)(E2(dithiol)) and CB(1)(E2(disulfide)), by analyzing interaction energy, contact number, core crevice, and cross correlation. The results show that the distinct E2 structures interact differently with the TM helical bundle and uniquely modify the TM helical topology, suggesting that E2 of the CB(1) receptor plays a critical role in stabilizing receptor structure, regulating ligand binding, and ultimately modulating receptor activation. Further studies on the role of E2 of the CB(1) receptor are warranted, particularly comparisons of the ligand-bound form with the present ligand-free form.
Collapse
Affiliation(s)
- Joong-Youn Shim
- JL Chambers Biomedical/Biotechnology Research Institute, North Carolina Central University, Durham, North Carolina 27707, USA.
| | | | | |
Collapse
|
43
|
Liang S, Zhang C, Standley DM. Protein loop selection using orientation-dependent force fields derived by parameter optimization. Proteins 2011; 79:2260-7. [DOI: 10.1002/prot.23051] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2011] [Revised: 03/21/2011] [Accepted: 03/31/2011] [Indexed: 12/25/2022]
|
44
|
Arnautova YA, Abagyan RA, Totrov M. Development of a new physics-based internal coordinate mechanics force field and its application to protein loop modeling. Proteins 2011; 79:477-98. [PMID: 21069716 PMCID: PMC3057902 DOI: 10.1002/prot.22896] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
We report the development of internal coordinate mechanics force field (ICMFF), new force field parameterized using a combination of experimental data for crystals of small molecules and quantum mechanics calculations. The main features of ICMFF include: (a) parameterization for the dielectric constant relevant to the condensed state (ε = 2) instead of vacuum, (b) an improved description of hydrogen-bond interactions using duplicate sets of van der Waals parameters for heavy atom-hydrogen interactions, and (c) improved backbone covalent geometry and energetics achieved using novel backbone torsional potentials and inclusion of the bond angles at the C(α) atoms into the internal variable set. The performance of ICMFF was evaluated through loop modeling simulations for 4-13 residue loops. ICMFF was combined with a solvent-accessible surface area solvation model optimized using a large set of loop decoys. Conformational sampling was carried out using the biased probability Monte Carlo method. Average/median backbone root-mean-square deviations of the lowest energy conformations from the native structures were 0.25/0.21 Å for four residues loops, 0.84/0.46 Å for eight residue loops, and 1.16/0.73 Å for 12 residue loops. To our knowledge, these results are significantly better than or comparable with those reported to date for any loop modeling method that does not take crystal packing into account. Moreover, the accuracy of our method is on par with the best previously reported results obtained considering the crystal environment. We attribute this success to the high accuracy of the new ICM force field achieved by meticulous parameterization, to the optimized solvent model, and the efficiency of the search method.
Collapse
Affiliation(s)
- Yelena A Arnautova
- Molsoft LLC, 3366 North Torrey Pines Court, Suite 300, La Jolla, California 92037, USA
| | | | | |
Collapse
|
45
|
Abstract
Loop modeling is crucial for high-quality homology model construction outside conserved secondary structure elements. Dozens of loop modeling protocols involving a range of database and ab initio search algorithms and a variety of scoring functions have been proposed. Knowledge-based loop modeling methods are very fast and some can successfully and reliably predict loops up to about eight residues long. Several recent ab initio loop simulation methods can be used to construct accurate models of loops up to 12-13 residues long, albeit at a substantial computational cost. Major current challenges are the simulations of loops longer than 12-13 residues, the modeling of multiple interacting flexible loops, and the sensitivity of the loop predictions to the accuracy of the loop environment.
Collapse
|
46
|
Lee J, Lee D, Park H, Coutsias EA, Seok C. Protein loop modeling by using fragment assembly and analytical loop closure. Proteins 2010; 78:3428-36. [PMID: 20872556 PMCID: PMC2976774 DOI: 10.1002/prot.22849] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2010] [Revised: 07/16/2010] [Accepted: 07/31/2010] [Indexed: 12/27/2022]
Abstract
Protein loops are often involved in important biological functions such as molecular recognition, signal transduction, or enzymatic action. The three dimensional structures of loops can provide essential information for understanding molecular mechanisms behind protein functions. In this article, we develop a novel method for protein loop modeling, where the loop conformations are generated by fragment assembly and analytical loop closure. The fragment assembly method reduces the conformational space drastically, and the analytical loop closure method finds the geometrically consistent loop conformations efficiently. We also derive an analytic formula for the gradient of any analytical function of dihedral angles in the space of closed loops. The gradient can be used to optimize various restraints derived from experiments or databases, for example restraints for preferential interactions between specific residues or for preferred backbone angles. We demonstrate that the current loop modeling method outperforms previous methods that employ residue-based torsion angle maps or different loop closure strategies when tested on two sets of loop targets of lengths ranging from 4 to 12.
Collapse
Affiliation(s)
- Julian Lee
- Department of Bioinformatics and Life Science, Soongsil University, Seoul 156-743, Korea
| | - Dongseon Lee
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Hahnbeom Park
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Evangelos A. Coutsias
- Department of Mathematics and Statistics, University of New Mexico, Albuquerque, NM 87131, USA
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| |
Collapse
|
47
|
Li Y, Rata I, Chiu SW, Jakobsson E. Improving predicted protein loop structure ranking using a Pareto-optimality consensus method. BMC STRUCTURAL BIOLOGY 2010; 10:22. [PMID: 20642859 PMCID: PMC2914074 DOI: 10.1186/1472-6807-10-22] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2009] [Accepted: 07/20/2010] [Indexed: 11/10/2022]
Abstract
Background Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction. Results We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy dominance relationship to the rest of the models. We apply the POC method to a large number of decoy sets for loops of 4- to 12-residue in length using a functional space composed of several carefully-selected scoring functions: Rosetta, DOPE, DDFIRE, OPLS-AA, and a triplet backbone dihedral potential developed in our lab. Our computational results show that the sets of Pareto-optimal decoys, which are typically composed of ~20% or less of the overall decoys in a set, have a good coverage of the best or near-best decoys in more than 99% of the loop targets. Compared to the individual scoring function yielding best selection accuracy in the decoy sets, the POC method yields 23%, 37%, and 64% less false positives in distinguishing the native conformation, indentifying a near-native model (RMSD < 0.5A from the native) as top-ranked, and selecting at least one near-native model in the top-5-ranked models, respectively. Similar effectiveness of the POC method is also found in the decoy sets from membrane protein loops. Furthermore, the POC method outperforms the other popularly-used consensus strategies in model ranking, such as rank-by-number, rank-by-rank, rank-by-vote, and regression-based methods. Conclusions By integrating multiple knowledge- and physics-based scoring functions based on Pareto optimality and fuzzy dominance, the POC method is effective in distinguishing the best loop models from the other ones within a loop model set.
Collapse
Affiliation(s)
- Yaohang Li
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA.
| | | | | | | |
Collapse
|
48
|
Danielson ML, Lill MA. New computational method for prediction of interacting protein loop regions. Proteins 2010; 78:1748-59. [PMID: 20186974 DOI: 10.1002/prot.22690] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Flexible loop regions of proteins play a crucial role in many biological functions such as protein-ligand recognition, enzymatic catalysis, and protein-protein association. To date, most computational methods that predict the conformational states of loops only focus on individual loop regions. However, loop regions are often spatially in close proximity to one another and their mutual interactions stabilize their conformations. We have developed a new method, titled CorLps, capable of simultaneously predicting such interacting loop regions. First, an ensemble of individual loop conformations is generated for each loop region. The members of the individual ensembles are combined and are accepted or rejected based on a steric clash filter. After a subsequent side-chain optimization step, the resulting conformations of the interacting loops are ranked by the statistical scoring function DFIRE that originated from protein structure prediction. Our results show that predicting interacting loops with CorLps is superior to sequential prediction of the two interacting loop regions, and our method is comparable in accuracy to single loop predictions. Furthermore, improved predictive accuracy of the top-ranked solution is achieved for 12-residue length loop regions by diversifying the initial pool of individual loop conformations using a quality threshold clustering algorithm.
Collapse
Affiliation(s)
- Matthew L Danielson
- Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, Indiana 47907, USA
| | | |
Collapse
|
49
|
Application of biasing-potential replica-exchange simulations for loop modeling and refinement of proteins in explicit solvent. Proteins 2010; 78:2809-19. [DOI: 10.1002/prot.22796] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
50
|
Choi Y, Deane CM. FREAD revisited: Accurate loop structure prediction using a database search algorithm. Proteins 2010; 78:1431-40. [PMID: 20034110 DOI: 10.1002/prot.22658] [Citation(s) in RCA: 121] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Loops are the most variable regions of protein structure and are, in general, the least accurately predicted. Their prediction has been approached in two ways, ab initio and database search. In recent years, it has been thought that ab initio methods are more powerful. In light of the continued rapid expansion in the number of known protein structures, we have re-evaluated FREAD, a database search method and demonstrate that the power of database search methods may have been underestimated. We found that sequence similarity as quantified by environment specific substitution scores can be used to significantly improve prediction. In fact, FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than the ab initio methods of MODELLER, PLOP, and RAPPER. Within this subset, FREAD's predictive ability is length independent, in general, producing results within 2A RMSD, compared to an average of over 10A for loop length 20 for any of the other tested methods. We also benchmarked the prediction protocols on a set of 212 loops from the model structures in CASP 7 and 8. An extended version of FREAD is able to make predictions for 127 of these, it gives the best prediction of the methods tested in 61 of these cases. In examining FREAD's ability to predict in the model environment, we found that whole structure quality did not affect the quality of loop predictions.
Collapse
Affiliation(s)
- Yoonjoo Choi
- Department of Statistics, Oxford University, United Kingdom.
| | | |
Collapse
|