1
|
Cai Y, Li X, Sun Z, Lu Y, Zhao H, Hanson J, Paliwal K, Litfin T, Zhou Y, Yang Y. SPOT-Fold: Fragment-Free Protein Structure Prediction Guided by Predicted Backbone Structure and Contact Map. J Comput Chem 2019; 41:745-750. [PMID: 31845383 DOI: 10.1002/jcc.26132] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 10/07/2019] [Accepted: 12/01/2019] [Indexed: 02/01/2023]
Abstract
Protein structure determination has long been one of the most challenging problems in molecular biology for the past 60 years. Here we present an ab initio protein tertiary-structure prediction method assisted by predicted contact maps from SPOT-Contact and predicted dihedral angles from SPIDER 3. These predicted properties were then fed to the crystallography and NMR system (CNS) for restrained structure modeling. The resulted structures are first evaluated by the potential energy calculated by CNS, followed by dDFIRE energy function for model selections. The method called SPOT-Fold has been tested on 241 CASP targets between 67 and 670 amino acid residues, 60 randomly selected globular proteins under 100 amino acids. The method has a comparable accuracy to other contact-map-based modeling techniques. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Yufeng Cai
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | - Xiongjun Li
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | - Zhe Sun
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | - Yutong Lu
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, 510000, China
| | - Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland, 4122, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland, 4122, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Queensland, 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Queensland, 4222, Australia
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| |
Collapse
|
2
|
Vaitinadapoule A, Etchebest C. Molecular Modeling of Transporters: From Low Resolution Cryo-Electron Microscopy Map to Conformational Exploration. The Example of TSPO. Methods Mol Biol 2017; 1635:383-416. [PMID: 28755381 DOI: 10.1007/978-1-4939-7151-0_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
This chapter describes a protocol to establish a three-dimensional (3D) model of a protein and to explore its conformational landscape. It combines predictions from up-to-date bioinformatics methods with low-resolution experimental data. It also proposes to examine rapidly the dynamics of the protein using molecular dynamics simulations with a coarse-grained force field. Tools for analyzing these trajectories are suggested as well as those for constructing all-atoms models. Thus, starting from a protein sequence and using free software, the user can get important conformational information, which might improve the knowledge about the protein function.
Collapse
Affiliation(s)
- Aurore Vaitinadapoule
- Unité INSERM UMRS1134, Laboratory of Excellence, Institut National de la Transfusion Sanguine, Université Paris-Diderot, Sorbonne Paris Cité, Université de la Réunion, 6 rue Alexandre Cabanel, 75015, Paris Cedex 15, France
| | - Catherine Etchebest
- Unité INSERM UMRS1134, Laboratory of Excellence, Institut National de la Transfusion Sanguine, Université Paris-Diderot, Sorbonne Paris Cité, Université de la Réunion, 6 rue Alexandre Cabanel, 75015, Paris Cedex 15, France.
| |
Collapse
|
3
|
Orlando G, Raimondi D, Vranken WF. Observation selection bias in contact prediction and its implications for structural bioinformatics. Sci Rep 2016; 6:36679. [PMID: 27857150 PMCID: PMC5114557 DOI: 10.1038/srep36679] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 10/18/2016] [Indexed: 01/14/2023] Open
Abstract
Next Generation Sequencing is dramatically increasing the number of known protein sequences, with related experimentally determined protein structures lagging behind. Structural bioinformatics is attempting to close this gap by developing approaches that predict structure-level characteristics for uncharacterized protein sequences, with most of the developed methods relying heavily on evolutionary information collected from homologous sequences. Here we show that there is a substantial observational selection bias in this approach: the predictions are validated on proteins with known structures from the PDB, but exactly for those proteins significantly more homologs are available compared to less studied sequences randomly extracted from Uniprot. Structural bioinformatics methods that were developed this way are thus likely to have over-estimated performances; we demonstrate this for two contact prediction methods, where performances drop up to 60% when taking into account a more realistic amount of evolutionary information. We provide a bias-free dataset for the validation for contact prediction methods called NOUMENON.
Collapse
Affiliation(s)
- G Orlando
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, Belgium.,Structural Biology Research Center, VIB, 1050 Brussels, Belgium
| | - D Raimondi
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, Belgium.,Structural Biology Research Center, VIB, 1050 Brussels, Belgium
| | - W F Vranken
- Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, La Plaine Campus, Triomflaan, Belgium.,Structural Biology Brussels, Vrije Universiteit Brussel, Pleinlaan 2, Belgium.,Structural Biology Research Center, VIB, 1050 Brussels, Belgium
| |
Collapse
|
4
|
Savojardo C, Fariselli P, Martelli PL, Casadio R. Prediction of disulfide connectivity in proteins with machine-learning methods and correlated mutations. BMC Bioinformatics 2013; 14 Suppl 1:S10. [PMID: 23368835 PMCID: PMC3548674 DOI: 10.1186/1471-2105-14-s1-s10] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Background Recently, information derived by correlated mutations in proteins has regained relevance for predicting protein contacts. This is due to new forms of mutual information analysis that have been proven to be more suitable to highlight direct coupling between pairs of residues in protein structures and to the large number of protein chains that are currently available for statistical validation. It was previously discussed that disulfide bond topology in proteins is also constrained by correlated mutations. Results In this paper we exploit information derived from a corrected mutual information analysis and from the inverse of the covariance matrix to address the problem of the prediction of the topology of disulfide bonds in Eukaryotes. Recently, we have shown that Support Vector Regression (SVR) can improve the prediction for the disulfide connectivity patterns. Here we show that the inclusion of the correlated mutation information increases of 5 percentage points the SVR performance (from 54% to 59%). When this approach is used in combination with a method previously developed by us and scoring at the state of art in predicting both location and topology of disulfide bonds in Eukaryotes (DisLocate), the per-protein accuracy is 38%, 2 percentage points higher than that previously obtained. Conclusions In this paper we show that the inclusion of information derived from correlated mutations can improve the performance of the state of the art methods for predicting disulfide connectivity patterns in Eukaryotic proteins. Our analysis also provides support to the notion that improving methods to extract evolutionary information from multiple sequence alignments greatly contributes to the scoring performance of predictors suited to detect relevant features from protein chains.
Collapse
Affiliation(s)
- Castrense Savojardo
- Department of Computer Science and Engineering, University of Bologna, Via Mura Anteo Zamboni 7, 41029 Bologna, Italy
| | | | | | | |
Collapse
|
5
|
Abstract
Motivation: The challenge of template-based modeling lies in the recognition of correct templates and generation of accurate sequence-template alignments. Homologous information has proved to be very powerful in detecting remote homologs, as demonstrated by the state-of-the-art profile-based method HHpred. However, HHpred does not fare well when proteins under consideration are low-homology. A protein is low-homology if we cannot obtain sufficient amount of homologous information for it from existing protein sequence databases. Results: We present a profile-entropy dependent scoring function for low-homology protein threading. This method will model correlation among various protein features and determine their relative importance according to the amount of homologous information available. When proteins under consideration are low-homology, our method will rely more on structure information; otherwise, homologous information. Experimental results indicate that our threading method greatly outperforms the best profile-based method HHpred and all the top CASP8 servers on low-homology proteins. Tested on the CASP8 hard targets, our threading method is also better than all the top CASP8 servers but slightly worse than Zhang-Server. This is significant considering that Zhang-Server and other top CASP8 servers use a combination of multiple structure-prediction techniques including consensus method, multiple-template modeling, template-free modeling and model refinement while our method is a classical single-template-based threading method without any post-threading refinement. Contact:jinboxu@gmail.com
Collapse
Affiliation(s)
- Jian Peng
- Toyota Technological Institute at Chicago, IL 60637, USA
| | | |
Collapse
|
6
|
Chubb D, Jefferys BR, Sternberg MJE, Kelley LA. Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe. ACTA ACUST UNITED AC 2010; 26:2664-71. [PMID: 20843957 DOI: 10.1093/bioinformatics/btq527] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Databases of sequenced genomes are widely used to characterize the structure, function and evolutionary relationships of proteins. The ability to discern such relationships is widely expected to grow as sequencing projects provide novel information, bridging gaps in our map of the protein universe. RESULTS We have plotted our progress in protein sequencing over the last two decades and found that the rate of novel sequence discovery is in a sustained period of decline. Consequently, PSI-BLAST, the most widely used method to detect remote evolutionary relationships, which relies upon the accumulation of novel sequence data, is now showing a plateau in performance. We interpret this trend as signalling our approach to a representative map of the protein universe and discuss its implications.
Collapse
Affiliation(s)
- Daniel Chubb
- Department of Life Science, Imperial College London, London, UK.
| | | | | | | |
Collapse
|
7
|
Casbon JA, Crooks GE, Saqi MAS. A high level interface to SCOP and ASTRAL implemented in python. BMC Bioinformatics 2006; 7:10. [PMID: 16403221 PMCID: PMC1373603 DOI: 10.1186/1471-2105-7-10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2005] [Accepted: 01/10/2006] [Indexed: 11/26/2022] Open
Abstract
Background Benchmarking algorithms in structural bioinformatics often involves the construction of datasets of proteins with given sequence and structural properties. The SCOP database is a manually curated structural classification which groups together proteins on the basis of structural similarity. The ASTRAL compendium provides non redundant subsets of SCOP domains on the basis of sequence similarity such that no two domains in a given subset share more than a defined degree of sequence similarity. Taken together these two resources provide a 'ground truth' for assessing structural bioinformatics algorithms. We present a small and easy to use API written in python to enable construction of datasets from these resources. Results We have designed a set of python modules to provide an abstraction of the SCOP and ASTRAL databases. The modules are designed to work as part of the Biopython distribution. Python users can now manipulate and use the SCOP hierarchy from within python programs, and use ASTRAL to return sequences of domains in SCOP, as well as clustered representations of SCOP from ASTRAL. Conclusion The modules make the analysis and generation of datasets for use in structural genomics easier and more principled.
Collapse
Affiliation(s)
- James A Casbon
- Bioinformatics, Institute of Cell and Molecular Science, School of Medicine and Dentistry, Queen Mary, University of London, Charterhouse Square, London EC1 6BQ, UK
| | - Gavin E Crooks
- Department of Plant and Microbial Biology, University of California, Berkeley, USA
| | - Mansoor AS Saqi
- Bioinformatics, Institute of Cell and Molecular Science, School of Medicine and Dentistry, Queen Mary, University of London, Charterhouse Square, London EC1 6BQ, UK
| |
Collapse
|