101
|
Dos Santos-Silva CA, Zupin L, Oliveira-Lima M, Vilela LMB, Bezerra-Neto JP, Ferreira-Neto JR, Ferreira JDC, de Oliveira-Silva RL, Pires CDJ, Aburjaile FF, de Oliveira MF, Kido EA, Crovella S, Benko-Iseppon AM. Plant Antimicrobial Peptides: State of the Art, In Silico Prediction and Perspectives in the Omics Era. Bioinform Biol Insights 2020; 14:1177932220952739. [PMID: 32952397 PMCID: PMC7476358 DOI: 10.1177/1177932220952739] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 07/30/2020] [Indexed: 12/14/2022] Open
Abstract
Even before the perception or interaction with pathogens, plants rely on constitutively guardian molecules, often specific to tissue or stage, with further expression after contact with the pathogen. These guardians include small molecules as antimicrobial peptides (AMPs), generally cysteine-rich, functioning to prevent pathogen establishment. Some of these AMPs are shared among eukaryotes (eg, defensins and cyclotides), others are plant specific (eg, snakins), while some are specific to certain plant families (such as heveins). When compared with other organisms, plants tend to present a higher amount of AMP isoforms due to gene duplications or polyploidy, an occurrence possibly also associated with the sessile habit of plants, which prevents them from evading biotic and environmental stresses. Therefore, plants arise as a rich resource for new AMPs. As these molecules are difficult to retrieve from databases using simple sequence alignments, a description of their characteristics and in silico (bioinformatics) approaches used to retrieve them is provided, considering resources and databases available. The possibilities and applications based on tools versus database approaches are considerable and have been so far underestimated.
Collapse
Affiliation(s)
| | - Luisa Zupin
- Genetic Immunology laboratory, Institute for Maternal and Child Health-IRCCS, Burlo Garofolo, Trieste, Italy
| | - Marx Oliveira-Lima
- Departamento de Genética, Universidade Federal de Pernambuco, Recife, Brazil
| | | | | | | | - José Diogo Cavalcanti Ferreira
- Departamento de Genética, Universidade Federal de Pernambuco, Recife, Brazil.,Departamento de Genética, Instituto Federal de Pernambuco, Pesqueira, Brazil
| | | | | | | | | | - Ederson Akio Kido
- Departamento de Genética, Universidade Federal de Pernambuco, Recife, Brazil
| | - Sergio Crovella
- Genetic Immunology laboratory, Institute for Maternal and Child Health-IRCCS, Burlo Garofolo, Trieste, Italy.,Department of Medicine, Surgery and Health Sciences, University of Trieste, Trieste, Italy
| | | |
Collapse
|
102
|
Mei Z, Treado JD, Grigas AT, Levine ZA, Regan L, O'Hern CS. Analyses of protein cores reveal fundamental differences between solution and crystal structures. Proteins 2020; 88:1154-1161. [PMID: 32105366 PMCID: PMC7415476 DOI: 10.1002/prot.25884] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 02/05/2020] [Accepted: 02/23/2020] [Indexed: 12/20/2022]
Abstract
There have been several studies suggesting that protein structures solved by NMR spectroscopy and X-ray crystallography show significant differences. To understand the origin of these differences, we assembled a database of high-quality protein structures solved by both methods. We also find significant differences between NMR and crystal structures-in the root-mean-square deviations of the C α atomic positions, identities of core amino acids, backbone, and side-chain dihedral angles, and packing fraction of core residues. In contrast to prior studies, we identify the physical basis for these differences by modeling protein cores as jammed packings of amino acid-shaped particles. We find that we can tune the jammed packing fraction by varying the degree of thermalization used to generate the packings. For an athermal protocol, we find that the average jammed packing fraction is identical to that observed in the cores of protein structures solved by X-ray crystallography. In contrast, highly thermalized packing-generation protocols yield jammed packing fractions that are even higher than those observed in NMR structures. These results indicate that thermalized systems can pack more densely than athermal systems, which suggests a physical basis for the structural differences between protein structures solved by NMR and X-ray crystallography.
Collapse
Affiliation(s)
- Zhe Mei
- Integrated Graduate Program in Physical & Engineering Biology, Yale University, New Haven, Connecticut
- Department of Chemistry, Yale University, New Haven, Connecticut
| | - John D Treado
- Integrated Graduate Program in Physical & Engineering Biology, Yale University, New Haven, Connecticut
- Department of Mechanical Engineering & Materials Science, Yale University, New Haven, Connecticut
| | - Alex T Grigas
- Integrated Graduate Program in Physical & Engineering Biology, Yale University, New Haven, Connecticut
- Graduate Program in Computational Biology & Bioinformatics, Yale University, New Haven, Connecticut
| | - Zachary A Levine
- Department of Pathology, Yale University, New Haven, Connecticut
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut
| | - Lynne Regan
- Institute of Quantitative Biology, Biochemistry and Biotechnology, Center for Synthetic and Systems Biology, School of Biological Sciences, University of Edinburgh, Edinburgh, UK
| | - Corey S O'Hern
- Integrated Graduate Program in Physical & Engineering Biology, Yale University, New Haven, Connecticut
- Department of Mechanical Engineering & Materials Science, Yale University, New Haven, Connecticut
- Department of Physics, Yale University, New Haven, Connecticut
- Department of Applied Physics, Yale University, New Haven, Connecticut
| |
Collapse
|
103
|
Grigas AT, Mei Z, Treado JD, Levine ZA, Regan L, O'Hern CS. Using physical features of protein core packing to distinguish real proteins from decoys. Protein Sci 2020; 29:1931-1944. [PMID: 32710566 PMCID: PMC7454528 DOI: 10.1002/pro.3914] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Revised: 07/10/2020] [Accepted: 07/20/2020] [Indexed: 01/06/2023]
Abstract
The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. We used two sources to obtain datasets of decoys to compare with real protein structures: submissions to the biennial Critical Assessment of protein Structure Prediction competition, in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence, and also decoys generated by 3DRobot, which have user-specified global root-mean-squared deviations from experimentally determined structures. Our analysis revealed that both sets of decoys possess cores that do not recapitulate the key features that define real protein cores. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a feed-forward neural network, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoy structures with similar accuracy to that obtained by state-of-the-art methods that incorporate many additional features. The small number of physical features makes our model interpretable, emphasizing the importance of protein packing and hydrophobicity in protein structure prediction.
Collapse
Affiliation(s)
- Alex T. Grigas
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
| | - Zhe Mei
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of ChemistryYale UniversityNew HavenConnecticutUSA
| | - John D. Treado
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
| | - Zachary A. Levine
- Department of PathologyYale UniversityNew HavenConnecticutUSA
- Department of Molecular Biophysics and BiochemistryYale UniversityNew HavenConnecticutUSA
| | - Lynne Regan
- Institute of Quantitative Biology, Biochemistry and Biotechnology, Centre for Synthetic and Systems Biology, School of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Corey S. O'Hern
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
- Department of PhysicsYale UniversityNew HavenConnecticutUSA
- Department of Applied PhysicsYale UniversityNew HavenConnecticutUSA
| |
Collapse
|
104
|
Runthala A, Chowdhury S. Refined template selection and combination algorithm significantly improves template-based modeling accuracy. J Bioinform Comput Biol 2020; 17:1950006. [PMID: 31057073 DOI: 10.1142/s0219720019500069] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In contrast to ab-initio protein modeling methodologies, comparative modeling is considered as the most popular and reliable algorithm to model protein structure. However, the selection of the best set of templates is still a major challenge. An effective template-ranking algorithm is developed to efficiently select only the reliable hits for predicting the protein structures. The algorithm employs the pairwise as well as multiple sequence alignments of template hits to rank and select the best possible set of templates. It captures several key sequences and structural information of template hits and converts into scores to effectively rank them. This selected set of templates is used to model a target. Modeling accuracy of the algorithm is tested and evaluated on TBM-HA domain containing CASP8, CASP9 and CASP10 targets. On an average, this template ranking and selection algorithm improves GDT-TS, GDT-HA and TM_Score by 3.531, 4.814 and 0.022, respectively. Further, it has been shown that the inclusion of structurally similar templates with ample conformational diversity is crucial for the modeling algorithm to maximally as well as reliably span the target sequence and construct its near-native model. The optimal model sampling also holds the key to predict the best possible target structure.
Collapse
Affiliation(s)
- Ashish Runthala
- 1 Department of Biological Sciences, Birla Institute of Technology and Science, Pilani-333031, India
| | - Shibasish Chowdhury
- 1 Department of Biological Sciences, Birla Institute of Technology and Science, Pilani-333031, India
| |
Collapse
|
105
|
Juan SH, Chen TR, Lo WC. A simple strategy to enhance the speed of protein secondary structure prediction without sacrificing accuracy. PLoS One 2020; 15:e0235153. [PMID: 32603341 PMCID: PMC7326220 DOI: 10.1371/journal.pone.0235153] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Accepted: 06/09/2020] [Indexed: 01/06/2023] Open
Abstract
The secondary structure prediction of proteins is a classic topic of computational structural biology with a variety of applications. During the past decade, the accuracy of prediction achieved by state-of-the-art algorithms has been >80%; meanwhile, the time cost of prediction increased rapidly because of the exponential growth of fundamental protein sequence data. Based on literature studies and preliminary observations on the relationships between the size/homology of the fundamental protein dataset and the speed/accuracy of predictions, we raised two hypotheses that might be helpful to determine the main influence factors of the efficiency of secondary structure prediction. Experimental results of size and homology reductions of the fundamental protein dataset supported those hypotheses. They revealed that shrinking the size of the dataset could substantially cut down the time cost of prediction with a slight decrease of accuracy, which could be increased on the contrary by homology reduction of the dataset. Moreover, the Shannon information entropy could be applied to explain how accuracy was influenced by the size and homology of the dataset. Based on these findings, we proposed that a proper combination of size and homology reductions of the protein dataset could speed up the secondary structure prediction while preserving the high accuracy of state-of-the-art algorithms. Testing the proposed strategy with the fundamental protein dataset of the year 2018 provided by the Universal Protein Resource, the speed of prediction was enhanced over 20 folds while all accuracy measures remained equivalently high. These findings are supposed helpful for improving the efficiency of researches and applications depending on the secondary structure prediction of proteins. To make future implementations of the proposed strategy easy, we have established a database of size and homology reduced protein datasets at http://10.life.nctu.edu.tw/UniRefNR.
Collapse
Affiliation(s)
- Sheng-Hung Juan
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Teng-Ruei Chen
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu, Taiwan
- The Center for Bioinformatics Research, National Chiao Tung University, Hsinchu, Taiwan
| |
Collapse
|
106
|
Bhattacharya D. refineD: improved protein structure refinement using machine learning based restrained relaxation. Bioinformatics 2020; 35:3320-3328. [PMID: 30759180 DOI: 10.1093/bioinformatics/btz101] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 01/22/2019] [Accepted: 02/11/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Protein structure refinement aims to bring moderately accurate template-based protein models closer to the native state through conformational sampling. However, guiding the sampling towards the native state by effectively using restraints remains a major issue in structure refinement. RESULTS Here, we develop a machine learning based restrained relaxation protocol that uses deep discriminative learning based binary classifiers to predict multi-resolution probabilistic restraints from the starting structure and subsequently converts these restraints to be integrated into Rosetta all-atom energy function as additional scoring terms during structure refinement. We use four restraint resolutions as adopted in GDT-HA (0.5, 1, 2 and 4 Å), centered on the Cα atom of each residue that are predicted by ensemble of four deep discriminative classifiers trained using combinations of sequence and structure-derived features as well as several energy terms from Rosetta centroid scoring function. The proposed method, refineD, has been found to produce consistent and substantial structural refinement through the use of cumulative and non-cumulative restraints on 150 benchmarking targets. refineD outperforms unrestrained relaxation strategy or relaxation that is restrained to starting structures using the FastRelax application of Rosetta or atomic-level energy minimization based ModRefiner method as well as molecular dynamics (MD) simulation based FG-MD protocol. Furthermore, by adjusting restraint resolutions, the method addresses the tradeoff that exists between degree and consistency of refinement. These results demonstrate a promising new avenue for improving accuracy of template-based protein models by effectively guiding conformational sampling during structure refinement through the use of machine learning based restraints. AVAILABILITY AND IMPLEMENTATION http://watson.cse.eng.auburn.edu/refineD/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| |
Collapse
|
107
|
Tessaro F, Scapozza L. How 'Protein-Docking' Translates into the New Emerging Field of Docking Small Molecules to Nucleic Acids? Molecules 2020; 25:E2749. [PMID: 32545835 PMCID: PMC7355999 DOI: 10.3390/molecules25122749] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2020] [Revised: 06/05/2020] [Accepted: 06/11/2020] [Indexed: 11/16/2022] Open
Abstract
In this review, we retraced the '40-year evolution' of molecular docking algorithms. Over the course of the years, their development allowed to progress from the so-called 'rigid-docking' searching methods to the more sophisticated 'semi-flexible' and 'flexible docking' algorithms. Together with the advancement of computing architecture and power, molecular docking's applications also exponentially increased, from a single-ligand binding calculation to large screening and polypharmacology profiles. Recently targeting nucleic acids with small molecules has emerged as a valuable therapeutic strategy especially for cancer treatment, along with bacterial and viral infections. For example, therapeutic intervention at the mRNA level allows to overcome the problematic of undruggable proteins without modifying the genome. Despite the promising therapeutic potential of nucleic acids, molecular docking programs have been optimized mostly for proteins. Here, we have analyzed literature data on nucleic acid to benchmark some of the widely used docking programs. Finally, the comparison between proteins and nucleic acid targets docking highlighted similarity and differences, which are intrinsically related to their chemical and structural nature.
Collapse
Affiliation(s)
- Francesca Tessaro
- Pharmaceutical Biochemistry, School of Pharmaceutical Sciences, University of Geneva CMU, Rue Michel-Servet 1, 1211 Geneva 4, Switzerland;
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, 1211 Geneva, Switzerland
| | - Leonardo Scapozza
- Pharmaceutical Biochemistry, School of Pharmaceutical Sciences, University of Geneva CMU, Rue Michel-Servet 1, 1211 Geneva 4, Switzerland;
- Institute of Pharmaceutical Sciences of Western Switzerland, University of Geneva, 1211 Geneva, Switzerland
| |
Collapse
|
108
|
Hattori LT, Pinheiro BA, Frigori RB, Benítez CMV, Lopes HS. PathMolD-AB: Spatiotemporal pathways of protein folding using parallel molecular dynamics with a coarse-grained model. Comput Biol Chem 2020; 87:107301. [PMID: 32554177 DOI: 10.1016/j.compbiolchem.2020.107301] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 05/25/2020] [Accepted: 05/28/2020] [Indexed: 10/24/2022]
Abstract
Solving the protein folding problem (PFP) is one of the grand challenges still open in computational biophysics. Globular proteins are believed to evolve from initial configurations through folding pathways connecting several thermodynamically accessible states in a free energy landscape until reaching its minimum, inhabited by the stable native structures. Despite its huge computational burden, molecular dynamics (MD) is the leading approach in the PFP studies by preserving the Newtonian temporal evolution in the canonical ensemble. Non-trivial improvements are provided by highly parallel implementations of MD in cost-effective GPUs, concomitant to multiscale descriptions of proteins by coarse-grained minimalist models. In this vein, we present the PathMolD-AB framework, a comprehensive software package for massively parallel MD simulations using the canonical ensemble, structural analysis, and visualization of the folding pathways using the minimalist AB-model. It has, also, a tool to compare the results with proteins re-scaled from the PDB. We simulate and analyze, as case studies, the folding of four proteins: 13FIBO, 2GB1, 1PLC and 5ANZ, with 13, 55, 99 and 223 amino acids, respectively. The datasets generated from simulations correspond to the MD evolution of 3500 folding pathways, encompassing 35×106 states, which contains the spatial amino acid positions, the protein free energies and radii of gyration at each time step. Results indicate that the speedup of our approach grows logarithmically with the protein length and, therefore, it is suited for most of the proteins in the PDB. The predicted structures simulated by PathMolD-AB were similar to the re-scaled biological structures, indicating that it is promising for the study of the PFP study.
Collapse
Affiliation(s)
- Leandro Takeshi Hattori
- Bioinformatics and Computational Intelligence Laboratory (LABIC), Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba, PR, Brazil.
| | - Bruna Araujo Pinheiro
- Bioinformatics and Computational Intelligence Laboratory (LABIC), Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba, PR, Brazil.
| | - Rafael Bertolini Frigori
- Bioinformatics and Computational Intelligence Laboratory (LABIC), Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba, PR, Brazil.
| | - César Manuel Vargas Benítez
- Bioinformatics and Computational Intelligence Laboratory (LABIC), Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba, PR, Brazil
| | - Heitor Silvério Lopes
- Bioinformatics and Computational Intelligence Laboratory (LABIC), Federal University of Technology Paraná (UTFPR), Av. 7 de Setembro, 3165, 80230-901 Curitiba, PR, Brazil.
| |
Collapse
|
109
|
Hou J, Adhikari B, Tanner JJ, Cheng J. SAXSDom: Modeling multidomain protein structures using small-angle X-ray scattering data. Proteins 2020; 88:775-787. [PMID: 31860156 PMCID: PMC7230021 DOI: 10.1002/prot.25865] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/18/2019] [Accepted: 12/14/2019] [Indexed: 12/27/2022]
Abstract
Many proteins are composed of several domains that pack together into a complex tertiary structure. Multidomain proteins can be challenging for protein structure modeling, particularly those for which templates can be found for individual domains but not for the entire sequence. In such cases, homology modeling can generate high quality models of the domains but not for the orientations between domains. Small-angle X-ray scattering (SAXS) reports the structural properties of entire proteins and has the potential for guiding homology modeling of multidomain proteins. In this article, we describe a novel multidomain protein assembly modeling method, SAXSDom that integrates experimental knowledge from SAXS with probabilistic Input-Output Hidden Markov model to assemble the structures of individual domains together. Four SAXS-based scoring functions were developed and tested, and the method was evaluated on multidomain proteins from two public datasets. Incorporation of SAXS information improved the accuracy of domain assembly for 40 out of 46 critical assessment of protein structure prediction multidomain protein targets and 45 out of 73 multidomain protein targets from the ab initio domain assembly dataset. The results demonstrate that SAXS data can provide useful information to improve the accuracy of domain-domain assembly. The source code and tool packages are available at https://github.com/jianlin-cheng/SAXSDom.
Collapse
Affiliation(s)
- Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, MO, 63103, USA
| | - Badri Adhikari
- Department of Computer Science, University of Missouri-St. Louis, Saint Louis, MO 63121, USA
| | - John J. Tanner
- Departments of Biochemistry and Chemistry, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
110
|
Zhou XG, Peng CX, Liu J, Zhang Y, Zhang GJ. Underestimation-Assisted Global-Local Cooperative Differential Evolution and the Application to Protein Structure Prediction. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION : A PUBLICATION OF THE IEEE NEURAL NETWORKS COUNCIL 2020; 24:536-550. [PMID: 33603321 PMCID: PMC7885903 DOI: 10.1109/tevc.2019.2938531] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Various mutation strategies show distinct advantages in differential evolution (DE). The cooperation of multiple strategies in the evolutionary process may be effective. This paper presents an underestimation-assisted global and local cooperative DE to simultaneously enhance the effectiveness and efficiency. In the proposed algorithm, two phases, namely, the global exploration and the local exploitation, are performed in each generation. In the global phase, a set of trial vectors is produced for each target individual by employing multiple strategies with strong exploration capability. Afterward, an adaptive underestimation model with a self-adapted slope control parameter is proposed to evaluate these trial vectors, the best of which is selected as the candidate. In the local phase, the better-based strategies guided by individuals that are better than the target individual are designed. For each individual accepted in the global phase, multiple trial vectors are generated by using these strategies and filtered by the underestimation value. The cooperation between the global and local phases includes two aspects. First, both of them concentrate on generating better individuals for the next generation. Second, the global phase aims to locate promising regions quickly while the local phase serves as a local search for enhancing convergence. Moreover, a simple mechanism is designed to determine the parameter of DE adaptively in the searching process. Finally, the proposed approach is applied to predict the protein 3D structure. Experimental studies on classical benchmark functions, CEC test sets, and protein structure prediction problem show that the proposed approach is superior to the competitors.
Collapse
Affiliation(s)
- Xiao-Gen Zhou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China, and also with the Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chun-Xiang Peng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA, and also with the Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
111
|
Dokholyan NV. Experimentally-driven protein structure modeling. J Proteomics 2020; 220:103777. [PMID: 32268219 PMCID: PMC7214187 DOI: 10.1016/j.jprot.2020.103777] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/17/2020] [Accepted: 04/02/2020] [Indexed: 11/25/2022]
Abstract
Revolutions in natural and exact sciences started at the dawn of last century have led to the explosion of theoretical, experimental, and computational approaches to determine structures of molecules, complexes, as well as their rich conformational dynamics. Since different experimental methods produce information that is attributed to specific time and length scales, corresponding computational methods have to be tailored to these scales and experiments. These methods can be then combined and integrated in scales, hence producing a fuller picture of molecular structure and motion from the "puzzle pieces" offered by various experiments. Here, we describe a number of computational approaches to utilize experimental data to glance into structure of proteins and understand their dynamics. We will also discuss the limitations and the resolution of the constraints-based modeling approaches. SIGNIFICANCE: Experimentally-driven computational structure modeling and determination is a rapidly evolving alternative to traditional approaches for molecular structure determination. These new hybrid experimental-computational approaches are proving to be a powerful microscope to glance into the structural features of intrinsically or partially disordered proteins, dynamics of molecules and complexes. In this review, we describe various approaches in the field of experimentally-driven computational structure modeling.
Collapse
Affiliation(s)
- Nikolay V Dokholyan
- Department of Pharmacology, Penn State University College of Medicine, Hershey, PA 17033, USA; Department of Biochemistry & Molecular Biology, Penn State College of Medicine, Hershey, PA 17033, USA.; Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA.; Department of Biomedical Engineering, Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
112
|
Liu XR, Zhang MM, Gross ML. Mass Spectrometry-Based Protein Footprinting for Higher-Order Structure Analysis: Fundamentals and Applications. Chem Rev 2020; 120:4355-4454. [PMID: 32319757 PMCID: PMC7531764 DOI: 10.1021/acs.chemrev.9b00815] [Citation(s) in RCA: 130] [Impact Index Per Article: 32.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Proteins adopt different higher-order structures (HOS) to enable their unique biological functions. Understanding the complexities of protein higher-order structures and dynamics requires integrated approaches, where mass spectrometry (MS) is now positioned to play a key role. One of those approaches is protein footprinting. Although the initial demonstration of footprinting was for the HOS determination of protein/nucleic acid binding, the concept was later adapted to MS-based protein HOS analysis, through which different covalent labeling approaches "mark" the solvent accessible surface area (SASA) of proteins to reflect protein HOS. Hydrogen-deuterium exchange (HDX), where deuterium in D2O replaces hydrogen of the backbone amides, is the most common example of footprinting. Its advantage is that the footprint reflects SASA and hydrogen bonding, whereas one drawback is the labeling is reversible. Another example of footprinting is slow irreversible labeling of functional groups on amino acid side chains by targeted reagents with high specificity, probing structural changes at selected sites. A third footprinting approach is by reactions with fast, irreversible labeling species that are highly reactive and footprint broadly several amino acid residue side chains on the time scale of submilliseconds. All of these covalent labeling approaches combine to constitute a problem-solving toolbox that enables mass spectrometry as a valuable tool for HOS elucidation. As there has been a growing need for MS-based protein footprinting in both academia and industry owing to its high throughput capability, prompt availability, and high spatial resolution, we present a summary of the history, descriptions, principles, mechanisms, and applications of these covalent labeling approaches. Moreover, their applications are highlighted according to the biological questions they can answer. This review is intended as a tutorial for MS-based protein HOS elucidation and as a reference for investigators seeking a MS-based tool to address structural questions in protein science.
Collapse
Affiliation(s)
| | | | - Michael L. Gross
- Department of Chemistry, Washington University in St. Louis, St. Louis, MO, USA, 63130
| |
Collapse
|
113
|
Karami Y, Rey J, Postic G, Murail S, Tufféry P, de Vries SJ. DaReUS-Loop: a web server to model multiple loops in homology models. Nucleic Acids Res 2020; 47:W423-W428. [PMID: 31114872 PMCID: PMC6602439 DOI: 10.1093/nar/gkz403] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 04/20/2019] [Accepted: 05/06/2019] [Indexed: 02/07/2023] Open
Abstract
Loop regions in protein structures often have crucial roles, and they are much more variable in sequence and structure than other regions. In homology modeling, this leads to larger deviations from the homologous templates, and loop modeling of homology models remains an open problem. To address this issue, we have previously developed the DaReUS-Loop protocol, leading to significant improvement over existing methods. Here, a DaReUS-Loop web server is presented, providing an automated platform for modeling or remodeling loops in the context of homology models. This is the first web server accepting a protein with up to 20 loop regions, and modeling them all in parallel. It also provides a prediction confidence level that corresponds to the expected accuracy of the loops. DaReUS-Loop facilitates the analysis of the results through its interactive graphical interface and is freely available at http://bioserv.rpbs.univ-paris-diderot.fr/services/DaReUS-Loop/.
Collapse
Affiliation(s)
- Yasaman Karami
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Julien Rey
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Guillaume Postic
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France.,Institut Français de Bioinformatique (IFB), UMS 3601-CNRS, Université Paris-Saclay, Orsay, France
| | - Samuel Murail
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France
| | - Pierre Tufféry
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Sjoerd J de Vries
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| |
Collapse
|
114
|
Olechnovič K, Venclovas Č. VoroMQA web server for assessing three-dimensional structures of proteins and protein complexes. Nucleic Acids Res 2020; 47:W437-W442. [PMID: 31073605 PMCID: PMC6602437 DOI: 10.1093/nar/gkz367] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 04/19/2019] [Accepted: 05/05/2019] [Indexed: 01/12/2023] Open
Abstract
The VoroMQA (Voronoi tessellation-based Model Quality Assessment) web server is dedicated to the estimation of protein structure quality, a common step in selecting realistic and most accurate computational models and in validating experimental structures. As an input, the VoroMQA web server accepts one or more protein structures in PDB format. Input structures may be either monomeric proteins or multimeric protein complexes. For every input structure, the server provides both global and local (per-residue) scores. Visualization of the local scores along the protein chain is enhanced by providing secondary structure assignment and information on solvent accessibility. A unique feature of the VoroMQA server is the ability to directly assess protein-protein interaction interfaces. If this type of assessment is requested, the web server provides interface quality scores, interface energy estimates, and local scores for residues involved in inter-chain interfaces. VoroMQA, the underlying method of the web server, was extensively tested in recent community-wide CASP and CAPRI experiments. During these experiments VoroMQA showed outstanding performance both in model selection and in estimation of accuracy of local structural regions. The VoroMQA web server is available at http://bioinformatics.ibt.lt/wtsam/voromqa.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio av. 7, Vilnius LT-10257, Lithuania
| |
Collapse
|
115
|
Banach M, Fabian P, Stapor K, Konieczny L, Roterman I. Structure of the Hydrophobic Core Determines the 3D Protein Structure-Verification by Single Mutation Proteins. Biomolecules 2020; 10:E767. [PMID: 32423068 PMCID: PMC7281683 DOI: 10.3390/biom10050767] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 05/08/2020] [Accepted: 05/12/2020] [Indexed: 02/06/2023] Open
Abstract
Four de novo proteins differing in single mutation positions, with a chain length of 56 amino acids, represent diverse 3D structures: monomeric 3α and 4β + α folds. The reason for this diversity is seen in the different structure of the hydrophobic core as a result of synergy leading to the generation of a system in which the polypeptide chain as a whole participates. On the basis of the fuzzy oil drop model, where the structure of the hydrophobic core is expressed by means of the hydrophobic distribution function in the form of a 3D Gaussian distribution, it has been shown that the composition of the hydrophobic core in these two structural forms is different. In addition, the use of a model to determine the structure of the early intermediate in the folding process allows to indicate differences in the polypeptide chain geometry, which, combined with the construction of a common hydrophobic nucleus as an effect of specific synergy, may indicate the reason for the diversity of the folding process of the polypeptide chain. The results indicate the need to take into account the presence of an external force field originating from the water environment and that its active impact on the formation of a hydrophobic core whose participation in the stabilization of the tertiary structure is fundamental.
Collapse
Affiliation(s)
- Mateusz Banach
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, Lazarza 16, 31-533 Krakow, Poland;
| | - Piotr Fabian
- Institute of Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland; (P.F.); (K.S.)
| | - Katarzyna Stapor
- Institute of Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland; (P.F.); (K.S.)
| | - Leszek Konieczny
- Chair of Medical Biochemistry, Medical College, Jagiellonian University, Kopernika 7, 31-034 Krakow, Poland;
| | - Irena Roterman
- Department of Bioinformatics and Telemedicine, Medical College, Jagiellonian University, Lazarza 16, 31-533 Krakow, Poland;
| |
Collapse
|
116
|
Getting to Know Your Neighbor: Protein Structure Prediction Comes of Age with Contextual Machine Learning. J Comput Biol 2020; 27:796-814. [DOI: 10.1089/cmb.2019.0193] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
|
117
|
Abbass J, Nebel JC. Enhancing fragment-based protein structure prediction by customising fragment cardinality according to local secondary structure. BMC Bioinformatics 2020; 21:170. [PMID: 32357827 PMCID: PMC7195757 DOI: 10.1186/s12859-020-3491-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 04/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Whenever suitable template structures are not available, usage of fragment-based protein structure prediction becomes the only practical alternative as pure ab initio techniques require massive computational resources even for very small proteins. However, inaccuracy of their energy functions and their stochastic nature imposes generation of a large number of decoys to explore adequately the solution space, limiting their usage to small proteins. Taking advantage of the uneven complexity of the sequence-structure relationship of short fragments, we adjusted the fragment insertion process by customising the number of available fragment templates according to the expected complexity of the predicted local secondary structure. Whereas the number of fragments is kept to its default value for coil regions, important and dramatic reductions are proposed for beta sheet and alpha helical regions, respectively. RESULTS The evaluation of our fragment selection approach was conducted using an enhanced version of the popular Rosetta fragment-based protein structure prediction tool. It was modified so that the number of fragment candidates used in Rosetta could be adjusted based on the local secondary structure. Compared to Rosetta's standard predictions, our strategy delivered improved first models, + 24% and + 6% in terms of GDT, when using 2000 and 20,000 decoys, respectively, while reducing significantly the number of fragment candidates. Furthermore, our enhanced version of Rosetta is able to deliver with 2000 decoys a performance equivalent to that produced by standard Rosetta while using 20,000 decoys. We hypothesise that, as the fragment insertion process focuses on the most challenging regions, such as coils, fewer decoys are needed to explore satisfactorily conformation spaces. CONCLUSIONS Taking advantage of the high accuracy of sequence-based secondary structure predictions, we showed the value of that information to customise the number of candidates used during the fragment insertion process of fragment-based protein structure prediction. Experimentations conducted using standard Rosetta showed that, when using the recommended number of decoys, i.e. 20,000, our strategy produces better results. Alternatively, similar results can be achieved using only 2000 decoys. Consequently, we recommend the adoption of this strategy to either improve significantly model quality or reduce processing times by a factor 10.
Collapse
Affiliation(s)
- Jad Abbass
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE UK
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE UK
| |
Collapse
|
118
|
Orengo C, Velankar S, Wodak S, Zoete V, Bonvin AMJJ, Elofsson A, Feenstra KA, Gerloff DL, Hamelryck T, Hancock JM, Helmer-Citterich M, Hospital A, Orozco M, Perrakis A, Rarey M, Soares C, Sussman JL, Thornton JM, Tuffery P, Tusnady G, Wierenga R, Salminen T, Schneider B. A community proposal to integrate structural bioinformatics activities in ELIXIR (3D-Bioinfo Community). F1000Res 2020; 9. [PMID: 32566135 PMCID: PMC7284151 DOI: 10.12688/f1000research.20559.1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/05/2020] [Indexed: 12/11/2022] Open
Abstract
Structural bioinformatics provides the scientific methods and tools to analyse, archive, validate, and present the biomolecular structure data generated by the structural biology community. It also provides an important link with the genomics community, as structural bioinformaticians also use the extensive sequence data to predict protein structures and their functional sites. A very broad and active community of structural bioinformaticians exists across Europe, and 3D-Bioinfo will establish formal platforms to address their needs and better integrate their activities and initiatives. Our mission will be to strengthen the ties with the structural biology research communities in Europe covering life sciences, as well as chemistry and physics and to bridge the gap between these researchers in order to fully realize the potential of structural bioinformatics. Our Community will also undertake dedicated educational, training and outreach efforts to facilitate this, bringing new insights and thus facilitating the development of much needed innovative applications e.g. for human health, drug and protein design. Our combined efforts will be of critical importance to keep the European research efforts competitive in this respect. Here we highlight the major European contributions to the field of structural bioinformatics, the most pressing challenges remaining and how Europe-wide interactions, enabled by ELIXIR and its platforms, will help in addressing these challenges and in coordinating structural bioinformatics resources across Europe. In particular, we present recent activities and future plans to consolidate an ELIXIR 3D-Bioinfo Community in structural bioinformatics and propose means to develop better links across the community. These include building new consortia, organising workshops to establish data standards and seeking community agreement on benchmark data sets and strategies. We also highlight existing and planned collaborations with other ELIXIR Communities and other European infrastructures, such as the structural biology community supported by Instruct-ERIC, with whom we have synergies and overlapping common interests.
Collapse
Affiliation(s)
- Christine Orengo
- Structural and Molecular Biology Department, University College, London, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Shoshana Wodak
- VIB-VUB Center for Structural Biology, Brussels, Belgium
| | - Vincent Zoete
- Department of Oncology, Lausanne University, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alexandre M J J Bonvin
- Bijvoet Center, Faculty of Science - Chemistry, Utrecht University, Utrecht, 3584CH, The Netherlands
| | - Arne Elofsson
- Science for Life Laboratory, Stockholm University, Solna, S-17121, Sweden
| | - K Anton Feenstra
- Dept. Computer Science, Center for Integrative Bioinformatics VU (IBIVU), Vrije Universiteit, Amsterdam, 1081 HV, The Netherlands
| | - Dietland L Gerloff
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Thomas Hamelryck
- Bioinformatics center, Department of Biology, University of Copenhagen, Copenhagen, DK-2200, Denmark
| | | | | | - Adam Hospital
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, 08028, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, 08028, Spain
| | | | - Matthias Rarey
- ZBH - Center for Bioinformatics, Universität Hamburg, Hamburg, D-20146, Germany
| | - Claudio Soares
- Instituto de Tecnologia Química e Biológica Antonio Xavier, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Joel L Sussman
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Pierre Tuffery
- Ressource Parisienne en Bioinformatique Structurale, Université de Paris, Paris, F-75205, France
| | - Gabor Tusnady
- Membrane Bioinformatics Research Group, Institute of Enzymology, Budapest, H-1117, Hungary
| | | | - Tiina Salminen
- Structural Bioinformatics Laboratory, Åbo Akademi University, Turku, FI-20500, Finland
| | - Bohdan Schneider
- Institute of Biotechnology of the Czech Academy of Sciences, Vestec, CZ-25250, Czech Republic
| |
Collapse
|
119
|
Paul L, Mudogo CN, Mtei KM, Machunda RL, Ntie-Kang F. A computer-based approach for developing linamarase inhibitory agents. PHYSICAL SCIENCES REVIEWS 2020. [DOI: 10.1515/psr-2019-0098] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractCassava is a strategic crop, especially for developing countries. However, the presence of cyanogenic compounds in cassava products limits the proper nutrients utilization. Due to the poor availability of structure discovery and elucidation in the Protein Data Bank is limiting the full understanding of the enzyme, how to inhibit it and applications in different fields. There is a need to solve the three-dimensional structure (3-D) of linamarase from cassava. The structural elucidation will allow the development of a competitive inhibitor and various industrial applications of the enzyme. The goal of this review is to summarize and present the available 3-D modeling structure of linamarase enzyme using different computational strategies. This approach could help in determining the structure of linamarase and later guide the structure elucidationin silicoand experimentally.
Collapse
Affiliation(s)
- Lucas Paul
- The Department of Materials and Energy Science & Engineering, The Nelson Mandela African Institution of Science and Technology, P.O. Box 447Arusha, Tanzania
- Department of Chemistry, Dar es Salaam University College of Education, P.O. Box 2329, 255Dar es Salaam, Tanzania
| | - Celestin N. Mudogo
- Biochemistry and Molecularbiology, University of Hamburg Institute of Biochemistry and Molecularbiology, Hamburg, Germany
- Department of Basic Sciences, School of Medicine, University of Kinshasa, Kinshasa, Congo (Democratic Republic of the)
| | - Kelvin M. Mtei
- The Department of Water and Environmental Science and Engineering, The Nelson Mandela African Institution of Science and Technology, P.O. Box 447Arusha, Tanzania
| | - Revocatus L. Machunda
- The Department of Water and Environmental Science and Engineering, The Nelson Mandela African Institution of Science and Technology, P.O. Box 447Arusha, Tanzania
| | - Fidele Ntie-Kang
- Department of Pharmaceutical Chemistry, Martin-Luther University Halle-Wittenberg, Wolfgang-Langenbeck Str. 4, Halle (Saale)06120, Germany
- Department of Informatics and Chemistry, University of Chemistry and Technology Prague, Technická 5, Prague 6, Dejvice 166 28, Czech Republic
- Department of Chemistry, University of Buea, P. O. Box 63Buea, Cameroon
| |
Collapse
|
120
|
Protein Contact Map Prediction Based on ResNet and DenseNet. BIOMED RESEARCH INTERNATIONAL 2020; 2020:7584968. [PMID: 32337273 PMCID: PMC7165324 DOI: 10.1155/2020/7584968] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 03/05/2020] [Indexed: 11/18/2022]
Abstract
Residue-residue contact prediction has become an increasingly important tool for modeling the three-dimensional structure of a protein when no homologous structure is available. Ultradeep residual neural network (ResNet) has become the most popular method for making contact predictions because it captures the contextual information between residues. In this paper, we propose a novel deep neural network framework for contact prediction which combines ResNet and DenseNet. This framework uses 1D ResNet to process sequential features, and besides PSSM, SS3, and solvent accessibility, we have introduced a new feature, position-specific frequency matrix (PSFM), as an input. Using ResNet's residual module and identity mapping, it can effectively process sequential features after which the outer concatenation function is used for sequential and pairwise features. Prediction accuracy is improved following a final processing step using the dense connection of DenseNet. The prediction accuracy of the protein contact map shows that our method is more effective than other popular methods due to the new network architecture and the added feature input.
Collapse
|
121
|
Tang H, Xu G, Zheng Q, Cheng Y, Zheng H, Li J, Yin Z, Liang F, Chen J. Treatment for acute flares of gout: A protocol for systematic review. Medicine (Baltimore) 2020; 99:e19668. [PMID: 32243400 PMCID: PMC7440275 DOI: 10.1097/md.0000000000019668] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/23/2020] [Accepted: 02/26/2020] [Indexed: 11/26/2022] Open
Abstract
INTRODUCTION The current evidence confirms the effectiveness and safety of several drug interventions in the treatment of acute flares of gout, however, the most preferred drugs are still unclear. We, therefore, seek to conduct a network meta-analysis that can systematically compare non-steroidal anti-inflammatory drugs (NSAIDs), COXIBs, colchicine, hormones, or IL-1 receptor antagonists, etc. for acute gout based on the latest evidence. METHODS AND ANALYSIS Nine online databases are searched with inception to September 1, 2019; there will be no language restrictions on the included trials. Randomized controlled trials that include patients with acute flares of gout receiving drug therapy versus a control group will be included. The selection of studies, risk of bias assessment and data extraction will be conducted by 2 independent researchers. Bayesian network meta-analysis is applied using the Markov chain Monte Carlo method with Stata or R. The dichotomous data will be presented as risk ratios with 95% CIs and the continuous data will be presented as weighted mean differences or standardized mean differences with 95% CIs. Evidence quality will be evaluated using the GRADE system. ETHICS AND DISSEMINATION This network meta-analysis will not involve private information from personal or imperil their rights, so, ethical approval is not required. The results of this network meta-analysis may be published in a journal or publicized in concerned conferences.
Collapse
Affiliation(s)
- Hongzhi Tang
- Outpatient department of Sichuan orthopedic hospital
| | - Guixing Xu
- The Acupuncture and Tuina School, The 3rd Teaching Hospital, Chengdu University of Traditional Chinese Medicine
| | - Qianhua Zheng
- The First Affiliated Hospital of Chengdu University of Chinese Medicine, Chengdu, Sichuan, China
| | - Ying Cheng
- The Acupuncture and Tuina School, The 3rd Teaching Hospital, Chengdu University of Traditional Chinese Medicine
| | - Hui Zheng
- The Acupuncture and Tuina School, The 3rd Teaching Hospital, Chengdu University of Traditional Chinese Medicine
| | - Juan Li
- The Acupuncture and Tuina School, The 3rd Teaching Hospital, Chengdu University of Traditional Chinese Medicine
| | - Zihan Yin
- The Acupuncture and Tuina School, The 3rd Teaching Hospital, Chengdu University of Traditional Chinese Medicine
| | - Fanrong Liang
- The Acupuncture and Tuina School, The 3rd Teaching Hospital, Chengdu University of Traditional Chinese Medicine
| | - Jiao Chen
- The Acupuncture and Tuina School, The 3rd Teaching Hospital, Chengdu University of Traditional Chinese Medicine
| |
Collapse
|
122
|
Abstract
The purpose of this quick guide is to help new modelers who have little or no background in comparative modeling yet are keen to produce high-resolution protein 3D structures for their study by following systematic good modeling practices, using affordable personal computers or online computational resources. Through the available experimental 3D-structure repositories, the modeler should be able to access and use the atomic coordinates for building homology models. We also aim to provide the modeler with a rationale behind making a simple list of atomic coordinates suitable for computational analysis abiding to principles of physics (e.g., molecular mechanics). Keeping that objective in mind, these quick tips cover the process of homology modeling and some postmodeling computations such as molecular docking and molecular dynamics (MD). A brief section was left for modeling nonprotein molecules, and a short case study of homology modeling is discussed.
Collapse
Affiliation(s)
- Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Brno, Czech Republic
| |
Collapse
|
123
|
Koukos P, Bonvin A. Integrative Modelling of Biomolecular Complexes. J Mol Biol 2020; 432:2861-2881. [DOI: 10.1016/j.jmb.2019.11.009] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2019] [Revised: 11/12/2019] [Accepted: 11/13/2019] [Indexed: 12/31/2022]
|
124
|
Lin X, Li X, Lin X. A Review on Applications of Computational Methods in Drug Screening and Design. Molecules 2020; 25:E1375. [PMID: 32197324 PMCID: PMC7144386 DOI: 10.3390/molecules25061375] [Citation(s) in RCA: 235] [Impact Index Per Article: 58.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Revised: 03/16/2020] [Accepted: 03/16/2020] [Indexed: 12/27/2022] Open
Abstract
Drug development is one of the most significant processes in the pharmaceutical industry. Various computational methods have dramatically reduced the time and cost of drug discovery. In this review, we firstly discussed roles of multiscale biomolecular simulations in identifying drug binding sites on the target macromolecule and elucidating drug action mechanisms. Then, virtual screening methods (e.g., molecular docking, pharmacophore modeling, and QSAR) as well as structure- and ligand-based classical/de novo drug design were introduced and discussed. Last, we explored the development of machine learning methods and their applications in aforementioned computational methods to speed up the drug discovery process. Also, several application examples of combining various methods was discussed. A combination of different methods to jointly solve the tough problem at different scales and dimensions will be an inevitable trend in drug screening and design.
Collapse
Affiliation(s)
- Xiaoqian Lin
- Institute of Single Cell Engineering, Beijing Advanced Innovation Center for Biomedical Engineering, Beihang University, Beijing 100191, China;
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| | - Xiu Li
- School of Chemistry and Material Science, Shanxi Normal University, Linfen 041004, China;
| | - Xubo Lin
- Institute of Single Cell Engineering, Beijing Advanced Innovation Center for Biomedical Engineering, Beihang University, Beijing 100191, China;
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100191, China
| |
Collapse
|
125
|
Smolarczyk T, Roterman-Konieczna I, Stapor K. Protein Secondary Structure Prediction: A Review of Progress and Directions. Curr Bioinform 2020. [DOI: 10.2174/1574893614666191017104639] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Over the last few decades, a search for the theory of protein folding has
grown into a full-fledged research field at the intersection of biology, chemistry and informatics.
Despite enormous effort, there are still open questions and challenges, like understanding the rules
by which amino acid sequence determines protein secondary structure.
Objective:
In this review, we depict the progress of the prediction methods over the years and
identify sources of improvement.
Methods:
The protein secondary structure prediction problem is described followed by the discussion
on theoretical limitations, description of the commonly used data sets, features and a review
of three generations of methods with the focus on the most recent advances. Additionally, methods
with available online servers are assessed on the independent data set.
Results:
The state-of-the-art methods are currently reaching almost 88% for 3-class prediction and
76.5% for an 8-class prediction.
Conclusion:
This review summarizes recent advances and outlines further research directions.
Collapse
Affiliation(s)
- Tomasz Smolarczyk
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| | - Irena Roterman-Konieczna
- Department of Bioinformatics and Telemedicine, Jagiellonian University Medical College, Krakow, Poland
| | - Katarzyna Stapor
- Institute of Informatics, Silesian University of Technology, Gliwice, Poland
| |
Collapse
|
126
|
Liu S, Xiang X, Gao X, Liu H. Neighborhood Preference of Amino Acids in Protein Structures and its Applications in Protein Structure Assessment. Sci Rep 2020; 10:4371. [PMID: 32152349 PMCID: PMC7062742 DOI: 10.1038/s41598-020-61205-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 02/24/2020] [Indexed: 12/02/2022] Open
Abstract
Amino acids form protein 3D structures in unique manners such that the folded structure is stable and functional under physiological conditions. Non-specific and non-covalent interactions between amino acids exhibit neighborhood preferences. Based on structural information from the protein data bank, a statistical energy function was derived to quantify amino acid neighborhood preferences. The neighborhood of one amino acid is defined by its contacting residues, and the energy function is determined by the neighboring residue types and relative positions. The neighborhood preference of amino acids was exploited to facilitate structural quality assessment, which was implemented in the neighborhood preference program NEPRE. The source codes are available via https://github.com/LiuLab-CSRC/NePre.
Collapse
Affiliation(s)
- Siyuan Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xilun Xiang
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xiang Gao
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Haiguang Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China.
- Physics Department, Beijing Normal University, Haidian, Beijing, 100875, China.
| |
Collapse
|
127
|
Eguchi RR, Huang PS. Multi-scale structural analysis of proteins by deep semantic segmentation. Bioinformatics 2020; 36:1740-1749. [PMID: 31424530 PMCID: PMC7075530 DOI: 10.1093/bioinformatics/btz650] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2018] [Revised: 07/29/2019] [Accepted: 08/18/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Recent advances in computational methods have facilitated large-scale sampling of protein structures, leading to breakthroughs in protein structural prediction and enabling de novo protein design. Establishing methods to identify candidate structures that can lead to native folds or designable structures remains a challenge, since few existing metrics capture high-level structural features such as architectures, folds and conformity to conserved structural motifs. Convolutional Neural Networks (CNNs) have been successfully used in semantic segmentation-a subfield of image classification in which a class label is predicted for every pixel. Here, we apply semantic segmentation to protein structures as a novel strategy for fold identification and structure quality assessment. RESULTS We train a CNN that assigns each residue in a multi-domain protein to one of 38 architecture classes designated by the CATH database. Our model achieves a high per-residue accuracy of 90.8% on the test set (95.0% average per-class accuracy; 87.8% average per-structure accuracy). We demonstrate that individual class probabilities can be used as a metric that indicates the degree to which a randomly generated structure assumes a specific fold, as well as a metric that highlights non-conformative regions of a protein belonging to a known class. These capabilities yield a powerful tool for guiding structural sampling for both structural prediction and design. AVAILABILITY AND IMPLEMENTATION The trained classifier network, parser network, and entropy calculation scripts are available for download at https://git.io/fp6bd, with detailed usage instructions provided at the download page. A step-by-step tutorial for setup is provided at https://goo.gl/e8GB2S. All Rosetta commands, RosettaRemodel blueprints, and predictions for all datasets used in the study are available in the Supplementary Information. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Raphael R Eguchi
- Department of Biochemistry, School of Medicine, Stanford University, Shriram Center for Bioengineering and Chemical Engineering, 443 via Ortega, Room 036, Stanford, CA 94305, USA
| | - Po-Ssu Huang
- Department of Bioengineering, Schools of Engineering and Medicine, Stanford University Shriram Center for Bioengineering and Chemical Engineering, 443 via Ortega, Room 036, Stanford, CA 94305, USA
| |
Collapse
|
128
|
Self-organized emergence of folded protein-like network structures from geometric constraints. PLoS One 2020; 15:e0229230. [PMID: 32106258 PMCID: PMC7046222 DOI: 10.1371/journal.pone.0229230] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 01/31/2020] [Indexed: 12/13/2022] Open
Abstract
The intricate three-dimensional geometries of protein tertiary structures underlie protein function and emerge through a folding process from one-dimensional chains of amino acids. The exact spatial sequence and configuration of amino acids, the biochemical environment and the temporal sequence of distinct interactions yield a complex folding process that cannot yet be easily tracked for all proteins. To gain qualitative insights into the fundamental mechanisms behind the folding dynamics and generic features of the folded structure, we propose a simple model of structure formation that takes into account only fundamental geometric constraints and otherwise assumes randomly paired connections. We find that despite its simplicity, the model results in a network ensemble consistent with key overall features of the ensemble of Protein Residue Networks we obtained from more than 1000 biological protein geometries as available through the Protein Data Base. Specifically, the distribution of the number of interaction neighbors a unit (amino acid) has, the scaling of the structure’s spatial extent with chain length, the eigenvalue spectrum and the scaling of the smallest relaxation time with chain length are all consistent between model and real proteins. These results indicate that geometric constraints alone may already account for a number of generic features of protein tertiary structures.
Collapse
|
129
|
Alapati R, Shuvo MH, Bhattacharya D. SPECS: Integration of side-chain orientation and global distance-based measures for improved evaluation of protein structural models. PLoS One 2020; 15:e0228245. [PMID: 32053611 PMCID: PMC7018003 DOI: 10.1371/journal.pone.0228245] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 01/11/2020] [Indexed: 12/23/2022] Open
Abstract
Significant advancements in the field of protein structure prediction have necessitated the need for objective and robust evaluation of protein structural models by comparing predicted models against the experimentally determined native structures to quantitate their structural similarities. Existing protein model versus native similarity metrics either consider the distances between alpha carbon (Cα) or side-chain atoms for computing the similarity. However, side-chain orientation of a protein plays a critical role in defining its conformation at the atomic-level. Despite its importance, inclusion of side-chain orientation in structural similarity evaluation has not yet been addressed. Here, we present SPECS, a side-chain-orientation-included protein model-native similarity metric for improved evaluation of protein structural models. SPECS combines side-chain orientation and global distance based measures in an integrated framework using the united-residue model of polypeptide conformation for computing model-native similarity. Experimental results demonstrate that SPECS is a reliable measure for evaluating structural similarity at the global level including and beyond the accuracy of Cα positioning. Moreover, SPECS delivers superior performance in capturing local quality aspect compared to popular global Cα positioning-based metrics ranging from models at near-experimental accuracies to models with correct overall folds-making it a robust measure suitable for both high- and moderate-resolution models. Finally, SPECS is sensitive to minute variations in side-chain χ angles even for models with perfect Cα trace, revealing the power of including side-chain orientation. Collectively, SPECS is a versatile evaluation metric covering a wide spectrum of protein modeling scenarios and simultaneously captures complementary aspects of structural similarities at multiple levels of granularities. SPECS is freely available at http://watson.cse.eng.auburn.edu/SPECS/.
Collapse
Affiliation(s)
- Rahul Alapati
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Md. Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
- Department of Biological Sciences, Auburn University, Auburn, Alabama, United States of America
| |
Collapse
|
130
|
The Order-Disorder Continuum: Linking Predictions of Protein Structure and Disorder through Molecular Simulation. Sci Rep 2020; 10:2068. [PMID: 32034199 PMCID: PMC7005769 DOI: 10.1038/s41598-020-58868-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2019] [Accepted: 10/16/2019] [Indexed: 12/11/2022] Open
Abstract
Intrinsically disordered proteins (IDPs) and intrinsically disordered regions within proteins (IDRs) serve an increasingly expansive list of biological functions, including regulation of transcription and translation, protein phosphorylation, cellular signal transduction, as well as mechanical roles. The strong link between protein function and disorder motivates a deeper fundamental characterization of IDPs and IDRs for discovering new functions and relevant mechanisms. We review recent advances in experimental techniques that have improved identification of disordered regions in proteins. Yet, experimentally curated disorder information still does not currently scale to the level of experimentally determined structural information in folded protein databases, and disorder predictors rely on several different binary definitions of disorder. To link secondary structure prediction algorithms developed for folded proteins and protein disorder predictors, we conduct molecular dynamics simulations on representative proteins from the Protein Data Bank, comparing secondary structure and disorder predictions with simulation results. We find that structure predictor performance from neural networks can be leveraged for the identification of highly dynamic regions within molecules, linked to disorder. Low accuracy structure predictions suggest a lack of static structure for regions that disorder predictors fail to identify. While disorder databases continue to expand, secondary structure predictors and molecular simulations can improve disorder predictor performance, which aids discovery of novel functions of IDPs and IDRs. These observations provide a platform for the development of new, integrated structural databases and fusion of prediction tools toward protein disorder characterization in health and disease.
Collapse
|
131
|
Yazhini A, Srinivasan N. How good are comparative models in the understanding of protein dynamics? Proteins 2020; 88:874-888. [PMID: 31999374 DOI: 10.1002/prot.25879] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Revised: 01/04/2020] [Accepted: 01/25/2020] [Indexed: 12/27/2022]
Abstract
The 3D structure of a protein is essential to understand protein dynamics. If experimentally determined structure is unavailable, comparative models could be used to infer dynamics. However, the effectiveness of comparative models, compared to experimental structures, in inferring dynamics is not clear. To address this, we compared dynamics features of ~800 comparative models with their crystal structures using normal mode analysis. Average similarity in magnitude, direction, and correlation of residue motions is >0.8 (where value 1 is identical) indicating that the dynamics of models and crystal structures are highly similar. Accuracy of 3D structure and dynamics is significantly higher for models built on multiple and/or high sequence identity templates (>40%). Three-dimensional (3D) structure and residue fluctuations of models are closer to that of crystal structures than to templates (TM score 0.9 vs 0.7 and square inner product 0.92 vs 0.88). Furthermore, long-range molecular dynamics simulations on comparative models of RNase 1 and Angiogenin showed significant differences in the conformational sampling of conserved active-site residues that characterize differences in their activity levels. Similar analyses on two EGFR kinase variant models highlight the effect of mutations on the functional state-specific αC helix motions and these results corroborate with the previous experimental observations. Thus, our study adds confidence to the use of comparative models in understanding protein dynamics.
Collapse
Affiliation(s)
- Arangasamy Yazhini
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka, India
| | | |
Collapse
|
132
|
Magnus M, Antczak M, Zok T, Wiedemann J, Lukasiak P, Cao Y, Bujnicki JM, Westhof E, Szachniuk M, Miao Z. RNA-Puzzles toolkit: a computational resource of RNA 3D structure benchmark datasets, structure manipulation, and evaluation tools. Nucleic Acids Res 2020; 48:576-588. [PMID: 31799609 PMCID: PMC7145511 DOI: 10.1093/nar/gkz1108] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 11/06/2019] [Accepted: 11/15/2019] [Indexed: 12/12/2022] Open
Abstract
Significant improvements have been made in the efficiency and accuracy of RNA 3D structure prediction methods during the succeeding challenges of RNA-Puzzles, a community-wide effort on the assessment of blind prediction of RNA tertiary structures. The RNA-Puzzles contest has shown, among others, that the development and validation of computational methods for RNA fold prediction strongly depend on the benchmark datasets and the structure comparison algorithms. Yet, there has been no systematic benchmark set or decoy structures available for the 3D structure prediction of RNA, hindering the standardization of comparative tests in the modeling of RNA structure. Furthermore, there has not been a unified set of tools that allows deep and complete RNA structure analysis, and at the same time, that is easy to use. Here, we present RNA-Puzzles toolkit, a computational resource including (i) decoy sets generated by different RNA 3D structure prediction methods (raw, for-evaluation and standardized datasets), (ii) 3D structure normalization, analysis, manipulation, visualization tools (RNA_format, RNA_normalizer, rna-tools) and (iii) 3D structure comparison metric tools (RNAQUA, MCQ4Structures). This resource provides a full list of computational tools as well as a standard RNA 3D structure prediction assessment protocol for the community.
Collapse
Affiliation(s)
- Marcin Magnus
- International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland
- ReMedy-International Research Agenda Unit, Centre of New Technologies, University of Warsaw, 02-097 Warsaw, Poland
| | - Maciej Antczak
- Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Tomasz Zok
- Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology, 60-965 Poznan, Poland
| | - Jakub Wiedemann
- Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Piotr Lukasiak
- Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Yang Cao
- Center of Growth, Metabolism and Aging, Key Laboratory of Bio-Resource and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, Chengdu 610065, PR China
| | - Janusz M Bujnicki
- International Institute of Molecular and Cell Biology in Warsaw, 02-109 Warsaw, Poland
- Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Poznan, Poland
| | - Eric Westhof
- Architecture et Réactivité de l’ARN, Université de Strasbourg, Institut de biologie moléculaire et cellulaire du CNRS, 12 allée Konrad Roentgen, 67084 Strasbourg, France
| | - Marta Szachniuk
- Institute of Computing Science & European Centre for Bioinformatics and Genomics, Poznan University of Technology, 60-965 Poznan, Poland
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, 61-704 Poznan, Poland
| | - Zhichao Miao
- Translational Research Institute of Brain and Brain-Like Intelligence and Department of Anesthesiology, Shanghai Fourth People's Hospital Affiliated to Tongji University School of Medicine, Shanghai 200081, China
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge CB10 1SD, UK
- Newcastle Fibrosis Research Group, Institute of Cellular Medicine, Faculty of Medical Sciences, Newcastle University, Newcastle upon Tyne, UK
| |
Collapse
|
133
|
Abstract
A class of secondary structure prediction algorithms use the information from the statistics of the residue pairs found in secondary structural elements. Because the protein folding process is dominated by backbone hydrogen bonding, an approach based on backbone hydrogen-bonded residue pairings would improve the predicting capabilities of these class algorithms. The reliability of the prediction algorithms depends on the quality of the statistics, therefore, of the data set. In this study, it was aimed to determine the propensities of the backbone hydrogen-bonded residue pairings for secondary structural elements of α-helix and β-sheet in globular proteins using a new and comprehensive data set created from the peptides deposited in Worldwide Protein Data Bank. A master data set including 4882 globular peptide chains with resolution better than 2.5 Å, sequence identity smaller than 25% and length of no shorter than 100 residues were created. Separate data sub sets also were created for helix and sheet structures from master set and each sub set includes 4594 and 4483 chains, respectively. Backbone hydrogen-bonded residue pairings in helices and sheets were detected and the propensities of them were represented as odds ratios (observed/[random or expected]) in matrices. Propensities assigned by this study to the residue pairings in secondary structural elements (as helix, overall strands, parallel strands and antiparallel strands) differ from the previous studies by 19 to 34%. These dissimilarities are important and they would cause further improvements in secondary structure prediction algorithms.
Collapse
Affiliation(s)
- Cevdet Nacar
- Department of Biophysics, School of Medicine, Marmara University, Istanbul, Turkey.
| |
Collapse
|
134
|
Lensink MF, Nadzirin N, Velankar S, Wodak SJ. Modeling protein‐protein, protein‐peptide, and protein‐oligosaccharide complexes: CAPRI 7th edition. Proteins 2020; 88:916-938. [DOI: 10.1002/prot.25870] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 12/19/2019] [Accepted: 12/26/2019] [Indexed: 12/19/2022]
Affiliation(s)
- Marc F. Lensink
- University of Lille, CNRS UMR8576 UGSF, Unité de Glycobiologie Structurale et Fonctionnelle F‐59000 Lille France
| | - Nurul Nadzirin
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome Campus Cambridge UK
| | - Sameer Velankar
- European Molecular Biology LaboratoryEuropean Bioinformatics Institute (EMBL‐EBI), Wellcome Trust Genome Campus Cambridge UK
| | | |
Collapse
|
135
|
Integrative Structural Biology of Protein-RNA Complexes. Structure 2020; 28:6-28. [DOI: 10.1016/j.str.2019.11.017] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 11/17/2019] [Accepted: 11/27/2019] [Indexed: 12/16/2022]
|
136
|
Abstract
There is a large gap between the numbers of known protein-protein interactions and the corresponding experimentally solved structures of protein complexes. Fortunately, this gap can be in part bridged by computational structure modeling methods. Currently, template-based modeling is the most accurate means to predict both individual protein structures and protein complexes. One of the major issues in template-based modeling is to identify homologous structures that could be utilized as templates. To simplify this task, we have developed the PPI3D web server. The server is not only able to search for homologous protein complexes, but also provides means to analyze identified interactions and to model protein complexes. In recent CASP and CAPRI experiments, PPI3D proved to be a useful tool for homology modeling of multimeric proteins. In this chapter, we provide a brief description of the PPI3D web server capabilities and how to use the server for modeling of protein complexes.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania.
| |
Collapse
|
137
|
Grazhdankin E, Stepniewski M, Xhaard H. Modeling membrane proteins: The importance of cysteine amino-acids. J Struct Biol 2020; 209:107400. [DOI: 10.1016/j.jsb.2019.10.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 09/11/2019] [Accepted: 10/03/2019] [Indexed: 12/14/2022]
|
138
|
Olechnovič K, Monastyrskyy B, Kryshtafovych A, Venclovas Č. Comparative analysis of methods for evaluation of protein models against native structures. Bioinformatics 2019; 35:937-944. [PMID: 30169622 DOI: 10.1093/bioinformatics/bty760] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2018] [Revised: 08/04/2018] [Accepted: 08/28/2018] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Measuring discrepancies between protein models and native structures is at the heart of development of protein structure prediction methods and comparison of their performance. A number of different evaluation methods have been developed; however, their comprehensive and unbiased comparison has not been performed. RESULTS We carried out a comparative analysis of several popular model assessment methods (RMSD, TM-score, GDT, QCS, CAD-score, LDDT, SphereGrinder and RPF) to reveal their relative strengths and weaknesses. The analysis, performed on a large and diverse model set derived in the course of three latest community-wide CASP experiments (CASP10-12), had two major directions. First, we looked at general differences between the scores by analyzing distribution, correspondence and correlation of their values as well as differences in selecting best models. Second, we examined the score differences taking into account various structural properties of models (stereochemistry, hydrogen bonds, packing of domains and chain fragments, missing residues, protein length and secondary structure). Our results provide a solid basis for an informed selection of the most appropriate score or combination of scores depending on the task at hand. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Kliment Olechnovič
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, Lithuania
| | | | | | - Česlovas Venclovas
- Institute of Biotechnology Life Sciences Center Vilnius University, Saulėtekio 7, Vilnius, Lithuania
| |
Collapse
|
139
|
Contreras S, Bertolani SJ, Siegel JB. A Benchmark for Homomeric Enzyme Active Site Structure Prediction Highlights the Importance of Accurate Modeling of Protein Symmetry. ACS OMEGA 2019; 4:22356-22362. [PMID: 31909318 PMCID: PMC6941179 DOI: 10.1021/acsomega.9b02636] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 12/04/2019] [Indexed: 05/15/2023]
Abstract
Accurate prediction and modeling of an enzyme's active site are critical for engineering efforts as well as providing insight into an enzyme's naturally occurring function. Previous efforts demonstrated that the integration of constraints enforcing strict geometric orientations between catalytic residues significantly improved the modeling accuracy for the active sites of monomeric enzymes. In this study, a similar approach was explored to evaluate the effect on the active sites of homomeric enzymes. A benchmark of 17 homomeric enzymes with known structures and a bound ligand relevant to the established chemistry were identified from the protein data bank. The enzymes identified span multiple classes as well as symmetries. Unlike what was observed for the monomeric enzymes, upon the application of catalytic geometric constraints, there was no significant improvement observed in modeling accuracy for either the active site of the protein structure or the accuracy of the subsequently docked ligand. Upon further analysis, it is apparent that the symmetric interface being modeled is inaccurate and prevented the active sites from being modeled at atomic-level accuracy. This is consistent with the challenge others have identified in being able to predict de novo protein symmetry. To further improve the accuracy of active site modeling for homomeric proteins, new methodologies to accurately model the symmetric interfaces of these complexes are needed.
Collapse
Affiliation(s)
- Stephanie
C. Contreras
- Department
of Chemistry, Department of Biochemistry and Molecular Medicine, and Genome Center, University of California, Davis, Davis, California 95616, United States
| | - Steve J. Bertolani
- Department
of Chemistry, Department of Biochemistry and Molecular Medicine, and Genome Center, University of California, Davis, Davis, California 95616, United States
| | - Justin B. Siegel
- Department
of Chemistry, Department of Biochemistry and Molecular Medicine, and Genome Center, University of California, Davis, Davis, California 95616, United States
- E-mail:
| |
Collapse
|
140
|
Mao W, Ding W, Xing Y, Gong H. AmoebaContact and GDFold as a pipeline for rapid de novo protein structure prediction. NAT MACH INTELL 2019. [DOI: 10.1038/s42256-019-0130-4] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
141
|
Revisiting the "satisfaction of spatial restraints" approach of MODELLER for protein homology modeling. PLoS Comput Biol 2019; 15:e1007219. [PMID: 31846452 PMCID: PMC6938380 DOI: 10.1371/journal.pcbi.1007219] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 12/31/2019] [Accepted: 11/13/2019] [Indexed: 01/02/2023] Open
Abstract
The most frequently used approach for protein structure prediction is currently homology modeling. The 3D model building phase of this methodology is critical for obtaining an accurate and biologically useful prediction. The most widely employed tool to perform this task is MODELLER. This program implements the “modeling by satisfaction of spatial restraints” strategy and its core algorithm has not been altered significantly since the early 1990s. In this work, we have explored the idea of modifying MODELLER with two effective, yet computationally light strategies to improve its 3D modeling performance. Firstly, we have investigated how the level of accuracy in the estimation of structural variability between a target protein and its templates in the form of σ values profoundly influences 3D modeling. We show that the σ values produced by MODELLER are on average weakly correlated to the true level of structural divergence between target-template pairs and that increasing this correlation greatly improves the program’s predictions, especially in multiple-template modeling. Secondly, we have inquired into how the incorporation of statistical potential terms (such as the DOPE potential) in the MODELLER’s objective function impacts positively 3D modeling quality by providing a small but consistent improvement in metrics such as GDT-HA and lDDT and a large increase in stereochemical quality. Python modules to harness this second strategy are freely available at https://github.com/pymodproject/altmod. In summary, we show that there is a large room for improving MODELLER in terms of 3D modeling quality and we propose strategies that could be pursued in order to further increase its performance. Proteins are fundamental biological molecules that carry out countless activities in living beings. Since the function of proteins is dictated by their three-dimensional atomic structures, acquiring structural details of proteins provides deep insights into their function. Currently, the most frequently used computational approach for protein structure prediction is template-based modeling. In this approach, a target protein is modeled using the experimentally-derived structural information of a template protein assumed to have a similar structure to the target. MODELLER is the most frequently used program for template-based 3D model building. Despite its success, its predictions are not always accurate enough to be useful in Biomedical Research. Here, we show that it is possible to greatly increase the performance of MODELLER by modifying two aspects of its algorithm. First, we demonstrate that providing the program with accurate estimations of local target-template structural divergence greatly increases the quality of its predictions. Additionally, we show that modifying MODELLER’s scoring function with statistical potential energetic terms also helps to improve modeling quality. This work will be useful in future research, since it reports practical strategies to improve the performance of this core tool in Structural Bioinformatics.
Collapse
|
142
|
On the possible origin of protein homochirality, structure, and biochemical function. Proc Natl Acad Sci U S A 2019; 116:26571-26579. [PMID: 31822617 DOI: 10.1073/pnas.1908241116] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Living systems have chiral molecules, e.g., native proteins that almost entirely contain L-amino acids. How protein homochirality emerged from a background of equal numbers of L and D amino acids is among many questions about life's origin. The origin of homochirality and its implications are explored in computer simulations examining the stability and structural and functional properties of an artificial library of compact proteins containing 1:1 (termed demi-chiral), 3:1, and 1:3 ratios of D:L and purely L or D amino acids generated without functional selection. Demi-chiral proteins have shorter secondary structures and fewer internal hydrogen bonds and are less stable than homochiral proteins. Selection for hydrogen bonding yields a preponderance of L or D amino acids. Demi-chiral proteins have native global folds, including similarity to early ribosomal proteins, similar small molecule ligand binding pocket geometries, and many constellations of L-chiral amino acids with a 1.0-Å RMSD to native enzyme active sites. For a representative subset containing 550 active site geometries matching 457 (2) 4-digit (3-digit) enzyme classification (E.C.) numbers, native active site amino acids were generated at random for 472 of 550 cases. This increases to 548 of 550 cases when similar residues are allowed. The most frequently generated sequences correspond to ancient enzymatic functions, e.g., glycolysis, replication, and nucleotide biosynthesis. Surprisingly, even without selection, demi-chiral proteins possess the requisite marginal biochemical function and structure of modern proteins, but were thermodynamically less stable. If demi-chiral proteins were present, they could engage in early metabolism, which created the feedback loop for transcription and cell formation.
Collapse
|
143
|
Zhang C, Lane L, Omenn GS, Zhang Y. Blinded Testing of Function Annotation for uPE1 Proteins by I-TASSER/COFACTOR Pipeline Using the 2018-2019 Additions to neXtProt and the CAFA3 Challenge. J Proteome Res 2019; 18:4154-4166. [PMID: 31581775 PMCID: PMC6900986 DOI: 10.1021/acs.jproteome.9b00537] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
In 2018, we reported a hybrid pipeline that predicts protein structures with I-TASSER and function with COFACTOR. I-TASSER/COFACTOR achieved Gene Ontology (GO) high prediction accuracies of Fmax = 0.69 and 0.57 for molecular function (MF) and biological process (BP), respectively, on 100 comprehensively annotated proteins. Now we report blinded analyses of newly annotated proteins in the critical assessment of function annotation (CAFA) three function prediction challenge and in neXtProt. For CAFA3 results released in May 2019, our predictions on 267 and 912 human proteins with newly annotated MF and BP terms achieved Fmax = 0.50 and 0.42, respectively, on "No Knowledge" proteins, and 0.51 and 0.74, respectively, on "Limited Knowledge" proteins. While COFACTOR consistently outperforms simple homology-based analysis, its accuracy still depends on template availability. Meanwhile, in neXtProt 2019-01, 25 proteins acquired new function annotation through literature curation at UniProt/Swiss-Prot. Before the release of these curated results, we submitted to neXtProt blinded predictions of free-text function annotation based on predicted GO terms. For 10 of the 25, a good match of free-text or GO term annotation was obtained. These blind tests represent rigorous assessments of I-TASSER/COFACTOR. neXtProt now provides links to precomputed I-TASSER/COFACTOR predictions for proteins without function annotation to facilitate experimental planning on "dark proteins".
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109-2218, United States
| | - Lydie Lane
- CALIPHO Group, SIB Swiss Institute of Bioinformatics, Geneva, Switzerland
- Department of Microbiology and Molecular Medicine, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109-2218, United States
- Departments of Internal Medicine and Human Genetics and School of Public Health, and University of Michigan, Ann Arbor, Michigan 48109-2218, United States
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109-2218, United States
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan 48109-2218, United States
| |
Collapse
|
144
|
Bittrich S, Schroeder M, Labudde D. StructureDistiller: Structural relevance scoring identifies the most informative entries of a contact map. Sci Rep 2019; 9:18517. [PMID: 31811259 PMCID: PMC6898053 DOI: 10.1038/s41598-019-55047-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 11/21/2019] [Indexed: 12/17/2022] Open
Abstract
Protein folding and structure prediction are two sides of the same coin. Contact maps and the related techniques of constraint-based structure reconstruction can be considered as unifying aspects of both processes. We present the Structural Relevance (SR) score which quantifies the information content of individual contacts and residues in the context of the whole native structure. The physical process of protein folding is commonly characterized with spatial and temporal resolution: some residues are Early Folding while others are Highly Stable with respect to unfolding events. We employ the proposed SR score to demonstrate that folding initiation and structure stabilization are subprocesses realized by distinct sets of residues. The example of cytochrome c is used to demonstrate how StructureDistiller identifies the most important contacts needed for correct protein folding. This shows that entries of a contact map are not equally relevant for structural integrity. The proposed StructureDistiller algorithm identifies contacts with the highest information content; these entries convey unique constraints not captured by other contacts. Identification of the most informative contacts effectively doubles resilience toward contacts which are not observed in the native contact map. Furthermore, this knowledge increases reconstruction fidelity on sparse contact maps significantly by 0.4 Å.
Collapse
Affiliation(s)
- Sebastian Bittrich
- University of Applied Sciences Mittweida, Mittweida, 09648, Germany. .,Biotechnology Center (BIOTEC), TU Dresden, Dresden, 01307, Germany. .,Research Collaboratory for Structural Bioinformatics Protein Data Bank, University of California, San Diego, La Jolla, CA, 92093, USA.
| | | | - Dirk Labudde
- University of Applied Sciences Mittweida, Mittweida, 09648, Germany
| |
Collapse
|
145
|
Rao R, Bhattacharya N, Thomas N, Duan Y, Chen X, Canny J, Abbeel P, Song YS. Evaluating Protein Transfer Learning with TAPE. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 2019; 32:9689-9701. [PMID: 33390682 PMCID: PMC7774645] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Machine learning applied to protein sequences is an increasingly popular area of research. Semi-supervised learning for proteins has emerged as an important paradigm due to the high cost of acquiring supervised protein labels, but the current literature is fragmented when it comes to datasets and standardized evaluation techniques. To facilitate progress in this field, we introduce the Tasks Assessing Protein Embeddings (TAPE), a set of five biologically relevant semi-supervised learning tasks spread across different domains of protein biology. We curate tasks into specific training, validation, and test splits to ensure that each task tests biologically relevant generalization that transfers to real-life scenarios. We benchmark a range of approaches to semi-supervised protein representation learning, which span recent work as well as canonical sequence learning techniques. We find that self-supervised pretraining is helpful for almost all models on all tasks, more than doubling performance in some cases. Despite this increase, in several cases features learned by self-supervised pretraining still lag behind features extracted by state-of-the-art non-neural techniques. This gap in performance suggests a huge opportunity for innovative architecture design and improved modeling paradigms that better capture the signal in biological sequences. TAPE will help the machine learning community focus effort on scientifically relevant problems. Toward this end, all data and code used to run these experiments are available at https://github.com/songlab-cal/tape.
Collapse
|
146
|
Shrestha R, Fajardo E, Gil N, Fidelis K, Kryshtafovych A, Monastyrskyy B, Fiser A. Assessing the accuracy of contact predictions in CASP13. Proteins 2019; 87:1058-1068. [PMID: 31587357 PMCID: PMC6851495 DOI: 10.1002/prot.25819] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/17/2019] [Accepted: 09/17/2019] [Indexed: 01/07/2023]
Abstract
The accuracy of sequence-based tertiary contact predictions was assessed in a blind prediction experiment at the CASP13 meeting. After 4 years of significant improvements in prediction accuracy, another dramatic advance has taken place since CASP12 was held 2 years ago. The precision of predicting the top L/5 contacts in the free modeling category, where L is the corresponding length of the protein in residues, has exceeded 70%. As a comparison, the best-performing group at CASP12 with a 47% precision would have finished below the top 1/3 of the CASP13 groups. Extensively trained deep neural network approaches dominate the top performing algorithms, which appear to efficiently integrate information on coevolving residues and interacting fragments or possibly utilize memories of sequence similarities and sometimes can deliver accurate results even in the absence of virtually any target specific evolutionary information. If the current performance is evaluated by F-score on L contacts, it stands around 24% right now, which, despite the tremendous impact and advance in improving its utility for structure modeling, also suggests that there is much room left for further improvement.
Collapse
Affiliation(s)
- Rojan Shrestha
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Eduardo Fajardo
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Nelson Gil
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Bohdan Monastyrskyy
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Andras Fiser
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| |
Collapse
|
147
|
Fajardo JE, Shrestha R, Gil N, Belsom A, Crivelli SN, Czaplewski C, Fidelis K, Grudinin S, Karasikov M, Karczyńska AS, Kryshtafovych A, Leitner A, Liwo A, Lubecka EA, Monastyrskyy B, Pagès G, Rappsilber J, Sieradzan AK, Sikorska C, Trabjerg E, Fiser A. Assessment of chemical-crosslink-assisted protein structure modeling in CASP13. Proteins 2019; 87:1283-1297. [PMID: 31569265 PMCID: PMC6851497 DOI: 10.1002/prot.25816] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 08/08/2019] [Accepted: 09/13/2019] [Indexed: 12/22/2022]
Abstract
With the advance of experimental procedures obtaining chemical crosslinking information is becoming a fast and routine practice. Information on crosslinks can greatly enhance the accuracy of protein structure modeling. Here, we review the current state of the art in modeling protein structures with the assistance of experimentally determined chemical crosslinks within the framework of the 13th meeting of Critical Assessment of Structure Prediction approaches. This largest-to-date blind assessment reveals benefits of using data assistance in difficult to model protein structure prediction cases. However, in a broader context, it also suggests that with the unprecedented advance in accuracy to predict contacts in recent years, experimental crosslinks will be useful only if their specificity and accuracy further improved and they are better integrated into computational workflows.
Collapse
Affiliation(s)
- J. Eduardo Fajardo
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Rojan Shrestha
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Nelson Gil
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Adam Belsom
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany
| | - Silvia N. Crivelli
- Department of Computer Science, UC Davis, One Shields Ave., Davis, CA 95616
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP LJK, 38000 Grenoble, France
| | - Mikhail Karasikov
- Center for Energy Systems, Skolkovo Institute of Science and Technology, Moscow, 143026, Russia
- Moscow Institute of Physics and Technology, Moscow, 141701, Russia
- Department of Computer Science, ETH Zurich, Zurich, 8092, Switzerland
| | | | - Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Alexander Leitner
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Otto-Stern-Weg 3, 8093 Zurich, Switzerland
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
- School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Emilia A. Lubecka
- Institute of Informatics, Faculty of Mathematics, Physics, and Informatics, University of Gdańsk, Wita Stwosza 57, 80-308 Gdańsk, Poland
| | - Bohdan Monastyrskyy
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Guillaume Pagès
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP LJK, 38000 Grenoble, France
| | - Juri Rappsilber
- Bioanalytics, Institute of Biotechnology, Technische Universität Berlin, 13355 Berlin, Germany
- Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom
| | - Adam K. Sieradzan
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Celina Sikorska
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | - Esben Trabjerg
- Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Otto-Stern-Weg 3, 8093 Zurich, Switzerland
| | - Andras Fiser
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| |
Collapse
|
148
|
Santos-Martins D, Eberhardt J, Bianco G, Solis-Vasquez L, Ambrosio FA, Koch A, Forli S. D3R Grand Challenge 4: prospective pose prediction of BACE1 ligands with AutoDock-GPU. J Comput Aided Mol Des 2019; 33:1071-1081. [PMID: 31691920 PMCID: PMC7325737 DOI: 10.1007/s10822-019-00241-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 10/22/2019] [Indexed: 10/25/2022]
Abstract
In this paper we describe our approaches to predict the binding mode of twenty BACE1 ligands as part of Grand Challenge 4 (GC4), organized by the Drug Design Data Resource. Calculations for all submissions (except for one, which used AutoDock4.2) were performed using AutoDock-GPU, the new GPU-accelerated version of AutoDock4 implemented in OpenCL, which features a gradient-based local search. The pose prediction challenge was organized in two stages. In Stage 1a, the protein conformations associated with each of the ligands were undisclosed, so we docked each ligand to a set of eleven receptor conformations, chosen to maximize the diversity of binding pocket topography. Protein conformations were made available in Stage 1b, making it a re-docking task. For all calculations, macrocyclic conformations were sampled on the fly during docking, taking the target structure into account. To leverage information from existing structures containing BACE1 bound to ligands available in the PDB, we tested biased docking and pose filter protocols to facilitate poses resembling those experimentally determined. Both pose filters and biased docking resulted in more accurate docked poses, enabling us to predict for both Stages 1a and 1b ligand poses within 2 Å RMSD from the crystallographic pose. Nevertheless, many of the ligands could be correctly docked without using existing structural information, demonstrating the usefulness of physics-based scoring functions, such as the one used in AutoDock4, for structure based drug design.
Collapse
Affiliation(s)
- Diogo Santos-Martins
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, USA
| | - Jerome Eberhardt
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, USA
| | - Giulia Bianco
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, USA
| | - Leonardo Solis-Vasquez
- Embedded Systems and Applications Group, Technische Universität Darmstadt, Darmstadt, Germany
| | - Francesca Alessandra Ambrosio
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, USA
- Department of Health Sciences, "Magna Græcia" University of Catanzaro, Campus "S. Venuta", Viale Europa, 88100, Catanzaro, Italy
| | - Andreas Koch
- Embedded Systems and Applications Group, Technische Universität Darmstadt, Darmstadt, Germany
| | - Stefano Forli
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, USA.
| |
Collapse
|
149
|
Dapkūnas J, Kairys V, Olechnovič K, Venclovas Č. Template-based modeling of diverse protein interactions in CAPRI rounds 38-45. Proteins 2019; 88:939-947. [PMID: 31697420 DOI: 10.1002/prot.25845] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2019] [Accepted: 11/03/2019] [Indexed: 11/09/2022]
Abstract
Structures of proteins complexed with other proteins, peptides, or ligands are essential for investigation of molecular mechanisms. However, the experimental structures of protein complexes of interest are often not available. Therefore, computational methods are widely used to predict these structures, and, of those methods, template-based modeling is the most successful. In the rounds 38-45 of the Critical Assessment of PRediction of Interactions (CAPRI), we applied template-based modeling for 9 of 11 protein-protein and protein-peptide interaction targets, resulting in medium and high-quality models for six targets. For the protein-oligosaccharide docking targets, we used constraints derived from template structures, and generated models of at least acceptable quality for most of the targets. Apparently, high flexibility of oligosaccharide molecules was the main cause preventing us from obtaining models of higher quality. We also participated in the CAPRI scoring challenge, the goal of which was to identify the highest quality models from a large pool of decoys. In this experiment, we tested VoroMQA, a scoring method based on interatomic contact areas. The results showed VoroMQA to be quite effective in scoring strongly binding and obligatory protein complexes, but less successful in the case of transient interactions. We extensively used manual intervention in both CAPRI modeling and scoring experiments. This oftentimes allowed us to select the correct templates from available alternatives and to limit the search space during the model scoring.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Visvaldas Kairys
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
150
|
Dos Santos RN, Bottino GF, Gozzo FC, Morcos F, Martínez L. Structural complementarity of distance constraints obtained from chemical cross-linking and amino acid coevolution. Proteins 2019; 88:625-632. [PMID: 31693206 DOI: 10.1002/prot.25843] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2019] [Revised: 10/07/2019] [Accepted: 11/03/2019] [Indexed: 12/11/2022]
Abstract
The analysis of amino acid coevolution has emerged as a practical method for protein structural modeling by providing structural contact information from alignments of amino acid sequences. In parallel, chemical cross-linking/mass spectrometry (XLMS) has gained attention as a universally applicable method for obtaining low-resolution distance constraints to model the quaternary arrangements of proteins, and more recently even protein tertiary structures. Here, we show that the structural information obtained by XLMS and coevolutionary analysis are effectively complementary: the distance constraints obtained by each method are almost exclusively associated with non-coincident pairs of residues, and modeling results obtained by the combination of both sets are improved relative to considering the same total number of constraints of a single type. The structural rationale behind the complementarity of the distance constraints is discussed and illustrated for a representative set of proteins with different sizes and folds.
Collapse
Affiliation(s)
- Ricardo N Dos Santos
- Institute of Chemistry, University of Campinas, Campinas, São Paulo, Brazil.,Center for Computing in Engineering & Sciences, University of Campinas, Campinas, São Paulo, Brazil
| | - Guilherme F Bottino
- Institute of Chemistry, University of Campinas, Campinas, São Paulo, Brazil.,Center for Computing in Engineering & Sciences, University of Campinas, Campinas, São Paulo, Brazil
| | - Fábio C Gozzo
- Institute of Chemistry, University of Campinas, Campinas, São Paulo, Brazil
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas.,Department of Bioengineering, University of Texas at Dallas, Richardson, Texas
| | - Leandro Martínez
- Institute of Chemistry, University of Campinas, Campinas, São Paulo, Brazil.,Center for Computing in Engineering & Sciences, University of Campinas, Campinas, São Paulo, Brazil
| |
Collapse
|