1
|
Pang HH, Li NS, Hsu YP, Ju SP, Syu GD, Du PX, Huang CY, Wei KC, Yang HW. AI-Driven Design System for Fabrication of Inhalable Nanocatchers for Virus Capture and Neutralization. Adv Healthc Mater 2024; 13:e2302927. [PMID: 37986024 DOI: 10.1002/adhm.202302927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 11/13/2023] [Indexed: 11/22/2023]
Abstract
The global pandemic presents a critical threat to humanity, with no effective rapid-response solutions for early-stage virus dissemination. This study aims to create an AI-driven entry-blocker design system (AIEB) to fabricate inhalable virus-like nanocatchers (VLNCs) fused with entry-blocking peptides (EBPs) to counter pandemic viruses and explore therapeutic applications. This work focuses on developing angiotensin-converting enzyme 2 (ACE2)-mimic domain-fused VLNCs (ACE2@VLNCs) using AIEB and analyzing their interaction with the SARS-CoV-2 receptor binding domain (RBD), demonstrating their potential to hinder SARS-CoV-2 infection. Aerosol-based tests show ACE2@VLNCs persist over 70 min in the air and neutralize pseudoviruses within 30 min, indicating their utility in reducing airborne virus transmission. In vivo results reveal ACE2@VLNCs mitigate over 67% of SARS-CoV-2 infections. Biosafety studies confirm their safety, causing no damage to eyes, skin, lungs, or trachea, and not eliciting significant immune responses. These findings offer crucial insights into pandemic virus prevention and treatment, highlighting the potential of the ACE2@VLNCs system as a promising strategy against future pandemics.
Collapse
Affiliation(s)
- Hao-Han Pang
- Department of Biomedical Engineering, National Cheng Kung University, Tainan, 70101, Taiwan
| | - Nan-Si Li
- Department of Biomedical Engineering, National Cheng Kung University, Tainan, 70101, Taiwan
| | - Ying-Pei Hsu
- Department of Materials and Optoelectronic Science, National Sun Yat-sen University, Kaohsiung, 80424, Taiwan
| | - Shin-Pon Ju
- Department of Mechanical and Electro-Mechanical Engineering, National Sun Yat-sen University, Kaohsiung, 80424, Taiwan
| | - Guan-Da Syu
- Department of Biotechnology and Bioindustry Sciences, National Cheng Kung University, Tainan, Taiwan
- International Center for Wound Repair and Regeneration, National Cheng Kung University, Tainan, Taiwan
- Medical Device Innovation Center, National Cheng Kung University, Tainan, 70101, Taiwan
| | - Pin-Xian Du
- Department of Biotechnology and Bioindustry Sciences, National Cheng Kung University, Tainan, Taiwan
| | - Chiung-Yin Huang
- Department of Neurosurgery, Neuroscience Research Center, Chang Gung Memorial Hospital, Linkou, Taoyuan, 33305, Taiwan
| | - Kuo-Chen Wei
- Department of Neurosurgery, Neuroscience Research Center, Chang Gung Memorial Hospital, Linkou, Taoyuan, 33305, Taiwan
- School of Medicine, Chang Gung University, Taoyuan, 33302, Taiwan
- Department of Neurosurgery, New Taipei Municipal TuCheng Hospital, New Taipei City, 23652, Taiwan
| | - Hung-Wei Yang
- Department of Biomedical Engineering, National Cheng Kung University, Tainan, 70101, Taiwan
- Medical Device Innovation Center, National Cheng Kung University, Tainan, 70101, Taiwan
| |
Collapse
|
2
|
Grigorjew A, Gynter A, Dias FHC, Buchfink B, Drost HG, Tomescu AI. Sensitive inference of alignment-safe intervals from biodiverse protein sequence clusters using EMERALD. Genome Biol 2023; 24:168. [PMID: 37461051 DOI: 10.1186/s13059-023-03008-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 07/05/2023] [Indexed: 07/20/2023] Open
Abstract
Sequence alignments are the foundations of life science research, but most innovation so far focuses on optimal alignments, while information derived from suboptimal solutions is ignored. We argue that one optimal alignment per pairwise sequence comparison is a reasonable approximation when dealing with very similar sequences but is insufficient when exploring the biodiversity of the protein universe at tree-of-life scale. To overcome this limitation, we introduce pairwise alignment-safety to uncover the amino acid positions robustly shared across all suboptimal solutions. We implement EMERALD, a software library for alignment-safety inference, and apply it to 400k sequences from the SwissProt database.
Collapse
Affiliation(s)
- Andreas Grigorjew
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Artur Gynter
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Fernando H C Dias
- Department of Computer Science, University of Helsinki, Helsinki, Finland
| | - Benjamin Buchfink
- Computational Biology Group, Max Planck Institute for Biology, Tübingen, Germany
| | - Hajk-Georg Drost
- Computational Biology Group, Max Planck Institute for Biology, Tübingen, Germany.
| | | |
Collapse
|
3
|
Afshinpour M, Mahdiuni H. Arginine transportation mechanism through cationic amino acid transporter 1: insights from molecular dynamics studies. J Biomol Struct Dyn 2023; 41:13580-13594. [PMID: 36762692 DOI: 10.1080/07391102.2023.2175374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 01/28/2023] [Indexed: 02/11/2023]
Abstract
Metabolic and signaling mechanisms in mammalian cells are facilitated by the transportation of L-arginine (Arg) across the plasma membrane through cationic amino acid transporter (CAT) proteins. Due to a lack of argininosuccinate synthase (ASS) activity in various tumor cells such as acute myeloid leukemia, acute lymphocytic leukemia, and chronic lymphocytic leukemia, these tumor entities are arginine-auxotrophic and therefore depend on the uptake of the amino acid arginine. Cationic amino acid transporter-1 (CAT-1) is the leading arginine importer expressed in the aforementioned tumor entities. Hence, in the present study, to investigate the transportation mechanism of arginine in CAT-1, we performed molecular dynamics (MD) simulation methods on the modeled human CAT-1. The MM-PBSA approach was conducted to determine the critical residues interacting with arginine within the corresponding binding site of CAT-1. In addition, we found out that the water molecules have the leading role in forming the transportation channel within CAT-1. The conductive structure of CAT-1 was formed only when the water molecules were continuously distributed across the channel. Steered molecular dynamics (SMD) simulation approach showed various energy barriers against arginine transportation through CAT-1, especially while crossing the bottlenecks of the related channel. These findings at the molecular level might shed light on identifying the crucial amino acids in the binding of arginine to eukaryotic CATs and also provide fundamental insights into the arginine transportation mechanisms through CAT-1. Understanding the transportation mechanism of arginine is essential to developing CAT-1 blockers, which can be potential medications for some types of cancers.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Maral Afshinpour
- Bioinformatics Lab., Department of Biology, School of Sciences, Razi University, Kermanshah, Iran
| | - Hamid Mahdiuni
- Bioinformatics Lab., Department of Biology, School of Sciences, Razi University, Kermanshah, Iran
| |
Collapse
|
4
|
Monroe L, Kihara D. Using steered molecular dynamic tension for assessing quality of computational protein structure models. J Comput Chem 2022; 43:1140-1150. [PMID: 35475517 PMCID: PMC9133218 DOI: 10.1002/jcc.26876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/16/2022] [Accepted: 04/15/2022] [Indexed: 11/12/2022]
Abstract
The native structures of proteins, except for notable exceptions of intrinsically disordered proteins, in general take their most stable conformation in the physiological condition to maintain their structural framework so that their biological function can be properly carried out. Experimentally, the stability of a protein can be measured by several means, among which the pulling experiment using the atomic force microscope (AFM) stands as a unique method. AFM directly measures the resistance from unfolding, which can be quantified from the observed force-extension profile. It has been shown that key features observed in an AFM pulling experiment can be well reproduced by computational molecular dynamics simulations. Here, we applied computational pulling for estimating the accuracy of computational protein structure models under the hypothesis that the structural stability would positively correlated with the accuracy, i.e. the closeness to the native, of a model. We used in total 4929 structure models for 24 target proteins from the Critical Assessment of Techniques of Structure Prediction (CASP) and investigated if the magnitude of the break force, that is, the force required to rearrange the model's structure, from the force profile was sufficient information for selecting near-native models. We found that near-native models can be successfully selected by examining their break forces suggesting that high break force indeed indicates high stability of models. On the other hand, there were also near-native models that had relatively low peak forces. The mechanisms of the stability exhibited by the break forces were explored and discussed.
Collapse
Affiliation(s)
- Lyman Monroe
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, USA
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
- Purdue University Center for Cancer Research, West Lafayette, IN, 47907, USA
| |
Collapse
|
5
|
Nyamai DW, Tastan Bishop Ö. Aminoacyl tRNA synthetases as malarial drug targets: a comparative bioinformatics study. Malar J 2019; 18:34. [PMID: 30728021 PMCID: PMC6366043 DOI: 10.1186/s12936-019-2665-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2018] [Accepted: 01/27/2019] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Treatment of parasitic diseases has been challenging due to evolution of drug resistant parasites, and thus there is need to identify new class of drugs and drug targets. Protein translation is important for survival of malarial parasite, Plasmodium, and the pathway is present in all of its life cycle stages. Aminoacyl tRNA synthetases are primary enzymes in protein translation as they catalyse amino acid addition to the cognate tRNA. This study sought to understand differences between Plasmodium and human aminoacyl tRNA synthetases through bioinformatics analysis. METHODS Plasmodium berghei, Plasmodium falciparum, Plasmodium fragile, Plasmodium knowlesi, Plasmodium malariae, Plasmodium ovale, Plasmodium vivax, Plasmodium yoelii and human aminoacyl tRNA synthetase sequences were retrieved from UniProt database and grouped into 20 families based on amino acid specificity. These families were further divided into two classes. Both families and classes were analysed. Motif discovery was carried out using the MEME software, sequence identity calculation was done using an in-house Python script, multiple sequence alignments were performed using PROMALS3D and TCOFFEE tools, and phylogenetic tree calculations were performed using MEGA vs 7.0 tool. Possible alternative binding sites were predicted using FTMap webserver and SiteMap tool. RESULTS Motif discovery revealed Plasmodium-specific motifs while phylogenetic tree calculations showed that Plasmodium proteins have different evolutionary history to the human homologues. Human aaRSs sequences showed low sequence identity (below 40%) compared to Plasmodium sequences. Prediction of alternative binding sites revealed potential druggable sites in PfArgRS, PfMetRS and PfProRS at regions that are weakly conserved when compared to the human homologues. Multiple sequence analysis, motif discovery, pairwise sequence identity calculations and phylogenetic tree analysis showed significant differences between parasite and human aaRSs proteins despite functional and structural conservation. These differences may provide a basis for further exploration of Plasmodium aminoacyl tRNA synthetases as potential drug targets. CONCLUSION This study showed that, despite, functional and structural conservation, Plasmodium aaRSs have key differences from the human homologues. These differences in Plasmodium aaRSs can be targeted to develop anti-malarial drugs with less toxicity to the host.
Collapse
Affiliation(s)
- Dorothy Wavinya Nyamai
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown, 6140, South Africa
| | - Özlem Tastan Bishop
- Research Unit in Bioinformatics (RUBi), Department of Biochemistry and Microbiology, Rhodes University, Grahamstown, 6140, South Africa.
| |
Collapse
|
6
|
Prediction of Local Quality of Protein Structure Models Considering Spatial Neighbors in Graphical Models. Sci Rep 2017; 7:40629. [PMID: 28074879 PMCID: PMC5225430 DOI: 10.1038/srep40629] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Accepted: 12/08/2016] [Indexed: 12/31/2022] Open
Abstract
Protein tertiary structure prediction methods have matured in recent years. However, some proteins defy accurate prediction due to factors such as inadequate template structures. While existing model quality assessment methods predict global model quality relatively well, there is substantial room for improvement in local quality assessment, i.e. assessment of the error at each residue position in a model. Local quality is a very important information for practical applications of structure models such as interpreting/designing site-directed mutagenesis of proteins. We have developed a novel local quality assessment method for protein tertiary structure models. The method, named Graph-based Model Quality assessment method (GMQ), explicitly considers the predicted quality of spatially neighboring residues using a graph representation of a query protein structure model. GMQ uses conditional random field as its core of the algorithm, and performs a binary prediction of the quality of each residue in a model, indicating if a residue position is likely to be within an error cutoff or not. The accuracy of GMQ was improved by considering larger graphs to include quality information of more surrounding residues. Moreover, we found that using different edge weights in graphs reflecting different secondary structures further improves the accuracy. GMQ showed competitive performance on a benchmark for quality assessment of structure models from the Critical Assessment of Techniques for Protein Structure Prediction (CASP).
Collapse
|
7
|
Li J, Fang H. A comparison of different functions for predicted protein model quality assessment. J Comput Aided Mol Des 2016; 30:553-8. [PMID: 27488386 DOI: 10.1007/s10822-016-9924-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2016] [Accepted: 07/08/2016] [Indexed: 11/30/2022]
Abstract
In protein structure prediction, a considerable number of models are usually produced by either the Template-Based Method (TBM) or the ab initio prediction. The purpose of this study is to find the critical parameter in assessing the quality of the predicted models. A non-redundant template library was developed and 138 target sequences were modeled. The target sequences were all distant from the proteins in the template library and were aligned with template library proteins on the basis of the transformation matrix. The quality of each model was first assessed with QMEAN and its six parameters, which are C_β interaction energy (C_beta), all-atom pairwise energy (PE), solvation energy (SE), torsion angle energy (TAE), secondary structure agreement (SSA), and solvent accessibility agreement (SAE). Finally, the alignment score (score) was also used to assess the quality of model. Hence, a total of eight parameters (i.e., QMEAN, C_beta, PE, SE, TAE, SSA, SAE, score) were independently used to assess the quality of each model. The results indicate that SSA is the best parameter to estimate the quality of the model.
Collapse
Affiliation(s)
- Juan Li
- Department of Hematology, Nanjing Drum Tower Hospital, The Affiliated Hospital of Nanjing University Medical School, Nanjing, Jiangsu, 210008, People's Republic of China
| | - Huisheng Fang
- School of Life Science and Technology, China Pharmaceutical University, Nanjing, Jiangsu, 210009, People's Republic of China.
| |
Collapse
|
8
|
Ryu H, Kim TR, Ahn S, Ji S, Lee J. Protein NMR structures refined without NOE data. PLoS One 2014; 9:e108888. [PMID: 25279564 PMCID: PMC4184813 DOI: 10.1371/journal.pone.0108888] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2014] [Accepted: 09/04/2014] [Indexed: 12/31/2022] Open
Abstract
The refinement of low-quality structures is an important challenge in protein structure prediction. Many studies have been conducted on protein structure refinement; the refinement of structures derived from NMR spectroscopy has been especially intensively studied. In this study, we generated flat-bottom distance potential instead of NOE data because NOE data have ambiguity and uncertainty. The potential was derived from distance information from given structures and prevented structural dislocation during the refinement process. A simulated annealing protocol was used to minimize the potential energy of the structure. The protocol was tested on 134 NMR structures in the Protein Data Bank (PDB) that also have X-ray structures. Among them, 50 structures were used as a training set to find the optimal "width" parameter in the flat-bottom distance potential functions. In the validation set (the other 84 structures), most of the 12 quality assessment scores of the refined structures were significantly improved (total score increased from 1.215 to 2.044). Moreover, the secondary structure similarity of the refined structure was improved over that of the original structure. Finally, we demonstrate that the combination of two energy potentials, statistical torsion angle potential (STAP) and the flat-bottom distance potential, can drive the refinement of NMR structures.
Collapse
Affiliation(s)
- Hyojung Ryu
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, The Republic of Korea
- Department of Bioinformatics, University of Science and Technology, Daejeon, The Republic of Korea
| | - Tae-Rae Kim
- Department of Chemistry, Seoul National University, Seoul, The Republic of Korea
| | - SeonJoo Ahn
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, The Republic of Korea
| | - Sunyoung Ji
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, The Republic of Korea
- Department of Bioinformatics, University of Science and Technology, Daejeon, The Republic of Korea
| | - Jinhyuk Lee
- Korean Bioinformation Center (KOBIC), Korea Research Institute of Bioscience and Biotechnology, Daejeon, The Republic of Korea
- Department of Bioinformatics, University of Science and Technology, Daejeon, The Republic of Korea
| |
Collapse
|
9
|
Deng X, Li J, Cheng J. Predicting Protein Model Quality from Sequence Alignments by Support Vector Machines. ACTA ACUST UNITED AC 2014; Suppl 9. [PMID: 26752865 PMCID: PMC4705550 DOI: 10.4172/jpb.s9-001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Assessing the quality of a protein structure model is essential for protein structure prediction. Here, we developed a Support Vector Machine (SVM) method to predict the quality score (GDT-TS score) of a protein structure model from the features extracted from the sequence alignment used to generate the model. We developed a Support Vector Machine (SVM) model quality assessment method, taking either a query-single-template pairwise alignment or a query-multitemplate alignment as input. For the pairwise alignment scheme, the input features fed into the SVM predictor include the normalized e-value of the given alignment, the percentage of identical residue pairs in the alignment, the percentage of residues of the query aligned with those of the template, and the sum of the BLOSUM scores of all aligned residues divided by the length of the aligned positions. Similarly, for the multiple-alignment scheme, the input features include the percentage of the residues of the target sequence aligned with those in one or more templates, the percentage of aligned residues of the target sequence that are the same as that of any one template, the average BLOSUM score of aligned residues and the average Gonnet160 score of aligned residues. A SVM regression predictor was trained on the training data to predict the GDT-TS scores of the models from the input features. The Root Mean Square Error (RMSE) and the Absolute Mean Error (ABS) between predicted and real GDT-TS scores were calculated to evaluate the performance. A five-fold cross validation was applied to select the best parameter values based on the average RMSE and ABS on the five folds. The RMSE and ABS of the optimized SVM predictor on the testing data were close to 0.1. The good performance of the SVM and sequence alignment based predictor indicates that integrating sequence alignment features with a SVM is effective for protein model quality assessment.
Collapse
Affiliation(s)
- Xin Deng
- Computer Science Department, University of Missouri-Columbia, Columbia, MO, USA
| | - Jilong Li
- Computer Science Department, University of Missouri-Columbia, Columbia, MO, USA
| | - Jianlin Cheng
- Computer Science Department, University of Missouri-Columbia, Columbia, MO, USA; Informatics Institute, University of Missouri-Columbia, Columbia, MO, USA; C. Bond Life Science Center, University of Missouri-Columbia, Columbia, MO, USA
| |
Collapse
|
10
|
Li J, Deng X, Eickholt J, Cheng J. Designing and benchmarking the MULTICOM protein structure prediction system. BMC STRUCTURAL BIOLOGY 2013; 13:2. [PMID: 23442819 PMCID: PMC3599124 DOI: 10.1186/1472-6807-13-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2012] [Accepted: 02/21/2013] [Indexed: 11/19/2022]
Abstract
Background Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor. Results Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction. Conclusions Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.
Collapse
Affiliation(s)
- Jilong Li
- Computer Science Department, University of Missouri, Columbia, MO, USA
| | | | | | | |
Collapse
|
11
|
Mullins JGL. Structural modelling pipelines in next generation sequencing projects. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2012; 89:117-67. [PMID: 23046884 DOI: 10.1016/b978-0-12-394287-6.00005-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Our capacity to reliably predict protein structure from sequence is steadily improving due to the increased numbers and better targeting of protein structures being experimentally determined by structural genomics projects, along with the development of better modeling methodologies. Template-based (homology) modeling and de novo modeling methods are being combined to fill in remaining gaps in template coverage, and powerful automated structural modeling pipelines are being applied to large data sets of protein sequences. The improved quality of 3D models of proteins has led to their routine use in assessing the functional impact of nonsynonymous single nucleotide polymorphisms (nsSNPs) in specific protein systems, with the development of approaches that may be applied in a predictive fashion to nsSNPs emerging from next-generation sequencing projects. The challenges encountered in deriving functionally meaningful deductions from structural modeling can be quite different for proteins of different protein functional classes. The specific challenges to the assessment of the structural and functional impact of nsSNPs in globular proteins such as binding and regulatory proteins, structural proteins, and enzymes are discussed, as well as membrane transport proteins and ion channels. The mapping of reliable predictions of the structural and functional impact of SNPs, generated from automated modeling pipelines, on to protein-protein interaction networks will facilitate new approaches to understanding complex polygenic disorders and predisposition to disease.
Collapse
Affiliation(s)
- Jonathan G L Mullins
- Genome and Structural Bioinformatics, Institute of Life Science, College of Medicine, Swansea University, Singleton Park, Swansea, Wales, UK.
| |
Collapse
|
12
|
Yang JS, Kim JH, Oh S, Han G, Lee S, Lee J. STAP Refinement of the NMR database: a database of 2405 refined solution NMR structures. Nucleic Acids Res 2011; 40:D525-30. [PMID: 22102572 PMCID: PMC3245188 DOI: 10.1093/nar/gkr1021] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
According to several studies, some nuclear magnetic resonance (NMR) structures are of lower quality, less reliable and less suitable for structural analysis than high-resolution X-ray crystallographic structures. We present a public database of 2405 refined NMR solution structures [statistical torsion angle potentials (STAP) refinement of the NMR database, http://psb.kobic.re.kr/STAP/refinement] from the Protein Data Bank (PDB). A simulated annealing protocol was employed to obtain refined structures with target potentials, including the newly developed STAP. The refined database was extensively analysed using various quality indicators from several assessment programs to determine the nuclear Overhauser effect (NOE) completeness, Ramachandran appearance, χ1-χ2 rotamer normality, various parameters for protein stability and other indicators. Most quality indicators are improved in our protocol mainly due to the inclusion of the newly developed knowledge-based potentials. This database can be used by the NMR structure community for further development of research and validation tools, structure-related studies and modelling in many fields of research.
Collapse
Affiliation(s)
- Joshua SungWoo Yang
- Korean Bioinformation Center, Korea Research Institute of Bioscience and Biotechnology, 125 Gwahak-ro Yuseong-Gu, Daejeon 305-806, The Republic of Korea
| | | | | | | | | | | |
Collapse
|
13
|
Kuziemko A, Honig B, Petrey D. Using structure to explore the sequence alignment space of remote homologs. PLoS Comput Biol 2011; 7:e1002175. [PMID: 21998567 PMCID: PMC3188491 DOI: 10.1371/journal.pcbi.1002175] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2011] [Accepted: 07/14/2011] [Indexed: 11/18/2022] Open
Abstract
Protein structure modeling by homology requires an accurate sequence alignment between the query protein and its structural template. However, sequence alignment methods based on dynamic programming (DP) are typically unable to generate accurate alignments for remote sequence homologs, thus limiting the applicability of modeling methods. A central problem is that the alignment that is “optimal” in terms of the DP score does not necessarily correspond to the alignment that produces the most accurate structural model. That is, the correct alignment based on structural superposition will generally have a lower score than the optimal alignment obtained from sequence. Variations of the DP algorithm have been developed that generate alternative alignments that are “suboptimal” in terms of the DP score, but these still encounter difficulties in detecting the correct structural alignment. We present here a new alternative sequence alignment method that relies heavily on the structure of the template. By initially aligning the query sequence to individual fragments in secondary structure elements and combining high-scoring fragments that pass basic tests for “modelability”, we can generate accurate alignments within a small ensemble. Our results suggest that the set of sequences that can currently be modeled by homology can be greatly extended. It has been suggested that, for nearly every protein sequence, there is already a protein with a similar structure in current protein structure databases. However, with poor or undetectable sequence relationships, it is expected that accurate alignments and models cannot be generated. Here we show that this is not the case, and that whenever structural relationship exists, there are usually local sequence relationships that can be used to generate an accurate alignment, no matter what the global sequence identity. However, this requires an alternative to the traditional dynamic programming algorithm and the consideration of a small ensemble of alignments. We present an algorithm, S4, and demonstrate that it is capable of generating accurate alignments in nearly all cases where a structural relationship exists between two proteins. Our results thus constitute an important advance in the full exploitation of the information in structural databases. That is, the expectation of an accurate alignment suggests that a meaningful model can be generated for nearly every sequence for which a suitable template exists.
Collapse
Affiliation(s)
- Andrew Kuziemko
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Barry Honig
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
| | - Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, United States of America
- Center for Computational Biology and Bioinformatics, Columbia University, New York, New York, United States of America
- * E-mail:
| |
Collapse
|
14
|
Chen H, Kihara D. Effect of using suboptimal alignments in template-based protein structure prediction. Proteins 2011; 79:315-34. [PMID: 21058297 PMCID: PMC3058269 DOI: 10.1002/prot.22885] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Computational protein structure prediction remains a challenging task in protein bioinformatics. In the recent years, the importance of template-based structure prediction is increasing because of the growing number of protein structures solved by the structural genomics projects. To capitalize the significant efforts and investments paid on the structural genomics projects, it is urgent to establish effective ways to use the solved structures as templates by developing methods for exploiting remotely related proteins that cannot be simply identified by homology. In this work, we examine the effect of using suboptimal alignments in template-based protein structure prediction. We showed that suboptimal alignments are often more accurate than the optimal one, and such accurate suboptimal alignments can occur even at a very low rank of the alignment score. Suboptimal alignments contain a significant number of correct amino acid residue contacts. Moreover, suboptimal alignments can improve template-based models when used as input to Modeller. Finally, we use suboptimal alignments for handling a contact potential in a probabilistic way in a threading program, SUPRB. The probabilistic contacts strategy outperforms the partly thawed approach, which only uses the optimal alignment in defining residue contacts, and also the re-ranking strategy, which uses the contact potential in re-ranking alignments. The comparison with existing methods in the template-recognition test shows that SUPRB is very competitive and outperforms existing methods.
Collapse
Affiliation(s)
- Hao Chen
- Department of Biological Sciences College of Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Biological Sciences College of Science, Purdue University, West Lafayette, IN, 47907, USA
- Department of Computer Science College of Science, Purdue University, West Lafayette, IN, 47907, USA
- Markey Center for Structural Biology College of Science, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
15
|
Abstract
Homology modeling is based on the observation that related protein sequences adopt similar three-dimensional structures. Hence, a homology model of a protein can be derived using related protein structure(s) as modeling template(s). A key step in this approach is the establishment of correspondence between residues of the protein to be modeled and those of modeling template(s). This step, often referred to as sequence-structure alignment, is one of the major determinants of the accuracy of a homology model. This chapter gives an overview of methods for deriving sequence-structure alignments and discusses recent methodological developments leading to improved performance. However, no method is perfect. How to find alignment regions that may have errors and how to make improvements? This is another focus of this chapter. Finally, the chapter provides a practical guidance of how to get the most of the available tools in maximizing the accuracy of sequence-structure alignments.
Collapse
|
16
|
Yang YD, Spratt P, Chen H, Park C, Kihara D. Sub-AQUA: real-value quality assessment of protein structure models. Protein Eng Des Sel 2010; 23:617-32. [PMID: 20525730 DOI: 10.1093/protein/gzq030] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Computational protein tertiary structure prediction has made significant progress over the past years. However, most of the existing structure prediction methods are not equipped with functionality to predict accuracy of constructed models. Knowing the accuracy of a structure model is crucial for its practical use since the accuracy determines potential applications of the model. Here we have developed quality assessment methods, which predict real value of the global and local quality of protein structure models. The global quality of a model is defined as the root mean square deviation (RMSD) and the LGA score to its native structure. The local quality is defined as the distance between the corresponding Calpha positions of a model and its native structure when they are superimposed. Three regression methods are employed to combine different types of quality assessment measures of models, including alignment-level scores, residue-position level scores, atomic-detailed structure level scores and composite scores. The regression models were tested on a large benchmark data set of template-based protein structure models of various qualities. In predicting RMSD and the LGA score, a combination of two terms, length-normalized SPAD, a score that assesses alignment stability by considering suboptimal alignments, and Verify3D normalized by the square of the model length shows a significant performance, achieving 97.1 and 83.6% accuracy in identifying models with an RMSD of <2 and 6 A, respectively. For predicting the local quality of models, we find that a two-step approach, in which the global RMSD predicted in the first step is further combined with the other terms, can dramatically increase the accuracy. Finally, the developed regression equations are applied to assess the quality of structure models of whole E. coli proteome.
Collapse
Affiliation(s)
- Yifeng David Yang
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, IN 47907, USA
| | | | | | | | | |
Collapse
|
17
|
Benkert P, Tosatto SCE, Schwede T. Global and local model quality estimation at CASP8 using the scoring functions QMEAN and QMEANclust. Proteins 2010; 77 Suppl 9:173-80. [PMID: 19705484 DOI: 10.1002/prot.22532] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Identifying the best candidate model among an ensemble of alternatives is crucial in protein structure prediction. For this purpose, scoring functions have been developed which either calculate a quality estimate on the basis of a single model or derive a score from the information contained in the ensemble of models generated for a given sequence (i.e., consensus methods). At CASP7, consensus methods have performed considerably better than scoring functions operating on single models. However, consensus methods tend to fail if the best models are far from the center of the dominant structural cluster. At CASP8, we investigated whether our hybrid method QMEANclust may overcome this limitation by combining the QMEAN composite scoring function operating on single models with consensus information. We participated with four different scoring functions in the quality assessment category. The QMEANclust consensus scoring function turned out to be a successful method both for the ranking of entire models but especially for the estimation of the per-residue model quality. In this article, we briefly describe the two scoring functions QMEAN and QMEANclust and discuss their performance in the context of what went right and wrong at CASP8. Both scoring functions are publicly available at http://swissmodel.expasy.org/qmean/.
Collapse
Affiliation(s)
- Pascal Benkert
- Biozentrum, University of Basel, Basel 4056, Switzerland
| | | | | |
Collapse
|
18
|
Benkert P, Schwede T, Tosatto SC. QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information. BMC STRUCTURAL BIOLOGY 2009; 9:35. [PMID: 19457232 PMCID: PMC2709111 DOI: 10.1186/1472-6807-9-35] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2008] [Accepted: 05/20/2009] [Indexed: 11/10/2022]
Abstract
BACKGROUND The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus. RESULTS Our recently published QMEAN composite scoring function has been improved by including an all-atom interaction potential term. The preliminary model ranking based on the new QMEAN score is used to select a subset of reliable models against which the structural consensus score is calculated. This scoring function called QMEANclust achieves a correlation coefficient of predicted quality score and GDT_TS of 0.9 averaged over the 98 CASP7 targets and perform significantly better in selecting good models from the ensemble of server models than any other groups participating in the quality estimation category of CASP7. Both scoring functions are also benchmarked on the MOULDER test set consisting of 20 target proteins each with 300 alternatives models generated by MODELLER. QMEAN outperforms all other tested scoring functions operating on individual models, while the consensus method QMEANclust only works properly on decoy sets containing a certain fraction of near-native conformations. We also present a local version of QMEAN for the per-residue estimation of model quality (QMEANlocal) and compare it to a new local consensus-based approach. CONCLUSION Improved model selection is obtained by using a composite scoring function operating on single models in order to enrich higher quality models which are subsequently used to calculate the structural consensus. The performance of consensus-based methods such as QMEANclust highly depends on the composition and quality of the model ensemble to be analysed. Therefore, performance estimates for consensus methods based on large meta-datasets (e.g. CASP) might overrate their applicability in more realistic modelling situations with smaller sets of models based on individual methods.
Collapse
Affiliation(s)
- Pascal Benkert
- Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland.
| | | | | |
Collapse
|
19
|
Benkert P, Künzli M, Schwede T. QMEAN server for protein model quality estimation. Nucleic Acids Res 2009; 37:W510-4. [PMID: 19429685 DOI: 10.1093/nar/gkp322] [Citation(s) in RCA: 593] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Model quality estimation is an essential component of protein structure prediction, since ultimately the accuracy of a model determines its usefulness for specific applications. Usually, in the course of protein structure prediction a set of alternative models is produced, from which subsequently the most accurate model has to be selected. The QMEAN server provides access to two scoring functions successfully tested at the eighth round of the community-wide blind test experiment CASP. The user can choose between the composite scoring function QMEAN, which derives a quality estimate on the basis of the geometrical analysis of single models, and the clustering-based scoring function QMEANclust which calculates a global and local quality estimate based on a weighted all-against-all comparison of the models from the ensemble provided by the user. The web server performs a ranking of the input models and highlights potentially problematic regions for each model. The QMEAN server is available at http://swissmodel.expasy.org/qmean.
Collapse
|
20
|
Kryshtafovych A, Fidelis K. Protein structure prediction and model quality assessment. Drug Discov Today 2009; 14:386-93. [PMID: 19100336 DOI: 10.1016/j.drudis.2008.11.010] [Citation(s) in RCA: 65] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Revised: 11/05/2008] [Accepted: 11/18/2008] [Indexed: 01/02/2023]
Abstract
Protein structures have proven to be a crucial piece of information for biomedical research. Of the millions of currently sequenced proteins only a small fraction is experimentally solved for structure and the only feasible way to bridge the gap between sequence and structure data is computational modeling. Half a century has passed since it was shown that the amino acid sequence of a protein determines its shape, but a method to translate the sequence code reliably into the 3D structure still remains to be developed. This review summarizes modern protein structure prediction techniques with the emphasis on comparative modeling, and describes the recent advances in methods for theoretical model quality assessment.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Protein Structure Prediction Center, Genome Center, University of California Davis, Davis, CA 95616, USA.
| | | |
Collapse
|