1
|
Hoogstraten CA, Koenderink JB, van Straaten CE, Scheer-Weijers T, Smeitink JAM, Schirris TJJ, Russel FGM. Pyruvate dehydrogenase is a potential mitochondrial off-target for gentamicin based on in silico predictions and in vitro inhibition studies. Toxicol In Vitro 2024; 95:105740. [PMID: 38036072 DOI: 10.1016/j.tiv.2023.105740] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 11/08/2023] [Accepted: 11/22/2023] [Indexed: 12/02/2023]
Abstract
During the drug development process, organ toxicity leads to an estimated failure of one-third of novel chemical entities. Drug-induced toxicity is increasingly associated with mitochondrial dysfunction, but identifying the underlying molecular mechanisms remains a challenge. Computational modeling techniques have proven to be a good tool in searching for drug off-targets. Here, we aimed to identify mitochondrial off-targets of the nephrotoxic drugs tenofovir and gentamicin using different in silico approaches (KRIPO, ProBis and PDID). Dihydroorotate dehydrogenase (DHODH) and pyruvate dehydrogenase (PDH) were predicted as potential novel off-target sites for tenofovir and gentamicin, respectively. The predicted targets were evaluated in vitro, using (colorimetric) enzymatic activity measurements. Tenofovir did not inhibit DHODH activity, while gentamicin potently reduced PDH activity. In conclusion, the use of in silico methods appeared a valuable approach in predicting PDH as a mitochondrial off-target of gentamicin. Further research is required to investigate the contribution of PDH inhibition to overall renal toxicity of gentamicin.
Collapse
Affiliation(s)
- Charlotte A Hoogstraten
- Division of Pharmacology and Toxicology, Department of Pharmacy, Radboud University Medical Center, Nijmegen 6500 HB, the Netherlands; Radboud Center for Mitochondrial Medicine, Radboud University Medical Center, Nijmegen 6500 HB, the Netherlands
| | - Jan B Koenderink
- Division of Pharmacology and Toxicology, Department of Pharmacy, Radboud University Medical Center, Nijmegen 6500 HB, the Netherlands
| | - Carolijn E van Straaten
- Division of Pharmacology and Toxicology, Department of Pharmacy, Radboud University Medical Center, Nijmegen 6500 HB, the Netherlands
| | - Tom Scheer-Weijers
- Division of Pharmacology and Toxicology, Department of Pharmacy, Radboud University Medical Center, Nijmegen 6500 HB, the Netherlands
| | - Jan A M Smeitink
- Radboud Center for Mitochondrial Medicine, Radboud University Medical Center, Nijmegen 6500 HB, the Netherlands; Department of Pediatrics, Amalia Children's Hospital, Radboud University Medical Center, Nijmegen 6500 HB, the Netherlands; Khondrion BV, Nijmegen 6525 EX, the Netherlands
| | - Tom J J Schirris
- Division of Pharmacology and Toxicology, Department of Pharmacy, Radboud University Medical Center, Nijmegen 6500 HB, the Netherlands; Radboud Center for Mitochondrial Medicine, Radboud University Medical Center, Nijmegen 6500 HB, the Netherlands
| | - Frans G M Russel
- Division of Pharmacology and Toxicology, Department of Pharmacy, Radboud University Medical Center, Nijmegen 6500 HB, the Netherlands; Radboud Center for Mitochondrial Medicine, Radboud University Medical Center, Nijmegen 6500 HB, the Netherlands.
| |
Collapse
|
2
|
Bhattacharya S, Roche R, Shuvo MH, Moussad B, Bhattacharya D. Contact-Assisted Threading in Low-Homology Protein Modeling. Methods Mol Biol 2023; 2627:41-59. [PMID: 36959441 DOI: 10.1007/978-1-0716-2974-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The ability to successfully predict the three-dimensional structure of a protein from its amino acid sequence has made considerable progress in the recent past. The progress is propelled by the improved accuracy of deep learning-based inter-residue contact map predictors coupled with the rising growth of protein sequence databases. Contact map encodes interatomic interaction information that can be exploited for highly accurate prediction of protein structures via contact map threading even for the query proteins that are not amenable to direct homology modeling. As such, contact-assisted threading has garnered considerable research effort. In this chapter, we provide an overview of existing contact-assisted threading methods while highlighting the recent advances and discussing some of the current limitations and future prospects in the application of contact-assisted threading for improving the accuracy of low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| | | | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | |
Collapse
|
3
|
Silva L, Antunes A. Omics and Remote Homology Integration to Decipher Protein Functionality. Methods Mol Biol 2023; 2627:61-81. [PMID: 36959442 DOI: 10.1007/978-1-0716-2974-1_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
In the recent years, several "omics" technologies based on specific biomolecules (from DNA, RNA, proteins, or metabolites) have won growing importance in the scientific field. Despite each omics possess their own laboratorial protocols, they share a background of bioinformatic tools for data integration and analysis. A recent subset of bioinformatic tools, based on available templates or remote homology protocols, allow computational fast and high-accuracy prediction of protein structures. The quickly predict of actually unsolved protein structures, together with late omics findings allow a boost of scientific advances in multiple fields such as cancer, longevity, immunity, mitochondrial function, toxicology, drug design, biosensors, and recombinant protein engineering. In this chapter, we assessed methodological approaches for the integration of omics and remote homology inferences to decipher protein functionality, opening the door to the next era of biological knowledge.
Collapse
Affiliation(s)
- Liliana Silva
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal
- Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal
| | - Agostinho Antunes
- CIIMAR/CIMAR, Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Porto, Portugal.
- Department of Biology, Faculty of Sciences, University of Porto, Porto, Portugal.
| |
Collapse
|
4
|
Maghrabi AHA, Aldowsari FMF, McGuffin LJ. Quality Estimates for 3D Protein Models. Methods Mol Biol 2023; 2627:101-118. [PMID: 36959444 DOI: 10.1007/978-1-0716-2974-1_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Protein structure modeling is one of the most advanced and complex processes in computational biology. One of the major problems for the protein structure prediction field has been how to estimate the accuracy of the predicted 3D models, on both a local and global level, in the absence of known structures. We must be able to accurately measure the confidence that we have in the quality predicted 3D models of proteins for them to become widely adopted by the general bioscience community. To address this major issue, it was necessary to develop new model quality assessment (MQA) methods and integrate them into our pipelines for building 3D protein models. Our MQA method, called ModFOLD, has been ranked as one of the most accurate MQA tools in independent blind evaluations. This chapter discusses model quality assessment in the protein modeling field, demonstrating both its strengths and limitations. We also present some of the best methods according to independent benchmarking data, which has been gathered in recent years.
Collapse
Affiliation(s)
- Ali H A Maghrabi
- College of Applied Sciences, Umm Al Qura University, Mecca, Saudi Arabia
| | | | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK.
| |
Collapse
|
5
|
Bertoline LMF, Lima AN, Krieger JE, Teixeira SK. Before and after AlphaFold2: An overview of protein structure prediction. FRONTIERS IN BIOINFORMATICS 2023; 3:1120370. [PMID: 36926275 PMCID: PMC10011655 DOI: 10.3389/fbinf.2023.1120370] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2022] [Accepted: 02/17/2023] [Indexed: 03/08/2023] Open
Abstract
Three-dimensional protein structure is directly correlated with its function and its determination is critical to understanding biological processes and addressing human health and life science problems in general. Although new protein structures are experimentally obtained over time, there is still a large difference between the number of protein sequences placed in Uniprot and those with resolved tertiary structure. In this context, studies have emerged to predict protein structures by methods based on a template or free modeling. In the last years, different methods have been combined to overcome their individual limitations, until the emergence of AlphaFold2, which demonstrated that predicting protein structure with high accuracy at unprecedented scale is possible. Despite its current impact in the field, AlphaFold2 has limitations. Recently, new methods based on protein language models have promised to revolutionize the protein structural biology allowing the discovery of protein structure and function only from evolutionary patterns present on protein sequence. Even though these methods do not reach AlphaFold2 accuracy, they already covered some of its limitations, being able to predict with high accuracy more than 200 million proteins from metagenomic databases. In this mini-review, we provide an overview of the breakthroughs in protein structure prediction before and after AlphaFold2 emergence.
Collapse
Affiliation(s)
- Letícia M F Bertoline
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of São Paulo Medical School, São Paulo, Brazil
| | - Angélica N Lima
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of São Paulo Medical School, São Paulo, Brazil
| | - Jose E Krieger
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of São Paulo Medical School, São Paulo, Brazil
| | - Samantha K Teixeira
- Laboratory of Genetics and Molecular Cardiology, Heart Institute, University of São Paulo Medical School, São Paulo, Brazil
| |
Collapse
|
6
|
Cao M, Shi L, Peng P, Han B, Liu L, Lv X, Ma Z, Zhang S, Sun D. Determination of genetic effects and functional SNPs of bovine HTR1B gene on milk fatty acid traits. BMC Genomics 2021; 22:575. [PMID: 34315401 PMCID: PMC8314477 DOI: 10.1186/s12864-021-07893-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Accepted: 07/15/2021] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Our previous genome-wide association study (GWAS) on milk fatty acid traits in Chinese Holstein cows revealed, the SNP, BTB-01556197, was significantly associated with C10:0 at genome-wide level (P = 0.0239). It was located in the down-stream of 5-hydroxytryptamine receptor 1B (HTR1B) gene that has been shown to play an important role in the regulation of fatty acid oxidation. Hence, we considered it as a promising candidate gene for milk fatty acids in dairy cattle. In this study, we aimed to investigate whether the HTR1B gene had significant genetic effects on milk fatty acid traits. RESULTS We re-sequenced the entire coding region and 3000 bp of 5' and 3' flanking regions of HTR1B gene. A total of 13 SNPs was identified, containing one in 5' flanking region, two in 5' untranslated region (UTR), two in exon 1, five in 3' UTR, and three in 3' flanking region. By performing genotype-phenotype association analysis with SAS9.2 software, we observed that 13 SNPs were significantly associated with medium-chain saturated fatty acids such as C6:0, C8:0 and C10:0 (P < 0.0001 ~ 0.042). With Haploview 4.1 software, linkage disequilibrium (LD) analysis was performed. Two haplotype blocks formed by two and ten SNPs were observed. Haplotype-based association analysis indicated that both haplotype blocks were strongly associated with C6:0, C8:0 and C10:0 as well (P < 0.0001 ~ 0.0071). With regards to the missense mutation in exon 1 (g.17303383G > T) that reduced amino acid change from alanine to serine, we predicted that it altered the secondary structure of HTR1B protein with SOPMA. In addition, we predicted that three SNPs in promoter region, g.17307103A > T, g.17305206 T > G and g.17303761C > T, altered the binding sites of transcription factors (TFs) HMX2, PAX2, FOXP1ES, MIZ1, CUX2, DREAM, and PPAR-RXR by Genomatix. Of them, luciferase assay experiment further confirmed that the allele T of g.17307103A > T significantly increased the transcriptional activity of HTR1B gene than allele A (P = 0.0007). CONCLUSIONS In conclusion, our findings provided first evidence that the HTR1B gene had significant genetic effects on milk fatty acids in dairy cattle.
Collapse
Affiliation(s)
- Mingyue Cao
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, 100193 China
| | - Lijun Shi
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, 100193 China
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, 100193 China
| | - Peng Peng
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, 100193 China
| | - Bo Han
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, 100193 China
| | - Lin Liu
- Beijing Dairy Cattle Center, Beijing, 100192 China
| | - Xiaoqing Lv
- Beijing Dairy Cattle Center, Beijing, 100192 China
| | - Zhu Ma
- Beijing Dairy Cattle Center, Beijing, 100192 China
| | - Shengli Zhang
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, 100193 China
| | - Dongxiao Sun
- Department of Animal Genetics, Breeding and Reproduction, College of Animal Science and Technology, Key Laboratory of Animal Genetics, Breeding and Reproduction of Ministry of Agriculture and Rural Affairs, National Engineering Laboratory for Animal Breeding, China Agricultural University, Beijing, 100193 China
| |
Collapse
|
7
|
Protein Structure Prediction: Conventional and Deep Learning Perspectives. Protein J 2021; 40:522-544. [PMID: 34050498 DOI: 10.1007/s10930-021-10003-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/21/2021] [Indexed: 10/21/2022]
Abstract
Protein structure prediction is a way to bridge the sequence-structure gap, one of the main challenges in computational biology and chemistry. Predicting any protein's accurate structure is of paramount importance for the scientific community, as these structures govern their function. Moreover, this is one of the complicated optimization problems that computational biologists have ever faced. Experimental protein structure determination methods include X-ray crystallography, Nuclear Magnetic Resonance Spectroscopy and Electron Microscopy. All of these are tedious and time-consuming procedures that require expertise. To make the process less cumbersome, scientists use predictive tools as part of computational methods, using data consolidated in the protein repositories. In recent years, machine learning approaches have raised the interest of the structure prediction community. Most of the machine learning approaches for protein structure prediction are centred on co-evolution based methods. The accuracy of these approaches depends on the number of homologous protein sequences available in the databases. The prediction problem becomes challenging for many proteins, especially those without enough sequence homologs. Deep learning methods allow for the extraction of intricate features from protein sequence data without making any intuitions. Accurately predicted protein structures are employed for drug discovery, antibody designs, understanding protein-protein interactions, and interactions with other molecules. This article provides a review of conventional and deep learning approaches in protein structure prediction. We conclude this review by outlining a few publicly available datasets and deep learning architectures currently employed for protein structure prediction tasks.
Collapse
|
8
|
Lindsay RJ, Mansbach RA, Gnanakaran S, Shen T. Effects of pH on an IDP conformational ensemble explored by molecular dynamics simulation. Biophys Chem 2021; 271:106552. [PMID: 33581430 PMCID: PMC8024028 DOI: 10.1016/j.bpc.2021.106552] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 01/15/2021] [Accepted: 01/20/2021] [Indexed: 01/03/2023]
Abstract
The conformational ensemble of intrinsically disordered proteins, such as α-synuclein, are responsible for their function and malfunction. Misfolding of α-synuclein can lead to neurodegenerative diseases, and the ability to study their conformations and those of other intrinsically disordered proteins under varying physiological conditions can be crucial to understanding and preventing pathologies. In contrast to well-folded peptides, a consensus feature of IDPs is their low hydropathy and high charge, which makes their conformations sensitive to pH perturbation. We examine a prominent member of this subset of IDPs, α-synuclein, using a divide-and-conquer scheme that provides enhanced sampling of IDP structural ensembles. We constructed conformational ensembles of α-synuclein under neutral (pH ~ 7) and low (pH ~ 3) pH conditions and compared our results with available information obtained from smFRET, SAXS, and NMR studies. Specifically, α-synuclein has been found to in a more compact state at low pH conditions and the structural changes observed are consistent with those from experiments. We also characterize the conformational and dynamic differences between these ensembles and discussed the implication on promoting pathogenic fibril formation. We find that under low pH conditions, neutralization of negatively charged residues leads to compaction of the C-terminal portion of α-synuclein while internal reorganization allows α-synuclein to maintain its overall end-to-end distance. We also observe different levels of intra-protein interaction between three regions of α-synuclein at varying pH and a shift towards more hydrophilic interactions with decreasing pH.
Collapse
Affiliation(s)
- Richard J Lindsay
- UT- ORNL Graduate School of Genome Science and Technology, Knoxville, TN, 37996, USA.
| | - Rachael A Mansbach
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM, 87544, USA; Department of Physics, Concordia University, Montreal, Quebec, Canada.
| | - S Gnanakaran
- Theoretical Biology and Biophysics, Los Alamos National Laboratory, Los Alamos, NM, 87544, USA.
| | - Tongye Shen
- Department of Biochemistry & Cellular and Molecular Biology, University of Tennessee, Knoxville, TN, 37996, USA.
| |
Collapse
|
9
|
Dhingra S, Sowdhamini R, Cadet F, Offmann B. A glance into the evolution of template-free protein structure prediction methodologies. Biochimie 2020; 175:85-92. [DOI: 10.1016/j.biochi.2020.04.026] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 04/24/2020] [Accepted: 04/27/2020] [Indexed: 11/26/2022]
|
10
|
Dokholyan NV. Experimentally-driven protein structure modeling. J Proteomics 2020; 220:103777. [PMID: 32268219 PMCID: PMC7214187 DOI: 10.1016/j.jprot.2020.103777] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/17/2020] [Accepted: 04/02/2020] [Indexed: 11/25/2022]
Abstract
Revolutions in natural and exact sciences started at the dawn of last century have led to the explosion of theoretical, experimental, and computational approaches to determine structures of molecules, complexes, as well as their rich conformational dynamics. Since different experimental methods produce information that is attributed to specific time and length scales, corresponding computational methods have to be tailored to these scales and experiments. These methods can be then combined and integrated in scales, hence producing a fuller picture of molecular structure and motion from the "puzzle pieces" offered by various experiments. Here, we describe a number of computational approaches to utilize experimental data to glance into structure of proteins and understand their dynamics. We will also discuss the limitations and the resolution of the constraints-based modeling approaches. SIGNIFICANCE: Experimentally-driven computational structure modeling and determination is a rapidly evolving alternative to traditional approaches for molecular structure determination. These new hybrid experimental-computational approaches are proving to be a powerful microscope to glance into the structural features of intrinsically or partially disordered proteins, dynamics of molecules and complexes. In this review, we describe various approaches in the field of experimentally-driven computational structure modeling.
Collapse
Affiliation(s)
- Nikolay V Dokholyan
- Department of Pharmacology, Penn State University College of Medicine, Hershey, PA 17033, USA; Department of Biochemistry & Molecular Biology, Penn State College of Medicine, Hershey, PA 17033, USA.; Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA.; Department of Biomedical Engineering, Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
11
|
Bhattacharya S, Bhattacharya D. Does inclusion of residue-residue contact information boost protein threading? Proteins 2019; 87:596-606. [PMID: 30882932 DOI: 10.1002/prot.25684] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2018] [Revised: 02/20/2019] [Accepted: 03/13/2019] [Indexed: 12/26/2022]
Abstract
Template-based modeling is considered as one of the most successful approaches for protein structure prediction. However, reliably and accurately selecting optimal template proteins from a library of known protein structures having similar folds as the target protein and making correct alignments between the target sequence and the template structures, a template-based modeling technique known as threading, remains challenging, particularly for non- or distantly-homologous protein targets. With the recent advancement in protein residue-residue contact map prediction powered by sequence co-evolution and machine learning, here we systematically analyze the effect of inclusion of residue-residue contact information in improving the accuracy and reliability of protein threading. We develop a new threading algorithm by incorporating various sequential and structural features, and subsequently integrate residue-residue contact information as an additional scoring term for threading template selection. We show that the inclusion of contact information attains statistically significantly better threading performance compared to a baseline threading algorithm that does not utilize contact information when everything else remains the same. Experimental results demonstrate that our contact based threading approach outperforms popular threading method MUSTER, contact-assisted ab initio folding method CONFOLD2, and recent state-of-the-art contact-assisted protein threading methods EigenTHREADER and map_align on several benchmarks. Our study illustrates that the inclusion of contact maps is a promising avenue in protein threading to ultimately help to improve the accuracy of protein structure prediction.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama
| |
Collapse
|
12
|
Panja AS, Bandopadhyay B, Nag A, Maiti S. Protein Secondary Structure Determination (PSSD): A New and Simple Approach. CURR PROTEOMICS 2019. [DOI: 10.2174/1570164615666180911113251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Our present investigation was conducted to explore the computational algorithm
for the protein secondary structure prediction as per the property of evolutionary transient and
large number (each 50) of homologous mesophilic-thermophilic proteins.
</P><P>
Objectives: These mesophilic-thermophilic proteins were used for numerical measurement of helix-sheetcoil
and turn tendency for which each amino-acid residue is screened to build up the propensity-table.
Methods:
In the current study, two different propensity windows have been introduced that allowed
predicting the secondary structure of protein more than 80% accuracy.
Results:
Using this propensity matrix and dynamic algorithm-based programme, a significant and decisive
outcome in the determination of protein (both thermophilic and mesophilic) secondary structure
was noticed over the previous algorithm based programme. It was demonstrated after comparison with
other standard methods including DSSP adopted by PDB with the help of multiple comparisons
ANOVA and Dunnett’s t-test.
Conclusion:
The PSSD is of great importance in the prediction of structural features of any unknown,
unresolved proteins. It is also useful in the studies of proteins structure-function relationship.
Collapse
Affiliation(s)
- Anindya Sundar Panja
- Post Graduate Department of Biotechnology, Oriental Institute of Science and Technology, Vidyasagar University, Midnapore-721102, West Bengal, India
| | - Bidyut Bandopadhyay
- Post Graduate Department of Biotechnology, Oriental Institute of Science and Technology, Vidyasagar University, Midnapore-721102, West Bengal, India
| | - Akash Nag
- Post Graduate Department of Computer Science, The University of Burdwan, Burdwan, 713104, Westbengal, India
| | - Smarajit Maiti
- Post Graduate Department of Biochemistry and Biotechnology, Cell and Molecular Therapeutics Laboratory, Oriental Institute of Science and Technology, Vidyasagar University, Midnapore- 721102, West Bengal, India
| |
Collapse
|
13
|
Hafsa NE, Berjanskii MV, Arndt D, Wishart DS. Rapid and reliable protein structure determination via chemical shift threading. JOURNAL OF BIOMOLECULAR NMR 2018; 70:33-51. [PMID: 29196969 DOI: 10.1007/s10858-017-0154-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Accepted: 11/14/2017] [Indexed: 06/07/2023]
Abstract
Protein structure determination using nuclear magnetic resonance (NMR) spectroscopy can be both time-consuming and labor intensive. Here we demonstrate how chemical shift threading can permit rapid, robust, and accurate protein structure determination using only chemical shift data. Threading is a relatively old bioinformatics technique that uses a combination of sequence information and predicted (or experimentally acquired) low-resolution structural data to generate high-resolution 3D protein structures. The key motivations behind using NMR chemical shifts for protein threading lie in the fact that they are easy to measure, they are available prior to 3D structure determination, and they contain vital structural information. The method we have developed uses not only sequence and chemical shift similarity but also chemical shift-derived secondary structure, shift-derived super-secondary structure, and shift-derived accessible surface area to generate a high quality protein structure regardless of the sequence similarity (or lack thereof) to a known structure already in the PDB. The method (called E-Thrifty) was found to be very fast (often < 10 min/structure) and to significantly outperform other shift-based or threading-based structure determination methods (in terms of top template model accuracy)-with an average TM-score performance of 0.68 (vs. 0.50-0.62 for other methods). Coupled with recent developments in chemical shift refinement, these results suggest that protein structure determination, using only NMR chemical shifts, is becoming increasingly practical and reliable. E-Thrifty is available as a web server at http://ethrifty.ca .
Collapse
Affiliation(s)
- Noor E Hafsa
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada
| | - Mark V Berjanskii
- Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada
| | - David Arndt
- Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada
| | - David S Wishart
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada.
- Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada.
| |
Collapse
|
14
|
Buchan DWA, Jones DT. EigenTHREADER: analogous protein fold recognition by efficient contact map threading. Bioinformatics 2017; 33:2684-2690. [PMID: 28419258 PMCID: PMC5860056 DOI: 10.1093/bioinformatics/btx217] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Revised: 01/18/2017] [Accepted: 04/12/2017] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein fold recognition when appropriate, evolutionarily-related, structural templates can be identified is often trivial and may even be viewed as a solved problem. However in cases where no homologous structural templates can be detected, fold recognition is a notoriously difficult problem ( Moult et al., 2014 ). Here we present EigenTHREADER, a novel fold recognition method capable of identifying folds where no homologous structures can be identified. EigenTHREADER takes a query amino acid sequence, generates a map of intra-residue contacts, and then searches a library of contact maps of known structures. To allow the contact maps to be compared, we use eigenvector decomposition to resolve the principal eigenvectors these can then be aligned using standard dynamic programming algorithms. The approach is similar to the Al-Eigen approach of Di Lena et al. (2010) , but with improvements made both to speed and accuracy. With this search strategy, EigenTHREADER does not depend directly on sequence homology between the target protein and entries in the fold library to generate models. This in turn enables EigenTHREADER to correctly identify analogous folds where little or no sequence homology information is. RESULTS EigenTHREADER outperforms well-established fold recognition methods such as pGenTHREADER and HHSearch in terms of True Positive Rate in the difficult task of analogous fold recognition. This should allow template-based modelling to be extended to many new protein families that were previously intractable to homology based fold recognition methods. AVAILABILITY AND IMPLEMENTATION All code used to generate these results and the computational protocol can be downloaded from https://github.com/DanBuchan/eigen_scripts . EigenTHREADER, the benchmark code and the data this paper is based on can be downloaded from: http://bioinfadmin.cs.ucl.ac.uk/downloads/eigenTHREADER/ . CONTACT d.t.jones@ucl.ac.uk.
Collapse
Affiliation(s)
- Daniel W A Buchan
- Department of Computer Science, University College London, Gower Street, London, UK
| | - David T Jones
- Department of Computer Science, University College London, Gower Street, London, UK
| |
Collapse
|
15
|
Unraveling the meaning of chemical shifts in protein NMR. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2017; 1865:1564-1576. [PMID: 28716441 DOI: 10.1016/j.bbapap.2017.07.005] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/22/2017] [Revised: 06/29/2017] [Accepted: 07/07/2017] [Indexed: 12/14/2022]
Abstract
Chemical shifts are among the most informative parameters in protein NMR. They provide wealth of information about protein secondary and tertiary structure, protein flexibility, and protein-ligand binding. In this report, we review the progress in interpreting and utilizing protein chemical shifts that has occurred over the past 25years, with a particular focus on the large body of work arising from our group and other Canadian NMR laboratories. More specifically, this review focuses on describing, assessing, and providing some historical context for various chemical shift-based methods to: (1) determine protein secondary and super-secondary structure; (2) derive protein torsion angles; (3) assess protein flexibility; (4) predict residue accessible surface area; (5) refine 3D protein structures; (6) determine 3D protein structures and (7) characterize intrinsically disordered proteins. This review also briefly covers some of the methods that we previously developed to predict chemical shifts from 3D protein structures and/or protein sequence data. It is hoped that this review will help to increase awareness of the considerable utility of NMR chemical shifts in structural biology and facilitate more widespread adoption of chemical-shift based methods by the NMR spectroscopists, structural biologists, protein biophysicists, and biochemists worldwide. This article is part of a Special Issue entitled: Biophysics in Canada, edited by Lewis Kay, John Baenziger, Albert Berghuis and Peter Tieleman.
Collapse
|
16
|
Faraggi E, Kloczkowski A. Accurate Prediction of One-Dimensional Protein Structure Features Using SPINE-X. Methods Mol Biol 2017; 1484:45-53. [PMID: 27787819 DOI: 10.1007/978-1-4939-6406-2_5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Accurate prediction of protein secondary structure and other one-dimensional structure features is essential for accurate sequence alignment, three-dimensional structure modeling, and function prediction. SPINE-X is a software package to predict secondary structure as well as accessible surface area and dihedral angles ϕ and ψ. For secondary structure SPINE-X achieves an accuracy of between 81 and 84 % depending on the dataset and choice of tests. The Pearson correlation coefficient for accessible surface area prediction is 0.75 and the mean absolute error from the ϕ and ψ dihedral angles are 20∘ and 33∘, respectively. The source code and a Linux executables for SPINE-X are available from Research and Information Systems at http://mamiris.com .
Collapse
Affiliation(s)
- Eshel Faraggi
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN, 46032, USA
- Research and Information Systems, LLC, Indianapolis, IN, USA
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| |
Collapse
|
17
|
Drozdetskiy A, Cole C, Procter J, Barton GJ. JPred4: a protein secondary structure prediction server. Nucleic Acids Res 2015; 43:W389-94. [PMID: 25883141 PMCID: PMC4489285 DOI: 10.1093/nar/gkv332] [Citation(s) in RCA: 1194] [Impact Index Per Article: 132.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Accepted: 03/28/2015] [Indexed: 11/13/2022] Open
Abstract
JPred4 (http://www.compbio.dundee.ac.uk/jpred4) is the latest version of the popular JPred protein secondary structure prediction server which provides predictions by the JNet algorithm, one of the most accurate methods for secondary structure prediction. In addition to protein secondary structure, JPred also makes predictions of solvent accessibility and coiled-coil regions. The JPred service runs up to 94 000 jobs per month and has carried out over 1.5 million predictions in total for users in 179 countries. The JPred4 web server has been re-implemented in the Bootstrap framework and JavaScript to improve its design, usability and accessibility from mobile devices. JPred4 features higher accuracy, with a blind three-state (α-helix, β-strand and coil) secondary structure prediction accuracy of 82.0% while solvent accessibility prediction accuracy has been raised to 90% for residues <5% accessible. Reporting of results is enhanced both on the website and through the optional email summaries and batch submission results. Predictions are now presented in SVG format with options to view full multiple sequence alignments with and without gaps and insertions. Finally, the help-pages have been updated and tool-tips added as well as step-by-step tutorials.
Collapse
Affiliation(s)
- Alexey Drozdetskiy
- Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
| | - Christian Cole
- Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
| | - James Procter
- Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
| | - Geoffrey J Barton
- Division of Computational Biology, College of Life Sciences, University of Dundee, Dundee, DD1 5EH, UK
| |
Collapse
|
18
|
Jayaram B, Dhingra P, Mishra A, Kaushik R, Mukherjee G, Singh A, Shekhar S. Bhageerath-H: a homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins. BMC Bioinformatics 2014; 15 Suppl 16:S7. [PMID: 25521245 PMCID: PMC4290660 DOI: 10.1186/1471-2105-15-s16-s7] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND The advent of human genome sequencing project has led to a spurt in the number of protein sequences in the databanks. Success of structure based drug discovery severely hinges on the availability of structures. Despite significant progresses in the area of experimental protein structure determination, the sequence-structure gap is continually widening. Data driven homology based computational methods have proved successful in predicting tertiary structures for sequences sharing medium to high sequence similarities. With dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Here we describe Bhageerath-H, a homology/ ab initio hybrid software/server for predicting protein tertiary structures with advancing drug design attempts as one of the goals. RESULTS Bhageerath-H web-server was validated on 75 CASP10 targets which showed TM-scores ≥ 0.5 in 91% of the cases and Cα RMSDs ≤ 5 Å from the native in 58% of the targets, which is well above the CASP10 water mark. Comparison with some leading servers demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution. CONCLUSION Bhageerath-H methodology is web enabled for the scientific community as a freely accessible web server. The methodology is fielded in the on-going CASP11 experiment.
Collapse
|
19
|
Three-dimensional protein structure prediction: Methods and computational strategies. Comput Biol Chem 2014; 53PB:251-276. [DOI: 10.1016/j.compbiolchem.2014.10.001] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 10/03/2014] [Accepted: 10/07/2014] [Indexed: 01/01/2023]
|
20
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
21
|
Vlachakis D, Karozou A, Kossida S. 3D Molecular Modelling Study of the H7N9 RNA-Dependent RNA Polymerase as an Emerging Pharmacological Target. INFLUENZA RESEARCH AND TREATMENT 2013; 2013:645348. [PMID: 24187616 PMCID: PMC3800656 DOI: 10.1155/2013/645348] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/05/2013] [Revised: 07/18/2013] [Accepted: 08/11/2013] [Indexed: 12/05/2022]
Abstract
Currently not much is known about the H7N9 strain, and this is the major drawback for a scientific strategy to tackle this virus. Herein, the 3D complex structure of the H7N9 RNA-dependent RNA polymerase has been established using a repertoire of molecular modelling techniques including homology modelling, molecular docking, and molecular dynamics simulations. Strikingly, it was found that the oligonucleotide cleft and tunnel in the H7N9 RNA-dependent RNA polymerase are structurally very similar to the corresponding region on the hepatitis C virus RNA-dependent RNA polymerase crystal structure. A direct comparison and a 3D postdynamics analysis of the 3D complex of the H7N9 RNA-dependent RNA polymerase provide invaluable clues and insight regarding the role and mode of action of a series of interacting residues on the latter enzyme. Our study provides a novel and efficiently intergraded platform with structural insights for the H7N9 RNA-dependent RNA Polymerase. We propose that future use and exploitation of these insights may prove invaluable in the fight against this lethal, ongoing epidemic.
Collapse
Affiliation(s)
- Dimitrios Vlachakis
- Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Soranou Efessiou 4, 11527 Athens, Greece
| | - Argiro Karozou
- Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Soranou Efessiou 4, 11527 Athens, Greece
| | - Sophia Kossida
- Bioinformatics & Medical Informatics Team, Biomedical Research Foundation, Academy of Athens, Soranou Efessiou 4, 11527 Athens, Greece
| |
Collapse
|
22
|
Vlachakis D, Kossida S. Molecular modeling and pharmacophore elucidation study of the Classical Swine Fever virus helicase as a promising pharmacological target. PeerJ 2013; 1:e85. [PMID: 23781407 PMCID: PMC3685396 DOI: 10.7717/peerj.85] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Accepted: 05/21/2013] [Indexed: 12/17/2022] Open
Abstract
The Classical Swine Fever virus (CSFV) is a major pathogen of livestock and belongs to the flaviviridae viral family. Even though there aren’t any verified zoonosis cases yet, the outcomes of CSFV epidemics have been devastating to local communities. In an effort to shed light to the molecular mechanisms underlying the structural and drug design potential of the viral helicase, the three dimensional structure of CSFV helicase has been modeled using conventional homology modeling techniques and the crystal structure of the Hepatitis C virus (HCV) as a template. The established structure of the CSFV helicase has been in silico evaluated for its viability using a repertoire of in silico tools. The ultimate goal of this study is to introduce the 3D conformation of the CSFV helicase as a reliable structure that may be used as the designing platform for de novo, structure-based drug design experiments. In this direction using the modeled structure of CSVF helicase, a 3D pharmacophore was designed. The pharmacophore comprises of a series of key characteristics that molecular inhibitors must satisfy in order to achieve maximum predicted affinity for the given enzyme. Overall, invaluable insights and conclusions are drawn from this structural study of the CSFV helicase, which may provide the scientific community with the founding plinth in the fight against CSFV infections through the perspective of the CSFV helicase as a potential pharmacological target. Notably, to date no antiviral agent is available against the CSFV nor is expected soon. Subsequently, there is urgent need for new modern and state-of-the-art antiviral strategies to be developed.
Collapse
Affiliation(s)
- Dimitrios Vlachakis
- Bioinformatics & Medical Informatics Team, Biomedical Research Foundation , Academy of Athens, Athens , Greece
| | | |
Collapse
|
23
|
Bettella F, Rasinski D, Knapp EW. Protein Secondary Structure Prediction with SPARROW. J Chem Inf Model 2012; 52:545-56. [DOI: 10.1021/ci200321u] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Francesco Bettella
- Freie Universität
Berlin,
Institut für Chemie, Fabeckstr. 36a, D-14195 Berlin, Germany
- deCODE genetics, Sturlugata
8, 101 Reykjavik, Iceland
| | - Dawid Rasinski
- Freie Universität
Berlin,
Institut für Chemie, Fabeckstr. 36a, D-14195 Berlin, Germany
| | - Ernst Walter Knapp
- Freie Universität
Berlin,
Institut für Chemie, Fabeckstr. 36a, D-14195 Berlin, Germany
| |
Collapse
|
24
|
PSS-3D1D: an improved 3D1D profile method of protein fold recognition for the annotation of twilight zone sequences. ACTA ACUST UNITED AC 2011; 12:181-9. [DOI: 10.1007/s10969-011-9119-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2011] [Accepted: 11/24/2011] [Indexed: 10/14/2022]
|
25
|
Yang Y, Faraggi E, Zhao H, Zhou Y. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 2011; 27:2076-82. [PMID: 21666270 DOI: 10.1093/bioinformatics/btr350] [Citation(s) in RCA: 241] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In recent years, development of a single-method fold-recognition server lags behind consensus and multiple template techniques. However, a good consensus prediction relies on the accuracy of individual methods. This article reports our efforts to further improve a single-method fold recognition technique called SPARKS by changing the alignment scoring function and incorporating the SPINE-X techniques that make improved prediction of secondary structure, backbone torsion angle and solvent accessible surface area. RESULTS The new method called SPARKS-X was tested with the SALIGN benchmark for alignment accuracy, Lindahl and SCOP benchmarks for fold recognition, and CASP 9 blind test for structure prediction. The method is compared to several state-of-the-art techniques such as HHPRED and BoostThreader. Results show that SPARKS-X is one of the best single-method fold recognition techniques. We further note that incorporating multiple templates and refinement in model building will likely further improve SPARKS-X. AVAILABILITY The method is available as a SPARKS-X server at http://sparks.informatics.iupui.edu/
Collapse
Affiliation(s)
- Yuedong Yang
- School of Informatics, Indiana University Purdue University, Indianapolis, IN 46202, USA
| | | | | | | |
Collapse
|
26
|
Söding J, Remmert M. Protein sequence comparison and fold recognition: progress and good-practice benchmarking. Curr Opin Struct Biol 2011; 21:404-11. [PMID: 21458982 DOI: 10.1016/j.sbi.2011.03.005] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2011] [Revised: 03/01/2011] [Accepted: 03/09/2011] [Indexed: 11/26/2022]
Abstract
Protein sequence comparison methods have grown increasingly sensitive during the last decade and can often identify distantly related proteins sharing a common ancestor some 3 billion years ago. Although cellular function is not conserved so long, molecular functions and structures of protein domains often are. In combination with a domain-centered approach to function and structure prediction, modern remote homology detection methods have a great and largely underexploited potential for elucidating protein functions and evolution. Advances during the last few years include nonlinear scoring functions combining various sequence features, the use of sequence context information, and powerful new software packages. Since progress depends on realistically assessing new and existing methods and published benchmarks are often hard to compare, we propose 10 rules of good-practice benchmarking.
Collapse
Affiliation(s)
- Johannes Söding
- Gene Center and Center for Integrated Protein Science, Ludwig-Maximilians-Universität München, Feodor-Lynen-Strasse 25, Munich, Germany.
| | | |
Collapse
|
27
|
Nelson KJ, Knutson ST, Soito L, Klomsiri C, Poole LB, Fetrow JS. Analysis of the peroxiredoxin family: using active-site structure and sequence information for global classification and residue analysis. Proteins 2011; 79:947-64. [PMID: 21287625 PMCID: PMC3065352 DOI: 10.1002/prot.22936] [Citation(s) in RCA: 136] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2010] [Revised: 10/13/2010] [Accepted: 10/25/2010] [Indexed: 12/25/2022]
Abstract
Peroxiredoxins (Prxs) are a widespread and highly expressed family of cysteine-based peroxidases that react very rapidly with H₂O₂, organic peroxides, and peroxynitrite. Correct subfamily classification has been problematic because Prx subfamilies are frequently not correlated with phylogenetic distribution and diverge in their preferred reductant, oligomerization state, and tendency toward overoxidation. We have developed a method that uses the Deacon Active Site Profiler (DASP) tool to extract functional-site profiles from structurally characterized proteins to computationally define subfamilies and to identify new Prx subfamily members from GenBank(nr). For the 58 literature-defined Prx test proteins, 57 were correctly assigned, and none were assigned to the incorrect subfamily. The >3500 putative Prx sequences identified were then used to analyze residue conservation in the active site of each Prx subfamily. Our results indicate that the existence and location of the resolving cysteine vary in some subfamilies (e.g., Prx5) to a greater degree than previously appreciated and that interactions at the A interface (common to Prx5, Tpx, and higher order AhpC/Prx1 structures) are important for stabilization of the correct active-site geometry. Interestingly, this method also allows us to further divide the AhpC/Prx1 into four groups that are correlated with functional characteristics. The DASP method provides more accurate subfamily classification than PSI-BLAST for members of the Prx family and can now readily be applied to other large protein families.
Collapse
Affiliation(s)
- Kimberly J. Nelson
- Department of Biochemistry, Wake Forest University Health Sciences, Medical Center Blvd., Winston-Salem NC 27157
| | - Stacy T. Knutson
- Departments of Physics and Computer Science, Wake Forest University, Winston-Salem, NC 27109
| | - Laura Soito
- Department of Biochemistry, Wake Forest University Health Sciences, Medical Center Blvd., Winston-Salem NC 27157
| | - Chananat Klomsiri
- Department of Biochemistry, Wake Forest University Health Sciences, Medical Center Blvd., Winston-Salem NC 27157
| | - Leslie B. Poole
- Department of Biochemistry, Wake Forest University Health Sciences, Medical Center Blvd., Winston-Salem NC 27157
| | - Jacquelyn S. Fetrow
- Departments of Physics and Computer Science, Wake Forest University, Winston-Salem, NC 27109
| |
Collapse
|
28
|
Structural bioinformatics: deriving biological insights from protein structures. Interdiscip Sci 2010; 2:347-66. [PMID: 21153779 DOI: 10.1007/s12539-010-0045-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2010] [Revised: 06/18/2010] [Accepted: 06/21/2010] [Indexed: 12/27/2022]
Abstract
Structural bioinformatics can be described as an approach that will help decipher biological insights from protein structures. As an important component of structural biology, this area promises to provide a high resolution understanding of biology by assisting comprehension and interpretation of a large amount of structural data. Biological function of protein molecules can be inferred from their three-dimensional structures by comparing structures, classifying them and transferring function from a related protein or family. It is well known now that the structure space of protein molecules is more conserved than the sequence space, making it important to seek functional associations at the structural level. An added advantage of structural bioinformatics over simpler sequence-based methods is that the former also provides ultimate insights into the mechanisms by which various biological events take place. A bird's eye-view of the different aspects of structural bioinformatics is given here along with various recent advances in the area including how knowledge obtained from structural bioinformatics can be applied in drug discovery.
Collapse
|
29
|
Schaefer C, Schlessinger A, Rost B. Protein secondary structure appears to be robust under in silico evolution while protein disorder appears not to be. ACTA ACUST UNITED AC 2010; 26:625-31. [PMID: 20081223 PMCID: PMC2828120 DOI: 10.1093/bioinformatics/btq012] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Motivation: The mutation of amino acids often impacts protein function and structure. Mutations without negative effect sustain evolutionary pressure. We study a particular aspect of structural robustness with respect to mutations: regular protein secondary structure and natively unstructured (intrinsically disordered) regions. Is the formation of regular secondary structure an intrinsic feature of amino acid sequences, or is it a feature that is lost upon mutation and is maintained by evolution against the odds? Similarly, is disorder an intrinsic sequence feature or is it difficult to maintain? To tackle these questions, we in silico mutated native protein sequences into random sequence-like ensembles and monitored the change in predicted secondary structure and disorder. Results: We established that by our coarse-grained measures for change, predictions and observations were similar, suggesting that our results were not biased by prediction mistakes. Changes in secondary structure and disorder predictions were linearly proportional to the change in sequence. Surprisingly, neither the content nor the length distribution for the predicted secondary structure changed substantially. Regions with long disorder behaved differently in that significantly fewer such regions were predicted after a few mutation steps. Our findings suggest that the formation of regular secondary structure is an intrinsic feature of random amino acid sequences, while the formation of long-disordered regions is not an intrinsic feature of proteins with disordered regions. Put differently, helices and strands appear to be maintained easily by evolution, whereas maintaining disordered regions appears difficult. Neutral mutations with respect to disorder are therefore very unlikely. Contact:schaefer@rostlab.org Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christian Schaefer
- Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics (C2B2), Columbia University, 1130 St Nicholas Ave., New York, NY 10032, USA.
| | | | | |
Collapse
|
30
|
Abstract
While the prediction of a native protein structure from sequence continues to remain a challenging problem, over the past decades computational methods have become quite successful in exploiting the mechanisms behind secondary structure formation. The great effort expended in this area has resulted in the development of a vast number of secondary structure prediction methods. Especially the combination of well-optimized/sensitive machine-learning algorithms and inclusion of homologous sequence information has led to increased prediction accuracies of up to 80%. In this chapter, we will first introduce some basic notions and provide a brief history of secondary structure prediction advances. Then a comprehensive overview of state-of-the-art prediction methods will be given. Finally, we will discuss open questions and challenges in this field and provide some practical recommendations for the user.
Collapse
Affiliation(s)
- Walter Pirovano
- Centre for Integrative Bioinformatics VU, VU University, Amsterdam, The Netherlands
| | | |
Collapse
|
31
|
Recognition of β-hairpin motifs in proteins by using the composite vector. Amino Acids 2009; 38:915-21. [DOI: 10.1007/s00726-009-0299-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2008] [Accepted: 04/20/2009] [Indexed: 10/20/2022]
|
32
|
Faraggi E, Xue B, Zhou Y. Improving the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins by guided-learning through a two-layer neural network. Proteins 2009; 74:847-56. [PMID: 18704931 DOI: 10.1002/prot.22193] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
This article attempts to increase the prediction accuracy of residue solvent accessibility and real-value backbone torsion angles of proteins through improved learning. Most methods developed for improving the backpropagation algorithm of artificial neural networks are limited to small neural networks. Here, we introduce a guided-learning method suitable for networks of any size. The method employs a part of the weights for guiding and the other part for training and optimization. We demonstrate this technique by predicting residue solvent accessibility and real-value backbone torsion angles of proteins. In this application, the guiding factor is designed to satisfy the intuitive condition that for most residues, the contribution of a residue to the structural properties of another residue is smaller for greater separation in the protein-sequence distance between the two residues. We show that the guided-learning method makes a 2-4% reduction in 10-fold cross-validated mean absolute errors (MAE) for predicting residue solvent accessibility and backbone torsion angles, regardless of the size of database, the number of hidden layers and the size of input windows. This together with introduction of two-layer neural network with a bipolar activation function leads to a new method that has a MAE of 0.11 for residue solvent accessibility, 36 degrees for psi, and 22 degrees for phi. The method is available as a Real-SPINE 3.0 server in http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Eshel Faraggi
- Indiana University School of Informatics, Indiana University-Purdue University, Indianapolis, IN 46202, USA
| | | | | |
Collapse
|
33
|
Zhu J, Fan H, Periole X, Honig B, Mark AE. Refining homology models by combining replica-exchange molecular dynamics and statistical potentials. Proteins 2009; 72:1171-88. [PMID: 18338384 DOI: 10.1002/prot.22005] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
A protocol is presented for the global refinement of homology models of proteins. It combines the advantages of temperature-based replica-exchange molecular dynamics (REMD) for conformational sampling and the use of statistical potentials for model selection. The protocol was tested using 21 models. Of these 14 were models of 10 small proteins for which high-resolution crystal structures were available, the remainder were targets of the recent CASPR exercise. It was found that REMD in combination with currently available force fields could sample near-native conformational states starting from high-quality homology models. Conformations in which the backbone RMSD of secondary structure elements (SSE-RMSD) was lower than the starting value by 0.5-1.0 A were found for 15 out of the 21 cases (average 0.82 A). Furthermore, when a simple scoring function consisting of two statistical potentials was used to rank the structures, one or more structures with SSE-RMSD of at least 0.2 A lower than the starting value was found among the five best ranked structures in 11 out of the 21 cases. The average improvement in SSE-RMSD for the best models was 0.42 A. However, none of the scoring functions tested identified the structures with the lowest SSE-RMSD as the best models although all identified the native conformation as the one with lowest energy. This suggests that while the proposed protocol proved effective for the refinement of high-quality models of small proteins scoring functions remain one of the major limiting factors in structure refinement. This and other aspects by which the methodology could be further improved are discussed.
Collapse
Affiliation(s)
- Jiang Zhu
- Howard Hughes Medical Institute and Columbia University, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, USA
| | | | | | | | | |
Collapse
|
34
|
Nicosia G, Stracquadanio G. Generalized pattern search algorithm for Peptide structure prediction. Biophys J 2008; 95:4988-99. [PMID: 18487293 PMCID: PMC2576383 DOI: 10.1529/biophysj.107.124016] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2007] [Accepted: 03/20/2008] [Indexed: 11/18/2022] Open
Abstract
Finding the near-native structure of a protein is one of the most important open problems in structural biology and biological physics. The problem becomes dramatically more difficult when a given protein has no regular secondary structure or it does not show a fold similar to structures already known. This situation occurs frequently when we need to predict the tertiary structure of small molecules, called peptides. In this research work, we propose a new ab initio algorithm, the generalized pattern search algorithm, based on the well-known class of Search-and-Poll algorithms. We performed an extensive set of simulations over a well-known set of 44 peptides to investigate the robustness and reliability of the proposed algorithm, and we compared the peptide conformation with a state-of-the-art algorithm for peptide structure prediction known as PEPstr. In particular, we tested the algorithm on the instances proposed by the originators of PEPstr, to validate the proposed algorithm; the experimental results confirm that the generalized pattern search algorithm outperforms PEPstr by 21.17% in terms of average root mean-square deviation, RMSD C(alpha).
Collapse
Affiliation(s)
- Giuseppe Nicosia
- Department of Mathematics and Computer Science, University of Catania, Catania, Italy
| | | |
Collapse
|
35
|
Bernsel A, Viklund H, Elofsson A. Remote homology detection of integral membrane proteins using conserved sequence features. Proteins 2008; 71:1387-99. [PMID: 18076048 DOI: 10.1002/prot.21825] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Compared with globular proteins, transmembrane proteins are surrounded by a more intricate environment and, consequently, amino acid composition varies between the different compartments. Existing algorithms for homology detection are generally developed with globular proteins in mind and may not be optimal to detect distant homology between transmembrane proteins. Here, we introduce a new profile-profile based alignment method for remote homology detection of transmembrane proteins in a hidden Markov model framework that takes advantage of the sequence constraints placed by the hydrophobic interior of the membrane. We expect that, for distant membrane protein homologs, even if the sequences have diverged too far to be recognized, the hydrophobicity pattern and the transmembrane topology are better conserved. By using this information in parallel with sequence information, we show that both sensitivity and specificity can be substantially improved for remote homology detection in two independent test sets. In addition, we show that alignment quality can be improved for the most distant homologs in a public dataset of membrane protein structures. Applying the method to the Pfam domain database, we are able to suggest new putative evolutionary relationships for a few relatively uncharacterized protein domain families, of which several are confirmed by other methods. The method is called Searcher for Homology Relationships of Integral Membrane Proteins (SHRIMP) and is available for download at http://www.sbc.su.se/shrimp/.
Collapse
Affiliation(s)
- Andreas Bernsel
- Center for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden
| | | | | |
Collapse
|
36
|
Duan M, Huang M, Ma C, Li L, Zhou Y. Position-specific residue preference features around the ends of helices and strands and a novel strategy for the prediction of secondary structures. Protein Sci 2008; 17:1505-12. [PMID: 18519808 DOI: 10.1110/ps.035691.108] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
It has been many years since position-specific residue preference around the ends of a helix was revealed. However, all the existing secondary structure prediction methods did not exploit this preference feature, resulting in low accuracy in predicting the ends of secondary structures. In this study, we collected a relatively large data set consisting of 1860 high-resolution, non-homology proteins from the PDB, and further analyzed the residue distributions around the ends of regular secondary structures. It was found that there exist position-specific residue preferences (PSRP) around the ends of not only helices but also strands. Based on the unique features, we proposed a novel strategy and developed a tool named E-SSpred that treats the secondary structure as a whole and builds models to predict entire secondary structure segments directly by integrating relevant features. In E-SSpred, the support vector machine (SVM) method is adopted to model and predict the ends of helices and strands according to the unique residue distributions around them. A simple linear discriminate analysis method is applied to model and predict entire secondary structure segments by integrating end-prediction results, tri-peptide composition, and length distribution features of secondary structures, as well as the prediction results of the most famous program PSIPRED. The results of fivefold cross-validation on a widely used data set demonstrate that the accuracy of E-SSpred in predicting ends of secondary structures is about 10% higher than PSIPRED, and the overall prediction accuracy (Q(3) value) of E-SSpred (82.2%) is also better than PSIPRED (80.3%). The E-SSpred web server is available at http://bioinfo.hust.edu.cn/bio/tools/E-SSpred/index.html.
Collapse
Affiliation(s)
- Mojie Duan
- Hubei Bioinformatics and Molecular Imaging Key Laboratory, Huazhong University of Science and Technology,Wuhan 430074, People's Republic of China
| | | | | | | | | |
Collapse
|
37
|
Yang YD, Park C, Kihara D. Threading without optimizing weighting factors for scoring function. Proteins 2008; 73:581-96. [DOI: 10.1002/prot.22082] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
38
|
Hu X, Li Q. Using support vector machine to predict β- and γ-turns in proteins. J Comput Chem 2008; 29:1867-75. [DOI: 10.1002/jcc.20929] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
39
|
|
40
|
Xu J, Jiao F, Yu L. Protein structure prediction using threading. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:91-121. [PMID: 18075163 DOI: 10.1007/978-1-59745-574-9_4] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
This chapter discusses the protocol for computational protein structure prediction by protein threading. First, we present a general procedure and summarize some typical ideas for each step of protein threading. Then, we describe the design and implementation of RAPTOR, a protein structure prediction program based on threading. The major focuses are three key components of RAPTOR: a linear programming approach to protein threading, two machine learning approaches (SVM and Gradient Boosting) to fold recognition, and evaluation of the statistical significance of the prediction results. The first part of this chapter is a brief review of protein threading, and the second part contains original research results. Some key ideas and results have been previously published.
Collapse
Affiliation(s)
- Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | | | | |
Collapse
|
41
|
|
42
|
Liu S, Zhang C, Liang S, Zhou Y. Fold recognition by concurrent use of solvent accessibility and residue depth. Proteins 2007; 68:636-45. [PMID: 17510969 DOI: 10.1002/prot.21459] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Recognizing the structural similarity without significant sequence identity (called fold recognition) is the key for bridging the gap between the number of known protein sequences and the number of structures solved. Previously, we developed a fold-recognition method called SP(3) which combines sequence-derived sequence profiles, secondary-structure profiles and residue-depth dependent, structure-derived sequence profiles. The use of residue-depth-dependent profiles makes SP(3) one of the best automatic predictors in CASP 6. Because residue depth (RD) and solvent accessible surface area (solvent accessibility) are complementary in describing the exposure of a residue to solvent, we test whether or not incorporation of solvent-accessibility profiles into SP(3) could further increase the accuracy of fold recognition. The resulting method, called SP(4), was tested in SALIGN benchmark for alignment accuracy and Lindahl, LiveBench 8 and CASP7 blind prediction for fold recognition sensitivity and model-structure accuracy. For remote homologs, SP(4) is found to consistently improve over SP(3) in the accuracy of sequence alignment and predicted structural models as well as in the sensitivity of fold recognition. Our result suggests that RD and solvent accessibility can be used concurrently for improving the accuracy and sensitivity of fold recognition. The SP(4) server and its local usage package are available on http://sparks.informatics.iupui.edu/SP4.
Collapse
Affiliation(s)
- Song Liu
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology and Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | | | | | |
Collapse
|
43
|
Kurgan L, Kedarisetti KD. Sequence representation and prediction of protein secondary structure for structural motifs in twilight zone proteins. Protein J 2007; 25:463-74. [PMID: 17115254 DOI: 10.1007/s10930-006-9029-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Characterizing and classifying regularities in protein structure is an important element in uncovering the mechanisms that regulate protein structure, function and evolution. Recent research concentrates on analysis of structural motifs that can be used to describe larger, fold-sized structures based on homologous primary sequences. At the same time, accuracy of secondary protein structure prediction based on multiple sequence alignment drops significantly when low homology (twilight zone) sequences are considered. To this end, this paper addresses a problem of providing an alternative sequences representation that would improve ability to distinguish secondary structures for the twilight zone sequences without using alignment. We consider a novel classification problem, in which, structural motifs, referred to as structural fragments (SFs) are defined as uniform strand, helix and coil fragments. Classification of SFs allows to design novel sequence representations, and to investigate which other factors and prediction algorithms may result in the improved discrimination. Comprehensive experimental results show that statistically significant improvement in classification accuracy can be achieved by: (1) improving sequence representations, and (2) removing possible noise on the terminal residues in the SFs. Combining these two approaches reduces the error rate on average by 15% when compared to classification using standard representation and noisy information on the terminal residues, bringing the classification accuracy to over 70%. Finally, we show that certain prediction algorithms, such as neural networks and boosted decision trees, are superior to other algorithms.
Collapse
Affiliation(s)
- Lukasz Kurgan
- Electrical and Computer Engineering Department, University of Alberta, Edmonton, Alberta, Canada, T6G 2V4.
| | | |
Collapse
|
44
|
Dor O, Zhou Y. Real-SPINE: An integrated system of neural networks for real-value prediction of protein structural properties. Proteins 2007; 68:76-81. [PMID: 17397056 DOI: 10.1002/prot.21408] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Proteins can move freely in three-dimensional space. As a result, their structural properties, such as solvent accessible surface area, backbone dihedral angles, and atomic distances, are continuous variables. However, these properties are often arbitrarily divided into a few classes to facilitate prediction by statistical learning techniques. In this work, we establish an integrated system of neural networks (called Real-SPINE) for real-value prediction and apply the method to predict residue-solvent accessibility and backbone psi dihedral angles of proteins based on information derived from sequences only. Real-SPINE is trained with a large data set of 2640 protein chains, sequence profiles generated from multiple sequence alignment, representative amino-acid properties, a slow learning rate, overfitting protection, and predicted secondary structures. The method optimizes more than 200,000 weights and yields a 10-fold cross-validated Pearson's correlation coefficient (PCC) of 0.74 between predicted and actual solvent accessible surface areas and 0.62 between predicted and actual psi angles. In particular, 90% of 2640 proteins have a PCC value greater than 0.6 between predicted and actual solvent-accessible surface areas. The results of Real-SPINE can be compared with the best reported correlation coefficients of 0.64-0.67 for solvent-accessible surface areas and 0.47 for psi angles. The real-SPINE server, executable programs, and datasets are freely available on http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Ofer Dor
- Department of Physiology and Biophysics, Howard Hughes Medical Institute Center for Single Molecule Biophysics, State University of New York at Buffalo, Buffalo, New York 14214, USA
| | | |
Collapse
|
45
|
Apatoff A, Kim E, Kliger Y. Towards alignment independent quantitative assessment of homology detection. PLoS One 2006; 1:e113. [PMID: 17205117 PMCID: PMC1762415 DOI: 10.1371/journal.pone.0000113] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2006] [Accepted: 11/28/2006] [Indexed: 11/29/2022] Open
Abstract
Identification of homologous proteins provides a basis for protein annotation. Sequence alignment tools reliably identify homologs sharing high sequence similarity. However, identification of homologs that share low sequence similarity remains a challenge. Lowering the cutoff value could enable the identification of diverged homologs, but also introduces numerous false hits. Methods are being continuously developed to minimize this problem. Estimation of the fraction of homologs in a set of protein alignments can help in the assessment and development of such methods, and provides the users with intuitive quantitative assessment of protein alignment results. Herein, we present a computational approach that estimates the amount of homologs in a set of protein pairs. The method requires a prevalent and detectable protein feature that is conserved between homologs. By analyzing the feature prevalence in a set of pairwise protein alignments, the method can estimate the number of homolog pairs in the set independently of the alignments' quality. Using the HomoloGene database as a standard of truth, we implemented this approach in a proteome-wide analysis. The results revealed that this approach, which is independent of the alignments themselves, works well for estimating the number of homologous proteins in a wide range of homology values. In summary, the presented method can accompany homology searches and method development, provides validation to search results, and allows tuning of tools and methods.
Collapse
Affiliation(s)
- Avihay Apatoff
- Compugen Ltd, Tel Aviv, Israel
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel
| | - Eddo Kim
- Compugen Ltd, Tel Aviv, Israel
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat Gan, Israel
| | - Yossef Kliger
- Compugen Ltd, Tel Aviv, Israel
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
46
|
Abstract
BACKGROUND Protein secondary structure prediction is a fundamental and important component in the analytical study of protein structure and functions. The prediction technique has been developed for several decades. The Chou-Fasman algorithm, one of the earliest methods, has been successfully applied to the prediction. However, this method has its limitations due to low accuracy, unreliable parameters, and over prediction. Thanks to the recent development in protein folding type-specific structure propensities and wavelet transformation, the shortcomings in Chou-Fasman method are able to be overcome. RESULTS We improved Chou-Fasman method in three aspects. (a) Replace the nucleation regions with extreme values of coefficients calculated by the continuous wavelet transform. (b) Substitute the original secondary structure conformational parameters with folding type-specific secondary structure propensities. (c) Modify Chou-Fasman rules. The CB396 data set was tested by using improved Chou-Fasman method and three indices: Q3, Qpre, SOV were used to measure this method. We compared the indices with those obtained from the original Chou-Fasman method and other four popular methods. The results showed that our improved Chou-Fasman method performs better than the original one in all indices, about 10-18% improvement. It is also comparable to other currently popular methods considering all the indices. CONCLUSION Our method has greatly improved Chou-Fasman method. It is able to predict protein secondary structure as good as current popular methods. By locating nucleation regions with refined wavelet transform technology and by calculating propensity factors with larger size data set, it is likely to get a better result.
Collapse
Affiliation(s)
- Hang Chen
- Department of Biomedical Engineering, College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, 310027, China
| | - Fei Gu
- Department of Biotechnology, College of Life Sciences, Zhejiang University, Hangzhou, 310027, China
| | - Zhengge Huang
- Department of Computer Science, Center for engineering and scientific computation, Zhejiang University, Hangzhou, 310027, China
| |
Collapse
|
47
|
Balaji S, Kalpana R, Shapshak P. Paradigm development: comparative and predictive 3D modeling of HIV-1 Virion Infectivity Factor (Vif). Bioinformation 2006; 1:290-309. [PMID: 17597910 PMCID: PMC1891711 DOI: 10.6026/97320630001290] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2006] [Revised: 12/04/2006] [Accepted: 12/05/2006] [Indexed: 02/03/2023] Open
Abstract
Obtaining structural information about Vif is of interest for several reasons that include the study of the interaction of Vif with
APOBEC3G, a resistance factor. Vif is a potential drug target and its function is essential for the HIV-1 infectivity process. To
study Vif mechanism of action, we need to decipher its structure. Pivotal in this approach is the painstaking prediction of its
protein structure. The three-dimensional (3D) crystal structure for Vif has not been established. In order to understand its mechanism
of action, information on the structure of Vif is very much needed. Therefore we undertook this study based on the hypothesis that
information from structurally homologous proteins can be used to predict the 3D structure of Vif by computer modeling and threading.
As a result the structure of HIV-1 Vif has been modeled and deposited in the theoretical models section and accepted with the PDB code 1VZF.
Here, we present the results of the comparative modeling strategy we used to predict the 3D structure of Vif.
Collapse
Affiliation(s)
- Seetharaaman Balaji
- Bioinformatics, School of Chemical and Biotechnology, SASTRA Deemed University, Thanjavur, Tamilnadu, India.
| | | | | |
Collapse
|
48
|
Montgomerie S, Sundararaj S, Gallin WJ, Wishart DS. Improving the accuracy of protein secondary structure prediction using structural alignment. BMC Bioinformatics 2006; 7:301. [PMID: 16774686 PMCID: PMC1550433 DOI: 10.1186/1471-2105-7-301] [Citation(s) in RCA: 103] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2005] [Accepted: 06/14/2006] [Indexed: 12/19/2022] Open
Abstract
Background The accuracy of protein secondary structure prediction has steadily improved over the past 30 years. Now many secondary structure prediction methods routinely achieve an accuracy (Q3) of about 75%. We believe this accuracy could be further improved by including structure (as opposed to sequence) database comparisons as part of the prediction process. Indeed, given the large size of the Protein Data Bank (>35,000 sequences), the probability of a newly identified sequence having a structural homologue is actually quite high. Results We have developed a method that performs structure-based sequence alignments as part of the secondary structure prediction process. By mapping the structure of a known homologue (sequence ID >25%) onto the query protein's sequence, it is possible to predict at least a portion of that query protein's secondary structure. By integrating this structural alignment approach with conventional (sequence-based) secondary structure methods and then combining it with a "jury-of-experts" system to generate a consensus result, it is possible to attain very high prediction accuracy. Using a sequence-unique test set of 1644 proteins from EVA, this new method achieves an average Q3 score of 81.3%. Extensive testing indicates this is approximately 4–5% better than any other method currently available. Assessments using non sequence-unique test sets (typical of those used in proteome annotation or structural genomics) indicate that this new method can achieve a Q3 score approaching 88%. Conclusion By using both sequence and structure databases and by exploiting the latest techniques in machine learning it is possible to routinely predict protein secondary structure with an accuracy well above 80%. A program and web server, called PROTEUS, that performs these secondary structure predictions is accessible at . For high throughput or batch sequence analyses, the PROTEUS programs, databases (and server) can be downloaded and run locally.
Collapse
Affiliation(s)
- Scott Montgomerie
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada
| | - Shan Sundararaj
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada
| | - Warren J Gallin
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada
| | - David S Wishart
- Department of Computing Science, University of Alberta, Edmonton, AB, T6G 2E8, Canada
- Department of Biological Sciences, University of Alberta, Edmonton, AB, T6G 2E9, Canada
| |
Collapse
|
49
|
Zhou H, Zhou Y. Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins 2006; 58:321-8. [PMID: 15523666 PMCID: PMC1408319 DOI: 10.1002/prot.20308] [Citation(s) in RCA: 195] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recognizing structural similarity without significant sequence identity has proved to be a challenging task. Sequence-based and structure-based methods as well as their combinations have been developed. Here, we propose a fold-recognition method that incorporates structural information without the need of sequence-to-structure threading. This is accomplished by generating sequence profiles from protein structural fragments. The structure-derived sequence profiles allow a simple integration with evolution-derived sequence profiles and secondary-structural information for an optimized alignment by efficient dynamic programming. The resulting method (called SP(3)) is found to make a statistically significant improvement in both sensitivity of fold recognition and accuracy of alignment over the method based on evolution-derived sequence profiles alone (SP) and the method based on evolution-derived sequence profile and secondary structure profile (SP(2)). SP(3) was tested in SALIGN benchmark for alignment accuracy and Lindahl, PROSPECTOR 3.0, and LiveBench 8.0 benchmarks for remote-homology detection and model accuracy. SP(3) is found to be the most sensitive and accurate single-method server in all benchmarks tested where other methods are available for comparison (although its results are statistically indistinguishable from the next best in some cases and the comparison is subjected to the limitation of time-dependent sequence and/or structural library used by different methods.). In LiveBench 8.0, its accuracy rivals some of the consensus methods such as ShotGun-INBGU, Pmodeller3, Pcons4, and ROBETTA. SP(3) fold-recognition server is available on http://theory.med.buffalo.edu.
Collapse
Affiliation(s)
| | - Yaoqi Zhou
- *Correspondence to: Dr. Yaoqi Zhou, Howard Hughes Medical Institute, Center for Single Molecule Biophysics and Department of Physiology & Biophysics, State University of New York at Buffalo, 124 Sherman Hall, Buffalo, NY 14214. E-mail:
| |
Collapse
|
50
|
Abstract
Homology modeling plays a central role in determining protein structure in the structural genomics project. The importance of homology modeling has been steadily increasing because of the large gap that exists between the overwhelming number of available protein sequences and experimentally solved protein structures, and also, more importantly, because of the increasing reliability and accuracy of the method. In fact, a protein sequence with over 30% identity to a known structure can often be predicted with an accuracy equivalent to a low-resolution X-ray structure. The recent advances in homology modeling, especially in detecting distant homologues, aligning sequences with template structures, modeling of loops and side chains, as well as detecting errors in a model, have contributed to reliable prediction of protein structure, which was not possible even several years ago. The ongoing efforts in solving protein structures, which can be time-consuming and often difficult, will continue to spur the development of a host of new computational methods that can fill in the gap and further contribute to understanding the relationship between protein structure and function.
Collapse
Affiliation(s)
- Zhexin Xiang
- Center for Molecular Modeling, Center for Information Technology, National Institutes of Health, Building 12A Room 2051, 12 South Drive, Bethesda, Maryland 20892-5624, USA.
| |
Collapse
|