1
|
Xie M, Li Y, Xu L, Zhang S, Ye H, Sun F, Mei R, Su X. Optimization of bacterial cytokine protein production by response surface methodology for environmental bioremediation. RSC Adv 2021; 11:36105-36115. [PMID: 35492803 PMCID: PMC9043431 DOI: 10.1039/d1ra03565g] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Accepted: 10/12/2021] [Indexed: 11/21/2022] Open
Abstract
In natural and engineered systems, most microorganisms would enter a state of dormancy termed as “viable but non-culturable” (VBNC) state when they are exposed to unpredictable environmental stress. One of the major advances in resuscitating from such a state is the discovery of a kind of bacterial cytokine protein called resuscitation-promoting factor (Rpf), which is secreted from Micrococcus luteus. In this study, the optimization of Rpf production was investigated by the response surface methodology (RSM). Results showed that an empirical quadratic model well predicted the Rpf yield, and the highest Rpf protein yield could be obtained at the optimal conditions of 59.56 mg L−1 IPTG, cell density 0.69, induction temperature 20.82 °C and culture time 7.72 h. Importantly, Phyre2 web portal characterized the structure of the Rpf domain to have a shared homology with lysozymes, and the highest lysozyme activity was at pH 5 and 50 °C. This study broadens the knowledge of Rpf production and provided potential strategies to apply Rpf as a bioactivator for environmental bioremediation. A group of secreted proteins from M. luteus, recognized as resuscitation promoting factors (Rpf) can resuscitate the viable but non-culturable (VBNC) state bacteria which have the potential function of environmental bioremediation.![]()
Collapse
Affiliation(s)
- Mengqi Xie
- College of Geography and Environmental Science, Zhejiang Normal University, Yingbin Road 688#, Jinhua 321004, China
| | - Yilin Li
- College of Geography and Environmental Science, Zhejiang Normal University, Yingbin Road 688#, Jinhua 321004, China
| | - Luning Xu
- College of Geography and Environmental Science, Zhejiang Normal University, Yingbin Road 688#, Jinhua 321004, China
| | - Shusheng Zhang
- The Management Center of Wuyanling National Natural Reserve in Zhejiang, Wenzhou 325500, China
| | - Hongyu Ye
- Eco-Environmental Science Design & Research Institute of Zhejiang Province, Hangzhou 310007, China
| | - Faqian Sun
- College of Geography and Environmental Science, Zhejiang Normal University, Yingbin Road 688#, Jinhua 321004, China
| | - Rongwu Mei
- Eco-Environmental Science Design & Research Institute of Zhejiang Province, Hangzhou 310007, China
| | - Xiaomei Su
- College of Geography and Environmental Science, Zhejiang Normal University, Yingbin Road 688#, Jinhua 321004, China
| |
Collapse
|
2
|
Zheng W, Zhang C, Wuyun Q, Pearce R, Li Y, Zhang Y. LOMETS2: improved meta-threading server for fold-recognition and structure-based function annotation for distant-homology proteins. Nucleic Acids Res 2020; 47:W429-W436. [PMID: 31081035 PMCID: PMC6602514 DOI: 10.1093/nar/gkz384] [Citation(s) in RCA: 94] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Revised: 04/19/2019] [Accepted: 04/30/2019] [Indexed: 12/13/2022] Open
Abstract
The LOMETS2 server (https://zhanglab.ccmb.med.umich.edu/LOMETS/) is an online meta-threading server system for template-based protein structure prediction. Although the server has been widely used by the community over the last decade, the previous LOMETS server no longer represents the state-of-the-art due to aging of the algorithms and unsatisfactory performance on distant-homology template identification. An extension of the server built on cutting-edge methods, especially techniques developed since the recent CASP experiments, is urgently needed. In this work, we report the recent advancements of the LOMETS2 server, which comprise a number of major new developments, including (i) new state-of-the-art threading programs, including contact-map-based threading approaches, (ii) deep sequence search-based sequence profile construction and (iii) a new web interface design that incorporates structure-based function annotations. Large-scale benchmark tests demonstrated that the integration of the deep profiles and new threading approaches into LOMETS2 significantly improve its structure modeling quality and template detection, where LOMETS2 detected 176% more templates with TM-scores >0.5 than the previous LOMETS server for Hard targets that lacked homologous templates. Meanwhile, the newly incorporated structure-based function prediction helps extend the usefulness of the online server to the broader biological community.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Qiqige Wuyun
- Computer Science and Engineering Department, Michigan State University, East Lansing, MI 48824, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
3
|
Bhat AS, Grishin NV. Predicting Sequence Features, Function, and Structure of Proteins Using MESSA. CURRENT PROTOCOLS IN BIOINFORMATICS 2019; 67:e84. [PMID: 31524991 PMCID: PMC6750024 DOI: 10.1002/cpbi.84] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
MEta-Server for protein Sequence Analysis (MESSA) is a tool that facilitates widespread protein sequence analysis by gathering structural (local sequence properties and three-dimensional structure) and functional (annotations from SWISS-PROT, Gene Ontology terms, and enzyme classification) predictions for a query protein of interest. MESSA uses multiple well-established tools to offer consensus-based predictions on important aspects of protein sequence analysis. Being freely available for noncommercial users and with a user-friendly interface, MESSA serves as an umbrella platform that overcomes the absence of a comprehensive tool for predictive protein analysis. This article reveals how to access MESSA via the Web and shows how to input a protein sequence to analyze using the MESSA web server. It also includes a detailed explanation of the output from MESSA to aid in better interpretation of results. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Archana S. Bhat
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050
| |
Collapse
|
4
|
In Silico Screening of Aptamers Configuration against Hepatitis B Surface Antigen. Adv Bioinformatics 2019; 2019:6912914. [PMID: 31346332 PMCID: PMC6617924 DOI: 10.1155/2019/6912914] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Revised: 04/20/2019] [Accepted: 04/30/2019] [Indexed: 01/05/2023] Open
Abstract
Aptamer has been long studied as a substitute of antibodies for many purposes. However, due to the exceeded length of the aptamers obtained in vitro, difficulties arise in its manipulation during its molecular conjugation on the matrix surfaces. Current study focuses on computational improvement for aptamers screening of hepatitis B surface antigen (HBsAg) through optimization of the length sequences obtained from SELEX. Three original aptamers with affinity against HBsAg were truncated into five short hairpin structured aptamers and their affinity against HBsAg was thoroughly studied by molecular docking, molecular dynamics (MD) simulation, and Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA) method. The result shows that truncated aptamers binding on HBsAg “a” determinant region are stabilized by the dynamic H-bond formation between the active binding residues and nucleotides. Amino acids residues with the highest hydrogen bonds hydrogen bond interactions with all five aptamers were determined as the active binding residues and further characterized. The computational prediction of complexes binding will include validations through experimental assays in future studies. Current study will improve the current in vitro aptamers by minimizing the aptamer length for its easy manipulation.
Collapse
|
5
|
The catalytic inactivation of the N-half of human hexokinase 2 and structural and biochemical characterization of its mitochondrial conformation. Biosci Rep 2018; 38:BSR20171666. [PMID: 29298880 PMCID: PMC5803496 DOI: 10.1042/bsr20171666] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2017] [Revised: 12/21/2017] [Accepted: 01/01/2018] [Indexed: 01/06/2023] Open
Abstract
The high proliferation rate of tumor cells demands high energy and metabolites that are sustained by a high glycolytic flux known as the 'Warburg effect'. The activation and further metabolism of glucose is initiated by hexokinase, a focal point of metabolic regulation. The human hexokinase 2 (HK2) is overexpressed in all aggressive tumors and predominantly found on the outer mitochondrial membrane, where interactions through its N-terminus initiates and maintains tumorigenesis. Here, we report the structure of HK2 in complex with glucose and glucose-6-phosphate (G6P). Structural and biochemical characterization of the mitochondrial conformation reveals higher conformational stability and slow protein unfolding rate (ku) compared with the cytosolic conformation. Despite the active site similarity of all human hexokinases, the N-domain of HK2 is catalytically active but not in hexokinase 1 and 3. Helix-α13 that protrudes out of the N-domain to link it to the C-domain of HK2 is found to be important in maintaining the catalytic activity of the N-half. In addition, the N-domain of HK2 regulates the stability of the whole enzyme in contrast with the C-domain. Glucose binding enhanced the stability of the wild-type (WT) enzyme and the single mutant D657A of the C-domain, but it did not increase the stability of the D209A mutant of the N-domain. The interaction of HK2 with the mitochondria through its N-half is proposed to facilitate higher stability on the mitochondria. The identification of structural and biochemical differences between HK2 and other human hexokinase isozymes could potentially be used in the development of new anticancer therapies.
Collapse
|
6
|
DeBenedictis EP, Ma D, Keten S. Structural predictions for curli amyloid fibril subunits CsgA and CsgB. RSC Adv 2017. [DOI: 10.1039/c7ra08030a] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
CsgA are the building blocks of curli fibrils.
Collapse
Affiliation(s)
- E. P. DeBenedictis
- Department of Civil and Environmental Engineering and Mechanical Engineering
- Northwestern University
- Evanston
- USA
| | - D. Ma
- Department of Civil and Environmental Engineering and Mechanical Engineering
- Northwestern University
- Evanston
- USA
| | - S. Keten
- Department of Civil and Environmental Engineering and Mechanical Engineering
- Northwestern University
- Evanston
- USA
| |
Collapse
|
7
|
Ingale AG. Prediction of Structural and Functional Aspects of Protein. PHARMACEUTICAL SCIENCES 2017. [DOI: 10.4018/978-1-5225-1762-7.ch021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
To predict the structure of protein from a primary amino acid sequence is computationally difficult. An investigation of the methods and algorithms used to predict protein structure and a thorough knowledge of the function and structure of proteins are critical for the advancement of biology and the life sciences as well as the development of better drugs, higher-yield crops, and even synthetic bio-fuels. To that end, this chapter sheds light on the methods used for protein structure prediction. This chapter covers the applications of modeled protein structures and unravels the relationship between pure sequence information and three-dimensional structure, which continues to be one of the greatest challenges in molecular biology. With this resource, it presents an all-encompassing examination of the problems, methods, tools, servers, databases, and applications of protein structure prediction, giving unique insight into the future applications of the modeled protein structures. In this chapter, current protein structure prediction methods are reviewed for a milieu on structure prediction, the prediction of structural fundamentals, tertiary structure prediction, and functional imminent. The basic ideas and advances of these directions are discussed in detail.
Collapse
|
8
|
Priya P, Kesheri M, Sinha RP, Kanchan S. Molecular Dynamics Simulations for Biological Systems. PHARMACEUTICAL SCIENCES 2017. [DOI: 10.4018/978-1-5225-1762-7.ch040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Molecular dynamics simulation is an important tool to capture the dynamicity of biological molecule and the atomistic insights. These insights are helpful to explore biological functions. Molecular dynamics simulation from femto seconds to milli seconds scale give a large ensemble of conformations that can reveal many biological mysteries. The main focus of the chapter is to throw light on theories, requirement of molecular dynamics for biological studies and application of molecular dynamics simulations. Molecular dynamics simulations are widely used to study protein-protein interaction, protein-ligand docking, effects of mutation on interactions, protein folding and flexibility of the biological molecules. This chapter also deals with various methods/algorithms of protein tertiary structure prediction, their strengths and weaknesses.
Collapse
|
9
|
Three-dimensional protein structure prediction: Methods and computational strategies. Comput Biol Chem 2014; 53PB:251-276. [DOI: 10.1016/j.compbiolchem.2014.10.001] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 10/03/2014] [Accepted: 10/07/2014] [Indexed: 01/01/2023]
|
10
|
Ko SS, Li MJ, Sun-Ben Ku M, Ho YC, Lin YJ, Chuang MH, Hsing HX, Lien YC, Yang HT, Chang HC, Chan MT. The bHLH142 Transcription Factor Coordinates with TDR1 to Modulate the Expression of EAT1 and Regulate Pollen Development in Rice. THE PLANT CELL 2014; 26:2486-2504. [PMID: 24894043 PMCID: PMC4114947 DOI: 10.1105/tpc.114.126292] [Citation(s) in RCA: 113] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Revised: 05/07/2014] [Accepted: 05/13/2014] [Indexed: 05/18/2023]
Abstract
Male sterility plays an important role in F1 hybrid seed production. We identified a male-sterile rice (Oryza sativa) mutant with impaired pollen development and a single T-DNA insertion in the transcription factor gene bHLH142. Knockout mutants of bHLH142 exhibited retarded meiosis and defects in tapetal programmed cell death. RT-PCR and in situ hybridization analyses showed that bHLH142 is specifically expressed in the anther, in the tapetum, and in meiocytes during early meiosis. Three basic helix-loop-helix transcription factors, UDT1 (bHLH164), TDR1 (bHLH5), and EAT1/DTD1 (bHLH141) are known to function in rice pollen development. bHLH142 acts downstream of UDT1 and GAMYB but upstream of TDR1 and EAT1 in pollen development. In vivo and in vitro assays demonstrated that bHLH142 and TDR1 proteins interact. Transient promoter assays demonstrated that regulation of the EAT1 promoter requires bHLH142 and TDR1. Consistent with these results, 3D protein structure modeling predicted that bHLH142 and TDR1 form a heterodimer to bind to the EAT1 promoter. EAT1 positively regulates the expression of AP37 and AP25, which induce tapetal programmed cell death. Thus, in this study, we identified bHLH142 as having a pivotal role in tapetal programmed cell death and pollen development.
Collapse
Affiliation(s)
- Swee-Suak Ko
- Academia Sinica Biotechnology Center in Southern Taiwan, Tainan 741, Taiwan Agricultural Biotechnology Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Min-Jeng Li
- Academia Sinica Biotechnology Center in Southern Taiwan, Tainan 741, Taiwan
| | - Maurice Sun-Ben Ku
- Institute of Bioagricultural Science, National Chiayi University, Chiayi 600, Taiwan
| | - Yi-Cheng Ho
- Institute of Bioagricultural Science, National Chiayi University, Chiayi 600, Taiwan
| | - Yi-Jyun Lin
- Academia Sinica Biotechnology Center in Southern Taiwan, Tainan 741, Taiwan
| | - Ming-Hsing Chuang
- Department of Life Sciences, National Cheng Kung University, Tainan 701, Taiwan
| | - Hong-Xian Hsing
- Academia Sinica Biotechnology Center in Southern Taiwan, Tainan 741, Taiwan
| | - Yi-Chen Lien
- Academia Sinica Biotechnology Center in Southern Taiwan, Tainan 741, Taiwan
| | - Hui-Ting Yang
- Academia Sinica Biotechnology Center in Southern Taiwan, Tainan 741, Taiwan
| | - Hung-Chia Chang
- Academia Sinica Biotechnology Center in Southern Taiwan, Tainan 741, Taiwan
| | - Ming-Tsair Chan
- Academia Sinica Biotechnology Center in Southern Taiwan, Tainan 741, Taiwan Agricultural Biotechnology Research Center, Academia Sinica, Taipei 115, Taiwan
| |
Collapse
|
11
|
Maity A, Majumdar S, Priya P, De P, Saha S, Ghosh Dastidar S. Adaptability in protein structures: structural dynamics and implications in ligand design. J Biomol Struct Dyn 2014; 33:298-321. [PMID: 24433438 DOI: 10.1080/07391102.2013.873002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The basic framework of understanding the mechanisms of protein functions is achieved from the knowledge of their structures which can model the molecular recognition. Recent advancement in the structural biology has revealed that in spite of the availability of the structural data, it is nontrivial to predict the mechanism of the molecular recognition which progresses via situation-dependent structural adaptation. The mutual selectivity of protein-protein and protein-ligand interactions often depends on the modulations of conformations empowered by their inherent flexibility, which in turn regulates the function. The mechanism of a protein's function, which used to be explained by the ideas of 'lock and key' has evolved today as the concept of 'induced fit' as well as the 'population shift' models. It is felt that the 'dynamics' is an essential feature to take into account for understanding the mechanism of protein's function. The design principles of therapeutic molecules suffer from the problems of plasticity of the receptors whose binding conformations are accurately not predictable from the prior knowledge of a template structure. On the other hand, flexibility of the receptors provides the opportunity to improve the binding affinity of a ligand by suitable substitution that will maximize the binding by modulating the receptors surface. In this paper, we discuss with example how the protein's flexibility is correlated with its functions in various systems, revealing the importance of its understanding and for making applications. We also highlight the methodological challenges to investigate it computationally and to account for the flexible nature of the molecules in drug design.
Collapse
Affiliation(s)
- Atanu Maity
- a Bioinformatics Centre, Bose Institute , P-1/12, C.I.T. Scheme VII M, Kolkata 700054 , India
| | | | | | | | | | | |
Collapse
|
12
|
Predicting PDZ domain mediated protein interactions from structure. BMC Bioinformatics 2013; 14:27. [PMID: 23336252 PMCID: PMC3602153 DOI: 10.1186/1471-2105-14-27] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2012] [Accepted: 12/19/2012] [Indexed: 12/03/2022] Open
Abstract
Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training–testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors at
http://webservice.baderlab.org/domains/POW.
Collapse
|
13
|
|
14
|
Li X, Zhang Z, Song J. Computational enzyme design approaches with significant biological outcomes: progress and challenges. Comput Struct Biotechnol J 2012; 2:e201209007. [PMID: 24688648 PMCID: PMC3962085 DOI: 10.5936/csbj.201209007] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2012] [Revised: 09/27/2012] [Accepted: 10/04/2012] [Indexed: 11/29/2022] Open
Abstract
Enzymes are powerful biocatalysts, however, so far there is still a large gap between the number of enzyme-based practical applications and that of naturally occurring enzymes. Multiple experimental approaches have been applied to generate nearly all possible mutations of target enzymes, allowing the identification of desirable variants with improved properties to meet the practical needs. Meanwhile, an increasing number of computational methods have been developed to assist in the modification of enzymes during the past few decades. With the development of bioinformatic algorithms, computational approaches are now able to provide more precise guidance for enzyme engineering and make it more efficient and less laborious. In this review, we summarize the recent advances of method development with significant biological outcomes to provide important insights into successful computational protein designs. We also discuss the limitations and challenges of existing methods and the future directions that should improve them.
Collapse
Affiliation(s)
- Xiaoman Li
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, Tianjin 300308, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, Tianjin 300308, China ; Department of Biochemistry and Molecular Biology and ARC Centre of Excellence in Structural and Functional Microbial Genomics, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
15
|
Abstract
Background Computational sequence analysis, that is, prediction of local sequence properties, homologs, spatial structure and function from the sequence of a protein, offers an efficient way to obtain needed information about proteins under study. Since reliable prediction is usually based on the consensus of many computer programs, meta-severs have been developed to fit such needs. Most meta-servers focus on one aspect of sequence analysis, while others incorporate more information, such as PredictProtein for local sequence feature predictions, SMART for domain architecture and sequence motif annotation, and GeneSilico for secondary and spatial structure prediction. However, as predictions of local sequence properties, three-dimensional structure and function are usually intertwined, it is beneficial to address them together. Results We developed a MEta-Server for protein Sequence Analysis (MESSA) to facilitate comprehensive protein sequence analysis and gather structural and functional predictions for a protein of interest. For an input sequence, the server exploits a number of select tools to predict local sequence properties, such as secondary structure, structurally disordered regions, coiled coils, signal peptides and transmembrane helices; detect homologous proteins and assign the query to a protein family; identify three-dimensional structure templates and generate structure models; and provide predictive statements about the protein's function, including functional annotations, Gene Ontology terms, enzyme classification and possible functionally associated proteins. We tested MESSA on the proteome of Candidatus Liberibacter asiaticus. Manual curation shows that three-dimensional structure models generated by MESSA covered around 75% of all the residues in this proteome and the function of 80% of all proteins could be predicted. Availability MESSA is free for non-commercial use at http://prodata.swmed.edu/MESSA/
Collapse
|
16
|
Reimand J, Hui S, Jain S, Law B, Bader GD. Domain-mediated protein interaction prediction: From genome to network. FEBS Lett 2012; 586:2751-63. [PMID: 22561014 DOI: 10.1016/j.febslet.2012.04.027] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2012] [Accepted: 04/17/2012] [Indexed: 11/19/2022]
Abstract
Protein-protein interactions (PPIs), involved in many biological processes such as cellular signaling, are ultimately encoded in the genome. Solving the problem of predicting protein interactions from the genome sequence will lead to increased understanding of complex networks, evolution and human disease. We can learn the relationship between genomes and networks by focusing on an easily approachable subset of high-resolution protein interactions that are mediated by peptide recognition modules (PRMs) such as PDZ, WW and SH3 domains. This review focuses on computational prediction and analysis of PRM-mediated networks and discusses sequence- and structure-based interaction predictors, techniques and datasets for identifying physiologically relevant PPIs, and interpreting high-resolution interaction networks in the context of evolution and human disease.
Collapse
Affiliation(s)
- Jüri Reimand
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario, Canada.
| | | | | | | | | |
Collapse
|
17
|
Identification of new hematopoietic cell subsets with a polyclonal antibody library specific for neglected proteins. PLoS One 2012; 7:e34395. [PMID: 22496798 PMCID: PMC3319577 DOI: 10.1371/journal.pone.0034395] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2011] [Accepted: 02/27/2012] [Indexed: 11/19/2022] Open
Abstract
The identification of new markers, the expression of which defines new phenotipically and functionally distinct cell subsets, is a main objective in cell biology. We have addressed the issue of identifying new cell specific markers with a reverse proteomic approach whereby approximately 1700 human open reading frames encoding proteins predicted to be transmembrane or secreted have been selected in silico for being poorly known, cloned and expressed in bacteria. These proteins have been purified and used to immunize mice with the aim of obtaining polyclonal antisera mostly specific for linear epitopes. Such a library, made of about 1600 different polyclonal antisera, has been obtained and screened by flow cytometry on cord blood derived CD34+CD45dim cells and on peripheral blood derived mature lymphocytes (PBLs). We identified three new proteins expressed by fractions of CD34+CD45dim cells and eight new proteins expressed by fractions of PBLs. Remarkably, we identified proteins the presence of which had not been demonstrated previously by transcriptomic analysis. From the functional point of view, looking at new proteins expressed on CD34+CD45dim cells, we identified one cell surface protein (MOSC-1) the expression of which on a minority of CD34+ progenitors marks those CD34+CD45dim cells that will go toward monocyte/granulocyte differentiation. In conclusion, we show a new way of looking at the membranome by assessing expression of generally neglected proteins with a library of polyclonal antisera, and in so doing we have identified new potential subsets of hematopoietic progenitors and of mature PBLs.
Collapse
|
18
|
La D, Kihara D. A novel method for protein-protein interaction site prediction using phylogenetic substitution models. Proteins 2011; 80:126-41. [PMID: 21989996 DOI: 10.1002/prot.23169] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2011] [Revised: 07/07/2011] [Accepted: 08/17/2011] [Indexed: 11/10/2022]
Abstract
Protein-protein binding events mediate many critical biological functions in the cell. Typically, functionally important sites in proteins can be well identified by considering sequence conservation. However, protein-protein interaction sites exhibit higher sequence variation than other functional regions, such as catalytic sites of enzymes. Consequently, the mutational behavior leading to weak sequence conservation poses significant challenges to the protein-protein interaction site prediction. Here, we present a phylogenetic framework to capture critical sequence variations that favor the selection of residues essential for protein-protein binding. Through the comprehensive analysis of diverse protein families, we show that protein binding interfaces exhibit distinct amino acid substitution as compared with other surface residues. On the basis of this analysis, we have developed a novel method, BindML, which utilizes the substitution models to predict protein-protein binding sites of protein with unknown interacting partners. BindML estimates the likelihood that a phylogenetic tree of a local surface region in a query protein structure follows the substitution patterns of protein binding interface and nonbinding surfaces. BindML is shown to perform well compared to alternative methods for protein binding interface prediction. The methodology developed in this study is very versatile in the sense that it can be generally applied for predicting other types of functional sites, such as DNA, RNA, and membrane binding sites in proteins.
Collapse
Affiliation(s)
- David La
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, Indiana 47907, USA
| | | |
Collapse
|
19
|
A second Ig-like domain identified in dystroglycan by molecular modelling and dynamics. J Mol Graph Model 2011; 29:1015-24. [PMID: 21605994 DOI: 10.1016/j.jmgm.2011.04.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2011] [Revised: 04/19/2011] [Accepted: 04/21/2011] [Indexed: 11/23/2022]
Abstract
Dystroglycan (DG) is a cell surface receptor which is composed of two subunits that interact noncovalently, namely α- and β-DG. In skeletal muscle, DG is the central component of the dystrophin-glycoprotein complex (DGC) that anchors the actin cytoskeleton to the extracellular matrix. To date only the three-dimensional structure of the N-terminal region of α-DG has been solved by X-ray crystallography. To expand such a structural analysis, a theoretical molecular model of the murine α-DG C-terminal region was built based on folding recognition/threading techniques. Although there is no a significant (<30%) sequence homology with the N-terminal region of α-DG, protein fold recognition methods found a significant resemblance to the α-DG N-terminal crystallographic structure. Our in silico structural prediction identified two subdomains in this region. Amino acid residues ∼ 500-600 of α-DG were predicted to adopt an immunoglobulin-like (Ig-like) β-sandwich fold. Such modeled domain includes the β-DG binding epitope of α-DG and, confirming our previous experimental results, suggests that the linear epitope (residues 550-565) assumes a β-strand conformation. The remaining segment of the α-DG C-terminal region (residues 601-653) is organized in a coil-helix-coil motif. A 20-ns molecular dynamics simulation in explicit water solvent provided support to the predicted Ig-like model structure. The identification of a second Ig-like domain in DG represents another important step towards a full structural and functional description of the α/β DG interface. Preliminary characterization of a novel recombinant peptide (505-600) encompassing this second Ig-like domain demonstrates that it is soluble and stable, further corroborating our in silico analysis.
Collapse
|
20
|
Evolutionary reshaping of fungal mating pathway scaffold proteins. mBio 2011; 2:e00230-10. [PMID: 21249169 PMCID: PMC3023161 DOI: 10.1128/mbio.00230-10] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2010] [Accepted: 12/03/2010] [Indexed: 02/08/2023] Open
Abstract
Scaffold proteins play central roles in the function of many signaling pathways. Among the best-studied examples are the Ste5 and Far1 proteins of the yeast Saccharomyces cerevisiae. These proteins contain three conserved modules, the RING and PH domains, characteristic of some ubiquitin-ligating enzymes, and a vWA domain implicated in protein-protein interactions. In yeast, Ste5p regulates the mating pathway kinases while Far1p coordinates the cellular polarity machinery. Within the fungal lineage, the Basidiomycetes and the Pezizomycetes contain a single Far1-like protein, while several Saccharomycotina species, belonging to the CTG (Candida) clade, contain both a classic Far1-like protein and a Ste5-like protein that lacks the vWA domain. We analyzed the function of C. albicans Ste5p (Cst5p), a member of this class of structurally distinct Ste5 proteins. CST5 is essential for mating and still coordinates the mitogen-activated protein (MAP) kinase (MAPK) cascade elements in the absence of the vWA domain; Cst5p interacts with the MEK kinase (MEKK) C. albicans Ste11p (CaSte11p) and the MAPK Cek1 as well as with the MEK Hst7 in a vWA domain-independent manner. Cst5p can homodimerize, similar to Ste5p, but can also heterodimerize with Far1p, potentially forming heteromeric signaling scaffolds. We found direct binding between the MEKK CaSte11p and the MEK Hst7p that depends on a mobile acidic loop absent from S. cerevisiae Ste11p but related to the Ste7-binding region within the vWA domain of Ste5p. Thus, the fungal lineage has restructured specific scaffolding modules to coordinate the proteins required to direct the gene expression, polarity, and cell cycle regulation essential for mating. The mitogen-activated protein (MAP) kinase cascade is an extensively used signaling module in eukaryotic cells, and the ability to regulate these modules is critical for ensuring proper responses to a wide variety of stimuli. One way that cells regulate this signaling module is through scaffold proteins that insulate related pathways against cross talk, improve signaling efficiency, and ensure that signals are connected to the correct response. The Ste5 scaffold of the S. cerevisiae mating response is a well-studied representative of this class of proteins. Using bioinformatics, structural modeling, and molecular genetic approaches, we have investigated the equivalent scaffold in the pathogenic yeast Candida albicans. We show that the C. albicans protein is structurally distinct from that of Saccharomyces cerevisiae but still provides similar functions. Increases in pathway complexity have been associated with changes in scaffold connectivity, and overall, the tethering capacity of the scaffolds has been more conserved than their structural organization.
Collapse
|
21
|
Chen H, Kihara D. Effect of using suboptimal alignments in template-based protein structure prediction. Proteins 2011; 79:315-34. [PMID: 21058297 PMCID: PMC3058269 DOI: 10.1002/prot.22885] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Computational protein structure prediction remains a challenging task in protein bioinformatics. In the recent years, the importance of template-based structure prediction is increasing because of the growing number of protein structures solved by the structural genomics projects. To capitalize the significant efforts and investments paid on the structural genomics projects, it is urgent to establish effective ways to use the solved structures as templates by developing methods for exploiting remotely related proteins that cannot be simply identified by homology. In this work, we examine the effect of using suboptimal alignments in template-based protein structure prediction. We showed that suboptimal alignments are often more accurate than the optimal one, and such accurate suboptimal alignments can occur even at a very low rank of the alignment score. Suboptimal alignments contain a significant number of correct amino acid residue contacts. Moreover, suboptimal alignments can improve template-based models when used as input to Modeller. Finally, we use suboptimal alignments for handling a contact potential in a probabilistic way in a threading program, SUPRB. The probabilistic contacts strategy outperforms the partly thawed approach, which only uses the optimal alignment in defining residue contacts, and also the re-ranking strategy, which uses the contact potential in re-ranking alignments. The comparison with existing methods in the template-recognition test shows that SUPRB is very competitive and outperforms existing methods.
Collapse
Affiliation(s)
- Hao Chen
- Department of Biological Sciences College of Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Biological Sciences College of Science, Purdue University, West Lafayette, IN, 47907, USA
- Department of Computer Science College of Science, Purdue University, West Lafayette, IN, 47907, USA
- Markey Center for Structural Biology College of Science, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
22
|
Fieldhouse RJ, Turgeon Z, White D, Merrill AR. Cholera- and anthrax-like toxins are among several new ADP-ribosyltransferases. PLoS Comput Biol 2010; 6:e1001029. [PMID: 21170356 PMCID: PMC3000352 DOI: 10.1371/journal.pcbi.1001029] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2010] [Accepted: 11/10/2010] [Indexed: 11/19/2022] Open
Abstract
Chelt, a cholera-like toxin from Vibrio cholerae, and Certhrax, an anthrax-like toxin from Bacillus cereus, are among six new bacterial protein toxins we identified and characterized using in silico and cell-based techniques. We also uncovered medically relevant toxins from Mycobacterium avium and Enterococcus faecalis. We found agriculturally relevant toxins in Photorhabdus luminescens and Vibrio splendidus. These toxins belong to the ADP-ribosyltransferase family that has conserved structure despite low sequence identity. Therefore, our search for new toxins combined fold recognition with rules for filtering sequences--including a primary sequence pattern--to reduce reliance on sequence identity and identify toxins using structure. We used computers to build models and analyzed each new toxin to understand features including: structure, secretion, cell entry, activation, NAD+ substrate binding, intracellular target binding and the reaction mechanism. We confirmed activity using a yeast growth test. In this era where an expanding protein structure library complements abundant protein sequence data--and we need high-throughput validation--our approach provides insight into the newest toxin ADP-ribosyltransferases.
Collapse
Affiliation(s)
- Robert J. Fieldhouse
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada
| | - Zachari Turgeon
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada
| | - Dawn White
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada
| | - A. Rod Merrill
- Department of Molecular and Cellular Biology, University of Guelph, Guelph, Ontario, Canada
| |
Collapse
|
23
|
Graebsch A, Roche S, Kostrewa D, Söding J, Niessing D. Of bits and bugs--on the use of bioinformatics and a bacterial crystal structure to solve a eukaryotic repeat-protein structure. PLoS One 2010; 5:e13402. [PMID: 20976240 PMCID: PMC2954813 DOI: 10.1371/journal.pone.0013402] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2010] [Accepted: 09/24/2010] [Indexed: 11/19/2022] Open
Abstract
Pur-α is a nucleic acid-binding protein involved in cell cycle control, transcription, and neuronal function. Initially no prediction of the three-dimensional structure of Pur-α was possible. However, recently we solved the X-ray structure of Pur-α from the fruitfly Drosophila melanogaster and showed that it contains a so-called PUR domain. Here we explain how we exploited bioinformatics tools in combination with X-ray structure determination of a bacterial homolog to obtain diffracting crystals and the high-resolution structure of Drosophila Pur-α. First, we used sensitive methods for remote-homology detection to find three repetitive regions in Pur-α. We realized that our lack of understanding how these repeats interact to form a globular domain was a major problem for crystallization and structure determination. With our information on the repeat motifs we then identified a distant bacterial homolog that contains only one repeat. We determined the bacterial crystal structure and found that two of the repeats interact to form a globular domain. Based on this bacterial structure, we calculated a computational model of the eukaryotic protein. The model allowed us to design a crystallizable fragment and to determine the structure of Drosophila Pur-α. Key for success was the fact that single repeats of the bacterial protein self-assembled into a globular domain, instructing us on the number and boundaries of repeats to be included for crystallization trials with the eukaryotic protein. This study demonstrates that the simpler structural domain arrangement of a distant prokaryotic protein can guide the design of eukaryotic crystallization constructs. Since many eukaryotic proteins contain multiple repeats or repeating domains, this approach might be instructive for structural studies of a range of proteins.
Collapse
Affiliation(s)
- Almut Graebsch
- Institute of Structural Biology, Helmholtz Zentrum München, Munich, Germany
- Department of Biochemistry, Gene Center of the Ludwig-Maximilians-University Munich, Munich, Germany
| | - Stéphane Roche
- Institute of Structural Biology, Helmholtz Zentrum München, Munich, Germany
- Department of Biochemistry, Gene Center of the Ludwig-Maximilians-University Munich, Munich, Germany
| | - Dirk Kostrewa
- Department of Biochemistry, Gene Center of the Ludwig-Maximilians-University Munich, Munich, Germany
| | - Johannes Söding
- Department of Biochemistry, Gene Center of the Ludwig-Maximilians-University Munich, Munich, Germany
| | - Dierk Niessing
- Institute of Structural Biology, Helmholtz Zentrum München, Munich, Germany
- Department of Biochemistry, Gene Center of the Ludwig-Maximilians-University Munich, Munich, Germany
| |
Collapse
|
24
|
Chugunov AO, Efremov RG. [Prediction of the spatial structure of proteins: emphasis on membrane targets]. RUSSIAN JOURNAL OF BIOORGANIC CHEMISTRY 2010; 35:744-60. [PMID: 20208575 DOI: 10.1134/s106816200906003x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Knowledge of the spatial structure of proteins is a prerequisite for both awareness of their functional mechanisms and the framework for rational drug discovery and design. Meanwhile, direct structural determination is often hampered or impractical due to the complexity, expensiveness, and limited capabilities of experimental techniques. These issues are especially pronounced for integral membrane proteins. On numerous occasions, the theoretical prediction of protein structures may facilitate the process by exploiting physical or empirical principles. This paper surveys modern techniques for the prediction of the spatial structure of proteins using computer algorithms, and the main emphasis is placed on the most "complex" targets - membrane proteins (MPs). The first part of the review describes de novo methods based on empirical physical principles; in the second part, a comparative modeling philosophy, which accounts for the structure of related proteins, is described. Special focus is made regarding pharmacologically relevant classes of G-coupled receptors, receptor tyrosine ki-nases, and other MPs. Algorithms for the assessment of the models quality and potential fields of application of computer models are discussed.
Collapse
|
25
|
Abstract
As the field of protein structure prediction continues to expand at an
exponential rate, the bench-biologist might feel overwhelmed by the sheer
range of available applications. This review presents the three main
approaches in computational structure prediction from a
non-bioinformatician?s point of view and makes a selection of tools and
servers freely available. These tools are evaluated from several aspects,
such as number of citations, ease of usage and quality of the results.
Finally, the applications of models generated by computational structure
prediction are discussed.
Collapse
|
26
|
Bahadur RP, Chakrabarti P. Discriminating the native structure from decoys using scoring functions based on the residue packing in globular proteins. BMC STRUCTURAL BIOLOGY 2009; 9:76. [PMID: 20038291 PMCID: PMC2809062 DOI: 10.1186/1472-6807-9-76] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2009] [Accepted: 12/28/2009] [Indexed: 11/14/2022]
Abstract
BACKGROUND Setting the rules for the identification of a stable conformation of a protein is of utmost importance for the efficient generation of structures in computer simulation. For structure prediction, a considerable number of possible models are generated from which the best model has to be selected. RESULTS Two scoring functions, Rs and Rp, based on the consideration of packing of residues, which indicate if the conformation of an amino acid sequence is native-like, are presented. These are defined using the solvent accessible surface area (ASA) and the partner number (PN) (other residues that are within 4.5 A) of a particular residue. The two functions evaluate the deviation from the average packing properties (ASA or PN) of all residues in a polypeptide chain corresponding to a model of its three-dimensional structure. While simple in concept and computationally less intensive, both the functions are at least as efficient as any other energy functions in discriminating the native structure from decoys in a large number of standard decoy sets, as well as on models submitted for the targets of CASP7. Rs appears to be slightly more effective than Rp, as determined by the number of times the native structure possesses the minimum value for the function and its separation from the average value for the decoys. CONCLUSION Two parameters, Rs and Rp, are discussed that can very efficiently recognize the native fold for a sequence from an ensemble of decoy structures. Unlike many other algorithms that rely on the use of composite scoring function, these are based on a single parameter, viz., the accessible surface area (or the number of residues in contact), but still able to capture the essential attribute of the native fold.
Collapse
Affiliation(s)
- Ranjit Prasad Bahadur
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
- Current address: Department of Biotechnology, Indian Institute of Technology, Kharagpur 721302, West Bengal, India
| | - Pinak Chakrabarti
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
| |
Collapse
|
27
|
Abstract
The observation that similar protein sequences fold into similar three-dimensional structures provides a basis for the methods which predict structural features of a novel protein based on the similarity between its sequence and sequences of known protein structures. Similarity over entire sequence or large sequence fragment(s) enables prediction and modeling of entire structural domains while statistics derived from distributions of local features of known protein structures make it possible to predict such features in proteins with unknown structures. The accuracy of models of protein structures is sufficient for many practical purposes such as analysis of point mutation effects, enzymatic reactions, interaction interfaces of protein complexes, and active sites. Protein models are also used for phasing of crystallographic data and, in some cases, for drug design. By using models one can avoid the costly and time-consuming process of experimental structure determination. The purpose of this chapter is to give a practical review of the most popular protein structure prediction methods based on sequence similarity and to outline a practical approach to protein structure prediction. While the main focus of this chapter is on template-based protein structure prediction, it also provides references to other methods and programs which play an important role in protein structure prediction.
Collapse
|
28
|
Bab-Dinitz E, Albeck S, Peleg Y, Brumfeld V, Gottschalk KE, Karlish SJD. A C-Terminal Lobe of the β Subunit of Na,K-ATPase and H,K-ATPase Resembles Cell Adhesion Molecules. Biochemistry 2009; 48:8684-91. [DOI: 10.1021/bi900868e] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | | | | | | | - Kay E. Gottschalk
- Department of Applied Physics, Ludwig-Maximilians Universität, 80799 München, Germany
| | | |
Collapse
|
29
|
Garza JA, Ilangovan U, Hinck AP, Barnes LD. Kinetic, dynamic, ligand binding properties, and structural models of a dual-substrate specific nudix hydrolase from Schizosaccharomyces pombe. Biochemistry 2009; 48:6224-39. [PMID: 19462967 DOI: 10.1021/bi802266g] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Schizosaccharomyces pombe Aps1 is a nudix hydrolase that catalyzes the hydrolysis of both diadenosine 5',5'''-P(1),P(n)-oligophosphates and diphosphoinositol polyphosphates in vitro. Nudix hydrolases act upon a wide variety of substrates, despite having a common 23 amino acid catalytic motif; hence, the residues responsible for substrate specificity are considered to reside outside the common catalytic nudix motif. The specific residues involved in binding each substrate of S. pombe Aps1 are unknown. In this study, we have conducted mutational and kinetic studies in combination with structural homology modeling and NMR spectroscopic analyses to identify potential residues involved in binding each class of substrates. This study demonstrates several major findings with regard to Aps1. First, the determination of the kinetic parameters of K(m) and k(cat) indicated that the initial 31 residues of Aps1 are not involved in substrate binding or catalysis with respect to Ap(6)A. Second, NMR spectroscopic analyses revealed the secondary structure and three dynamic backbone regions, one of which corresponds to a large insert in Aps1 as compared to other putative fungal orthologues. Third, two structural models of Aps1Delta2-19, based on the crystal structures of human DIPP1 and T. thermophilus Ndx1, were generated using homology modeling. The structural models were in excellent agreement with the NMR-derived secondary structure of Aps1Delta2-19. Fourth, NMR chemical shift mapping in conjunction with structural homology models indicated several residues outside the catalytic nudix motif that are involved in specific binding of diphosphoinositol polyphosphate or diadenosine oligophosphate ligands.
Collapse
Affiliation(s)
- John A Garza
- Department of Biochemistry, The University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Drive, San Antonio, Texas 78229-3900, USA
| | | | | | | |
Collapse
|
30
|
Benkert P, Schwede T, Tosatto SC. QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information. BMC STRUCTURAL BIOLOGY 2009; 9:35. [PMID: 19457232 PMCID: PMC2709111 DOI: 10.1186/1472-6807-9-35] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2008] [Accepted: 05/20/2009] [Indexed: 11/10/2022]
Abstract
BACKGROUND The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus. RESULTS Our recently published QMEAN composite scoring function has been improved by including an all-atom interaction potential term. The preliminary model ranking based on the new QMEAN score is used to select a subset of reliable models against which the structural consensus score is calculated. This scoring function called QMEANclust achieves a correlation coefficient of predicted quality score and GDT_TS of 0.9 averaged over the 98 CASP7 targets and perform significantly better in selecting good models from the ensemble of server models than any other groups participating in the quality estimation category of CASP7. Both scoring functions are also benchmarked on the MOULDER test set consisting of 20 target proteins each with 300 alternatives models generated by MODELLER. QMEAN outperforms all other tested scoring functions operating on individual models, while the consensus method QMEANclust only works properly on decoy sets containing a certain fraction of near-native conformations. We also present a local version of QMEAN for the per-residue estimation of model quality (QMEANlocal) and compare it to a new local consensus-based approach. CONCLUSION Improved model selection is obtained by using a composite scoring function operating on single models in order to enrich higher quality models which are subsequently used to calculate the structural consensus. The performance of consensus-based methods such as QMEANclust highly depends on the composition and quality of the model ensemble to be analysed. Therefore, performance estimates for consensus methods based on large meta-datasets (e.g. CASP) might overrate their applicability in more realistic modelling situations with smaller sets of models based on individual methods.
Collapse
Affiliation(s)
- Pascal Benkert
- Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland.
| | | | | |
Collapse
|
31
|
Cavasotto CN, Phatak SS. Homology modeling in drug discovery: current trends and applications. Drug Discov Today 2009; 14:676-83. [PMID: 19422931 DOI: 10.1016/j.drudis.2009.04.006] [Citation(s) in RCA: 272] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2009] [Revised: 04/20/2009] [Accepted: 04/23/2009] [Indexed: 10/20/2022]
Abstract
As structural genomics (SG) projects continue to deposit representative 3D structures of proteins, homology modeling methods will play an increasing role in structure-based drug discovery. Although computational structure prediction methods provide a cost-effective alternative in the absence of experimental structures, developing accurate enough models still remains a big challenge. In this contribution, we report the current developments in this field, discuss in silico modeling limitations, and review the successful application of this technique to different stages of the drug discovery process.
Collapse
Affiliation(s)
- Claudio N Cavasotto
- School of Health Information Sciences, The University of Texas Health Science Center at Houston, 7000 Fannin, Suite 860B, Houston, TX 77030, United States.
| | | |
Collapse
|
32
|
Zhang Y. Protein structure prediction: when is it useful? Curr Opin Struct Biol 2009; 19:145-55. [PMID: 19327982 PMCID: PMC2673339 DOI: 10.1016/j.sbi.2009.02.005] [Citation(s) in RCA: 191] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2008] [Revised: 02/18/2009] [Accepted: 02/19/2009] [Indexed: 10/21/2022]
Abstract
Computationally predicted three-dimensional structure of protein molecules has demonstrated the usefulness in many areas of biomedicine, ranging from approximate family assignments to precise drug screening. For nearly 40 years, however, the accuracy of the predicted models has been dictated by the availability of close structural templates. Progress has recently been achieved in refining low-resolution models closer to the native ones; this has been made possible by combining knowledge-based information from multiple sources of structural templates as well as by improving the energy funnel of physics-based force fields. Unfortunately, there has been no essential progress in the development of techniques for detecting remotely homologous templates and for predicting novel protein structures.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics and Department of Molecular Biosciences, University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, USA.
| |
Collapse
|
33
|
Leuko S, Raftery MJ, Burns BP, Walter MR, Neilan BA. Global Protein-Level Responses of Halobacterium salinarum NRC-1 to Prolonged Changes in External Sodium Chloride Concentrations. J Proteome Res 2009; 8:2218-25. [DOI: 10.1021/pr800663c] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Stefan Leuko
- Australian Centre for Astrobiology, Bioanalytical Mass Spectrometry Facility, and School of Biotechnology and Biomolecular Science, University of New South Wales, NSW 2052, Australia
| | - Mark J. Raftery
- Australian Centre for Astrobiology, Bioanalytical Mass Spectrometry Facility, and School of Biotechnology and Biomolecular Science, University of New South Wales, NSW 2052, Australia
| | - Brendan P. Burns
- Australian Centre for Astrobiology, Bioanalytical Mass Spectrometry Facility, and School of Biotechnology and Biomolecular Science, University of New South Wales, NSW 2052, Australia
| | - Malcolm R. Walter
- Australian Centre for Astrobiology, Bioanalytical Mass Spectrometry Facility, and School of Biotechnology and Biomolecular Science, University of New South Wales, NSW 2052, Australia
| | - Brett A. Neilan
- Australian Centre for Astrobiology, Bioanalytical Mass Spectrometry Facility, and School of Biotechnology and Biomolecular Science, University of New South Wales, NSW 2052, Australia
| |
Collapse
|
34
|
|
35
|
Vallat BK, Pillardy J, Elber R. A template-finding algorithm and a comprehensive benchmark for homology modeling of proteins. Proteins 2008; 72:910-28. [PMID: 18300226 PMCID: PMC2907141 DOI: 10.1002/prot.21976] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The first step in homology modeling is to identify a template protein for the target sequence. The template structure is used in later phases of the calculation to construct an atomically detailed model for the target. We have built from the Protein Data Bank (PDB) a large-scale learning set that includes tens of millions of pair matches that can be either a true template or a false one. Discriminatory learning (learning from positive and negative examples) is used to train a decision tree. Each branch of the tree is a mathematical programming model. The decision tree is tested on an independent set from PDB entries and on the sequences of CASP7. It provides significant enrichment of true templates (between 50 and 100%) when compared to PSI-BLAST. The model is further verified by building atomically detailed structures for each of the tentative true templates with modeller. The probability that a true match does not yield an acceptable structural model (within 6 A RMSD from the native structure) decays linearly as a function of the TM structural-alignment score.
Collapse
Affiliation(s)
- Brinda Kizhakke Vallat
- Department of Computer Science, Cornell University, Upson Hall 4130, Ithaca, New York 14853, USA
| | | | | |
Collapse
|
36
|
Abstract
Computational biology/chemistry tools are used in most areas of life/health science research. These methods are continually being developed and their use can present difficulties for both experienced and novice investigators. To facilitate the use of these applications, many packages have been implemented online during these last 5 years. This unit focuses on online computational methods with a special emphasis on structural refinement/atomic simulations, protein electrostatic calculations, searches for functional sites, searches for druggable pockets, protein docking and small molecule docking, and prediction of potential impact of amino acid variations on the structure and function of the protein molecules.
Collapse
|
37
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using MODELLER. ACTA ACUST UNITED AC 2008; Chapter 2:Unit 2.9. [PMID: 18429317 DOI: 10.1002/0471140864.ps0209s50] [Citation(s) in RCA: 754] [Impact Index Per Article: 47.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Functional characterization of a protein sequence is a common goal in biology, and is usually facilitated by having an accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco, San Francisco, California, USA
| | | | | | | | | | | | | | | |
Collapse
|
38
|
Alternating evolutionary pressure in a genetic algorithm facilitates protein model selection. BMC STRUCTURAL BIOLOGY 2008; 8:34. [PMID: 18673557 PMCID: PMC2527322 DOI: 10.1186/1472-6807-8-34] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/04/2008] [Accepted: 08/01/2008] [Indexed: 11/12/2022]
Abstract
Background Automatic protein modelling pipelines are becoming ever more accurate; this has come hand in hand with an increasingly complicated interplay between all components involved. Nevertheless, there are still potential improvements to be made in template selection, refinement and protein model selection. Results In the context of an automatic modelling pipeline, we analysed each step separately, revealing several non-intuitive trends and explored a new strategy for protein conformation sampling using Genetic Algorithms (GA). We apply the concept of alternating evolutionary pressure (AEP), i.e. intermediate rounds within the GA runs where unrestrained, linear growth of the model populations is allowed. Conclusion This approach improves the overall performance of the GA by allowing models to overcome local energy barriers. AEP enabled the selection of the best models in 40% of all targets; compared to 25% for a normal GA.
Collapse
|
39
|
Pawłowski K. Uncharacterized/hypothetical proteins in biomedical 'omics' experiments: is novelty being swept under the carpet? BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2008; 7:283-90. [PMID: 18641417 DOI: 10.1093/bfgp/eln033] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Many 'omics' studies, gene expression microarray experiments in particular, aim at charting the molecular mechanisms of physiology, disease and drug response. This short review discusses the bias present in many such studies whereas the focus is set on the well understood and established molecular scenarios. The under-reporting rate of 'hypothetical' or uncharacterized genes and proteins, differentially regulated in disease context, is assessed here. Reasons for this bias are discussed. Particular examples from the genomics studies on respiratory diseases are presented. This review aims at increasing awareness of the unexplored genomics data and proposes remedies in order to refocus genomics studies on the less-charted territories of the genome, transcriptome and proteome. It is suggested that routine use of function prediction methods in conjunction with omics analyses may allow better interpretation of the data, and facilitate discovery of true novelty.
Collapse
Affiliation(s)
- Krzysztof Pawłowski
- Nencki Institute of Experimental Biology, PAS, Warsaw University of Life Sciences, Warszawa, Poland.
| |
Collapse
|
40
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using Modeller. ACTA ACUST UNITED AC 2008; Chapter 5:Unit-5.6. [PMID: 18428767 DOI: 10.1002/0471250953.bi0506s15] [Citation(s) in RCA: 1766] [Impact Index Per Article: 110.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco San Francisco, California
| | - Ben Webb
- University of California at San Francisco San Francisco, California
| | | | - M S Madhusudhan
- University of California at San Francisco San Francisco, California
| | - David Eramian
- University of California at San Francisco San Francisco, California
| | - Min-Yi Shen
- University of California at San Francisco San Francisco, California
| | - Ursula Pieper
- University of California at San Francisco San Francisco, California
| | - Andrej Sali
- University of California at San Francisco San Francisco, California
| |
Collapse
|
41
|
Zhang Y. Progress and challenges in protein structure prediction. Curr Opin Struct Biol 2008; 18:342-8. [PMID: 18436442 DOI: 10.1016/j.sbi.2008.02.004] [Citation(s) in RCA: 304] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2007] [Accepted: 02/14/2008] [Indexed: 10/22/2022]
Abstract
Depending on whether similar structures are found in the PDB library, the protein structure prediction can be categorized into template-based modeling and free modeling. Although threading is an efficient tool to detect the structural analogs, the advancements in methodology development have come to a steady state. Encouraging progress is observed in structure refinement which aims at drawing template structures closer to the native; this has been mainly driven by the use of multiple structure templates and the development of hybrid knowledge-based and physics-based force fields. For free modeling, exciting examples have been witnessed in folding small proteins to atomic resolutions. However, predicting structures for proteins larger than 150 residues still remains a challenge, with bottlenecks from both force field and conformational search.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics and Department of Molecular Biosciences, University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, United States.
| |
Collapse
|
42
|
Benkert P, Tosatto SCE, Schomburg D. QMEAN: A comprehensive scoring function for model quality assessment. Proteins 2008; 71:261-77. [PMID: 17932912 DOI: 10.1002/prot.21715] [Citation(s) in RCA: 733] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In protein structure prediction, a considerable number of alternative models are usually produced from which subsequently the final model has to be selected. Thus, a scoring function for the identification of the best model within an ensemble of alternative models is a key component of most protein structure prediction pipelines. QMEAN, which stands for Qualitative Model Energy ANalysis, is a composite scoring function describing the major geometrical aspects of protein structures. Five different structural descriptors are used. The local geometry is analyzed by a new kind of torsion angle potential over three consecutive amino acids. A secondary structure-specific distance-dependent pairwise residue-level potential is used to assess long-range interactions. A solvation potential describes the burial status of the residues. Two simple terms describing the agreement of predicted and calculated secondary structure and solvent accessibility, respectively, are also included. A variety of different implementations are investigated and several approaches to combine and optimize them are discussed. QMEAN was tested on several standard decoy sets including a molecular dynamics simulation decoy set as well as on a comprehensive data set of totally 22,420 models from server predictions for the 95 targets of CASP7. In a comparison to five well-established model quality assessment programs, QMEAN shows a statistically significant improvement over nearly all quality measures describing the ability of the scoring function to identify the native structure and to discriminate good from bad models. The three-residue torsion angle potential turned out to be very effective in recognizing the native fold.
Collapse
Affiliation(s)
- Pascal Benkert
- Institute for Biochemistry, University of Cologne, 50674 Cologne, Germany
| | | | | |
Collapse
|
43
|
Wallner B, Elofsson A. Prediction of global and local model quality in CASP7 using Pcons and ProQ. Proteins 2008; 69 Suppl 8:184-93. [PMID: 17894353 DOI: 10.1002/prot.21774] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The ability to rank and select the best model is important in protein structure prediction. Model Quality Assessment Programs (MQAPs) are programs developed to perform this task. They can be divided into three categories based on the information they use. Consensus based methods use the similarity to other models, structure-based methods use features calculated from the structure and evolutionary based methods use the sequence similarity between a model and a template. These methods can be trained to predict the overall global quality of a model, that is, how much a model is likely to differ from the native structure. The methods can also be trained to pinpoint which local regions in a model are likely to be incorrect. In CASP7, we participated with three predictors of global and four of local quality using information from the three categories described above. The result shows that the MQAP using consensus, Pcons, was significantly better at predicting both global and local quality compared with MQAPs using only structure or sequence based information.
Collapse
Affiliation(s)
- Björn Wallner
- Center for Biomembrane Research, Stockholm University, SE-106 91 Stockholm, Sweden.
| | | |
Collapse
|
44
|
Terashi G, Takeda-Shitaka M, Kanou K, Iwadate M, Takaya D, Hosoi A, Ohta K, Umeyama H. Fams-ace: a combined method to select the best model after remodeling all server models. Proteins 2008; 69 Suppl 8:98-107. [PMID: 17894329 DOI: 10.1002/prot.21785] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
During Critical Assessment of Protein Structure Prediction (CASP7, Pacific Grove, CA, 2006), fams-ace was entered in the 3D coordinate prediction category as a human expert group. The procedure can be summarized by the following three steps. (1) All the server models were refined and rebuilt utilizing our homology modeling method. (2) Representative structures were selected from each server, according to a model quality evaluation, based on a 3D1D profile score (like Verify3D). (3) The top five models were selected and submitted in the order of the consensus-based score (like 3D-Jury). Fams-ace is a fully automated server and does not require human intervention. In this article, we introduce the methodology of fams-ace and discuss the successes and failures of this approach during CASP7. In addition, we discuss possible improvements for the next CASP.
Collapse
Affiliation(s)
- Genki Terashi
- School of Pharmacy, Kitasato University, Tokyo, Japan
| | | | | | | | | | | | | | | |
Collapse
|
45
|
Zhang Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 2008; 9:40. [PMID: 18215316 PMCID: PMC2245901 DOI: 10.1186/1471-2105-9-40] [Citation(s) in RCA: 3778] [Impact Index Per Article: 236.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Accepted: 01/23/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prediction of 3-dimensional protein structures from amino acid sequences represents one of the most important problems in computational structural biology. The community-wide Critical Assessment of Structure Prediction (CASP) experiments have been designed to obtain an objective assessment of the state-of-the-art of the field, where I-TASSER was ranked as the best method in the server section of the recent 7th CASP experiment. Our laboratory has since then received numerous requests about the public availability of the I-TASSER algorithm and the usage of the I-TASSER predictions. RESULTS An on-line version of I-TASSER is developed at the KU Center for Bioinformatics which has generated protein structure predictions for thousands of modeling requests from more than 35 countries. A scoring function (C-score) based on the relative clustering structural density and the consensus significance score of multiple threading templates is introduced to estimate the accuracy of the I-TASSER predictions. A large-scale benchmark test demonstrates a strong correlation between the C-score and the TM-score (a structural similarity measurement with values in [0, 1]) of the first models with a correlation coefficient of 0.91. Using a C-score cutoff > -1.5 for the models of correct topology, both false positive and false negative rates are below 0.1. Combining C-score and protein length, the accuracy of the I-TASSER models can be predicted with an average error of 0.08 for TM-score and 2 A for RMSD. CONCLUSION The I-TASSER server has been developed to generate automated full-length 3D protein structural predictions where the benchmarked scoring system helps users to obtain quantitative assessments of the I-TASSER models. The output of the I-TASSER server for each query includes up to five full-length models, the confidence score, the estimated TM-score and RMSD, and the standard deviation of the estimations. The I-TASSER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/I-TASSER.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA.
| |
Collapse
|
46
|
Yin Y, Fischer D. Identification and investigation of ORFans in the viral world. BMC Genomics 2008; 9:24. [PMID: 18205946 PMCID: PMC2245933 DOI: 10.1186/1471-2164-9-24] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2007] [Accepted: 01/19/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome-wide studies have already shed light into the evolution and enormous diversity of the viral world. Nevertheless, one of the unresolved mysteries in comparative genomics today is the abundance of ORFans - ORFs with no detectable sequence similarity to any other ORF in the databases. Recently, studies attempting to understand the origin and functions of bacterial ORFans have been reported. Here we present a first genome-wide identification and analysis of ORFans in the viral world, with focus on bacteriophages. RESULTS Almost one-third of all ORFs in 1,456 complete virus genomes correspond to ORFans, a figure significantly larger than that observed in prokaryotes. Like prokaryotic ORFans, viral ORFans are shorter and have a lower GC content than non-ORFans. Nevertheless, a statistically significant lower GC content is found only on a minority of viruses. By focusing on phages, we find that 38.4% of phage ORFs have no homologs in other phages, and 30.1% have no homologs neither in the viral nor in the prokaryotic world. Phages with different host ranges have different percentages of ORFans, reflecting different sampling status and suggesting various diversities. Similarity searches of the phage ORFeome (ORFans and non-ORFans) against prokaryotic genomes shows that almost half of the phage ORFs have prokaryotic homologs, suggesting the major role that horizontal transfer plays in bacterial evolution. Surprisingly, the percentage of phage ORFans with prokaryotic homologs is only 18.7%. This suggests that phage ORFans play a lesser role in horizontal transfer to prokaryotes, but may be among the major players contributing to the vast phage diversity. CONCLUSION Although the current sampling of viral genomes is extremely low, ORFans and near-ORFans are likely to continue to grow in number as more genomes are sequenced. The abundance of phage ORFans may be partially due to the expected vast viral diversity, and may be instrumental in understanding viral evolution. The functions, origins and fates of the majority of viral ORFans remain a mystery. Further computational and experimental studies are likely to shed light on the mechanisms that have given rise to so many bacterial and viral ORFans.
Collapse
Affiliation(s)
- Yanbin Yin
- Computer Science and Engineering Dept, 201 Bell Hall, University at Buffalo, Buffalo, NY 14260-2000, USA.
| | | |
Collapse
|
47
|
|
48
|
ProCKSI: a decision support system for Protein (structure) Comparison, Knowledge, Similarity and Information. BMC Bioinformatics 2007; 8:416. [PMID: 17963510 PMCID: PMC2222653 DOI: 10.1186/1471-2105-8-416] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2007] [Accepted: 10/26/2007] [Indexed: 11/19/2022] Open
Abstract
Background We introduce the decision support system for Protein (Structure) Comparison, Knowledge, Similarity and Information (ProCKSI). ProCKSI integrates various protein similarity measures through an easy to use interface that allows the comparison of multiple proteins simultaneously. It employs the Universal Similarity Metric (USM), the Maximum Contact Map Overlap (MaxCMO) of protein structures and other external methods such as the DaliLite and the TM-align methods, the Combinatorial Extension (CE) of the optimal path, and the FAST Align and Search Tool (FAST). Additionally, ProCKSI allows the user to upload a user-defined similarity matrix supplementing the methods mentioned, and computes a similarity consensus in order to provide a rich, integrated, multicriteria view of large datasets of protein structures. Results We present ProCKSI's architecture and workflow describing its intuitive user interface, and show its potential on three distinct test-cases. In the first case, ProCKSI is used to evaluate the results of a previous CASP competition, assessing the similarity of proposed models for given targets where the structures could have a large deviation from one another. To perform this type of comparison reliably, we introduce a new consensus method. The second study deals with the verification of a classification scheme for protein kinases, originally derived by sequence comparison by Hanks and Hunter, but here we use a consensus similarity measure based on structures. In the third experiment using the Rost and Sander dataset (RS126), we investigate how a combination of different sets of similarity measures influences the quality and performance of ProCKSI's new consensus measure. ProCKSI performs well with all three datasets, showing its potential for complex, simultaneous multi-method assessment of structural similarity in large protein datasets. Furthermore, combining different similarity measures is usually more robust than relying on one single, unique measure. Conclusion Based on a diverse set of similarity measures, ProCKSI computes a consensus similarity profile for the entire protein set. All results can be clustered, visualised, analysed and easily compared with each other through a simple and intuitive interface. ProCKSI is publicly available at for academic and non-commercial use.
Collapse
|
49
|
Abstract
Ubiquitin‐specific proteases (USPs) emerge as key regulators of numerous cellular processes and account for the bulk of human deubiquitinating enzymes (DUBs). Their modular structure, mostly annotated by sequence homology, is believed to determine substrate recognition and subcellular localization. Currently, a large proportion of known human USP sequences are not annotated either structurally or functionally, including regions both within and flanking their catalytic cores. To extend the current understanding of human USPs, we applied consensus fold recognition to the unannotated content of the human USP family. The most interesting discovery was the marked presence of reliably predicted ubiquitin‐like (UBL) domains in this family of enzymes. The UBL domain thus appears to be the most frequently occurring domain in the human USP family, after the characteristic catalytic domain. The presence of multiple UBL domains per USP protein, as well as of UBL domains embedded in the USP catalytic core, add to the structural complexity currently recognized for many DUBs. Possible functional roles of the newly uncovered UBL domains of human USPs, including proteasome binding, and substrate and protein target specificities, are discussed. Proteins 2007. © 2007 Wiley‐Liss, Inc.
Collapse
Affiliation(s)
- Xiao Zhu
- Biotechnology Research Institute, National Research Council of Canada, Montreal, Quebec H4P 2R2, Canada
- Department of Biochemistry, Université de Montréal, Montreal, Quebec H3C 3J7, Canada
| | - Robert Ménard
- Biotechnology Research Institute, National Research Council of Canada, Montreal, Quebec H4P 2R2, Canada
- Department of Biochemistry, Université de Montréal, Montreal, Quebec H3C 3J7, Canada
| | - Traian Sulea
- Biotechnology Research Institute, National Research Council of Canada, Montreal, Quebec H4P 2R2, Canada
| |
Collapse
|
50
|
McGuffin LJ. Benchmarking consensus model quality assessment for protein fold recognition. BMC Bioinformatics 2007; 8:345. [PMID: 17877795 PMCID: PMC2048972 DOI: 10.1186/1471-2105-8-345] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Accepted: 09/18/2007] [Indexed: 11/25/2022] Open
Abstract
Background Selecting the highest quality 3D model of a protein structure from a number of alternatives remains an important challenge in the field of structural bioinformatics. Many Model Quality Assessment Programs (MQAPs) have been developed which adopt various strategies in order to tackle this problem, ranging from the so called "true" MQAPs capable of producing a single energy score based on a single model, to methods which rely on structural comparisons of multiple models or additional information from meta-servers. However, it is clear that no current method can separate the highest accuracy models from the lowest consistently. In this paper, a number of the top performing MQAP methods are benchmarked in the context of the potential value that they add to protein fold recognition. Two novel methods are also described: ModSSEA, which based on the alignment of predicted secondary structure elements and ModFOLD which combines several true MQAP methods using an artificial neural network. Results The ModSSEA method is found to be an effective model quality assessment program for ranking multiple models from many servers, however further accuracy can be gained by using the consensus approach of ModFOLD. The ModFOLD method is shown to significantly outperform the true MQAPs tested and is competitive with methods which make use of clustering or additional information from multiple servers. Several of the true MQAPs are also shown to add value to most individual fold recognition servers by improving model selection, when applied as a post filter in order to re-rank models. Conclusion MQAPs should be benchmarked appropriately for the practical context in which they are intended to be used. Clustering based methods are the top performing MQAPs where many models are available from many servers; however, they often do not add value to individual fold recognition servers when limited models are available. Conversely, the true MQAP methods tested can often be used as effective post filters for re-ranking few models from individual fold recognition servers and further improvements can be achieved using a consensus of these methods.
Collapse
Affiliation(s)
- Liam J McGuffin
- The School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK.
| |
Collapse
|