1
|
Eirich P, Nesterov P, Shityakov S, Skorb EV, Sander B, Broscheit J, Dandekar T, Jones NG, Engstler M. The release of host-derived antibodies bound to the variant surface glycoprotein (VSG) of Trypanosoma brucei cannot be explained by pH-dependent conformational changes of the VSG dimer. OPEN RESEARCH EUROPE 2024; 4:87. [PMID: 38903703 PMCID: PMC11187536 DOI: 10.12688/openreseurope.16783.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 01/26/2024] [Indexed: 06/22/2024]
Abstract
Background Trypanosoma brucei is a protozoan parasite that evades the mammalian host's adaptive immune response by antigenic variation of the highly immunogenic variant surface glycoprotein (VSG). VSGs form a dense surface coat that is constantly recycled through the endosomal system. Bound antibodies are separated in the endosome from the VSG and destroyed in the lysosome. For VSGs it has been hypothesized that pH-dependent structural changes of the VSG could occur in the more acidic environment of the endosome and hence, facilitate the separation of the antibody from the VSG. Methods We used size exclusion chromatography, where molecules are separated according to their hydrodynamic radius to see if the VSG is present as a homodimer at both pH values. To gain information about the structural integrity of the protein we used circular dichroism spectroscopy by exposing the VSG in solution to a mixture of right- and left-circularly polarized light and analysing the absorbed UV spectra. Evaluation of protein stability and molecular dynamics simulations at different pH values was performed using different computational methods. Results We show, for an A2-type VSG, that the dimer size is only slightly larger at pH 5.2 than at pH 7.4. Moreover, the dimer was marginally more stable at lower pH due to the higher affinity (ΔG = 353.37 kcal/mol) between the monomers. Due to the larger size, the predicted epitopes were more exposed to the solvent at low pH. Moderate conformational changes (ΔRMSD = 0.35 nm) in VSG were detected between the dimers at pH 5.2 and pH 7.4 in molecular dynamics simulations, and no significant differences in the protein secondary structure were observed by circular dichroism spectroscopy. Conclusions Thus, the dissociation of anti-VSG-antibodies in endosomes cannot be explained by changes in pH.
Collapse
Affiliation(s)
- Patrick Eirich
- Department of Cell & Developmental Biology, Biocentre, University of Würzburg, Würzburg, Bavaria, 97074, Germany
- Department of Anaesthesiology, Intensive Care, Emergency and Pain Medicine, Würzburg University Hospital, University of Würzburg, Würzburg, Bavaria, 97080, Germany
| | - Pavel Nesterov
- Infochemistry Scientific Center, Laboratory of Chemoinformatics, ITMO University, Saint Petersburg, Saint Petersburg, 191002, Russian Federation
| | - Sergey Shityakov
- Department of Anaesthesiology, Intensive Care, Emergency and Pain Medicine, Würzburg University Hospital, University of Würzburg, Würzburg, Bavaria, 97080, Germany
- Infochemistry Scientific Center, Laboratory of Chemoinformatics, ITMO University, Saint Petersburg, Saint Petersburg, 191002, Russian Federation
- Department of Bioinformatics, Biocentre, University of Würzburg, Würzburg, Bavaria, 97074, Germany
| | - Ekaterina V. Skorb
- Infochemistry Scientific Center, Laboratory of Chemoinformatics, ITMO University, Saint Petersburg, Saint Petersburg, 191002, Russian Federation
| | - Bodo Sander
- Rudolf Virchow Center for Experimental Biomedicine, University of Würzburg, Würzburg, Bavaria, 97080, Germany
| | - Jens Broscheit
- Department of Anaesthesiology, Intensive Care, Emergency and Pain Medicine, Würzburg University Hospital, University of Würzburg, Würzburg, Bavaria, 97080, Germany
| | - Thomas Dandekar
- Department of Bioinformatics, Biocentre, University of Würzburg, Würzburg, Bavaria, 97074, Germany
| | - Nicola G. Jones
- Department of Cell & Developmental Biology, Biocentre, University of Würzburg, Würzburg, Bavaria, 97074, Germany
| | - Markus Engstler
- Department of Cell & Developmental Biology, Biocentre, University of Würzburg, Würzburg, Bavaria, 97074, Germany
| |
Collapse
|
2
|
Golestan Hashemi FS, Razi Ismail M, Rafii Yusop M, Golestan Hashemi MS, Nadimi Shahraki MH, Rastegari H, Miah G, Aslani F. Intelligent mining of large-scale bio-data: Bioinformatics applications. BIOTECHNOL BIOTEC EQ 2017. [DOI: 10.1080/13102818.2017.1364977] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Affiliation(s)
- Farahnaz Sadat Golestan Hashemi
- Plant Genetics, AgroBioChem Department, Gembloux Agro-Bio Tech, University of Liege, Liege, Belgium
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mohd Razi Ismail
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mohd Rafii Yusop
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mahboobe Sadat Golestan Hashemi
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
- Big Data Research Center, Najafabad Branch, Islamic Azad University, Isfahan, Iran
| | - Mohammad Hossein Nadimi Shahraki
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
- Big Data Research Center, Najafabad Branch, Islamic Azad University, Isfahan, Iran
| | - Hamid Rastegari
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
| | - Gous Miah
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Farzad Aslani
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| |
Collapse
|
3
|
Takaya D, Sato T, Yuki H, Sasaki S, Tanaka A, Yokoyama S, Honma T. Prediction of Ligand-Induced Structural Polymorphism of Receptor Interaction Sites Using Machine Learning. J Chem Inf Model 2013; 53:704-16. [DOI: 10.1021/ci300458g] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Daisuke Takaya
- RIKEN Systems and Structural Biology Center, 1-7-22 Suehiro-cho, Tsurumi-ku,
Yokohama 230-0045, Japan
| | - Tomohiro Sato
- RIKEN Systems and Structural Biology Center, 1-7-22 Suehiro-cho, Tsurumi-ku,
Yokohama 230-0045, Japan
| | - Hitomi Yuki
- RIKEN Systems and Structural Biology Center, 1-7-22 Suehiro-cho, Tsurumi-ku,
Yokohama 230-0045, Japan
| | - Shunta Sasaki
- RIKEN Systems and Structural Biology Center, 1-7-22 Suehiro-cho, Tsurumi-ku,
Yokohama 230-0045, Japan
| | - Akiko Tanaka
- RIKEN Systems and Structural Biology Center, 1-7-22 Suehiro-cho, Tsurumi-ku,
Yokohama 230-0045, Japan
| | - Shigeyuki Yokoyama
- RIKEN Systems and Structural Biology Center, 1-7-22 Suehiro-cho, Tsurumi-ku,
Yokohama 230-0045, Japan
- Department of Biophysics and
Biochemistry, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
| | - Teruki Honma
- RIKEN Systems and Structural Biology Center, 1-7-22 Suehiro-cho, Tsurumi-ku,
Yokohama 230-0045, Japan
| |
Collapse
|
4
|
Eberini I, Daniele S, Parravicini C, Sensi C, Trincavelli ML, Martini C, Abbracchio MP. In silico identification of new ligands for GPR17: a promising therapeutic target for neurodegenerative diseases. J Comput Aided Mol Des 2011; 25:743-52. [PMID: 21744154 DOI: 10.1007/s10822-011-9455-8] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2011] [Accepted: 06/28/2011] [Indexed: 01/14/2023]
Abstract
GPR17, a previously orphan receptor responding to both uracil nucleotides and cysteinyl-leukotrienes, has been proposed as a novel promising target for human neurodegenerative diseases. Here, in order to specifically identify novel potent ligands of GPR17, we first modeled in silico the receptor by using a multiple template approach, in which extracellular loops of the receptor, quite complex to treat, were modeled making reference to the most similar parts of all the class-A GPCRs crystallized so far. A high-throughput virtual screening exploration of GPR17 binding site with more than 130,000 lead-like compounds was then applied, followed by the wet functional and pharmacological validation of the top-scoring chemical structures. This approach revealed successful for the proposed aim, and allowed us to identify five agonists or partial agonists with very diverse chemical structure. None of these compounds could have been expected 'a priori' to act on a GPCR, and all of them behaved as much more potent ligands than GPR17 endogenous activators.
Collapse
Affiliation(s)
- Ivano Eberini
- Gruppo di Studio per la Proteomica e la Struttura delle Proteine, Dipartimento di Scienze Farmacologiche, Università degli Studi di Milano, Italy.
| | | | | | | | | | | | | |
Collapse
|
5
|
Bagos PG, Tsaousis GN, Hamodrakas SJ. How many 3D structures do we need to train a predictor? GENOMICS PROTEOMICS & BIOINFORMATICS 2010; 7:128-37. [PMID: 19944385 PMCID: PMC5054404 DOI: 10.1016/s1672-0229(08)60041-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
It has been shown that the progress in the determination of membrane protein structure grows exponentially, with approximately the same growth rate as that of the water-soluble proteins. In order to investigate the effect of this, on the performance of prediction algorithms for both alpha-helical and beta-barrel membrane proteins, we conducted a prospective study based on historical records. We trained separate hidden Markov models with different sized training sets and evaluated their performance on topology prediction for the two classes of transmembrane proteins. We show that the existing top-scoring algorithms for predicting the transmembrane segments of alpha-helical membrane proteins perform slightly better than that of beta-barrel outer membrane proteins in all measures of accuracy. With the same rationale, a meta-analysis of the performance of the secondary structure prediction algorithms indicates that existing algorithmic techniques cannot be further improved by just adding more non-homologous sequences to the training sets. The upper limit for secondary structure prediction is estimated to be no more than 70% and 80% of correctly predicted residues for single sequence based methods and multiple sequence based ones, respectively. Therefore, we should concentrate our efforts on utilizing new techniques for the development of even better scoring predictors.
Collapse
Affiliation(s)
- Pantelis G Bagos
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Athens 15701, Greece.
| | | | | |
Collapse
|
6
|
Fourty G, Callebaut I, Mornon JP. Characterization of non-trivial neighborhood fold constraints from protein sequences using generalized topohydrophobicity. Bioinform Biol Insights 2008; 2:47-66. [PMID: 19812765 PMCID: PMC2735972 DOI: 10.4137/bbi.s426] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Prediction of key features of protein structures, such as secondary structure, solvent accessibility and number of contacts between residues, provides useful structural constraints for comparative modeling, fold recognition, ab-initio fold prediction and detection of remote relationships. In this study, we aim at characterizing the number of non-trivial close neighbors, or long-range contacts of a residue, as a function of its “topohydrophobic” index deduced from multiple sequence alignments and of the secondary structure in which it is embedded. The “topohydrophobic” index is calculated using a two-class distribution of amino acids, based on their mean atom depths. From a large set of structural alignments processed from the FSSP database, we selected 1485 structural sub-families including at least 8 members, with accurate alignments and limited redundancy. We show that residues within helices, even when deeply buried, have few non-trivial neighbors (0–2), whereas β-strand residues clearly exhibit a multimodal behavior, dominated by the local geometry of the tetrahedron (3 non-trivial close neighbors associated with one tetrahedron; 6 with two tetrahedra). This observed behavior allows the distinction, from sequence profiles, between edge and central β-strands within β-sheets. Useful topological constraints on the immediate neighborhood of an amino acid, but also on its correlated solvent accessibility, can thus be derived using this approach, from the simple knowledge of multiple sequence alignments.
Collapse
Affiliation(s)
- Guillaume Fourty
- Département de Biologie Structurale, Institut de Minéralogie et de Physique des Milieux Condensés, CNRS UMR 7590 - Universités Paris 6/Paris 7, France
| | | | | |
Collapse
|
7
|
Yao XQ, Zhu H, She ZS. A dynamic Bayesian network approach to protein secondary structure prediction. BMC Bioinformatics 2008; 9:49. [PMID: 18218144 PMCID: PMC2266706 DOI: 10.1186/1471-2105-9-49] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2007] [Accepted: 01/25/2008] [Indexed: 11/19/2022] Open
Abstract
Background Protein secondary structure prediction method based on probabilistic models such as hidden Markov model (HMM) appeals to many because it provides meaningful information relevant to sequence-structure relationship. However, at present, the prediction accuracy of pure HMM-type methods is much lower than that of machine learning-based methods such as neural networks (NN) or support vector machines (SVM). Results In this paper, we report a new method of probabilistic nature for protein secondary structure prediction, based on dynamic Bayesian networks (DBN). The new method models the PSI-BLAST profile of a protein sequence using a multivariate Gaussian distribution, and simultaneously takes into account the dependency between the profile and secondary structure and the dependency between profiles of neighboring residues. In addition, a segment length distribution is introduced for each secondary structure state. Tests show that the DBN method has made a significant improvement in the accuracy compared to other pure HMM-type methods. Further improvement is achieved by combining the DBN with an NN, a method called DBNN, which shows better Q3 accuracy than many popular methods and is competitive to the current state-of-the-arts. The most interesting feature of DBN/DBNN is that a significant improvement in the prediction accuracy is achieved when combined with other methods by a simple consensus. Conclusion The DBN method using a Gaussian distribution for the PSI-BLAST profile and a high-ordered dependency between profiles of neighboring residues produces significantly better prediction accuracy than other HMM-type probabilistic methods. Owing to their different nature, the DBN and NN combine to form a more accurate method DBNN. Future improvement may be achieved by combining DBNN with a method of SVM type.
Collapse
Affiliation(s)
- Xin-Qiu Yao
- State Key Laboratory for Turbulence and Complex Systems and Department of Biomedical Engineering, Peking University, Beijing 100871, China.
| | | | | |
Collapse
|
8
|
Selent J, Kaleta J, Li Z, Lalmanach G, Brömme D. Selective inhibition of the collagenase activity of cathepsin K. J Biol Chem 2007; 282:16492-501. [PMID: 17426030 DOI: 10.1074/jbc.m700242200] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Cathepsin K, the main bone degrading protease, and chondroitin 4-sulfate (C4-S) form a complex with enhanced collagenase activity. In this report, we demonstrate the specific inhibition of the collagenase activity of cathepsin K by negatively charged polymers without affecting the overall proteolytic activity of the protease. Three different mechanisms to interfere with cathepsin-catalyzed collagen degradation are discussed: 1) inhibition of the formation of the cathepsin K/C4-S complex, 2) inhibition of the attachment of C4-S to collagen, and 3) masking of the collagenase cleavage sites in collagen. By targeting these interaction sites, collagen degradation can be modulated while the non-collagenolytic activities of cathepsin K remain intact. The main inhibitory effect on collagen degradation is due to the impeding effect on the active cathepsin K/C4-S complex. Essential structural elements in the inhibitor molecules are negative charges which compete with the sulfate groups of C4-S in the cathepsin K/C4-S complex. The inhibitory effect can be controlled by length and charge of the polymers. Longer negatively charged polymers (e.g. polyglutamates, oligonucleotides) tend to inhibit all three mechanisms, whereas shorter ones preferentially affect the cathepsin K/C4-S complex.
Collapse
Affiliation(s)
- Jana Selent
- Department of Oral Biological and Medical Sciences, Faculty of Dentistry and the Center for Blood Research, University of British Columbia, Vancouver, British Columbia V6T 1Z3, Canada
| | | | | | | | | |
Collapse
|
9
|
Etchebest C, Benros C, Hazout S, de Brevern AG. A structural alphabet for local protein structures: improved prediction methods. Proteins 2006; 59:810-27. [PMID: 15822101 DOI: 10.1002/prot.20458] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%.
Collapse
Affiliation(s)
- Catherine Etchebest
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Université Denis DIDEROT-Paris, France
| | | | | | | |
Collapse
|
10
|
Moreau VH, Valente AP, Almeida FC. Prediction of the amount of secondary structure of proteins using unassigned NMR spectra: a tool for target selection in structural proteomics. Genet Mol Biol 2006. [DOI: 10.1590/s1415-47572006000400030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Affiliation(s)
- Vitor Hugo Moreau
- Faculdade de Tecnologia e Ciências, Brazil; Universidade Federal do Rio de Janeiro, Brazil
| | | | | |
Collapse
|
11
|
Karchin R, Cline M, Mandel-Gutfreund Y, Karplus K. Hidden Markov models that use predicted local structure for fold recognition: alphabets of backbone geometry. Proteins 2003; 51:504-14. [PMID: 12784210 DOI: 10.1002/prot.10369] [Citation(s) in RCA: 154] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
An important problem in computational biology is predicting the structure of the large number of putative proteins discovered by genome sequencing projects. Fold-recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins homologous to the target. Remote homologs that may have significant structural similarity are often not detectable by sequence similarities alone. To address this, we incorporated predicted local structure, a generalization of secondary structure, into two-track profile hidden Markov models (HMMs). We did not rely on a simple helix-strand-coil definition of secondary structure, but experimented with a variety of local structure descriptions, following a principled protocol to establish which descriptions are most useful for improving fold recognition and alignment quality. On a test set of 1298 nonhomologous proteins, HMMs incorporating a 3-letter STRIDE alphabet improved fold recognition accuracy by 15% over amino-acid-only HMMs and 23% over PSI-BLAST, measured by ROC-65 numbers. We compared two-track HMMs to amino-acid-only HMMs on a difficult alignment test set of 200 protein pairs (structurally similar with 3-24% sequence identity). HMMs with a 6-letter STRIDE secondary track improved alignment quality by 62%, relative to DALI structural alignments, while HMMs with an STR track (an expanded DSSP alphabet that subdivides strands into six states) improved by 40% relative to CE.
Collapse
Affiliation(s)
- Rachel Karchin
- Center for Biomolecular Science and Engineering, Baskin School of Engineering, University of California, Santa Cruz 95064, USA.
| | | | | | | |
Collapse
|
12
|
Pollock DD. Genomic biodiversity, phylogenetics and coevolution in proteins. APPLIED BIOINFORMATICS 2002; 1:81-92. [PMID: 15130847 PMCID: PMC2943949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 04/29/2023]
Abstract
Comprehensive sampling of genomic biodiversity is fast becoming a reality for some genomic regions and complete organelle genomes. Genomic biodiversity is defined as large genomic sequences from many species, and here some recent work is reviewed that demonstrates the potential benefits of genomic biodiversity for molecular evolutionary analysis and phylogenetic reconstruction. This work shows that using likelihood-based approaches, taxon addition can dramatically improve phylogenetic reconstruction. Features or dynamics of the evolutionary process are much more easily inferred with large numbers of taxa, and large numbers are essential for discriminating differences in evolutionary patterns between sites. Accurate prediction of site-specific patterns can improve phylogenetic reconstruction by an amount equivalent to quadrupling sequence length. Genomic biodiversity is particularly central to research relating patterns of evolution, adaptation and coevolution to structural and functional features of proteins. Research on detecting coevolution between amino acid residues in proteins demonstrates a clear need for much greater numbers of closely related taxa to better discriminate site-specific patterns of interaction, and to allow more detailed analysis of coevolutionary interactions between subunits in protein complexes. It is argued that parsing out coevolutionary and other context-dependent substitution probabilities is essential for discriminating between coevolution and adaptation, and for more realistically modelling the evolution of proteins. Also reviewed is research that argues for increasing the efficiency of acquiring genomic biodiversity, and suggests that this might be done by simultaneously shotgun cloning and sequencing genomic mixtures from many species. Increased efficiency is a prerequisite if genomic biodiversity levels are to rapidly increase by orders of magnitude, and thus lead to dramatically improved understanding of interactions between protein structure, function and sequence evolution.
Collapse
Affiliation(s)
- David D Pollock
- Department of Biological Sciences and Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, 70803, USA.
| |
Collapse
|
13
|
Pollock DD, Eisen JA, Doggett NA, Cummings MP. A case for evolutionary genomics and the comprehensive examination of sequence biodiversity. Mol Biol Evol 2000; 17:1776-88. [PMID: 11110893 DOI: 10.1093/oxfordjournals.molbev.a026278] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Comparative analysis is one of the most powerful methods available for understanding the diverse and complex systems found in biology, but it is often limited by a lack of comprehensive taxonomic sampling. Despite the recent development of powerful genome technologies capable of producing sequence data in large quantities (witness the recently completed first draft of the human genome), there has been relatively little change in how evolutionary studies are conducted. The application of genomic methods to evolutionary biology is a challenge, in part because gene segments from different organisms are manipulated separately, requiring individual purification, cloning, and sequencing. We suggest that a feasible approach to collecting genome-scale data sets for evolutionary biology (i.e., evolutionary genomics) may consist of combination of DNA samples prior to cloning and sequencing, followed by computational reconstruction of the original sequences. This approach will allow the full benefit of automated protocols developed by genome projects to be realized; taxon sampling levels can easily increase to thousands for targeted genomes and genomic regions. Sequence diversity at this level will dramatically improve the quality and accuracy of phylogenetic inference, as well as the accuracy and resolution of comparative evolutionary studies. In particular, it will be possible to make accurate estimates of normal evolution in the context of constant structural and functional constraints (i.e., site-specific substitution probabilities), along with accurate estimates of changes in evolutionary patterns, including pairwise coevolution between sites, adaptive bursts, and changes in selective constraints. These estimates can then be used to understand and predict the effects of protein structure and function on sequence evolution and to predict unknown details of protein structure, function, and functional divergence. In order to demonstrate the practicality of these ideas and the potential benefit for functional genomic analysis, we describe a pilot project we are conducting to simultaneously sequence large numbers of vertebrate mitochondrial genomes.
Collapse
Affiliation(s)
- D D Pollock
- Theoretical Biology and Biophysics, Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico, USA.
| | | | | | | |
Collapse
|
14
|
Abstract
By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C(alpha) ("protein blocks"). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35%. Sharing sequence windows associated with certain blocks into "sequence families" improves the prediction accuracy by 6%. This prediction accuracy exceeds 75% when keeping the first four predicted protein blocks at each site of the protein. In addition, two different strategies are proposed: the first one defines the number of protein blocks in each site needed for respecting a user-fixed prediction accuracy, and alternatively, the second one defines the different protein sites to be predicted with a user-fixed number of blocks and a chosen accuracy. This last strategy applied to the ubiquitin conjugating enzyme (alpha/beta protein) shows that 91% of the sites may be predicted with a prediction accuracy larger than 77% considering only three blocks per site. The prediction strategies proposed improve our knowledge about sequence-structure dependence and should be very useful in ab initio protein modelling.
Collapse
Affiliation(s)
- A G de Brevern
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U436, Université Paris 7, Paris, France.
| | | | | |
Collapse
|
15
|
Zhang CT, Zhang R. A graphic approach to evaluate algorithms of secondary structure prediction. J Biomol Struct Dyn 2000; 17:829-42. [PMID: 10798528 DOI: 10.1080/07391102.2000.10506572] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Algorithms of secondary structure prediction have undergone the developments of nearly 30 years. However, the problem of how to appropriately evaluate and compare algorithms has not yet completely solved. A graphic method to evaluate algorithms of secondary structure prediction has been proposed here. Traditionally, the performance of an algorithm is evaluated by a number, i.e., accuracy of various definitions. Instead of a number, we use a graph to completely evaluate an algorithm, in which the mapping points are distributed in a three-dimensional space. Each point represents the predictive result of the secondary structure of a protein. Because the distribution of mapping points in the 3D space generally contains more information than a number or a set of numbers, it is expected that algorithms may be evaluated and compared by the proposed graphic method more objectively. Based on the point distribution, six evaluation parameters are proposed, which describe the overall performance of the algorithm evaluated. Furthermore, the graphic method is simple and intuitive. As an example of application, two advanced algorithms, i.e., the PHD and NNpredict methods, are evaluated and compared. It is shown that there is still much room for further improvement for both algorithms. It is pointed out that the accuracy for predicting either the alpha-helix or beta-strand in proteins with higher alpha-helix or beta-strand content, respectively, should be greatly improved for both algorithms.
Collapse
Affiliation(s)
- C T Zhang
- Department of Physics, Tianjin University, China.
| | | |
Collapse
|
16
|
Abstract
A novel method was introduced to predict protein subcellular locations from sequences. Using sequence data, this method achieved a prediction accuracy higher than previous methods based on the amino acid composition. For three subcellular locations in a prokaryotic organism, the overall prediction accuracy reached 89.1%. For eukaryotic proteins, prediction accuracies of 73.0% and 78.7% were attained within four and three location categories, respectively. These results demonstrate the applicability of this relative simple method and possible improvement of prediction for the protein subcellular location.
Collapse
Affiliation(s)
- Z Yuan
- National Laboratory of Biomacromolecules, Institute of Biophysics, Academia Sinica, Beijing, China.
| |
Collapse
|
17
|
Abstract
Genome sequencing projects continue to provide a flood of new protein sequences, and prediction methods remain an important means of adding structural information. Recently, there have been advances in secondary structure prediction, which feed, in turn, into improved fold recognition algorithms. Finally, there have been technical improvements in comparative modelling, and studies of the expected accuracy of three-dimensional structural models built by this method.
Collapse
Affiliation(s)
- D R Westhead
- The European Bioinformatics Institute EMBL Outstation Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK.
| | | |
Collapse
|