101
|
Williams SG, Lovell SC. The effect of sequence evolution on protein structural divergence. Mol Biol Evol 2009; 26:1055-65. [PMID: 19193735 DOI: 10.1093/molbev/msp020] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
The complex constraints imposed by protein structure and function result in varied rates of sequence and structural divergence in proteins. Analysis of sequence differences between homologous proteins can advance our understanding of structural divergence and some of the constraints that govern the evolution of these molecules. Here, we assess the relationship between amino acid sequence and structural divergence. Firstly, we demonstrate that the relationship between protein sequence and structural divergence is governed by a variety of evolutionary constraints, including solvent exposure and secondary structure. Secondly, although compensatory substitutions are widespread, we find many radical size-changing mutations that are not compensated by neighboring complementary changes. Instead, these noncompensated substitutions are mitigated by alteration of protein structure. These results suggest a combined mechanism of accommodating substitutions in proteins, involving both coevolution and structural accommodation. Such a mechanism can explain previously observed correlated substitutions of residues that are distant both in sequence and structure, allowing an integrated view of sequence and structural divergence of proteins.
Collapse
Affiliation(s)
- Simon G Williams
- Faculty of Life Sciences, University of Manchester, Manchester, UK
| | | |
Collapse
|
102
|
Carlson JM, Brumme ZL, Rousseau CM, Brumme CJ, Matthews P, Kadie C, Mullins JI, Walker BD, Harrigan PR, Goulder PJR, Heckerman D. Phylogenetic dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag. PLoS Comput Biol 2008; 4:e1000225. [PMID: 19023406 PMCID: PMC2579584 DOI: 10.1371/journal.pcbi.1000225] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2008] [Accepted: 10/09/2008] [Indexed: 11/18/2022] Open
Abstract
HIV avoids elimination by cytotoxic T-lymphocytes (CTLs) through the evolution of escape mutations. Although there is mounting evidence that these escape pathways are broadly consistent among individuals with similar human leukocyte antigen (HLA) class I alleles, previous population-based studies have been limited by the inability to simultaneously account for HIV codon covariation, linkage disequilibrium among HLA alleles, and the confounding effects of HIV phylogeny when attempting to identify HLA-associated viral evolution. We have developed a statistical model of evolution, called a phylogenetic dependency network, that accounts for these three sources of confounding and identifies the primary sources of selection pressure acting on each HIV codon. Using synthetic data, we demonstrate the utility of this approach for identifying sites of HLA-mediated selection pressure and codon evolution as well as the deleterious effects of failing to account for all three sources of confounding. We then apply our approach to a large, clinically-derived dataset of Gag p17 and p24 sequences from a multicenter cohort of 1144 HIV-infected individuals from British Columbia, Canada (predominantly HIV-1 clade B) and Durban, South Africa (predominantly HIV-1 clade C). The resulting phylogenetic dependency network is dense, containing 149 associations between HLA alleles and HIV codons and 1386 associations among HIV codons. These associations include the complete reconstruction of several recently defined escape and compensatory mutation pathways and agree with emerging data on patterns of epitope targeting. The phylogenetic dependency network adds to the growing body of literature suggesting that sites of escape, order of escape, and compensatory mutations are largely consistent even across different clades, although we also identify several differences between clades. As recent case studies have demonstrated, understanding both the complexity and the consistency of immune escape has important implications for CTL-based vaccine design. Phylogenetic dependency networks represent a major step toward systematically expanding our understanding of CTL escape to diverse populations and whole viral genes.
Collapse
Affiliation(s)
- Jonathan M. Carlson
- eScience Group, Microsoft Research, Redmond, Washington, United States of America
- Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America
| | - Zabrina L. Brumme
- Partners AIDS Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Christine M. Rousseau
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
| | - Chanson J. Brumme
- Partners AIDS Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Philippa Matthews
- Department of Paediatrics, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Carl Kadie
- eScience Group, Microsoft Research, Redmond, Washington, United States of America
| | - James I. Mullins
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
- Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | - Bruce D. Walker
- Partners AIDS Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Howard Hughes Medical Institute, Chevy Chase, Maryland, United States of America
| | - P. Richard Harrigan
- B.C. Centre for Excellence in HIV/AIDS, Vancouver, British Columbia, Canada
- Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Philip J. R. Goulder
- Partners AIDS Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Paediatrics, Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- HIV Pathogenesis Programme, The Doris Duke Medical Research Institute, University of KwaZulu-Natal, Durban, South Africa
| | - David Heckerman
- eScience Group, Microsoft Research, Redmond, Washington, United States of America
| |
Collapse
|
103
|
Ahn C, Seillier-Moiseiwitsch F, Koch GG. Predictive tests for linked changes. Stat Med 2008; 27:4790-804. [PMID: 18186528 DOI: 10.1002/sim.3164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Mutations may confer a survival advantage to an organism and they can also reduce their fitness. In particular, we are interested in identifying correlated changes in genomic sequences. We consider the general situation where the observed characters at two genomic positions are summarized by an r x c contingency table. The test statistic focusses on double departures from the consensus configuration. When the original data are aggregated into two possible categories at each position (consensus vs non-consensus character), we obtain a 2 x 2 table to derive a test statistic that deals with the total number of double changes. Expected values and variances are predicted, under the assumption of independence, from table entries corresponding to single-mutation events. In some situations, the resulting tests are more powerful than those previously proposed.
Collapse
Affiliation(s)
- C Ahn
- Rho, Inc., Chapel Hill, NC 27517, USA
| | | | | |
Collapse
|
104
|
Analysis of natural sequence variation and covariation in human immunodeficiency virus type 1 integrase. J Virol 2008; 82:9228-35. [PMID: 18596095 DOI: 10.1128/jvi.01535-07] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Human immunodeficiency virus type 1 (HIV-1) integrase inhibitors are in clinical trials, and raltegravir and elvitegravir are likely to be the first licensed drugs of this novel class of HIV antivirals. Understanding resistance to these inhibitors is important to maximize their efficacy. It has been shown that natural variation and covariation provide valuable insights into the development of resistance for established HIV inhibitors. Therefore, we have undertaken a study to fully characterize natural polymorphisms and amino acid covariation within an inhibitor-naïve sequence set spanning all defined HIV-1 subtypes. Inter- and intrasubtype variation was greatest in a 50-amino-acid segment of HIV-1 integrase incorporating the catalytic aspartic acid codon 116, suggesting that polymorphisms affect inhibitor binding and pathways to resistance. The critical mutations that determine the resistance pathways to raltegravir and elvitegravir (N155H, Q148K/R/H, and E92Q) were either rare or absent from the 1,165-sequence data set. However, 25 out of 41 mutations associated with integrase inhibitor resistance were present. These mutations were not subtype associated and were more prevalent in the subtypes that had been sampled frequently within the database. A novel modification of the Jaccard index was used to analyze amino acid covariation within HIV-1 integrase. A network of 10 covarying resistance-associated mutations was elucidated, along with a further 15 previously undescribed mutations that covaried with at least two of the resistance positions. The validation of covariation as a predictive tool will be dependent on monitoring the evolution of HIV-1 integrase under drug selection pressure.
Collapse
|
105
|
Miller CS, Eisenberg D. Using inferred residue contacts to distinguish between correct and incorrect protein models. ACTA ACUST UNITED AC 2008; 24:1575-82. [PMID: 18511466 PMCID: PMC2638260 DOI: 10.1093/bioinformatics/btn248] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivation: The de novo prediction of 3D protein structure is enjoying a period of dramatic improvements. Often, a remaining difficulty is to select the model closest to the true structure from a group of low-energy candidates. To what extent can inter-residue contact predictions from multiple sequence alignments, information which is orthogonal to that used in most structure prediction algorithms, be used to identify those models most similar to the native protein structure? Results: We present a Bayesian inference procedure to identify residue pairs that are spatially proximal in a protein structure. The method takes as input a multiple sequence alignment, and outputs an accurate posterior probability of proximity for each residue pair. We exploit a recent metagenomic sequencing project to create large, diverse and informative multiple sequence alignments for a test set of 1656 known protein structures. The method infers spatially proximal residue pairs in this test set with good accuracy: top-ranked predictions achieve an average accuracy of 38% (for an average 21-fold improvement over random predictions) in cross-validation tests. Notably, the accuracy of predicted 3D models generated by a range of structure prediction algorithms strongly correlates with how well the models satisfy probable residue contacts inferred via our method. This correlation allows for confident rejection of incorrect structural models. Availability: An implementation of the method is freely available at http://www.doe-mbi.ucla.edu/services Contact:david@mbi.ucla.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher S Miller
- UCLA-DOE Institute for Genomics & Proteomics, Molecular Biology Institute, Box 951570, UCLA, Los Angeles, CA 90095, USA
| | | |
Collapse
|
106
|
Codoñer FM, O'Dea S, Fares MA. Reducing the false positive rate in the non-parametric analysis of molecular coevolution. BMC Evol Biol 2008; 8:106. [PMID: 18402697 PMCID: PMC2362121 DOI: 10.1186/1471-2148-8-106] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2007] [Accepted: 04/10/2008] [Indexed: 11/14/2022] Open
Abstract
Background The strength of selective constraints operating on amino acid sites of proteins has a multifactorial nature. In fact, amino acid sites within proteins coevolve due to their functional and/or structural relationships. Different methods have been developed that attempt to account for the evolutionary dependencies between amino acid sites. Researchers have invested a significant effort to increase the sensitivity of such methods. However, the difficulty in disentangling functional co-dependencies from historical covariation has fuelled the scepticism over their power to detect biologically meaningful results. In addition, the biological parameters connecting linear sequence evolution to structure evolution remain elusive. For these reasons, most of the evolutionary studies aimed at identifying functional dependencies among protein domains have focused on the structural properties of proteins rather than on the information extracted from linear multiple sequence alignments (MSA). Non-parametric methods to detect coevolution have been reported to be especially susceptible to produce false positive results based on the properties of MSAs. However, no formal statistical analysis has been performed to definitively test the differential effects of these properties on the sensitivity of such methods. Results Here we test the effect that variations on the MSA properties have over the sensitivity of non-parametric methods to detect coevolution. We test the effect that the size of the MSA (number of sequences), mean pairwise amino acid distance per site and the strength of the coevolution signal have on the ability of non-parametric methods to detect coevolution. Our results indicate that all three factors have significant effects on the accuracy of non-parametric methods. Further, introducing statistical filters improves the sensitivity and increases the statistical power of the methods to detect functional coevolution. Statistical analysis of the physico-chemical properties of amino acid sites in the context of the protein structure reveals striking dependencies among amino acid sites. Results indicate a covariation trend in the hydrophobicities and molecular weight characteristics of amino acid sites when analysing a non-redundant set of 8000 protein structures. Using this biological information as filter in coevolutionary analyses minimises the false positive rate of these methods. Application of these filters to three different proteins with known functional domains supports the importance of using biological filters to detect coevolution. Conclusion Coevolutionary analyses using non-parametric methods have proved difficult and highly prone to provide spurious results depending on the properties of MSAs and on the strength of coevolution between amino acid sites. The application of statistical filters to the number of pairs detected as coevolving reduces significantly the number of artifactual results. Analysis of the physico-chemical properties of amino acid sites in the protein structure context reveals their structure-dependent covariation. The application of this known biological information to the analysis of covariation greatly enhances the functional coevolutionary signal and removes historical covariation. Simultaneous use of statistical and biological data is instrumental in the detection of functional amino acid sites dependencies and compensatory changes at the protein level.
Collapse
Affiliation(s)
- Francisco M Codoñer
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin, Ireland.
| | | | | |
Collapse
|
107
|
Thomas J, Ramakrishnan N, Bailey-Kellogg C. Graphical models of residue coupling in protein families. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:183-197. [PMID: 18451428 DOI: 10.1109/tcbb.2007.70225] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Many statistical measures and algorithmic techniques have been proposed for studying residue coupling in protein families. Generally speaking, two residue positions are considered coupled if, in the sequence record, some of their amino acid type combinations are significantly more common than others. While the proposed approaches have proven useful in finding and describing coupling, a significant missing component is a formal probabilistic model that explicates and compactly represents the coupling, integrates information about sequence,structure, and function, and supports inferential procedures for analysis, diagnosis, and prediction.We present an approach to learning and using probabilistic graphical models of residue coupling. These models capture significant conservation and coupling constraints observable ina multiply-aligned set of sequences. Our approach can place a structural prior on considered couplings, so that all identified relationships have direct mechanistic explanations. It can also incorporate information about functional classes, and thereby learn a differential graphical model that distinguishes constraints common to all classes from those unique to individual classes. Such differential models separately account for class-specific conservation and family-wide coupling, two different sources of sequence covariation. They are then able to perform interpretable functional classification of new sequences, explaining classification decisions in terms of the underlying conservation and coupling constraints. We apply our approach in studies of both G protein-coupled receptors and PDZ domains, identifying and analyzing family-wide and class-specific constraints, and performing functional classification. The results demonstrate that graphical models of residue coupling provide a powerful tool for uncovering, representing, and utilizing significant sequence structure-function relationships in protein families.
Collapse
Affiliation(s)
- John Thomas
- Department of Computer Science, Dartmouth College, Sudikoff Laboratory, Hanover, NH 03755, USA.
| | | | | |
Collapse
|
108
|
Codoñer FM, Fares MA. Why should we care about molecular coevolution? Evol Bioinform Online 2008; 4:29-38. [PMID: 19204805 PMCID: PMC2614197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Non-independent evolution of amino acid sites has become a noticeable limitation of most methods aimed at identifying selective constraints at functionally important amino acid sites or protein regions. The need for a generalised framework to account for non-independence of amino acid sites has fuelled the design and development of new mathematical models and computational tools centred on resolving this problem. Molecular coevolution is one of the most active areas of research, with an increasing rate of new models and methods being developed everyday. Both parametric and non-parametric methods have been developed to account for correlated variability of amino acid sites. These methods have been utilised for detecting phylogenetic, functional and structural coevolution as well as to identify surfaces of amino acid sites involved in protein-protein interactions. Here we discuss and briefly describe these methods, and identify their advantages and limitations.
Collapse
Affiliation(s)
- Francisco M. Codoñer
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College, Institute of Immunology, Biology Department, National University of Ireland Maynooth
| | - Mario A. Fares
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College,Correspondence:
| |
Collapse
|
109
|
Abstract
Non-independent evolution of amino acid sites has become a noticeable limitation of most methods aimed at identifying selective constraints at functionally important amino acid sites or protein regions. The need for a generalised framework to account for non-independence of amino acid sites has fuelled the design and development of new mathematical models and computational tools centred on resolving this problem. Molecular coevolution is one of the most active areas of research, with an increasing rate of new models and methods being developed everyday. Both parametric and non-parametric methods have been developed to account for correlated variability of amino acid sites. These methods have been utilised for detecting phylogenetic, functional and structural coevolution as well as to identify surfaces of amino acid sites involved in protein-protein interactions. Here we discuss and briefly describe these methods, and identify their advantages and limitations.
Collapse
Affiliation(s)
- Francisco M. Codoñer
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College
- Institute of Immunology, Biology Department, National University of Ireland Maynooth
| | - Mario A. Fares
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College
| |
Collapse
|
110
|
del Pozo A, Pazos F, Valencia A. Defining functional distances over gene ontology. BMC Bioinformatics 2008; 9:50. [PMID: 18221506 PMCID: PMC2375122 DOI: 10.1186/1471-2105-9-50] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2007] [Accepted: 01/25/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A fundamental problem when trying to define the functional relationships between proteins is the difficulty in quantifying functional similarities, even when well-structured ontologies exist regarding the activity of proteins (i.e. 'gene ontology' -GO-). However, functional metrics can overcome the problems in the comparing and evaluating functional assignments and predictions. As a reference of proximity, previous approaches to compare GO terms considered linkage in terms of ontology weighted by a probability distribution that balances the non-uniform 'richness' of different parts of the Direct Acyclic Graph. Here, we have followed a different approach to quantify functional similarities between GO terms. RESULTS We propose a new method to derive 'functional distances' between GO terms that is based on the simultaneous occurrence of terms in the same set of Interpro entries, instead of relying on the structure of the GO. The coincidence of GO terms reveals natural biological links between the GO functions and defines a distance model Df which fulfils the properties of a Metric Space. The distances obtained in this way can be represented as a hierarchical 'Functional Tree'. CONCLUSION The method proposed provides a new definition of distance that enables the similarity between GO terms to be quantified. Additionally, the 'Functional Tree' defines groups with biological meaning enhancing its utility for protein function comparison and prediction. Finally, this approach could be for function-based protein searches in databases, and for analysing the gene clusters produced by DNA array experiments.
Collapse
Affiliation(s)
- Angela del Pozo
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Melchor Fernandez Almagro, 3, E-28029 Madrid, Spain.
| | | | | |
Collapse
|
111
|
The average mutual information profile as a genomic signature. BMC Bioinformatics 2008; 9:48. [PMID: 18218139 PMCID: PMC2335307 DOI: 10.1186/1471-2105-9-48] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2007] [Accepted: 01/25/2008] [Indexed: 12/19/2022] Open
Abstract
Background Occult organizational structures in DNA sequences may hold the key to understanding functional and evolutionary aspects of the DNA molecule. Such structures can also provide the means for identifying and discriminating organisms using genomic data. Species specific genomic signatures are useful in a variety of contexts such as evolutionary analysis, assembly and classification of genomic sequences from large uncultivated microbial communities and a rapid identification system in health hazard situations. Results We have analyzed genomic sequences of eukaryotic and prokaryotic chromosomes as well as various subtypes of viruses using an information theoretic framework. We confirm the existence of a species specific average mutual information (AMI) profile. We use these profiles to define a very simple, computationally efficient, alignment free, distance measure that reflects the evolutionary relationships between genomic sequences. We use this distance measure to classify chromosomes according to species of origin, to separate and cluster subtypes of the HIV-1 virus, and classify DNA fragments to species of origin. Conclusion AMI profiles of DNA sequences prove to be species specific and easy to compute. The structure of AMI profiles are conserved, even in short subsequences of a species' genome, rendering a pervasive signature. This signature can be used to classify relatively short DNA fragments to species of origin.
Collapse
|
112
|
|
113
|
Dunn S, Wahl L, Gloor G. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 2007; 24:333-40. [DOI: 10.1093/bioinformatics/btm604] [Citation(s) in RCA: 363] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
114
|
Subtype-specific conformational differences within the V3 region of subtype B and subtype C human immunodeficiency virus type 1 Env proteins. J Virol 2007; 82:903-16. [PMID: 18003735 DOI: 10.1128/jvi.01444-07] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The V3 region of the human immunodeficiency virus type 1 gp120 Env protein is a key domain in Env due to its role in interacting with the coreceptors CCR5 and CXCR4. We examined potential subtype-specific V3 region differences by comparing patterns of amino acid variability and probing for subtype-specific structures using 11 anti-V3 monoclonal antibodies (V3 MAbs). Differences between the subtypes in patterns of variability were most evident in the stem and turn regions of V3 (positions 9 to 24), with the two subtypes being very similar in the base region. The characteristics of the binding of V3 MAbs to Env proteins of the subtype B virus JR-FL and the subtype C virus BR025 suggested three patterns, as each group of MAbs recognized a specific conformation- or sequence-based epitope. Viruses pseudotyped with Env from JR-FL and BR025 were resistant to neutralization by the V3 MAbs, although the replacement of the Env V3 region of the SF162 virus with the JR-FL V3 created a pseudotyped virus that was hypersensitive to neutralization. A single mutation in V3 (H13R) made this chimeric Env selectively resistant to one group of V3 MAbs, consistent with the mAb binding properties. We hypothesize that there are intrinsic differences in V3 conformation between subtype B and subtype C that are localized to the stem and turn regions and that these differences have two important biological consequences: first, subtype B and subtype C V3 regions can have subtype-specific epitopes that will inherently limit antibody cross-reactivity, and second, V3 conformational differences may potentiate the frequent evolution of R5- into X4-tropic variants of subtype B but limit subtype C virus from using the same mechanism to evolve X4-tropic variants as efficiently.
Collapse
|
115
|
Poon AFY, Lewis FI, Pond SLK, Frost SDW. An evolutionary-network model reveals stratified interactions in the V3 loop of the HIV-1 envelope. PLoS Comput Biol 2007; 3:e231. [PMID: 18039027 PMCID: PMC2082504 DOI: 10.1371/journal.pcbi.0030231] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2007] [Accepted: 10/11/2007] [Indexed: 12/28/2022] Open
Abstract
The third variable loop (V3) of the human immunodeficiency virus type 1 (HIV-1) envelope is a principal determinant of antibody neutralization and progression to AIDS. Although it is undoubtedly an important target for vaccine research, extensive genetic variation in V3 remains an obstacle to the development of an effective vaccine. Comparative methods that exploit the abundance of sequence data can detect interactions between residues of rapidly evolving proteins such as the HIV-1 envelope, revealing biological constraints on their variability. However, previous studies have relied implicitly on two biologically unrealistic assumptions: (1) that founder effects in the evolutionary history of the sequences can be ignored, and; (2) that statistical associations between residues occur exclusively in pairs. We show that comparative methods that neglect the evolutionary history of extant sequences are susceptible to a high rate of false positives (20%-40%). Therefore, we propose a new method to detect interactions that relaxes both of these assumptions. First, we reconstruct the evolutionary history of extant sequences by maximum likelihood, shifting focus from extant sequence variation to the underlying substitution events. Second, we analyze the joint distribution of substitution events among positions in the sequence as a Bayesian graphical model, in which each branch in the phylogeny is a unit of observation. We perform extensive validation of our models using both simulations and a control case of known interactions in HIV-1 protease, and apply this method to detect interactions within V3 from a sample of 1,154 HIV-1 envelope sequences. Our method greatly reduces the number of false positives due to founder effects, while capturing several higher-order interactions among V3 residues. By mapping these interactions to a structural model of the V3 loop, we find that the loop is stratified into distinct evolutionary clusters. We extend our model to detect interactions between the V3 and C4 domains of the HIV-1 envelope, and account for the uncertainty in mapping substitutions to the tree with a parametric bootstrap.
Collapse
Affiliation(s)
- Art F Y Poon
- Department of Pathology, University of California San Diego, La Jolla, California, United States of America.
| | | | | | | |
Collapse
|
116
|
Sing T, Low AJ, Beerenwinkel N, Sander O, Cheung PK, Domingues FS, Büch J, Däumer M, Kaiser R, Lengauer T, Harrigan PR. Predicting HIV Coreceptor Usage on the Basis of Genetic and Clinical Covariates. Antivir Ther 2007. [DOI: 10.1177/135965350701200709] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Background We compared several statistical learning methods for the prediction of HIV coreceptor use from clonal HIV third hypervariable (V3) loop sequences, and evaluated and improved their effectiveness on clinical samples. Methods Support vector machines (SVM), artificial neural networks, position-specific scoring matrices (PSSM) and mixtures of localized rules were estimated and tested using 10x ten-fold cross-validation on a clonal dataset consisting of 1,100 matched clonal genotype-phenotype pairs from 332 patients. Different SVMs were also trained and tested on a clinically derived dataset, representing 920 patient samples from British Columbia, Canada. Methods were evaluated using receiver operating characteristic (ROC) curves. Results In the clonal analysis, the sensitivity of the 11/25 rule at 92.5% specificity was 59.5%. PSSMs and SVMs increased sensitivity to 71.9% and 76.4%, respectively, at the same specificity ( P<<0.05). In clinical samples, the sensitivity of the 11/25 rule and SVM decreased to 25.9% (specificity 93.9%) and 39.8% (specificity 93.5%), respectively. However, the integration of clinical data resulted in a further 2.4-fold increase in sensitivity over the 11/25 rule (63%). Univariate analyses identified 41 V3 mutations significantly associated with coreceptor usage. Conclusion For all methods tested, a substantial sensitivity decrease is observed on clinical data, probably owing to the heterogeneity of the viral population in vivo. In response to these complications, we present an SVM-based approach that integrates sequence information with clinical and host data, resulting in improved performance and sensitivity compared with purely sequence-based approaches.
Collapse
Affiliation(s)
- Tobias Sing
- Max Planck Institute for Informatics, Saarbrücken, Germany
- Department for Modeling and Simulation, Novartis Pharmaceuticals, Basel, Switzerland
| | - Andrew J Low
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
- Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| | | | - Oliver Sander
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Peter K Cheung
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
| | | | - Joachim Büch
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | | | | | | | - P Richard Harrigan
- British Columbia Centre for Excellence in HIV/AIDS, Vancouver, BC, Canada
- Faculty of Medicine, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
117
|
Yeang CH, Haussler D. Detecting coevolution in and among protein domains. PLoS Comput Biol 2007; 3:e211. [PMID: 17983264 PMCID: PMC2098842 DOI: 10.1371/journal.pcbi.0030211] [Citation(s) in RCA: 133] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2007] [Accepted: 09/17/2007] [Indexed: 01/17/2023] Open
Abstract
Correlated changes of nucleic or amino acids have provided strong information about the structures and interactions of molecules. Despite the rich literature in coevolutionary sequence analysis, previous methods often have to trade off between generality, simplicity, phylogenetic information, and specific knowledge about interactions. Furthermore, despite the evidence of coevolution in selected protein families, a comprehensive screening of coevolution among all protein domains is still lacking. We propose an augmented continuous-time Markov process model for sequence coevolution. The model can handle different types of interactions, incorporate phylogenetic information and sequence substitution, has only one extra free parameter, and requires no knowledge about interaction rules. We employ this model to large-scale screenings on the entire protein domain database (Pfam). Strikingly, with 0.1 trillion tests executed, the majority of the inferred coevolving protein domains are functionally related, and the coevolving amino acid residues are spatially coupled. Moreover, many of the coevolving positions are located at functionally important sites of proteins/protein complexes, such as the subunit linkers of superoxide dismutase, the tRNA binding sites of ribosomes, the DNA binding region of RNA polymerase, and the active and ligand binding sites of various enzymes. The results suggest sequence coevolution manifests structural and functional constraints of proteins. The intricate relations between sequence coevolution and various selective constraints are worth pursuing at a deeper level. The sequences of different components within and across genes often undergo coordinated changes in order to maintain the structures or functions of the genes. Identifying the coordinated changes—the “coevolution”—of those components in the context of evolution is important in predicting the structures, interactions, and functions of genes. The authors incur a large-scale screening on all the known protein sequences and build a compendium about the coevolving relations of all protein domains—subunits of proteins. The majority of the coevolving protein domains either belongs to the same proteins, appears in the same protein complexes, or shares the same functional annotations. Furthermore, coevolving positions in the same proteins or protein complexes are spatially coupled, as they tend to be closer than random positions in the 3-D structures of the proteins/protein complexes. More strikingly, many coevolving positions are located at functionally important sites of the molecules. The results provide useful insights about the relations between sequence evolution and protein structures and functions.
Collapse
Affiliation(s)
- Chen-Hsiang Yeang
- Simons Center for Systems Biology, Institute for Advanced Study, Princeton, New Jersey, United States of America.
| | | |
Collapse
|
118
|
Wang Q, Lee C. Distinguishing functional amino acid covariation from background linkage disequilibrium in HIV protease and reverse transcriptase. PLoS One 2007; 2:e814. [PMID: 17726544 PMCID: PMC1950573 DOI: 10.1371/journal.pone.0000814] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2007] [Accepted: 08/01/2007] [Indexed: 11/19/2022] Open
Abstract
Correlated amino acid mutation analysis has been widely used to infer functional interactions between different sites in a protein. However, this analysis can be confounded by important phylogenetic effects broadly classifiable as background linkage disequilibrium (BLD). We have systematically separated the covariation induced by selective interactions between amino acids from background LD, using synonymous (S) vs. amino acid (A) mutations. Covariation between two amino acid mutations, (A,A), can be affected by selective interactions between amino acids, whereas covariation within (A,S) pairs or (S,S) pairs cannot. Our analysis of the pol gene — including the protease and the reverse transcriptase genes — in HIV reveals that (A,A) covariation levels are enormously higher than for either (A,S) or (S,S), and thus cannot be attributed to phylogenetic effects. The magnitude of these effects suggests that a large portion of (A,A) covariation in the HIV pol gene results from selective interactions. Inspection of the most prominent (A,A) interactions in the HIV pol gene showed that they are known sites of independently identified drug resistance mutations, and physically cluster around the drug binding site. Moreover, the specific set of (A,A) interaction pairs was reproducible in different drug treatment studies, and vanished in untreated HIV samples. The (S,S) covariation curves measured a low but detectable level of background LD in HIV.
Collapse
Affiliation(s)
- Qi Wang
- Center for Computational Biology, Molecular Biology Institute, Institute for Genomics and Proteomics, University of California at Los Angeles, Los Angeles, United States of America
| | - Christopher Lee
- Center for Computational Biology, Molecular Biology Institute, Institute for Genomics and Proteomics, University of California at Los Angeles, Los Angeles, United States of America
- Department of Chemistry and Biochemistry, University of California at Los Angeles, Los Angeles, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
119
|
Carlson J, Kadie C, Mallal S, Heckerman D. Leveraging hierarchical population structure in discrete association studies. PLoS One 2007; 2:e591. [PMID: 17611623 PMCID: PMC1899226 DOI: 10.1371/journal.pone.0000591] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2007] [Accepted: 06/08/2007] [Indexed: 11/22/2022] Open
Abstract
Population structure can confound the identification of correlations in biological data. Such confounding has been recognized in multiple biological disciplines, resulting in a disparate collection of proposed solutions. We examine several methods that correct for confounding on discrete data with hierarchical population structure and identify two distinct confounding processes, which we call coevolution and conditional influence. We describe these processes in terms of generative models and show that these generative models can be used to correct for the confounding effects. Finally, we apply the models to three applications: identification of escape mutations in HIV-1 in response to specific HLA-mediated immune pressure, prediction of coevolving residues in an HIV-1 peptide, and a search for genotypes that are associated with bacterial resistance traits in Arabidopsis thaliana. We show that coevolution is a better description of confounding in some applications and conditional influence is better in others. That is, we show that no single method is best for addressing all forms of confounding. Analysis tools based on these models are available on the internet as both web based applications and downloadable source code at http://atom.research.microsoft.com/bio/phylod.aspx.
Collapse
Affiliation(s)
- Jonathan Carlson
- Machine Learning and Applied Statistics Group, Microsoft Research, Redmond, Washington, United States of America
- Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America
| | - Carl Kadie
- Machine Learning and Applied Statistics Group, Microsoft Research, Redmond, Washington, United States of America
| | - Simon Mallal
- Center for Clinical Immunology and Biomedical Statistics, Royal Perth Hospital, Perth, Australia
| | - David Heckerman
- Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
120
|
Rhee SY, Liu TF, Holmes SP, Shafer RW. HIV-1 subtype B protease and reverse transcriptase amino acid covariation. PLoS Comput Biol 2007; 3:e87. [PMID: 17500586 PMCID: PMC1866358 DOI: 10.1371/journal.pcbi.0030087] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2006] [Accepted: 04/02/2007] [Indexed: 11/19/2022] Open
Abstract
Despite the high degree of HIV-1 protease and reverse transcriptase (RT) mutation in the setting of antiretroviral therapy, the spectrum of possible virus variants appears to be limited by patterns of amino acid covariation. We analyzed patterns of amino acid covariation in protease and RT sequences from more than 7,000 persons infected with HIV-1 subtype B viruses obtained from the Stanford HIV Drug Resistance Database (http://hivdb.stanford.edu). In addition, we examined the relationship between conditional probabilities associated with a pair of mutations and the order in which those mutations developed in viruses for which longitudinal sequence data were available. Patterns of RT covariation were dominated by the distinct clustering of Type I and Type II thymidine analog mutations and the Q151M-associated mutations. Patterns of protease covariation were dominated by the clustering of nelfinavir-associated mutations (D30N and N88D), two main groups of protease inhibitor (PI)-resistance mutations associated either with V82A or L90M, and a tight cluster of mutations associated with decreased susceptibility to amprenavir and the most recently approved PI darunavir. Different patterns of covariation were frequently observed for different mutations at the same position including the RT mutations T69D versus T69N, L74V versus L74I, V75I versus V75M, T215F versus T215Y, and K219Q/E versus K219N/R, and the protease mutations M46I versus M46L, I54V versus I54M/L, and N88D versus N88S. Sequence data from persons with correlated mutations in whom earlier sequences were available confirmed that the conditional probabilities associated with correlated mutation pairs could be used to predict the order in which the mutations were likely to have developed. Whereas accessory nucleoside RT inhibitor-resistance mutations nearly always follow primary nucleoside RT inhibitor-resistance mutations, accessory PI-resistance mutations often preceded primary PI-resistance mutations.
Collapse
Affiliation(s)
- Soo-Yon Rhee
- Division of Infectious Diseases, Department of Medicine, Stanford University, Stanford, California, United States of America
| | - Tommy F Liu
- Division of Infectious Diseases, Department of Medicine, Stanford University, Stanford, California, United States of America
| | - Susan P Holmes
- Department of Statistics, Stanford University, Stanford, California, United States of America
| | - Robert W Shafer
- Division of Infectious Diseases, Department of Medicine, Stanford University, Stanford, California, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
121
|
In silico identification of functional divergence between the multiple groEL gene paralogs in Chlamydiae. BMC Evol Biol 2007; 7:81. [PMID: 17519003 PMCID: PMC1892554 DOI: 10.1186/1471-2148-7-81] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2007] [Accepted: 05/22/2007] [Indexed: 12/26/2022] Open
Abstract
Background Heat-shock proteins are specialized molecules performing different and essential roles in the cell including protein degradation, folding and trafficking. GroEL is a 60 Kda heat-shock protein ubiquitous in bacteria and has been regarded as an important molecule implicated in chronic inflammatory processes caused by Chlamydiae infections. GroEL in Chlamydiae became duplicated at the origin of the Chlamydiae lineage presenting three distinct molecular chaperones, namely the original protein GroEL1 (Ct110), and its paralogous proteins GroEL2 (Ct604) and GroEL3 (Ct755). These chaperones present differential and independent expressions during the different stages of Chlamydiae infections and have been suggested to present differential physiological and regulatory roles. Results In this comprehensive in silico study we show that GroEL protein paralogs have diverged functionally after the different gene duplication events and that this divergence has occurred mainly between GroEL3 and GroEL1. GroEL2 presents an intermediate functional divergence pattern from GroEL1. Our results point to the different protein-protein interaction patterns between GroEL paralogs and known GroEL protein clients supporting their functional divergence after groEL gene duplication. Analysis of selective constraints identifies periods of adaptive evolution after gene duplication that led to the fixation of amino acid replacements in GroEL protein domains involved in the interaction with GroEL protein clients. Conclusion We demonstrate that GroEL protein copies in Chlamydiae species have diverged functionally after the gene duplication events. We also show that functional divergence has occurred in important functional regions of these GroEL proteins and that very probably have affected the ancestral GroEL regulatory role and protein-protein interaction patterns with GroEL client proteins. Most of the amino acid replacements that have affected interaction with protein clients and that were responsible for the functional divergence between GroEL paralogs were fixed by adaptive evolution after the groEL gene duplication events.
Collapse
|
122
|
Pantophlet R, Aguilar-Sino RO, Wrin T, Cavacini LA, Burton DR. Analysis of the neutralization breadth of the anti-V3 antibody F425-B4e8 and re-assessment of its epitope fine specificity by scanning mutagenesis. Virology 2007; 364:441-53. [PMID: 17418361 PMCID: PMC1985947 DOI: 10.1016/j.virol.2007.03.007] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2007] [Revised: 02/13/2007] [Accepted: 03/06/2007] [Indexed: 10/23/2022]
Abstract
The identification of cross-neutralizing antibodies to HIV-1 is important for designing antigens aimed at eliciting similar antibodies upon immunization. The monoclonal antibody (mAb) F425-B4e8 had been suggested previously to bind an epitope at the base of V3 and shown to neutralize two primary HIV isolates. Here, we have assessed the neutralization breadth of mAb F425-B4e8 using a 40-member panel of primary HIV-1 and determined the epitope specificity of the mAb. The antibody was able to neutralize 8 clade B viruses (n=16), 1 clade C virus (n=11), and 2 clade D viruses (n=6), thus placing it among the more broadly neutralizing anti-V3 antibodies described so far. Contrary to an initial report, results from our scanning mutagenesis of the V3 region suggest that mAb F425-B4e8 interacts primarily with the crown/tip of V3, notably Ile(309), Arg(315), and Phe(317). Despite the somewhat limited neutralization breadth of mAb F425-B4e8, the results presented here, along with analyses from other cross-neutralizing anti-V3 mAbs, may facilitate the template-based design of antigens that target V3 and permit neutralization of HIV-1 strains in which the V3 region is accessible to antibodies.
Collapse
Affiliation(s)
- Ralph Pantophlet
- The Scripps Research Institute, Department of Immunology, IMM2, La Jolla, CA 92037, USA.
| | | | | | | | | |
Collapse
|
123
|
Ruano-Rubio V, Fares MA. Testing the Neutral Fixation of Hetero-Oligomerism in the Archaeal Chaperonin CCT. Mol Biol Evol 2007; 24:1384-96. [PMID: 17406022 DOI: 10.1093/molbev/msm065] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The evolutionary transition from homo-oligomerism to hetero-oligomerism in multimeric proteins and its contribution to function innovation and organism complexity remain to be investigated. Here, we undertake the challenge of contributing to this theoretical ground by investigating the hetero-oligomerism in the molecular chaperonin cytosolic chaperonin containing tailless complex polypeptide 1 (CCT) from archaea. CCT is amenable to this study because, in contrast to eukaryotic CCTs where sub-functionalization after gene duplication has been taken to completion, archaeal CCTs present no evidence for subunit functional specialization. Our analyses yield additional information to previous reports on archaeal CCT paralogy by identifying new duplication events. Analyses of selective constraints show that amino acid sites from 1 subunit have fixed slightly deleterious mutations at inter-subunit interfaces after gene duplication. These mutations have been followed by compensatory mutations in nearby regions of the same subunit and in the interface contact regions of its paralogous subunit. The strong selective constraints in these regions after speciation support the evolutionary entrapment of CCTs as hetero-oligomers. In addition, our results unveil different evolutionary dynamics depending on the degree of CCT hetero-oligomerism. Archaeal CCT protein complexes comprising 3 distinct classes of subunits present 2 evolutionary processes. First, slightly deleterious and compensatory mutations were fixed neutrally at inter-subunit regions. Second, sub-functionalization may have occurred at substrate-binding and adenosine triphosphate-binding regions after the 2nd gene duplication event took place. CCTs with 2 distinct types of subunits did not present evidence of sub-functionalization. Our results provide the 1st in silico evidence for the neutral fixation of hetero-oligomerism in archaeal CCTs and provide information on the evolution of hetero-oligomerism toward sub-functionalization in archaeal CCTs.
Collapse
Affiliation(s)
- Valentin Ruano-Rubio
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin, Ireland
| | | |
Collapse
|
124
|
Rong R, Gnanakaran S, Decker JM, Bibollet-Ruche F, Taylor J, Sfakianos JN, Mokili JL, Muldoon M, Mulenga J, Allen S, Hahn BH, Shaw GM, Blackwell JL, Korber BT, Hunter E, Derdeyn CA. Unique mutational patterns in the envelope alpha 2 amphipathic helix and acquisition of length in gp120 hypervariable domains are associated with resistance to autologous neutralization of subtype C human immunodeficiency virus type 1. J Virol 2007; 81:5658-68. [PMID: 17360739 PMCID: PMC1900276 DOI: 10.1128/jvi.00257-07] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Autologous neutralizing antibodies (NAb) against human immunodeficiency virus type 1 generate viral escape variants; however, the mechanisms of escape are not clearly defined. In a previous study, we determined the susceptibilities of 48 donor and 25 recipient envelope (Env) glycoproteins from five subtype C heterosexual transmission pairs to NAb in donor plasma by using a virus pseudotyping assay, thereby providing an ideal setting to probe the determinants of susceptibility to neutralization. In the present study, acquisition of length in the Env gp120 hypervariable domains was shown to correlate with resistance to NAb in donor plasma (P = 0.01; Kendall's tau test) but not in heterologous plasma. Sequence divergence in the gp120 V1-to-V4 region also correlated with resistance to donor (P = 0.0002) and heterologous (P = 0.001) NAb. A mutual information analysis suggested possible associations of nine amino acid positions in V1 to V4 with NAb resistance to the donor's antibodies, and five of these were located within an 18-residue amphipathic helix (alpha2) located on the gp120 outer domain. High nonsynonymous-to-synonymous substitution (dN/dS) ratios, indicative of positive selection, were also found at these five positions in subtype C sequences in the database. Nevertheless, exchange of the entire alpha2 helix between resistant donor Envs and sensitive recipient Envs did not alter the NAb phenotype. The combined mutual information and dN/dS analyses suggest that unique mutational patterns in alpha2 and insertions in the V1-to-V4 region are associated with NAb resistance during subtype C infection but that the selected positions within the alpha2 helix must be linked to still other changes in Env to confer antibody escape. These findings suggest that subtype C viruses utilize mutations in the alpha2 helix for efficient viral replication and immune avoidance.
Collapse
Affiliation(s)
- Rong Rong
- Department of Pathology and Laboratory Medicine, Emory University, Atlanta, Georgia, Atlanta, GA 30329, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
125
|
Tully DC, Fares MA. Unravelling selection shifts among foot-and-mouth disease virus (FMDV) serotypes. Evol Bioinform Online 2007; 2:211-25. [PMID: 19455214 PMCID: PMC2674665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
FMDV virus has been increasingly recognised as the most economically severe animal virus with a remarkable degree of antigenic diversity. Using an integrative evolutionary and computational approach we have compelling evidence for heterogeneity in the selection forces shaping the evolution of the seven different FMDV serotypes. Our results show that positive Darwinian selection has governed the evolution of the major antigenic regions of serotypes A, Asia1, O, SAT1 and SAT2, but not C or SAT3. Co-evolution between sites from antigenic regions under positive selection pinpoints their functional communication to generate immune-escape mutants while maintaining their ability to recognise the host-cell receptors. Neural network and functional divergence analyses strongly point to selection shifts between the different serotypes. Our results suggest that, unlike African FMDV serotypes, serotypes with wide geographical distribution have accumulated compensatory mutations as a strategy to ameliorate the effect of slightly deleterious mutations fixed by genetic drift. This strategy may have provided the virus by a flexibility to generate immune-escape mutants and yet recognise host-cell receptors. African serotypes presented no evidence for compensatory mutations. Our results support heterogeneous selective constraints affecting the different serotypes. This points to the possible accelerated rates of evolution diverging serotypes sharing geographical locations as to ameliorate the competition for the host.
Collapse
Affiliation(s)
- Damien C. Tully
- Molecular Evolution and Bioinformatics Laboratory, Biology Department, National University of Ireland, Maynooth, Co. Kildare, Ireland
| | - Mario A. Fares
- Molecular Evolution and Bioinformatics Laboratory, Biology Department, National University of Ireland, Maynooth, Co. Kildare, Ireland,Correspondence: Dr. Mario A. Fares, Tel: 353 01 6081064; Fax: 353 01 6714968;
| |
Collapse
|
126
|
Travers SAA, Fares MA. Functional coevolutionary networks of the Hsp70-Hop-Hsp90 system revealed through computational analyses. Mol Biol Evol 2007; 24:1032-44. [PMID: 17267421 DOI: 10.1093/molbev/msm022] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Currently, the identification of groups of amino acid residues that are important in the function, structure, or interaction of a protein can be both costly and prohibitively complex, involving vast numbers of mutagenesis experiments. Here, we present the application of a novel computational method, which identifies the presence of coevolution in a data set, thereby enabling the a priori identification of amino acid residues that play an important role in protein function. We have applied this method to the heat shock protein (Hsp) protein-folding system, studying the network between Hsp70, Hsp90, and Hop (heat shock-organizing protein). Our analysis has identified functional residues within the tetratricopeptide repeat (TPR) 1 and 2A domains in Hop, previously shown to be interacting with Hsp70 and Hsp90, respectively. Further, we have identified significant residues elsewhere in Hop within domains that have been recently proposed as being important for Hop interaction with Hsp70 and/or Hsp90. In addition, several amino acid sites present in groups of coevolution were identified as 3-dimensionally or linearly proximal to functionally important sites or domains. Based on our results, we also investigate a further functional domain within Hop, between TPR1 and TPR2A, which we suggest as being functionally important in the interaction of Hop with both Hsp70 and Hsp90 whether directly or otherwise. Our method has identified all the previously characterized functionally important regions in this system, thereby indicating the power of this method in the a priori identification of important regions for site-directed mutagenesis studies.
Collapse
Affiliation(s)
- Simon A A Travers
- Molecular Evolution and Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, Ireland
| | | |
Collapse
|
127
|
Abstract
Many newly identified gene products from completely sequenced genomes are difficult to characterize in the absence of sequence homology to known proteins. In such a scenario, the context of the proteins' functional associations can be used for annotation; overrepresented functional linkages with a certain class of proteins or members of a pathway allow putative function assignments based on the "guilt-by-association" principle. Two computational functional genomics methods, phylogenetic profiling and identification of Rosetta stone linkages, are described in this chapter, which allow assessment of functional linkages between proteins, consequently facilitating annotation. Phylogenetic profiling involves measuring similarity between profiles that describe the presence or absence of a protein in a set of reference genomes, whereas Rosetta stone fusion sequences help link two or more independently transcribed and translated proteins. Both methods can be applied to investigate functional associations between individual proteins, and can also be extended to reconstruct the genome-wide network of functional linkages by querying the entire protein complement of an organism.
Collapse
|
128
|
Gnanakaran S, Lang D, Daniels M, Bhattacharya T, Derdeyn CA, Korber B. Clade-specific differences between human immunodeficiency virus type 1 clades B and C: diversity and correlations in C3-V4 regions of gp120. J Virol 2006; 81:4886-91. [PMID: 17166900 PMCID: PMC1900169 DOI: 10.1128/jvi.01954-06] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Current knowledge of human immunodeficiency virus type 1 envelope (Env) glycoprotein structure and function is based on studies of clade B viruses. We present evidence of sequence and structural differences in viral glycoprotein gp120 between clades B and C. In clade C, the C3 region alpha2-helix exhibits high sequence entropy at the polar face but maintains its amphipathicity, whereas in clade B it accommodates hydrophobic residues. The V4 hypervariable domain in clade C is shorter than that in clade B. Generally, shorter V4 loops are incompatible with a glycine occurring in the alpha2-helix in clade C, an intriguing association that could be exploited to inform Env immunogen design.
Collapse
Affiliation(s)
- S Gnanakaran
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
| | | | | | | | | | | |
Collapse
|
129
|
Poon AFY, Lewis FI, Pond SLK, Frost SDW. Evolutionary interactions between N-linked glycosylation sites in the HIV-1 envelope. PLoS Comput Biol 2006; 3:e11. [PMID: 17238283 PMCID: PMC1779302 DOI: 10.1371/journal.pcbi.0030011] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2006] [Accepted: 12/07/2006] [Indexed: 11/18/2022] Open
Abstract
The addition of asparagine (N)-linked polysaccharide chains (i.e., glycans) to the gp120 and gp41 glycoproteins of human immunodeficiency virus type 1 (HIV-1) envelope is not only required for correct protein folding, but also may provide protection against neutralizing antibodies as a "glycan shield." As a result, strong host-specific selection is frequently associated with codon positions where nonsynonymous substitutions can create or disrupt potential N-linked glycosylation sites (PNGSs). Moreover, empirical data suggest that the individual contribution of PNGSs to the neutralization sensitivity or infectivity of HIV-1 may be critically dependent on the presence or absence of other PNGSs in the envelope sequence. Here we evaluate how glycan-glycan interactions have shaped the evolution of HIV-1 envelope sequences by analyzing the distribution of PNGSs in a large-sequence alignment. Using a "covarion"-type phylogenetic model, we find that the rates at which individual PNGSs are gained or lost vary significantly over time, suggesting that the selective advantage of having a PNGS may depend on the presence or absence of other PNGSs in the sequence. Consequently, we identify specific interactions between PNGSs in the alignment using a new paired-character phylogenetic model of evolution, and a Bayesian graphical model. Despite the fundamental differences between these two methods, several interactions are jointly identified by both. Mapping these interactions onto a structural model of HIV-1 gp120 reveals that negative (exclusive) interactions occur significantly more often between colocalized glycans, while positive (inclusive) interactions are restricted to more distant glycans. Our results imply that the adaptive repertoire of alternative configurations in the HIV-1 glycan shield is limited by functional interactions between the N-linked glycans. This represents a potential vulnerability of rapidly evolving HIV-1 populations that may provide useful glycan-based targets for neutralizing antibodies.
Collapse
Affiliation(s)
- Art F Y Poon
- Department of Pathology, University of California San Diego, La Jolla, California, United States of America.
| | | | | | | |
Collapse
|
130
|
Abstract
UNLABELLED Coevolution Analysis using Protein Sequences (CAPS) is a PERL based software that identifies co-evolution between amino acid sites. Blosum-corrected amino acid distances are used to identify amino acid co-variation. The phylogenetic sequence relationships are used to remove the phylogenetic and stochastic dependencies between sites. The 3D protein structure is used to identify the nature of the dependencies between co-evolving amino acid sites. Friendly interpretable output files are generated. AVAILABILITY CAPS version 1 is available at http://bioinf.gen.tcd.ie/~faresm/software/caps/. Distribution versions for Linux/Unix, Mac OS X and Windows operating systems are available, including manual and example files.
Collapse
Affiliation(s)
- Mario A Fares
- Evolutionary Genetics and Bioinformatics Laboratory, Department of Genetics Smurfit Institute of Genetics, University of Dublin, Trinity College, Dublin 2, Dublin, Ireland.
| | | |
Collapse
|
131
|
Codoñer FM, Fares MA, Elena SF. Adaptive covariation between the coat and movement proteins of prunus necrotic ringspot virus. J Virol 2006; 80:5833-40. [PMID: 16731922 PMCID: PMC1472603 DOI: 10.1128/jvi.00122-06] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The relative functional and/or structural importance of different amino acid sites in a protein can be assessed by evaluating the selective constraints to which they have been subjected during the course of evolution. Here we explore such constraints at the linear and three-dimensional levels for the movement protein (MP) and coat protein (CP) encoded by RNA 3 of prunus necrotic ringspot ilarvirus (PNRSV). By a maximum-parsimony approach, the nucleotide sequences from 46 isolates of PNRSV varying in symptomatology, host tree, and geographic origin have been analyzed and sites under different selective pressures have been identified in both proteins. We have also performed covariation analyses to explore whether changes in certain amino acid sites condition subsequent variation in other sites of the same protein or the other protein. These covariation analyses shed light on which particular amino acids should be involved in the physical and functional interaction between MP and CP. Finally, we discuss these findings in the light of what is already known about the implication of certain sites and domains in structure and protein-protein and RNA-protein interactions.
Collapse
Affiliation(s)
- Francisco M Codoñer
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-UPV, 46022 València, Spain
| | | | | |
Collapse
|
132
|
Solis M, Wilkinson P, Romieu R, Hernandez E, Wainberg MA, Hiscott J. Gene expression profiling of the host response to HIV-1 B, C, or A/E infection in monocyte-derived dendritic cells. Virology 2006; 352:86-99. [PMID: 16730773 DOI: 10.1016/j.virol.2006.04.010] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2005] [Revised: 01/17/2006] [Accepted: 04/03/2006] [Indexed: 02/04/2023]
Abstract
Dendritic cells (DC) are among the first targets of human immunodeficiency virus type-1 (HIV-1) infection and in turn play a crucial role in viral transmission to T cells and in the regulation of the immune response. The major group of HIV-1 has diversified genetically based on variation in env sequences and comprise at least 11 subtypes. Because little is known about the host response elicited against different HIV-1 clade isolates in vivo, we sought to use gene expression profiling to identify genes regulated by HIV-1 subtypes B, C, and A/E upon de novo infection of primary immature monocyte-derived DC (iMDDCs). A total of 3700 immune-related genes were subjected to a significance analysis of microarrays (SAM); 656 genes were selected as significant and were further divided into 8 functional categories. Regardless of the time of infection, 20% of the genes affected by HIV-1 were involved in signal transduction, followed by 14% of the genes identified as transcription-related genes, and 7% were classified as playing a role in cell proliferation and cell cycle. Furthermore, 7% of the genes were immune response genes. By 72 h postinfection, genes upregulated by subtype B included the inhibitor of the matrix metalloproteinase TIMP2 and the heat shock protein 40 homolog (Hsp40) DNAJB1, whereas the IFN inducible gene STAT1, the MAPK1/ERK2 kinase regulator ST5, and the chemokine CXCL3 and SHC1 genes were induced by subtypes C and A/E. These analyses distinguish a temporally regulated host response to de novo HIV-1 infection in primary dendritic cells.
Collapse
Affiliation(s)
- Mayra Solis
- McGill AIDS Center, Lady Davis Institute for Medical Research, Jewish General Hospital, Department of Microbiology and Immunology, McGill University, 3755 Cote Ste. Catherine, Montreal, Quebec, Canada H3T1E2
| | | | | | | | | | | |
Collapse
|
133
|
FELSÖVÁLYI KLÁRA, NÁDAS ARTHUR, ZOLLA-PAZNER SUSAN, CARDOZO TIMOTHY. Distinct sequence patterns characterize the V3 region of HIV type 1 gp120 from subtypes A and C. AIDS Res Hum Retroviruses 2006; 22:703-8. [PMID: 16831095 PMCID: PMC1868395 DOI: 10.1089/aid.2006.22.703] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The known sequences of HIV-1 viruses have been categorized into subtypes based on the phylogenetic partitioning of their env and gag gene sequences. The env gene encodes the protein gp120, which contains five sequence- variable regions (V1 to V5), of which the V3 loop is of central importance to viral infectivity. The V3 loop consensus sequences of HIV-1 subtype A and C viruses are similar, and more similar to one another than the V3 consensus sequences of any other two HIV-1 subtypes. However, using a position-specific statistical comparison, we found that the V3 region of these two subtypes is statistically distinct (p = approximately 0.0). (The p-value calculated to the lowest limit of representation on the computer used to run the calculation. This lowest limit was 10(16). Although theoretically a p-value cannot be equal to 0.0, the p-value for the comparisons in question can be intuitively considered to be extremely small, or approximately 0.0.).
Collapse
Affiliation(s)
| | - ARTHUR NÁDAS
- Department of Pathology, New York University School of Medicine and the New York Veterans Affairs Medical Center, New York, New York 10016
| | - SUSAN ZOLLA-PAZNER
- Department of Pathology, New York University School of Medicine and the New York Veterans Affairs Medical Center, New York, New York 10016
| | - TIMOTHY CARDOZO
- Department of Pharmacology and
- Address reprint requests to: Timothy Cardozo, 550 First Avenue, MSB 497A, New York, New York 10016, E-mail:
| |
Collapse
|
134
|
Ozer N, Haliloglu T, Schiffer CA. Substrate specificity in HIV-1 protease by a biased sequence search method. Proteins 2006; 64:444-56. [PMID: 16741993 DOI: 10.1002/prot.21023] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Drug resistance in HIV-1 protease can also occasionally confer a change in the substrate specificity. Through the use of computational techniques, a relationship can be determined between the substrate sequence and three-dimensional structure of HIV-1 protease, and be utilized to predict substrate specificity. In this study, we introduce a biased sequence search threading (BSST) methodology to analyze the preferences of substrate positions and correlations between them that might also identify which positions within known substrates can likely tolerate sequence variability and which cannot. The potential sequence space was efficiently explored using a low-resolution knowledge-based scoring function. The low-energy substrate sequences generated by the biased search are correlated with the natural substrates. Octameric sequences were predicted using the probabilities of residue positions in the sequences generated by BSST in three ways: considering each position in the substrate independently, considering pairwise interdependency, and considering triple-wise interdependency. The prediction of octameric sequences using the triple-wise conditional probabilities produces the most accurate results, reproducing most of the sequences for five of the nine natural substrates and implying that there is a complex interdependence between the different substrate residue positions. This likely reflects that HIV-1 protease recognizes the overall shape of the substrate more than its specific sequence.
Collapse
Affiliation(s)
- Nevra Ozer
- Polymer Research Center and Chemical Engineering Department, Bogazici University, Bebek, Istanbul, Turkey
| | | | | |
Collapse
|
135
|
Fares MA, Travers SAA. A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics 2006; 173:9-23. [PMID: 16547113 PMCID: PMC1461439 DOI: 10.1534/genetics.105.053249] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Protein evolution depends on intramolecular coevolutionary networks whose complexity is proportional to the underlying functional and structural interactions among sites. Here we present a novel approach that vastly improves the sensitivity of previous methods for detecting coevolution through a weighted comparison of divergence between amino acid sites. The analysis of the HIV-1 Gag protein detected convergent adaptive coevolutionary events responsible for the selective variability emerging between subtypes. Coevolution analysis and functional data for heat-shock proteins, Hsp90 and GroEL, highlight that almost all detected coevolving sites are functionally or structurally important. The results support previous suggestions pinpointing the complex interdomain functional interactions within these proteins and we propose new amino acid sites as important for interdomain functional communication. Three-dimensional information sheds light on the functional and structural constraints governing the coevolution between sites. Our covariation analyses propose two types of coevolving sites in agreement with previous reports: pairs of sites spatially proximal, where compensatory mutations could maintain the local structure stability, and clusters of distant sites located in functional domains, suggesting a functional dependency between them. All sites detected under adaptive evolution in these proteins belong to coevolution groups, further underlining the importance of testing for coevolution in selective constraints analyses.
Collapse
Affiliation(s)
- Mario A Fares
- Molecular Evolution and Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth, Ireland.
| | | |
Collapse
|
136
|
Abstract
Human immunodeficiency viruses (HIV) have exhibited an extraordinary capacity for genetic change, exploring new evolutionary space after each transmission to a new host. This presents a great challenge to the prevention and management of HIV-1 infection. At the same time, the relentless diversification of HIV-1, developing as it does under the constraints imposed by the human immune system and other selective forces, contains within it information useful for understanding HIV epidemiology and pathogenesis. Comparing the sheer mutational potential of HIV with actual data representing viral lineages that can survive selection suggests that HIV does not have unlimited capacity for change. Rather, clinical and bioinformatic data suggest that, even in the most diverse gene of the most highly variable organism, natural selection places severe limits on the portion of amino acid sequence space that ensures viability. This suggests some optimism for those attempting to identify sets of antigens that can generate effective humoral and cellular immune responses against HIV.
Collapse
Affiliation(s)
- J I Mullins
- Departments of Microbiology, University of Washington School of Medicine, Seattle, WA 98195-8070, USA.
| | | |
Collapse
|
137
|
Watabe T, Kishino H, Okuhara Y, Kitazoe Y. Fold recognition of the human immunodeficiency virus type 1 V3 loop and flexibility of its crown structure during the course of adaptation to a host. Genetics 2005; 172:1385-96. [PMID: 16361230 PMCID: PMC1456290 DOI: 10.1534/genetics.105.051508] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The third hypervariable (V3) region of the HIV-1 gp120 protein is responsible for many aspects of viral infectivity. The tertiary structure of the V3 loop seems to influence the coreceptor usage of the virus, which is an important determinant of HIV pathogenesis. Hence, the information about preferred conformations of the V3-loop region and its flexibility could be a crucial tool for understanding the mechanisms of progression from an initial infection to AIDS. Taking into account the uncertainty of the loop structure, we predicted the structural flexibility, diversity, and sequence fitness to the V3-loop structure for each of the sequences serially sampled during an asymptomatic period. Structural diversity correlated with sequence diversity. The predicted crown structure usage implied that structural flexibility depended on the patient and that the antigenic character of the virus might be almost uniform in a patient whose immune system is strong. Furthermore, the predicted structural ensemble suggested that toward the end of the asymptomatic period there was a change in the V3-loop structure or in the environment surrounding the V3 loop, possibly because of its proximity to the gp120 core.
Collapse
Affiliation(s)
- Teruaki Watabe
- Center of Medical Information Science, Kochi University, Japan.
| | | | | | | |
Collapse
|
138
|
Gilbert PB, Novitsky V, Essex M. Covariability of selected amino acid positions for HIV type 1 subtypes C and B. AIDS Res Hum Retroviruses 2005; 21:1016-30. [PMID: 16379605 DOI: 10.1089/aid.2005.21.1016] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We studied covariability of selected amino acid positions in globally dominant HIV-1 subtype C viruses. The analyzed sequences spanned the V3 loop, Gag p17, Gag p24, and five CTL epitope-rich regions in Gag, Nef, and Tat. The corresponding regions in HIV-1 subtype B were also evaluated. The analyses identified a great number of covarying pairs and triples of sites in the HIV-1B V3 loop (173 site pairs, 242 site triples). Several of these interactions were found in the earlier studies [e.g., the V3 loop covariability analyses by Korber et al. (Proc Natl Acad Sci USA 1993;90:7176-7180) and Bickel et al. (AIDS Res Hum Retroviruses 1996;12:1401-1411)] and have known biological significance. However, generally these key covarying sites did not covary in the HIV-1C V3 loop (total 17 covarying site pairs), suggesting that the V3 loop may have subtype differences in functional or structural operating characteristics. Covariability of positions 309 and 312 was observed in the immunodominant region HIV-1C Gag 291-320 but no covariability was found in the corresponding region of HIV-1B, and vice versa for Nef 122-141; these findings may reflect subtype-specific covariability within immunologically relevant regions. Gag p17 exhibited greater covariability and less diversity for HIV-1B than HIV-1C, raising the hypothesis that Gag p17 is highly immunodominant in HIV-1B and is especially important for HIV-1B vaccines. Information on covariability should be better exploited in assessments of HIV-1 diversity and how to surmount it with vaccine design.
Collapse
Affiliation(s)
- Peter B Gilbert
- Department of Biostatistics, University of Washington, and Fred Hutchinson Cancer Research Center, Seattle, Washington 98109, USA.
| | | | | |
Collapse
|
139
|
Martin LC, Gloor GB, Dunn SD, Wahl LM. Using information theory to search for co-evolving residues in proteins. Bioinformatics 2005; 21:4116-24. [PMID: 16159918 DOI: 10.1093/bioinformatics/bti671] [Citation(s) in RCA: 207] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Some functionally important protein residues are easily detected since they correspond to conserved columns in a multiple sequence alignment (MSA). However important residues may also mutate, with compensatory mutations occurring elsewhere in the protein, which serve to preserve or restore functionality. It is difficult to distinguish these co-evolving sites from other non-conserved sites. RESULTS We used Mutual Information (MI) to identify co-evolving positions. Using in silico evolved MSAs, we examined the effects of the number of sequences, the size of amino acid alphabet and the mutation rate on two sources of background MI: finite sample size effects and phylogenetic influence. We then assessed the performance of various normalizations of MI in enhancing detection of co-evolving positions and found that normalization by the pair entropy was optimal. Real protein alignments were analyzed and co-evolving isolated pairs were often found to be in contact with each other. AVAILABILITY All data and program files can be found at http://www.biochem.uwo.ca/cgi-bin/CDD/index.cgi
Collapse
Affiliation(s)
- L C Martin
- Department of Applied Mathematics, University of Western Ontario, London, Canada
| | | | | | | |
Collapse
|
140
|
Abstract
As complete genomes accumulate and the generation of genomic biodiversity proceeds at an accelerating pace, the need to understand the interaction between sequence evolution and protein structure and function rises in prominence. The pattern and pace of substitutions in proteins can provide important clues to functional importance, functional divergence, and adaptive response. Coevolution between amino acid residues and the context dependence of the evolutionary process are often ignored, however, because of their complexity, but they are critical for the accurate interpretation of reconstructed evolutionary events. Because residues interact with one another, and because the effect of substitutions can depend on the structural and physiological environment in which they occur, an accurate science of evolutionary functional genomics and a complete understanding of selection in proteins require a better understanding of how context dependence affects protein evolution. Here, we present new evidence from vertebrate cytochrome oxidase sequences that pairwise coevolutionary interactions between protein residues are highly dependent on tertiary and secondary structure. We also discuss theoretical predictions that impinge on our expectations of how protein residues may interact over long distances because of their shared need to maintain protein stability.
Collapse
Affiliation(s)
- Zhengyuan O Wang
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| | | |
Collapse
|
141
|
Atchley WR, Zhao J, Fernandes AD, Drüke T. Solving the protein sequence metric problem. Proc Natl Acad Sci U S A 2005; 102:6395-400. [PMID: 15851683 PMCID: PMC1088356 DOI: 10.1073/pnas.0408677102] [Citation(s) in RCA: 293] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2004] [Indexed: 11/18/2022] Open
Abstract
Biological sequences are composed of long strings of alphabetic letters rather than arrays of numerical values. Lack of a natural underlying metric for comparing such alphabetic data significantly inhibits sophisticated statistical analyses of sequences, modeling structural and functional aspects of proteins, and related problems. Herein, we use multivariate statistical analyses on almost 500 amino acid attributes to produce a small set of highly interpretable numeric patterns of amino acid variability. These high-dimensional attribute data are summarized by five multidimensional patterns of attribute covariation that reflect polarity, secondary structure, molecular volume, codon diversity, and electrostatic charge. Numerical scores for each amino acid then transform amino acid sequences for statistical analyses. Relationships between transformed data and amino acid substitution matrices show significant associations for polarity and codon diversity scores. Transformed alphabetic data are used in analysis of variance and discriminant analysis to study DNA binding in the basic helix-loop-helix proteins. The transformed scores offer a general solution for analyzing a wide variety of sequence analysis problems.
Collapse
Affiliation(s)
- William R Atchley
- Department of Genetics, Graduate Program in Biomathematics, and Center for Computational Biology, North Carolina State University, Raleigh, NC 27695-7614, USA.
| | | | | | | |
Collapse
|
142
|
Buck MJ, Atchley WR. Networks of coevolving sites in structural and functional domains of serpin proteins. Mol Biol Evol 2005; 22:1627-34. [PMID: 15858204 DOI: 10.1093/molbev/msi157] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Amino acids do not occur randomly in proteins; rather, their occurrence at any given site is strongly influenced by the amino acid composition at other sites, the structural and functional aspects of the region of the protein in which they occur, and the evolutionary history of the protein. The goal of our research study is to identify networks of coevolving sites within the serpin proteins (serine protease inhibitors) and classify them as being caused by structural-functional constraints or by evolutionary history. To address this, a matrix of pairwise normalized mutual information (NMI) values was computed among amino acid sites for the serpin proteins. The NMI matrix was partitioned into orthogonal patterns of amino acid variability by factor analysis. Each common factor pattern was interpreted as having phylogenetic and/or structural-functional explanations. In addition, we used a bootstrap factor analysis technique to limit the effects of phylogenetic history on our factor patterns. Our results show an extensive network of correlations among amino acid sites in key functional regions (reactive center loop, shutter, and breach). Additionally, we have discovered long-range coevolution for packed amino acids within the serpin protein core. Lastly, we have discovered a group of serpin sites which coevolve in the hydrophobic core region (s5B and s4B) and appear to represent sites important for formation of the "native" instead of the "latent" serpin structure. This research provides a better understanding on how protein structure evolves; in particular, it elucidates the selective forces creating coevolution among protein sites.
Collapse
Affiliation(s)
- Michael J Buck
- Department of Genetics and The Center for Computational Biology, North Carolina State University, USA.
| | | |
Collapse
|
143
|
Pang PS, Jankowsky E, Wadley LM, Pyle AM. Prediction of functional tertiary interactions and intermolecular interfaces from primary sequence data. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2005; 304:50-63. [PMID: 15595717 DOI: 10.1002/jez.b.21024] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Given the availability of sequence information for many species, one can examine how the sequence of a gene varies among different organisms. This is accomplished by aligning the sequences and observing patterns of conservation, mutation and counter-mutation at different positions in the gene. Imbedded in these patterns is information on energetic coupling and macromolecular interactions, which can be deciphered by application of statistical algorithms. Here we report a robust approach for predicting interactions within (or between) any type of biopolymer, including proteins, RNAs and RNA-protein complexes. Rather than maximize the number of predictions, this approach is designed to detect a limited number of highly significant interactions, thereby providing accurate results from alignments that contain a modest number of sequences (20-60). The versatility and accuracy of the algorithm is demonstrated by the successful prediction of important intramolecular interactions within RNAs, modified RNAs, and proteins, as well as the prediction of RNA-protein and protein-protein interactions.
Collapse
Affiliation(s)
- Phillip S Pang
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10027, USA
| | | | | | | |
Collapse
|
144
|
Olshen A, Cosman P, Rodrigo A, Bickel P, Olshen R. Vector quantization of amino acids: Analysis of the HIV V3 loop region. J Stat Plan Inference 2005. [DOI: 10.1016/j.jspi.2003.10.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
145
|
Yamaguchi-Kabata Y, Yamashita M, Ohkura S, Hayami M, Miura T. Linkage of amino acid variation and evolution of human immunodeficiency virus type 1 gp120 envelope glycoprotein (subtype B) with usage of the second receptor. J Mol Evol 2004; 58:333-40. [PMID: 15045488 DOI: 10.1007/s00239-003-2555-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2002] [Accepted: 10/07/2003] [Indexed: 10/26/2022]
Abstract
To clarify the relationship between the amino acid variations of the gp120 of human immunodeficiency virus type 1 (HIV-1) and the chemokine receptors that are used as the second receptor for HIV, we evaluated amino acid site variation of gp120 between the X4 strains (use CXCR4) and the R5 strains (use CCR5) from 21 sequences of subtype B. Our analysis showed that residues 306 and 322 in the V3 loop and residue 440 in the C4 region were associated with usage of the second receptor. The polymorphism at residue 440 is clearly associated with the usage of the second receptor: The amino acid at position 440 was a basic amino acid in the R5 strains, and a nonbasic and smaller amino acid in the X4 strains, while the V3 loop of the X4 strains was more basic than that of the R5 strains. This suggests that residue 440 in the C4 region, which is close to the V3 loop in the three-dimensional structure, is critical in determining which second receptor is used. Analysis of codon frequency suggests that, in almost all cases, the difference at residue 440 between basic amino acids in the R5 strains and nonbasic amino acids in the X4 strains could be due to a single nucleotide change. These findings predict that the evolutionary changes in amino acid residue 440 may be correlated with evolutionary changes in the V3 loop. One possibility is that a change in electric charge at residue 440 compensates for a change in electric charge in the V3 loop. The amino acid polymorphism at position 440 can be useful to predict the cell tropism of a strain of HIV-1 subtype B.
Collapse
Affiliation(s)
- Yumi Yamaguchi-Kabata
- Integrated Database Group, Biological Information Research Center, National Institute of Advanced Industrial Science and Technology, 2-41-6 Aomi, Koto-ku, Tokyo 135-0064, Japan.
| | | | | | | | | |
Collapse
|
146
|
Daub CO, Steuer R, Selbig J, Kloska S. Estimating mutual information using B-spline functions--an improved similarity measure for analysing gene expression data. BMC Bioinformatics 2004; 5:118. [PMID: 15339346 PMCID: PMC516800 DOI: 10.1186/1471-2105-5-118] [Citation(s) in RCA: 194] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2003] [Accepted: 08/31/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The information theoretic concept of mutual information provides a general framework to evaluate dependencies between variables. In the context of the clustering of genes with similar patterns of expression it has been suggested as a general quantity of similarity to extend commonly used linear measures. Since mutual information is defined in terms of discrete variables, its application to continuous data requires the use of binning procedures, which can lead to significant numerical errors for datasets of small or moderate size. RESULTS In this work, we propose a method for the numerical estimation of mutual information from continuous data. We investigate the characteristic properties arising from the application of our algorithm and show that our approach outperforms commonly used algorithms: The significance, as a measure of the power of distinction from random correlation, is significantly increased. This concept is subsequently illustrated on two large-scale gene expression datasets and the results are compared to those obtained using other similarity measures.A C++ source code of our algorithm is available for non-commercial use from kloska@scienion.de upon request. CONCLUSION The utilisation of mutual information as similarity measure enables the detection of non-linear correlations in gene expression datasets. Frequently applied linear correlation measures, which are often used on an ad-hoc basis without further justification, are thereby extended.
Collapse
Affiliation(s)
- Carsten O Daub
- Max Planck Institute of Molecular Plant Physiology, Potsdam, 14424, Germany
- Center for Genomics and Bioinformatics, Karolinska Institutet, Stockholm, 17177, Sweden
| | - Ralf Steuer
- Nonlinear Dynamics Group, Institute of Physics, University of Potsdam, Potsdam, 14415, Germany
| | - Joachim Selbig
- Max Planck Institute of Molecular Plant Physiology, Potsdam, 14424, Germany
| | - Sebastian Kloska
- Max Planck Institute of Molecular Plant Physiology, Potsdam, 14424, Germany
- Scienion AG, Volmerstrasse 7a, Berlin, 12489, Germany
| |
Collapse
|
147
|
|
148
|
Abstract
We have examined patterns of sequence variability for evidence of linked sequence changes in HIV-1 subtype B protease using translated sequences from protease inhibitor (PI) treated and untreated subjects downloaded from the Stanford HIV RT and Protease Sequence Database (http://hivdb.stanford.edu). The final data set size was 648 sequences from untreated subjects (notx) and 531 for PI-treated subjects (tx). Each subject was uniquely represented by a single sequence. Mutual information was calculated for all pairwise comparisons of positions with nonconsensus amino acids in at least 5% of sequences; significance of pairwise association was assessed using permutation tests. In addition pairs of positions were assessed for linkage by comparing the observed occurrences of amino acid combinations to expected values. The mutual information statistic indicated linkage between nine pairs of sites in the untreated data set (10:93, 12:19, 35:38, 37:41, 62:71, 63:64, 71:77, 71:93, 77:93). Strong statistical support for linkage in the treated data set was seen for 32 pairs, eight involving position 10:7 involving position 71, with the rest being 12:19, 15:77, 20:36, 30:88, 35:36, 35:37, 36:62, 36:77, 46:82, 46:84, 48:54, 48:82, 54:82, 63:64, 63:90, 73:90, 77:93, and 84:90. Most associations were positive, although negative associations were seen for five pairs of interactions. Structural proximity suggests that numerous pairs may interact within a local environment. These interactions include two distinct clusters around 36/77 and 71/93. While some of these interactions may reflect fortuitous linkage in heavily treated subjects with many resistance mutations, others will likely represent important cooperative interactions that are amenable to experimental validation.
Collapse
Affiliation(s)
- Noah G Hoffman
- UNC Center for AIDS Research, University of North Carolina, Chapel Hill, NC 27599-7295, USA
| | | | | |
Collapse
|
149
|
Date SV, Marcotte EM. Discovery of uncharacterized cellular systems by genome-wide analysis of functional linkages. Nat Biotechnol 2003; 21:1055-62. [PMID: 12923548 DOI: 10.1038/nbt861] [Citation(s) in RCA: 150] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2003] [Accepted: 06/24/2003] [Indexed: 11/08/2022]
Abstract
We introduce a general computational method, applicable on a genome-wide scale, for the systematic discovery of uncharacterized cellular systems. Quantitative analysis of the coinheritance of pairs of genes among different organisms, calculated using phylogenetic profiles, allows the prediction of thousands of functional linkages between the corresponding proteins. A comparison of these functional linkages to known pathways reveals that calculated linkages are comparable in accuracy to genome-wide yeast two-hybrid screens or mass spectrometry interaction assays. In aggregate, these linkages describe the structure of large-scale networks, with the resulting yeast network composed of 3,875 linkages among 804 proteins, and the resulting pathogenic Escherichia coli network composed of 2,043 linkages among 828 proteins. The search of such networks for groups of uncharacterized, linked proteins led to the identification of 27 novel cellular systems from one nonpathogenic and three pathogenic bacterial genomes.
Collapse
Affiliation(s)
- Shailesh V Date
- Center for Computational Biology and Bioinformatics, Institute for Cellular and Molecular Biology, 1 University Station A4800, Austin, Texas 78712-1064, USA
| | | |
Collapse
|
150
|
Upadhya SC, Hegde AN. A potential proteasome-interacting motif within the ubiquitin-like domain of parkin and other proteins. Trends Biochem Sci 2003; 28:280-3. [PMID: 12826399 DOI: 10.1016/s0968-0004(03)00092-6] [Citation(s) in RCA: 50] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Parkin and other unrelated proteins contain a ubiquitin-like domain (UbLD). This article describes a motif that might be important in the interaction of UbLD-containing proteins (UbLPs) with the proteasome. The proteasome-interacting motif, which is conserved in a subset of UbLPs, such as parkin, Rad23 and several transcription factors, is likely to enable the UbLPs to form a complex with the proteasome for proteolysis or the recently discovered non-proteolytic functions of the proteasome.
Collapse
Affiliation(s)
- Sudarshan C Upadhya
- Department of Neurobiology and Anatomy, Wake Forest University Health Sciences, Medical Center Boulevard, Winston-Salem, NC 27157, USA
| | | |
Collapse
|