1
|
O'Neil PT, Swint‐Kruse L, Fenton AW. Rheostatic contributions to protein stability can obscure a position's functional role. Protein Sci 2024; 33:e5075. [PMID: 38895978 PMCID: PMC11187868 DOI: 10.1002/pro.5075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Revised: 05/24/2024] [Accepted: 05/27/2024] [Indexed: 06/21/2024]
Abstract
Rheostat positions, which can be substituted with various amino acids to tune protein function across a range of outcomes, are a developing area for advancing personalized medicine and bioengineering. Current methods cannot accurately predict which proteins contain rheostat positions or their substitution outcomes. To compare the prevalence of rheostat positions in homologs, we previously investigated their occurrence in two pyruvate kinase (PYK) isozymes. Human liver PYK contained numerous rheostat positions that tuned the apparent affinity for the substrate phosphoenolpyruvate (Kapp-PEP) across a wide range. In contrast, no functional rheostat positions were identified in Zymomonas mobilis PYK (ZmPYK). Further, the set of ZmPYK substitutions included an unusually large number that lacked measurable activity. We hypothesized that the inactive substitution variants had reduced protein stability, precluding detection of Kapp-PEP tuning. Using modified buffers, robust enzymatic activity was obtained for 19 previously-inactive ZmPYK substitution variants at three positions. Surprisingly, both previously-inactive and previously-active substitution variants all had Kapp-PEP values close to wild-type. Thus, none of the three positions were functional rheostat positions, and, unlike human liver PYK, ZmPYK's Kapp-PEP remained poorly tunable by single substitutions. To directly assess effects on stability, we performed thermal denaturation experiments for all ZmPYK substitution variants. Many diminished stability, two enhanced stability, and the three positions showed different thermal sensitivity to substitution, with one position acting as a "stability rheostat." The differences between the two PYK homologs raises interesting questions about the underlying mechanism(s) that permit functional tuning by single substitutions in some proteins but not in others.
Collapse
Affiliation(s)
- Pierce T. O'Neil
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansasUSA
| | - Liskin Swint‐Kruse
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansasUSA
| | - Aron W. Fenton
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansasUSA
| |
Collapse
|
2
|
Swint-Kruse L, Dougherty LL, Page B, Wu T, O’Neil PT, Prasannan CB, Timmons C, Tang Q, Parente DJ, Sreenivasan S, Holyoak T, Fenton AW. PYK-SubstitutionOME: an integrated database containing allosteric coupling, ligand affinity and mutational, structural, pathological, bioinformatic and computational information about pyruvate kinase isozymes. Database (Oxford) 2023; 2023:baad030. [PMID: 37171062 PMCID: PMC10176505 DOI: 10.1093/database/baad030] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 03/29/2023] [Accepted: 04/11/2023] [Indexed: 05/13/2023]
Abstract
Interpreting changes in patient genomes, understanding how viruses evolve and engineering novel protein function all depend on accurately predicting the functional outcomes that arise from amino acid substitutions. To that end, the development of first-generation prediction algorithms was guided by historic experimental datasets. However, these datasets were heavily biased toward substitutions at positions that have not changed much throughout evolution (i.e. conserved). Although newer datasets include substitutions at positions that span a range of evolutionary conservation scores, these data are largely derived from assays that agglomerate multiple aspects of function. To facilitate predictions from the foundational chemical properties of proteins, large substitution databases with biochemical characterizations of function are needed. We report here a database derived from mutational, biochemical, bioinformatic, structural, pathological and computational studies of a highly studied protein family-pyruvate kinase (PYK). A centerpiece of this database is the biochemical characterization-including quantitative evaluation of allosteric regulation-of the changes that accompany substitutions at positions that sample the full conservation range observed in the PYK family. We have used these data to facilitate critical advances in the foundational studies of allosteric regulation and protein evolution and as rigorous benchmarks for testing protein predictions. We trust that the collected dataset will be useful for the broader scientific community in the further development of prediction algorithms. Database URL https://github.com/djparente/PYK-DB.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Larissa L Dougherty
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Braelyn Page
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Tiffany Wu
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Pierce T O’Neil
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Charulata B Prasannan
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Cody Timmons
- Chemistry Department, Southwestern Oklahoma State University, 100 Campus Dr., Weatherford, OK 73096, USA
| | - Qingling Tang
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Daniel J Parente
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
- Department of Family Medicine and Community Health, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Shwetha Sreenivasan
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| | - Todd Holyoak
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
- Department of Biology, University of Waterloo, 200 University Ave. W, Waterloo, ON N2L 3G1, Canada
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS 66160, USA
| |
Collapse
|
3
|
Pan-cancer functional analysis of somatic mutations in G protein-coupled receptors. Sci Rep 2022; 12:21534. [PMID: 36513718 PMCID: PMC9747925 DOI: 10.1038/s41598-022-25323-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2022] [Accepted: 11/28/2022] [Indexed: 12/15/2022] Open
Abstract
G Protein-coupled receptors (GPCRs) are the most frequently exploited drug target family, moreover they are often found mutated in cancer. Here we used a dataset of mutations found in patient samples derived from the Genomic Data Commons and compared it to the natural human variance as exemplified by data from the 1000 genomes project. We explored cancer-related mutation patterns in all GPCR classes combined and individually. While the location of the mutations across the protein domains did not differ significantly in the two datasets, a mutation enrichment in cancer patients was observed among class-specific conserved motifs in GPCRs such as the Class A "DRY" motif. A Two-Entropy Analysis confirmed the correlation between residue conservation and cancer-related mutation frequency. We subsequently created a ranking of high scoring GPCRs, using a multi-objective approach (Pareto Front Ranking). Our approach was confirmed by re-discovery of established cancer targets such as the LPA and mGlu receptor families, but also discovered novel GPCRs which had not been linked to cancer before such as the P2Y Receptor 10 (P2RY10). Overall, this study presents a list of GPCRs that are amenable to experimental follow up to elucidate their role in cancer.
Collapse
|
4
|
Fenton KD, Meneely KM, Wu T, Martin TA, Swint‐Kruse L, Fenton AW, Lamb AL. Substitutions at a rheostat position in human aldolase A cause a shift in the conformational population. Protein Sci 2022; 31:357-370. [PMID: 34734672 PMCID: PMC8819835 DOI: 10.1002/pro.4222] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 10/28/2021] [Accepted: 10/28/2021] [Indexed: 02/03/2023]
Abstract
Some protein positions play special roles in determining the magnitude of protein function: at such "rheostat" positions, varied amino acid substitutions give rise to a continuum of functional outcomes, from wild type (or enhanced), to intermediate, to loss of function. This observed range raises interesting questions about the biophysical bases by which changes at single positions have such varied outcomes. Here, we assessed variants at position 98 in human aldolase A ("I98X"). Despite being ~17 Å from the active site and far from subunit interfaces, substitutions at position 98 have rheostatic contributions to the apparent cooperativity (nH ) associated with fructose-1,6-bisphosphate substrate binding and moderately affected binding affinity. Next, we crystallized representative I98X variants to assess structural consequences. Residues smaller than the native isoleucine (cysteine and serine) were readily accommodated, and the larger phenylalanine caused only a slight separation of the two parallel helixes. However, the diffraction quality was reduced for I98F, and further reduced for I98Y. Intriguingly, the resolutions of the I98X structures correlated with their nH values. We propose that substitution effects on both nH and crystal lattice disruption arise from changes in the population of aldolase A conformations in solution. In combination with results computed for rheostat positions in other proteins, the results from this study suggest that rheostat positions accommodate a wide range of side chains and that structural consequences manifest as shifted ensemble populations and/or dynamics changes.
Collapse
Affiliation(s)
- Kathryn D. Fenton
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| | - Kathleen M. Meneely
- Department of ChemistryUniversity of Texas at San AntonioSan AntonioTexasUSA
| | - Tiffany Wu
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| | - Tyler A. Martin
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| | - Liskin Swint‐Kruse
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| | - Aron W. Fenton
- Department of Biochemistry and Molecular BiologyThe University of Kansas Medical CenterKansas CityKansasUSA
| | - Audrey L. Lamb
- Department of ChemistryUniversity of Texas at San AntonioSan AntonioTexasUSA
| |
Collapse
|
5
|
Gu J, Isozumi N, Yuan S, Jin L, Gao B, Ohki S, Zhu S. Evolution-Based Protein Engineering for Antifungal Peptide Improvement. Mol Biol Evol 2021; 38:5175-5189. [PMID: 34320203 PMCID: PMC8557468 DOI: 10.1093/molbev/msab224] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Antimicrobial peptides (AMPs) have been considered as the alternatives to antibiotics because of their less susceptibility to microbial resistance. However, compared with conventional antibiotics they show relatively low activity and the consequent high cost and nonspecific cytotoxicity, hindering their clinical application. What’s more, engineering of AMPs is a great challenge due to the inherent complexity in their sequence, structure, and function relationships. Here, we report an evolution-based strategy for improving the antifungal activity of a nematode-sourced defensin (Cremycin-5). This strategy utilizes a sequence-activity comparison between Cremycin-5 and its functionally diverged paralogs to identify sites associated with antifungal activity for screening of enhanceable activity-modulating sites for subsequent saturation mutagenesis. Using this strategy, we identified a site (Glu-15) whose mutations with nearly all other types of amino acids resulted in a universally enhanced activity against multiple fungal species, which is thereby defined as a Universally Enhanceable Activity-Modulating Site (UEAMS). Especially, Glu15Lys even exhibited >9-fold increased fungicidal potency against several clinical isolates of Candida albicans through inhibiting cytokinesis. This mutant showed high thermal and serum stability and quicker killing kinetics than clotrimazole without detectable hemolysis. Molecular dynamic simulations suggest that the mutations at the UEAMS likely limit the conformational flexibility of a distant functional residue via allostery, enabling a better peptide–fungus interaction. Further sequence, structural, and mutational analyses of the Cremycin-5 ortholog uncover an epistatic interaction between the UEAMS and another site that may constrain its evolution. Our work lights one new road to success of engineering AMP drug leads.
Collapse
Affiliation(s)
- Jing Gu
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Noriyoshi Isozumi
- Center for Nano Materials and Technology (CNMT), Japan Advanced Institute of Science and Technology (JAIST), 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan
| | - Shouli Yuan
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Ling Jin
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Bin Gao
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
| | - Shinya Ohki
- Center for Nano Materials and Technology (CNMT), Japan Advanced Institute of Science and Technology (JAIST), 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan
| | - Shunyi Zhu
- Group of Peptide Biology and Evolution, State Key Laboratory of Integrated Management of Pest Insects and Rodents, Institute of Zoology, Chinese Academy of Sciences, 1 Beichen West Road, Chaoyang District, Beijing 100101, China
| |
Collapse
|
6
|
Swint-Kruse L, Martin TA, Page BM, Wu T, Gerhart PM, Dougherty LL, Tang Q, Parente DJ, Mosier BR, Bantis LE, Fenton AW. Rheostat functional outcomes occur when substitutions are introduced at nonconserved positions that diverge with speciation. Protein Sci 2021; 30:1833-1853. [PMID: 34076313 DOI: 10.1002/pro.4136] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 05/25/2021] [Accepted: 05/28/2021] [Indexed: 12/14/2022]
Abstract
When amino acids vary during evolution, the outcome can be functionally neutral or biologically-important. We previously found that substituting a subset of nonconserved positions, "rheostat" positions, can have surprising effects on protein function. Since changes at rheostat positions can facilitate functional evolution or cause disease, more examples are needed to understand their unique biophysical characteristics. Here, we explored whether "phylogenetic" patterns of change in multiple sequence alignments (such as positions with subfamily specific conservation) predict the locations of functional rheostat positions. To that end, we experimentally tested eight phylogenetic positions in human liver pyruvate kinase (hLPYK), using 10-15 substitutions per position and biochemical assays that yielded five functional parameters. Five positions were strongly rheostatic and three were non-neutral. To test the corollary that positions with low phylogenetic scores were not rheostat positions, we combined these phylogenetic positions with previously-identified hLPYK rheostat, "toggle" (most substitution abolished function), and "neutral" (all substitutions were like wild-type) positions. Despite representing 428 variants, this set of 33 positions was poorly statistically powered. Thus, we turned to the in vivo phenotypic dataset for E. coli lactose repressor protein (LacI), which comprised 12-13 substitutions at 329 positions and could be used to identify rheostat, toggle, and neutral positions. Combined hLPYK and LacI results show that positions with strong phylogenetic patterns of change are more likely to exhibit rheostat substitution outcomes than neutral or toggle outcomes. Furthermore, phylogenetic patterns were more successful at identifying rheostat positions than were co-evolutionary or eigenvector centrality measures of evolutionary change.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tyler A Martin
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Braelyn M Page
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tiffany Wu
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Paige M Gerhart
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Larissa L Dougherty
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA.,Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth College, Hanover, New Hampshire, USA
| | - Qingling Tang
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Daniel J Parente
- Department of Family Medicine and Community Health, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Brian R Mosier
- Department of Biostatistics and Data Science, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Leonidas E Bantis
- Department of Biostatistics and Data Science, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
7
|
Ruggiero MJ, Malhotra S, Fenton AW, Swint-Kruse L, Karanicolas J, Hagenbuch B. A clinically relevant polymorphism in the Na +/taurocholate cotransporting polypeptide (NTCP) occurs at a rheostat position. J Biol Chem 2020; 296:100047. [PMID: 33168628 PMCID: PMC7948949 DOI: 10.1074/jbc.ra120.014889] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2020] [Revised: 10/22/2020] [Accepted: 11/09/2020] [Indexed: 12/28/2022] Open
Abstract
Conventionally, most amino acid substitutions at “important” protein positions are expected to abolish function. However, in several soluble-globular proteins, we identified a class of nonconserved positions for which various substitutions produced progressive functional changes; we consider these evolutionary “rheostats”. Here, we report a strong rheostat position in the integral membrane protein, Na+/taurocholate (TCA) cotransporting polypeptide, at the site of a pharmacologically relevant polymorphism (S267F). Functional studies were performed for all 20 substitutions (S267X) with three substrates (TCA, estrone-3-sulfate, and rosuvastatin). The S267X set showed strong rheostatic effects on overall transport, and individual substitutions showed varied effects on transport kinetics (Km and Vmax) and substrate specificity. To assess protein stability, we measured surface expression and used the Rosetta software (https://www.rosettacommons.org) suite to model structure and stability changes of S267X. Although buried near the substrate-binding site, S267X substitutions were easily accommodated in the Na+/TCA cotransporting polypeptide structure model. Across the modest range of changes, calculated stabilities correlated with surface-expression differences, but neither parameter correlated with altered transport. Thus, substitutions at rheostat position 267 had wide-ranging effects on the phenotype of this integral membrane protein. We further propose that polymorphic positions in other proteins might be locations of rheostat positions.
Collapse
Affiliation(s)
- Melissa J Ruggiero
- Department of Pharmacology, Toxicology and Therapeutics, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Shipra Malhotra
- Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, Pennsylvania, USA; Center for Computational Biology, University of Kansas, Lawrence, Kansas, USA
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - John Karanicolas
- Program in Molecular Therapeutics, Fox Chase Cancer Center, Philadelphia, Pennsylvania, USA
| | - Bruno Hagenbuch
- Department of Pharmacology, Toxicology and Therapeutics, The University of Kansas Medical Center, Kansas City, Kansas, USA.
| |
Collapse
|
8
|
Fenton AW, Page BM, Spellman-Kruse A, Hagenbuch B, Swint-Kruse L. Rheostat positions: A new classification of protein positions relevant to pharmacogenomics. Med Chem Res 2020; 29:1133-1146. [PMID: 32641900 PMCID: PMC7276102 DOI: 10.1007/s00044-020-02582-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Accepted: 05/30/2020] [Indexed: 01/03/2023]
Abstract
To achieve the full potential of pharmacogenomics, one must accurately predict the functional out comes that arise from amino acid substitutions in proteins. Classically, researchers have focused on understanding the consequences of individual substitutions. However, literature surveys have shown that most substitutions were created at evolutionarily conserved positions. Awareness of this bias leads to a shift in perspective, from considering the outcomes of individual substitutions to understanding the roles of individual protein positions. Conserved positions tend to act as "toggle" switches, with most substitutions abolishing function. However, nonconserved positions have been found equally capable of affecting protein function. Indeed, many nonconserved positions act like functional dimmer switches ("rheostat" positions): This is revealed when multiple substitutions are made at a single position. Each substitution has a different functional outcome; the set of substitutions spans arange of outcomes. Finally, some nonconserved positions appear neutral, capable of accommodating all amino acid types without modifying function. This manuscript reviews the currently-known properties of rheost at positions, with examples shown for pyruvate kinase, organic anion transporting polypeptide 1B1, the beta-lactamase inhibitory protein, and angiotensin-converting enzyme 2. Outcomes observed for rheostat positions have implications for the rational design of drug analogs and allosteric drugs. Furthermore, this new framework - comprising three types of protein positions - provides a new approach to interpreting disease and population-based databases of amino acid changes. In conclusion, although a full understanding of substitution out comes at rheostat positions poses a challenge, utilization of this new frame of reference will further advance the application of pharmacogenomics.
Collapse
Affiliation(s)
- Aron W. Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, KS 66160 USA
| | - Braelyn M. Page
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, KS 66160 USA
| | | | - Bruno Hagenbuch
- Department of Pharmacology, Toxicology, and Therapeutics, The University of Kansas Medical Center, Kansas City, KS 66160 USA
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, KS 66160 USA
| |
Collapse
|
9
|
Martin TA, Wu T, Tang Q, Dougherty LL, Parente DJ, Swint-Kruse L, Fenton AW. Identification of biochemically neutral positions in liver pyruvate kinase. Proteins 2020; 88:1340-1350. [PMID: 32449829 DOI: 10.1002/prot.25953] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2019] [Revised: 03/10/2020] [Accepted: 05/16/2020] [Indexed: 01/08/2023]
Abstract
Understanding how each residue position contributes to protein function has been a long-standing goal in protein science. Substitution studies have historically focused on conserved protein positions. However, substitutions of nonconserved positions can also modify function. Indeed, we recently identified nonconserved positions that have large substitution effects in human liver pyruvate kinase (hLPYK), including altered allosteric coupling. To facilitate a comparison of which characteristics determine when a nonconserved position does vs does not contribute to function, the goal of the current work was to identify neutral positions in hLPYK. However, existing hLPYK data showed that three features commonly associated with neutral positions-high sequence entropy, high surface exposure, and alanine scanning-lacked the sensitivity needed to guide experimental studies. We used multiple evolutionary patterns identified in a sequence alignment of the PYK family to identify which positions were least patterned, reasoning that these were most likely to be neutral. Nine positions were tested with a total of 117 amino acid substitutions. Although exploring all potential functions is not feasible for any protein, five parameters associated with substrate/effector affinities and allosteric coupling were measured for hLPYK variants. For each position, the aggregate functional outcomes of all variants were used to quantify a "neutrality" score. Three positions showed perfect neutral scores for all five parameters. Furthermore, the nine positions showed larger neutral scores than 17 positions located near allosteric binding sites. Thus, our strategy successfully enriched the dataset for positions with neutral and modest substitutions.
Collapse
Affiliation(s)
- Tyler A Martin
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Tiffany Wu
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Qingling Tang
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Larissa L Dougherty
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Daniel J Parente
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA.,Department of Family and Community Medicine, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Aron W Fenton
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, USA
| |
Collapse
|
10
|
Garrido-Martín D, Pazos F. Effect of the sequence data deluge on the performance of methods for detecting protein functional residues. BMC Bioinformatics 2018; 19:67. [PMID: 29482506 PMCID: PMC5827975 DOI: 10.1186/s12859-018-2084-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Accepted: 02/21/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The exponential accumulation of new sequences in public databases is expected to improve the performance of all the approaches for predicting protein structural and functional features. Nevertheless, this was never assessed or quantified for some widely used methodologies, such as those aimed at detecting functional sites and functional subfamilies in protein multiple sequence alignments. Using raw protein sequences as only input, these approaches can detect fully conserved positions, as well as those with a family-dependent conservation pattern. Both types of residues are routinely used as predictors of functional sites and, consequently, understanding how the sequence content of the databases affects them is relevant and timely. RESULTS In this work we evaluate how the growth and change with time in the content of sequence databases affect five sequence-based approaches for detecting functional sites and subfamilies. We do that by recreating historical versions of the multiple sequence alignments that would have been obtained in the past based on the database contents at different time points, covering a period of 20 years. Applying the methods to these historical alignments allows quantifying the temporal variation in their performance. Our results show that the number of families to which these methods can be applied sharply increases with time, while their ability to detect potentially functional residues remains almost constant. CONCLUSIONS These results are informative for the methods' developers and final users, and may have implications in the design of new sequencing initiatives.
Collapse
Affiliation(s)
- Diego Garrido-Martín
- Present address: Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, c/ Dr. Aiguader, 88, 08003, Barcelona, Spain.,Present address: Universitat Pompeu Fabra (UPF), Plaça de la Mercè, 10-12, 08002, Barcelona, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), c/ Darwin, 3, 28049, Madrid, Spain.
| |
Collapse
|
11
|
Swint-Kruse L. Using Evolution to Guide Protein Engineering: The Devil IS in the Details. Biophys J 2017; 111:10-8. [PMID: 27410729 DOI: 10.1016/j.bpj.2016.05.030] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Revised: 04/18/2016] [Accepted: 05/20/2016] [Indexed: 10/21/2022] Open
Abstract
For decades, protein engineers have endeavored to reengineer existing proteins for novel applications. Overall, protein folds and gross functions can be readily transferred from one protein to another by transplanting large blocks of sequence (i.e., domain recombination). However, predictably fine-tuning function (e.g., by adjusting ligand affinity, specificity, catalysis, and/or allosteric regulation) remains a challenge. One approach has been to use the sequences of protein families to identify amino acid positions that change during the evolution of functional variation. The rationale is that these nonconserved positions could be mutated to predictably fine-tune function. Evolutionary approaches to protein design have had some success, but the engineered proteins seldom replicate the functional performances of natural proteins. This Biophysical Perspective reviews several complexities that have been revealed by evolutionary and experimental studies of protein function. These include 1) challenges in defining computational and biological thresholds that define important amino acids; 2) the co-occurrence of many different patterns of amino acid changes in evolutionary data; 3) difficulties in mapping the patterns of amino acid changes to discrete functional parameters; 4) the nonconventional mutational outcomes that occur for a particular group of functionally important, nonconserved positions; 5) epistasis (nonadditivity) among multiple mutations; and 6) the fact that a large fraction of a protein's amino acids contribute to its overall function. To overcome these challenges, new goals are identified for future studies.
Collapse
Affiliation(s)
- Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas.
| |
Collapse
|
12
|
Sloutsky R, Naegle KM. High-Resolution Identification of Specificity Determining Positions in the LacI Protein Family Using Ensembles of Sub-Sampled Alignments. PLoS One 2016; 11:e0162579. [PMID: 27681038 PMCID: PMC5040260 DOI: 10.1371/journal.pone.0162579] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2016] [Accepted: 08/08/2016] [Indexed: 01/24/2023] Open
Abstract
Since the advent of large-scale genomic sequencing, and the consequent availability of large numbers of homologous protein sequences, there has been burgeoning development of methods for extracting functional information from multiple sequence alignments (MSAs). One type of analysis seeks to identify specificity determining positions (SDPs) based on the assumption that such positions are highly conserved within groups of sequences sharing functional specificity, but conserved to different amino acids in different specificity groups. This unsupervised approach to utilizing evolutionary information may elucidate mechanisms of specificity in protein-protein interactions, catalytic activity of enzymes, sensitivity to allosteric regulation, and other types of protein functionality. We present an analysis of SDPs in the LacI family of transcriptional regulators in which we 1) relax the constraint that all specificity groups must contribute to SDP signal, and 2) use a novel approach to robust treatment of sequence alignment uncertainty based on sub-sampling. We find that the vast majority of SDP signal occurs at positions with a conservation pattern that significantly complicates detection by previously described methods. This pattern, which we term “partial SDP”, consists of the commonly accepted SDP conservation pattern among a subset of specificity groups and strong degeneracy among the rest. An upshot of this fact is that the SDP complement of every specificity group appears to be unique. Additionally, sub-sampling gives us the ability to assign a confidence interval to the SDP score, as well as increase fidelity, as compared to analysis of a single, comprehensive alignment—the current standard in multiple sequence alignment methodologies.
Collapse
Affiliation(s)
- Roman Sloutsky
- Biomedical Engineering Department, Washington University in St. Louis, St. Louis, Missouri, 63130, United States of America
- Center for Biological Systems Engineering, Washington University in St. Louis, St. Louis, Missouri, 63130, United States of America
| | - Kristen M. Naegle
- Biomedical Engineering Department, Washington University in St. Louis, St. Louis, Missouri, 63130, United States of America
- Center for Biological Systems Engineering, Washington University in St. Louis, St. Louis, Missouri, 63130, United States of America
- * E-mail:
| |
Collapse
|
13
|
Parente DJ, Ray JCJ, Swint-Kruse L. Amino acid positions subject to multiple coevolutionary constraints can be robustly identified by their eigenvector network centrality scores. Proteins 2015; 83:2293-306. [PMID: 26503808 DOI: 10.1002/prot.24948] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Revised: 09/21/2015] [Accepted: 10/14/2015] [Indexed: 12/21/2022]
Abstract
As proteins evolve, amino acid positions key to protein structure or function are subject to mutational constraints. These positions can be detected by analyzing sequence families for amino acid conservation or for coevolution between pairs of positions. Coevolutionary scores are usually rank-ordered and thresholded to reveal the top pairwise scores, but they also can be treated as weighted networks. Here, we used network analyses to bypass a major complication of coevolution studies: For a given sequence alignment, alternative algorithms usually identify different, top pairwise scores. We reconciled results from five commonly-used, mathematically divergent algorithms (ELSC, McBASC, OMES, SCA, and ZNMI), using the LacI/GalR and 1,6-bisphosphate aldolase protein families as models. Calculations used unthresholded coevolution scores from which column-specific properties such as sequence entropy and random noise were subtracted; "central" positions were identified by calculating various network centrality scores. When compared among algorithms, network centrality methods, particularly eigenvector centrality, showed markedly better agreement than comparisons of the top pairwise scores. Positions with large centrality scores occurred at key structural locations and/or were functionally sensitive to mutations. Further, the top central positions often differed from those with top pairwise coevolution scores: instead of a few strong scores, central positions often had multiple, moderate scores. We conclude that eigenvector centrality calculations reveal a robust evolutionary pattern of constraints-detectable by divergent algorithms--that occur at key protein locations. Finally, we discuss the fact that multiple patterns coexist in evolutionary data that, together, give rise to emergent protein functions.
Collapse
Affiliation(s)
- Daniel J Parente
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, 66160
| | - J Christian J Ray
- Center for Computational Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, 66047
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, Kansas, 66160
| |
Collapse
|
14
|
Karasev DA, Veselovsky AV, Oparina NY, Filimonov DA, Sobolev BN. Prediction of amino acid positions specific for functional groups in a protein family based on local sequence similarity. J Mol Recognit 2015; 29:159-69. [DOI: 10.1002/jmr.2515] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Revised: 09/28/2015] [Accepted: 09/30/2015] [Indexed: 01/24/2023]
Affiliation(s)
- Dmitry A. Karasev
- Russian National Research Medical University; Moscow Russia
- Laboratory of Structure-Function Based Drug Design; Institute of Biomedical Chemistry (IBMC); Moscow Russia
| | - Alexander V. Veselovsky
- Laboratory of Structure Bioinformatics; Institute of Biomedical Chemistry (IBMC); Moscow Russia
| | - Nina Yu. Oparina
- Department of Medical Biochemistry and Microbiology; Uppsala University; Uppsala Sweden
- Engelhardt Institute of Molecular Biology; Moscow Russia
| | - Dmitry A. Filimonov
- Laboratory of Structure Bioinformatics; Institute of Biomedical Chemistry (IBMC); Moscow Russia
| | - Boris N. Sobolev
- Laboratory of Structure-Function Based Drug Design; Institute of Biomedical Chemistry (IBMC); Moscow Russia
| |
Collapse
|
15
|
AlloRep: A Repository of Sequence, Structural and Mutagenesis Data for the LacI/GalR Transcription Regulators. J Mol Biol 2015; 428:671-678. [PMID: 26410588 DOI: 10.1016/j.jmb.2015.09.015] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 09/04/2015] [Accepted: 09/17/2015] [Indexed: 11/20/2022]
Abstract
Protein families evolve functional variation by accumulating point mutations at functionally important amino acid positions. Homologs in the LacI/GalR family of transcription regulators have evolved to bind diverse DNA sequences and allosteric regulatory molecules. In addition to playing key roles in bacterial metabolism, these proteins have been widely used as a model family for benchmarking structural and functional prediction algorithms. We have collected manually curated sequence alignments for >3000 sequences, in vivo phenotypic and biochemical data for >5750 LacI/GalR mutational variants, and noncovalent residue contact networks for 65 LacI/GalR homolog structures. Using this rich data resource, we compared the noncovalent residue contact networks of the LacI/GalR subfamilies to design and experimentally validate an allosteric mutant of a synthetic LacI/GalR repressor for use in biotechnology. The AlloRep database (freely available at www.AlloRep.org) is a key resource for future evolutionary studies of LacI/GalR homologs and for benchmarking computational predictions of functional change.
Collapse
|
16
|
Parente DJ, Swint-Kruse L. Multiple co-evolutionary networks are supported by the common tertiary scaffold of the LacI/GalR proteins. PLoS One 2013; 8:e84398. [PMID: 24391951 PMCID: PMC3877293 DOI: 10.1371/journal.pone.0084398] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2013] [Accepted: 11/15/2013] [Indexed: 11/18/2022] Open
Abstract
Protein families might evolve paralogous functions on their common tertiary scaffold in two ways. First, the locations of functionally-important sites might be "hard-wired" into the structure, with novel functions evolved by altering the amino acid (e.g. Ala vs Ser) at these positions. Alternatively, the tertiary scaffold might be adaptable, accommodating a unique set of functionally important sites for each paralogous function. To discriminate between these possibilities, we compared the set of functionally important sites in the six largest paralogous subfamilies of the LacI/GalR transcription repressor family. LacI/GalR paralogs share a common tertiary structure, but have low sequence identity (≤ 30%), and regulate a variety of metabolic processes. Functionally important positions were identified by conservation and co-evolutionary sequence analyses. Results showed that conserved positions use a mixture of the "hard-wired" and "accommodating" scaffold frameworks, but that the co-evolution networks were highly dissimilar between any pair of subfamilies. Therefore, the tertiary structure can accommodate multiple networks of functionally important positions. This possibility should be included when designing and interpreting sequence analyses of other protein families. Software implementing conservation and co-evolution analyses is available at https://sourceforge.net/projects/coevolutils/.
Collapse
Affiliation(s)
- Daniel J. Parente
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, United States of America
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, United States of America
- * E-mail:
| |
Collapse
|
17
|
Meinhardt S, Manley MW, Parente DJ, Swint-Kruse L. Rheostats and toggle switches for modulating protein function. PLoS One 2013; 8:e83502. [PMID: 24386217 PMCID: PMC3875437 DOI: 10.1371/journal.pone.0083502] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2013] [Accepted: 11/03/2013] [Indexed: 01/08/2023] Open
Abstract
The millions of protein sequences generated by genomics are expected to transform protein engineering and personalized medicine. To achieve these goals, tools for predicting outcomes of amino acid changes must be improved. Currently, advances are hampered by insufficient experimental data about nonconserved amino acid positions. Since the property “nonconserved” is identified using a sequence alignment, we designed experiments to recapitulate that context: Mutagenesis and functional characterization was carried out in 15 LacI/GalR homologs (rows) at 12 nonconserved positions (columns). Multiple substitutions were made at each position, to reveal how various amino acids of a nonconserved column were tolerated in each protein row. Results showed that amino acid preferences of nonconserved positions were highly context-dependent, had few correlations with physico-chemical similarities, and were not predictable from their occurrence in natural LacI/GalR sequences. Further, unlike the “toggle switch” behaviors of conserved positions, substitutions at nonconserved positions could be rank-ordered to show a “rheostatic”, progressive effect on function that spanned several orders of magnitude. Comparisons to various sequence analyses suggested that conserved and strongly co-evolving positions act as functional toggles, whereas other important, nonconserved positions serve as rheostats for modifying protein function. Both the presence of rheostat positions and the sequence analysis strategy appear to be generalizable to other protein families and should be considered when engineering protein modifications or predicting the impact of protein polymorphisms.
Collapse
Affiliation(s)
- Sarah Meinhardt
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, United States of America
| | - Michael W. Manley
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, United States of America
| | - Daniel J. Parente
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, United States of America
| | - Liskin Swint-Kruse
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas, United States of America
- * E-mail:
| |
Collapse
|
18
|
Teppa E, Wilkins AD, Nielsen M, Buslje CM. Disentangling evolutionary signals: conservation, specificity determining positions and coevolution. Implication for catalytic residue prediction. BMC Bioinformatics 2012; 13:235. [PMID: 22978315 PMCID: PMC3515339 DOI: 10.1186/1471-2105-13-235] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2012] [Accepted: 09/05/2012] [Indexed: 11/11/2022] Open
Abstract
Background A large panel of methods exists that aim to identify residues with critical impact on protein function based on evolutionary signals, sequence and structure information. However, it is not clear to what extent these different methods overlap, and if any of the methods have higher predictive potential compared to others when it comes to, in particular, the identification of catalytic residues (CR) in proteins. Using a large set of enzymatic protein families and measures based on different evolutionary signals, we sought to break up the different components of the information content within a multiple sequence alignment to investigate their predictive potential and degree of overlap. Results Our results demonstrate that the different methods included in the benchmark in general can be divided into three groups with a limited mutual overlap. One group containing real-value Evolutionary Trace (rvET) methods and conservation, another containing mutual information (MI) methods, and the last containing methods designed explicitly for the identification of specificity determining positions (SDPs): integer-value Evolutionary Trace (ivET), SDPfox, and XDET. In terms of prediction of CR, we find using a proximity score integrating structural information (as the sum of the scores of residues located within a given distance of the residue in question) that only the methods from the first two groups displayed a reliable performance. Next, we investigated to what degree proximity scores for conservation, rvET and cumulative MI (cMI) provide complementary information capable of improving the performance for CR identification. We found that integrating conservation with proximity scores for rvET and cMI achieved the highest performance. The proximity conservation score contained no complementary information when integrated with proximity rvET. Moreover, the signal from rvET provided only a limited gain in predictive performance when integrated with mutual information and conservation proximity scores. Combined, these observations demonstrate that the rvET and cMI scores add complementary information to the prediction system. Conclusions This work contributes to the understanding of the different signals of evolution and also shows that it is possible to improve the detection of catalytic residues by integrating structural and higher order sequence evolutionary information with sequence conservation.
Collapse
Affiliation(s)
- Elin Teppa
- Fundación Instituto Leloir, Avda, Patricias Argentinas 435, CABA, C1405BWE, Argentina
| | | | | | | |
Collapse
|
19
|
Meinhardt S, Manley MW, Becker NA, Hessman JA, Maher LJ, Swint-Kruse L. Novel insights from hybrid LacI/GalR proteins: family-wide functional attributes and biologically significant variation in transcription repression. Nucleic Acids Res 2012; 40:11139-54. [PMID: 22965134 PMCID: PMC3505978 DOI: 10.1093/nar/gks806] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
LacI/GalR transcription regulators have extensive, non-conserved interfaces between their regulatory domains and the 18 amino acids that serve as ‘linkers’ to their DNA-binding domains. These non-conserved interfaces might contribute to functional differences between paralogs. Previously, two chimeras created by domain recombination displayed novel functional properties. Here, we present a synthetic protein family, which was created by joining the LacI DNA-binding domain/linker to seven additional regulatory domains. Despite ‘mismatched’ interfaces, chimeras maintained allosteric response to their cognate effectors. Therefore, allostery in many LacI/GalR proteins does not require interfaces with precisely matched interactions. Nevertheless, the chimeric interfaces were not silent to mutagenesis, and preliminary comparisons suggest that the chimeras provide an ideal context for systematically exploring functional contributions of non-conserved positions. DNA looping experiments revealed higher order (dimer–dimer) oligomerization in several chimeras, which might be possible for the natural paralogs. Finally, the biological significance of repression differences was determined by measuring bacterial growth rates on lactose minimal media. Unexpectedly, moderate and strong repressors showed an apparent induction phase, even though inducers were not provided; therefore, an unknown mechanism might contribute to regulation of the lac operon. Nevertheless, altered growth correlated with altered repression, which indicates that observed functional modifications are significant.
Collapse
Affiliation(s)
- Sarah Meinhardt
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, KS 66160, USA
| | | | | | | | | | | |
Collapse
|
20
|
Dou Y, Wang J, Yang J, Zhang C. L1pred: a sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier. PLoS One 2012; 7:e35666. [PMID: 22558194 PMCID: PMC3338704 DOI: 10.1371/journal.pone.0035666] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2012] [Accepted: 03/19/2012] [Indexed: 12/01/2022] Open
Abstract
To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance.
Collapse
Affiliation(s)
- Yongchao Dou
- School of Biological Sciences, Center for Plant Science and Innovation, University of Nebraska, Lincoln, Nebraska, United States of America
| | - Jun Wang
- Scientific Computing Key Laboratory of Shanghai Universities, Shanghai, People’s Republic of China
- Department of Mathematics, Shanghai Normal University, Shanghai, People’s Republic of China
| | - Jialiang Yang
- MPI-Institute of Computational Biology, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Chi Zhang
- School of Biological Sciences, Center for Plant Science and Innovation, University of Nebraska, Lincoln, Nebraska, United States of America
- * E-mail:
| |
Collapse
|
21
|
Dou Y, Geng X, Gao H, Yang J, Zheng X, Wang J. Sequence Conservation in the Prediction of Catalytic Sites. Protein J 2011; 30:229-39. [PMID: 21465136 DOI: 10.1007/s10930-011-9324-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
22
|
Tungtur S, Parente DJ, Swint-Kruse L. Functionally important positions can comprise the majority of a protein's architecture. Proteins 2011; 79:1589-608. [PMID: 21374721 DOI: 10.1002/prot.22985] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Revised: 12/08/2010] [Accepted: 12/15/2010] [Indexed: 01/13/2023]
Abstract
Concomitant with the genomic era, many bioinformatics programs have been developed to identify functionally important positions from sequence alignments of protein families. To evaluate these analyses, many have used the LacI/GalR family and determined whether positions predicted to be "important" are validated by published experiments. However, we previously noted that predictions do not identify all of the experimentally important positions present in the linker regions of these homologs. In an attempt to reconcile these differences, we corrected and expanded the LacI/GalR sequence set commonly used in sequence/function analyses. Next, a variety of analyses were carried out (1) for the entire LacI/GalR sequence set and (2) for a subset of homologs with functionally-important "YxPxxxAxxL" motifs in their linkers. This strategy was devised to determine whether predictions could be improved by knowledge-based sequence sorting and-for some analyses-did increase the number of linker positions identified. However, two functionally important linker positions were not reliably identified by any analysis. Finally, we compared the new predictions to all known experimental data for E. coli LacI and three homologous linkers. From these, we estimate that >50% of positions are important to the functions of the LacI/GalR homologs. In corollary, neutral positions might occur less frequently and might be easier to detect in sequence analyses. Although analyses have successfully guided mutations that partially exchange protein functions, a better experimental understanding of the sequence/function relationships in protein families would be helpful for uncovering the remaining rules used by nature to evolve new protein functions.
Collapse
Affiliation(s)
- Sudheer Tungtur
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, MSN 3030, Kansas City, Kansas 66160, USA
| | | | | |
Collapse
|
23
|
Prediction of catalytic residues based on an overlapping amino acid classification. Amino Acids 2010; 39:1353-61. [DOI: 10.1007/s00726-010-0587-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2009] [Accepted: 03/27/2010] [Indexed: 10/19/2022]
|
24
|
Kuipers RK, Joosten HJ, van Berkel WJH, Leferink NGH, Rooijen E, Ittmann E, van Zimmeren F, Jochens H, Bornscheuer U, Vriend G, Martins dos Santos VAP, Schaap PJ. 3DM: Systematic analysis of heterogeneous superfamily data to discover protein functionalities. Proteins 2010; 78:2101-13. [DOI: 10.1002/prot.22725] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
25
|
Comparing the functional roles of nonconserved sequence positions in homologous transcription repressors: implications for sequence/function analyses. J Mol Biol 2009; 395:785-802. [PMID: 19818797 DOI: 10.1016/j.jmb.2009.10.001] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2009] [Revised: 10/01/2009] [Accepted: 10/02/2009] [Indexed: 11/21/2022]
Abstract
The explosion of protein sequences deduced from genetic code has led to both a problem and a potential resource: Efficient data use requires interpreting the functional impact of sequence change without experimentally characterizing each protein variant. Several groups have hypothesized that interpretation could be aided by analyzing the sequences of naturally occurring homologues. To that end, myriad sequence/function analyses have been developed to predict which conserved, semi-conserved, and nonconserved positions are functionally important. These positions must be discriminated from the nonconserved positions that are functionally silent. However, the assumptions that underlie sequence analyses are based on experimental results that are sparse and usually designed to address different questions. Here, we use three homologues from a test family common to bioinformatics-the LacI/GalR transcription repressors-to test a common assumption: If a position is functionally important for one family member, it has similar importance in all homologues. We generated experimental sequence/function information for each nonconserved position in the 18 amino acids that link the DNA-binding and regulatory domains of three LacI/GalR homologues. We find that the functional importance of each position is preserved among the three linkers, albeit to different degrees. We also find that every linker position contributes to function, which has twofold implications. (1) Since the linker positions range from highly conserved to semi-conserved to nonconserved and contribute to affinity, selectivity, and allosteric response, we assert that sequence/function analyses must identify positions in the LacI/GalR linkers to be qualified as "successful". Many analyses overlook this region since most of the residues do not directly contact ligand. (2) No position in the LacI/GalR linker is functionally silent. This finding is inconsistent with another underlying principle of many analyses: Using sequence sets to discriminate important from non-contributing positions obligates silent positions, which denotes that most homologues tolerate a variety of amino acid substitutions at the position without functional change. Instead, additional combinatorial mutants in the LacI/GalR linkers show that particular substitutions can be silent in a context-dependent manner. Thus, specific permutations of sequence change (rather than change at silent positions) would facilitate neutral drift during evolution. Finally, the combinatorial mutants also reveal functional synergy between semi- and nonconserved positions. Such functional relationships would be missed by analyses that rely primarily upon co-evolution.
Collapse
|
26
|
Dou Y, Zheng X, Wang J. Several appropriate background distributions for entropy-based protein sequence conservation measures. J Theor Biol 2009; 262:317-22. [PMID: 19808039 DOI: 10.1016/j.jtbi.2009.09.030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Revised: 09/25/2009] [Accepted: 09/25/2009] [Indexed: 11/25/2022]
Abstract
Amino acid background distribution is an important factor for entropy-based methods which extract sequence conservation information from protein multiple sequence alignments (MSAs). However, MSAs are usually not large enough to allow a reliable observed background distribution. In this paper, we propose two new estimations of background distribution. One is an integration of the observed background distribution and the position-specific residue distribution, and the other is a normalized square root of observed background frequency. To validate these new background distributions, they are applied to the relative entropy model to find catalytic sites and ligand binding sites from protein MSAs. Experimental results show that they are superior to the observed background distribution in predicting functionally important residues.
Collapse
Affiliation(s)
- Yongchao Dou
- School of Mathematical Science, Dalian University of Technology, Dalian 116024, PR China
| | | | | |
Collapse
|
27
|
Doddareddy MR, van Westen GJP, van der Horst E, Peironcely JE, Corthals F, Ijzerman AP, Emmerich M, Jenkins JL, Bender A. Chemogenomics: Looking at biology through the lens of chemistry. Stat Anal Data Min 2009. [DOI: 10.1002/sam.10046] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
28
|
Kalinina OV, Gelfand MS, Russell RB. Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics 2009; 10:174. [PMID: 19508719 PMCID: PMC2709924 DOI: 10.1186/1471-2105-10-174] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 06/09/2009] [Indexed: 11/16/2022] Open
Abstract
Background Predicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities. Results Here we present a novel method for functional site prediction based on identification of conserved positions, as well as those responsible for determining ligand specificity. We define Specificity-Determining Positions (SDPs), as those occupied by conserved residues within sub-groups of proteins in a family having a common specificity, but differ between groups, and are thus likely to account for specific recognition events. We benchmark the approach on enzyme families of known 3D structure with bound substrates, and find that in nearly all families residues predicted by SDPsite are in contact with the bound substrate, and that the addition of SDPs significantly improves functional site prediction accuracy. We apply SDPsite to various families of proteins containing known three-dimensional structures, but lacking clear functional annotations, and discusse several illustrative examples. Conclusion The results suggest a better means to predict functional details for the thousands of protein structures determined prior to a clear understanding of molecular function.
Collapse
|
29
|
Fatakia SN, Costanzi S, Chow CC. Computing highly correlated positions using mutual information and graph theory for G protein-coupled receptors. PLoS One 2009; 4:e4681. [PMID: 19262747 PMCID: PMC2650788 DOI: 10.1371/journal.pone.0004681] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2008] [Accepted: 01/07/2009] [Indexed: 01/06/2023] Open
Abstract
G protein-coupled receptors (GPCRs) are a superfamily of seven transmembrane-spanning proteins involved in a wide array of physiological functions and are the most common targets of pharmaceuticals. This study aims to identify a cohort or clique of positions that share high mutual information. Using a multiple sequence alignment of the transmembrane (TM) domains, we calculated the mutual information between all inter-TM pairs of aligned positions and ranked the pairs by mutual information. A mutual information graph was constructed with vertices that corresponded to TM positions and edges between vertices were drawn if the mutual information exceeded a threshold of statistical significance. Positions with high degree (i.e. had significant mutual information with a large number of other positions) were found to line a well defined inter-TM ligand binding cavity for class A as well as class C GPCRs. Although the natural ligands of class C receptors bind to their extracellular N-terminal domains, the possibility of modulating their activity through ligands that bind to their helical bundle has been reported. Such positions were not found for class B GPCRs, in agreement with the observation that there are not known ligands that bind within their TM helical bundle. All identified key positions formed a clique within the MI graph of interest. For a subset of class A receptors we also considered the alignment of a portion of the second extracellular loop, and found that the two positions adjacent to the conserved Cys that bridges the loop with the TM3 qualified as key positions. Our algorithm may be useful for localizing topologically conserved regions in other protein families.
Collapse
Affiliation(s)
- Sarosh N. Fatakia
- Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Stefano Costanzi
- Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Carson C. Chow
- Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
30
|
Kuntner M, Agnarsson I. Phylogeny accurately predicts behaviour in Indian Ocean Clitaetra spiders (Araneae:Nephilidae). INVERTEBR SYST 2009. [DOI: 10.1071/is09002] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Phylogenies are underutilised, powerful predictors of traits in unstudied species. We tested phylogenetic predictions of web-related behaviour in Clitaetra Simon, 1889, an Afro-Indian spider genus of the family Nephilidae. Clitaetra is phylogenetically sister to all other nephilids and thus important for understanding ancestral traits. Behavioural information on Clitaetra has been limited to only C. irenae Kuntner, 2006 from South Africa which constructs ladder webs. A resolved species-level phylogeny unambiguously optimised Clitaetra behavioural biology and predicted web traits in five unstudied species and a uniform intrageneric nephilid web biology. We tested these predictions by studying the ecology and web biology of C. perroti Simon, 1894 on Madagascar and C. episinoides Simon, 1889 on Mayotte. We confirm predicted arboricolous web architecture in these species. The expected ontogenetic allometric transition from orbs in juveniles to elongate ladder webs in adults was statistically significant in C. perroti, whereas marginally not significant in C. episinoides. We demonstrate the persistence of the temporary spiral in finished Clitaetra webs. A morphological and behavioural phylogenetic analysis resulted in unchanged topology and persisting unambiguous behavioural synapomorphies. Our results support the homology of Clitaetra hub reinforcement with the nephilid hub-cup. In Clitaetra, behaviour was highly predictable and remained consistent with new observations. Our results confirm that nephilid web biology is evolutionarily conserved within genera.
Collapse
|
31
|
Meinhardt S, Swint-Kruse L. Experimental identification of specificity determinants in the domain linker of a LacI/GalR protein: bioinformatics-based predictions generate true positives and false negatives. Proteins 2008; 73:941-57. [PMID: 18536016 DOI: 10.1002/prot.22121] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In protein families, conserved residues often contribute to a common general function, such as DNA-binding. However, unique attributes for each homolog (e.g. recognition of alternative DNA sequences) must arise from variation in other functionally-important positions. The locations of these "specificity determinant" positions are obscured amongst the background of varied residues that do not make significant contributions to either structure or function. To isolate specificity determinants, a number of bioinformatics algorithms have been developed. When applied to the LacI/GalR family of transcription regulators, several specificity determinants are predicted in the 18 amino acids that link the DNA-binding and regulatory domains. However, results from alternative algorithms are only in partial agreement with each other. Here, we experimentally evaluate these predictions using an engineered repressor comprising the LacI DNA-binding domain, the LacI linker, and the GalR regulatory domain (LLhG). "Wild-type" LLhG has altered DNA specificity and weaker lacO(1) repression compared to LacI or a similar LacI:PurR chimera. Next, predictions of linker specificity determinants were tested, using amino acid substitution and in vivo repression assays to assess functional change. In LLhG, all predicted sites are specificity determinants, as well as three sites not predicted by any algorithm. Strategies are suggested for diminishing the number of false negative predictions. Finally, individual substitutions at LLhG specificity determinants exhibited a broad range of functional changes that are not predicted by bioinformatics algorithms. Results suggest that some variants have altered affinity for DNA, some have altered allosteric response, and some appear to have changed specificity for alternative DNA ligands.
Collapse
Affiliation(s)
- Sarah Meinhardt
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas 66160, USA
| | | |
Collapse
|
32
|
Przytycka TM, Jothi R, Aravind L, Lipman DJ. Differences in evolutionary pressure acting within highly conserved ortholog groups. BMC Evol Biol 2008; 8:208. [PMID: 18637201 PMCID: PMC2488352 DOI: 10.1186/1471-2148-8-208] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2007] [Accepted: 07/17/2008] [Indexed: 11/12/2022] Open
Abstract
Background In highly conserved widely distributed ortholog groups, the main evolutionary force is assumed to be purifying selection that enforces sequence conservation, with most divergence occurring by accumulation of neutral substitutions. Using a set of ortholog groups from prokaryotes, with a single representative in each studied organism, we asked the question if this evolutionary pressure is acting similarly on different subgroups of orthologs defined as major lineages (e.g. Proteobacteria or Firmicutes). Results Using correlations in entropy measures as a proxy for evolutionary pressure, we observed two distinct behaviors within our ortholog collection. The first subset of ortholog groups, called here informational, consisted mostly of proteins associated with information processing (i.e. translation, transcription, DNA replication) and the second, the non-informational ortholog groups, mostly comprised of proteins involved in metabolic pathways. The evolutionary pressure acting on non-informational proteins is more uniform relative to their informational counterparts. The non-informational proteins show higher level of correlation between entropy profiles and more uniformity across subgroups. Conclusion The low correlation of entropy profiles in the informational ortholog groups suggest that the evolutionary pressure acting on the informational ortholog groups is not uniform across different clades considered this study. This might suggest "fine-tuning" of informational proteins in each lineage leading to lineage-specific differences in selection. This, in turn, could make these proteins less exchangeable between lineages. In contrast, the uniformity of the selective pressure acting on the non-informational groups might allow the exchange of the genetic material via lateral gene transfer.
Collapse
Affiliation(s)
- Teresa M Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | | | |
Collapse
|