1
|
Tule S, Foley G, Zhao C, Forbes M, Bodén M. Optimal phylogenetic reconstruction of insertion and deletion events. Bioinformatics 2024; 40:i277-i286. [PMID: 38940131 PMCID: PMC11211827 DOI: 10.1093/bioinformatics/btae254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
MOTIVATION Insertions and deletions (indels) influence the genetic code in fundamentally distinct ways from substitutions, significantly impacting gene product structure and function. Despite their influence, the evolutionary history of indels is often neglected in phylogenetic tree inference and ancestral sequence reconstruction, hindering efforts to comprehend biological diversity determinants and engineer variants for medical and industrial applications. RESULTS We frame determining the optimal history of indel events as a single Mixed-Integer Programming (MIP) problem, across all branch points in a phylogenetic tree adhering to topological constraints, and all sites implied by a given set of aligned, extant sequences. By disentangling the impact on ancestral sequences at each branch point, this approach identifies the minimal indel events that jointly explain the diversity in sequences mapped to the tips of that tree. MIP can recover alternate optimal indel histories, if available. We evaluated MIP for indel inference on a dataset comprising 15 real phylogenetic trees associated with protein families ranging from 165 to 2000 extant sequences, and on 60 synthetic trees at comparable scales of data and reflecting realistic rates of mutation. Across relevant metrics, MIP outperformed alternative parsimony-based approaches and reported the fewest indel events, on par or below their occurrence in synthetic datasets. MIP offers a rational justification for indel patterns in extant sequences; importantly, it uniquely identifies global optima on complex protein data sets without making unrealistic assumptions of independence or evolutionary underpinnings, promising a deeper understanding of molecular evolution and aiding novel protein design. AVAILABILITY AND IMPLEMENTATION The implementation is available via GitHub at https://github.com/santule/indelmip.
Collapse
Affiliation(s)
- Sanjana Tule
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Gabriel Foley
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Chongting Zhao
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Michael Forbes
- School of Mathematics and Physics, The University of Queensland, Brisbane, QLD 4072, Australia
| | - Mikael Bodén
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, QLD 4072, Australia
| |
Collapse
|
2
|
Yang Y, Braga MV, Dean MD. Insertion-Deletion Events Are Depleted in Protein Regions with Predicted Secondary Structure. Genome Biol Evol 2024; 16:evae093. [PMID: 38735759 PMCID: PMC11102076 DOI: 10.1093/gbe/evae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/16/2024] [Accepted: 04/21/2024] [Indexed: 05/14/2024] Open
Abstract
A fundamental goal in evolutionary biology and population genetics is to understand how selection shapes the fate of new mutations. Here, we test the null hypothesis that insertion-deletion (indel) events in protein-coding regions occur randomly with respect to secondary structures. We identified indels across 11,444 sequence alignments in mouse, rat, human, chimp, and dog genomes and then quantified their overlap with four different types of secondary structure-alpha helices, beta strands, protein bends, and protein turns-predicted by deep-learning methods of AlphaFold2. Indels overlapped secondary structures 54% as much as expected and were especially underrepresented over beta strands, which tend to form internal, stable regions of proteins. In contrast, indels were enriched by 155% over regions without any predicted secondary structures. These skews were stronger in the rodent lineages compared to the primate lineages, consistent with population genetic theory predicting that natural selection will be more efficient in species with larger effective population sizes. Nonsynonymous substitutions were also less common in regions of protein secondary structure, although not as strongly reduced as in indels. In a complementary analysis of thousands of human genomes, we showed that indels overlapping secondary structure segregated at significantly lower frequency than indels outside of secondary structure. Taken together, our study shows that indels are selected against if they overlap secondary structure, presumably because they disrupt the tertiary structure and function of a protein.
Collapse
Affiliation(s)
- Yi Yang
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew V Braga
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew D Dean
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
3
|
Zhao D, Zhou Q, Zarif M, Eladl E, Wei C, Atenafu EG, Schuh A, Tierens A, Yeung YWT, Minden MD, Chang H. AML with CEBPA mutations: A comparison of ICC and WHO-HAEM5 criteria in patients with 20% or more blasts. Leuk Res 2023; 134:107376. [PMID: 37690321 DOI: 10.1016/j.leukres.2023.107376] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/21/2023] [Accepted: 08/25/2023] [Indexed: 09/12/2023]
Abstract
AML with CEBPA mutation and AML with in-frame bZIP CEBPA mutations define favorable-risk disease entities in the proposed 5th edition of the World Health Organization Classification (WHO-HAEM5) and the International Consensus Classification (ICC), respectively. However, the impact of these new classifications on clinical practice remains unclear. We sought to assess the differences between the ICC and WHO-HAEM5 for AML with CEBPA mutation. 741 AML patients were retrospectively analyzed. Cox proportional-hazard regression was used to identify factors predictive of outcome. A validation cohort from the UK-NCRI clinical trials was used to confirm our findings. 81 (11%) AML patients had CEBPA mutations. 39 (48%) patients met WHO-HAEM5 criteria for AML with CEBPA mutation, among which 30 (77%) had biallelic CEBPA mutations and 9 (23%) had a single bZIP mutation. Among the 39 patients who met WHO-HAEM5 criteria, 25 (64%) also met ICC criteria. Compared to patients only meeting WHO-HAEM5 criteria, patients with in-frame bZIP CEBPA mutations (ie. meeting both WHO-HAEM5 and ICC criteria) were younger, had higher bone marrow blast percentages and CEBPA mutation burden, infrequently harboured 2022 ELN high-risk genetic features and co-mutations in other genes, and had superior outcomes. The associations in clinicopathological features and outcomes between the CEBPA-mutated groups were validated in the UK-NCRI cohort. Our study indicates that in-frame bZIP CEBPA mutations are the critical molecular aberrations associated with favorable outcomes in AML patients treated with curative intent chemotherapy. Compared to WHO-HAEM5, the ICC identifies a more homogenous group of CEBPA-mutated AML patients with favorable outcomes.
Collapse
Affiliation(s)
- Davidson Zhao
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada; Department of Laboratory Hematology, Laboratory Medicine Program, University Health Network, Toronto, ON, Canada
| | - Qianghua Zhou
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada; Department of Laboratory Hematology, Laboratory Medicine Program, University Health Network, Toronto, ON, Canada
| | - Mojgan Zarif
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada; Department of Laboratory Hematology, Laboratory Medicine Program, University Health Network, Toronto, ON, Canada
| | - Entsar Eladl
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada; Department of Laboratory Hematology, Laboratory Medicine Program, University Health Network, Toronto, ON, Canada
| | - Cuihong Wei
- Department of Clinical Laboratory Genetics, Genome Diagnostics & Cancer Cytogenetics, University Health Network, Toronto, ON, Canada
| | - Eshetu G Atenafu
- Department of Biostatistics, University Health Network, Toronto, ON, Canada
| | - Andre Schuh
- Department of Medical Oncology and Hematology, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Anne Tierens
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada; Department of Laboratory Hematology, Laboratory Medicine Program, University Health Network, Toronto, ON, Canada
| | - Yu Wing Tony Yeung
- Department of Laboratory Medicine, St. Michael's Hospital, Toronto, ON, Canada
| | - Mark D Minden
- Department of Medical Oncology and Hematology, Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Hong Chang
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada; Department of Laboratory Hematology, Laboratory Medicine Program, University Health Network, Toronto, ON, Canada.
| |
Collapse
|
4
|
Jilani M, Turcan A, Haspel N, Jagodzinski F. Elucidating the Structural Impacts of Protein InDels. Biomolecules 2022; 12:1435. [PMID: 36291643 PMCID: PMC9599607 DOI: 10.3390/biom12101435] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 09/23/2022] [Accepted: 09/27/2022] [Indexed: 09/17/2023] Open
Abstract
The effects of amino acid insertions and deletions (InDels) remain a rather under-explored area of structural biology. These variations oftentimes are the cause of numerous disease phenotypes. In spite of this, research to study InDels and their structural significance remains limited, primarily due to a lack of experimental information and computational methods. In this work, we fill this gap by modeling InDels computationally; we investigate the rigidity differences between the wildtype and a mutant variant with one or more InDels. Further, we compare how structural effects due to InDels differ from the effects of amino acid substitutions, which are another type of amino acid mutation. We finish by performing a correlation analysis between our rigidity-based metrics and wet lab data for their ability to infer the effects of InDels on protein fitness.
Collapse
Affiliation(s)
- Muneeba Jilani
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Alistair Turcan
- Department of Computer Science, Western Washington University, Bellingham, WA 98225, USA
| | - Nurit Haspel
- Department of Computer Science, University of Massachusetts Boston, Boston, MA 02125, USA
| | - Filip Jagodzinski
- Department of Computer Science, Western Washington University, Bellingham, WA 98225, USA
| |
Collapse
|
5
|
Shekhar C, Maeda T. A simple approach for random genomic insertion-deletions using ambiguous sequences in Escherichia coli. J Basic Microbiol 2022; 62:948-962. [PMID: 35739617 DOI: 10.1002/jobm.202100636] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 04/20/2022] [Accepted: 06/11/2022] [Indexed: 11/07/2022]
Abstract
Escherichia coli K-12, being one of the best understood and thoroughly analyzed organisms, is the preferred platform for genetic and biochemical research. Among all genetic engineering approaches applied on E. coli, the homologous recombination approach is versatile and precise, which allows engineering genes or large segments of the chromosome directly by using polymerase chain reaction (PCR) products or synthetic oligonucleotides. The previously explained approaches for random insertion and deletions were reported as technically not easy and laborious. This study, first, finds the minimum length of homology extension that is efficient and accurate for homologous recombination, as 30 nt. Second, proposes an approach utilizing PCR products flanking ambiguous NNN-sequence (30-nt) extensions, which facilitate the homologous recombination to recombine them at multiple regions on the genome and generate insertion-deletion mutations. Further analysis found that these mutations were varying in number, that is, multiple genomic regions were deleted. Moreover, evaluation of the phenotype of all the multiple random insertion-deletion mutants demonstrated no significant changes in the normal metabolism of bacteria. This study not only presents the efficiency of ambiguous sequences in making random deletion mutations, but also demonstrates their further applicability in genomics.
Collapse
Affiliation(s)
- Chandra Shekhar
- Department of Biological Functions Engineering, Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology, Kitakyushu, Japan
| | - Toshinari Maeda
- Department of Biological Functions Engineering, Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology, Kitakyushu, Japan
| |
Collapse
|
6
|
Savino S, Desmet T, Franceus J. Insertions and deletions in protein evolution and engineering. Biotechnol Adv 2022; 60:108010. [PMID: 35738511 DOI: 10.1016/j.biotechadv.2022.108010] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2022] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 11/17/2022]
Abstract
Protein evolution or engineering studies are traditionally focused on amino acid substitutions and the way these contribute to fitness. Meanwhile, the insertion and deletion of amino acids is often overlooked, despite being one of the most common sources of genetic variation. Recent methodological advances and successful engineering stories have demonstrated that the time is ripe for greater emphasis on these mutations and their understudied effects. This review highlights the evolutionary importance and biotechnological relevance of insertions and deletions (indels). We provide a comprehensive overview of approaches that can be employed to include indels in random, (semi)-rational or computational protein engineering pipelines. Furthermore, we discuss the tolerance to indels at the structural level, address how domain indels can link the function of unrelated proteins, and feature studies that illustrate the surprising and intriguing potential of frameshift mutations.
Collapse
Affiliation(s)
- Simone Savino
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Tom Desmet
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium
| | - Jorick Franceus
- Centre for Synthetic Biology (CSB), Department of Biotechnology, Ghent University, Coupure Links 653, 9000 Ghent, Belgium..
| |
Collapse
|
7
|
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion Mutants from Robust, Ancestral Templates Using Graphical Representation of Ancestral Sequence Predictions (GRASP). METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2397:85-110. [PMID: 34813061 DOI: 10.1007/978-1-0716-1826-4_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Analyzing the natural evolution of proteins by ancestral sequence reconstruction (ASR) can provide valuable information about the changes in sequence and structure that drive the development of novel protein functions. However, ASR has also been used as a protein engineering tool, as it often generates thermostable proteins which can serve as robust and evolvable templates for enzyme engineering. Importantly, ASR has the potential to provide an insight into the history of insertions and deletions that have occurred in the evolution of a protein family. Indels are strongly associated with functional change during enzyme evolution and represent a largely unexplored source of genetic diversity for designing proteins with novel or improved properties. Current ASR methods differ in the way they handle indels; inclusion or exclusion of indels is often managed subjectively, based on assumptions the user makes about the likelihood of each recombination event, yet most currently available ASR tools provide limited, if any, opportunities for evaluating indel placement in a reconstructed sequence. Graphical Representation of Ancestral Sequence Predictions (GRASP) is an ASR tool that maps indel evolution throughout a reconstruction and enables the evaluation of indel variants. This chapter provides a general protocol for performing a reconstruction using GRASP and using the results to create indel variants. The method addresses protein template selection, sequence curation, alignment refinement, tree building, ancestor reconstruction, evaluation of indel variants and approaches to library development.
Collapse
|
8
|
Martin NS, Ahnert SE. Insertions and deletions in the RNA sequence-structure map. J R Soc Interface 2021; 18:20210380. [PMID: 34610259 PMCID: PMC8492174 DOI: 10.1098/rsif.2021.0380] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 09/13/2021] [Indexed: 12/21/2022] Open
Abstract
Genotype-phenotype maps link genetic changes to their fitness effect and are thus an essential component of evolutionary models. The map between RNA sequences and their secondary structures is a key example and has applications in functional RNA evolution. For this map, the structural effect of substitutions is well understood, but models usually assume a constant sequence length and do not consider insertions or deletions. Here, we expand the sequence-structure map to include single nucleotide insertions and deletions by using the RNAshapes concept. To quantify the structural effect of insertions and deletions, we generalize existing definitions for robustness and non-neutral mutation probabilities. We find striking similarities between substitutions, deletions and insertions: robustness to substitutions is correlated with robustness to insertions and, for most structures, to deletions. In addition, frequent structural changes after substitutions also tend to be common for insertions and deletions. This is consistent with the connection between energetically suboptimal folds and possible structural transitions. The similarities observed hold both for genotypic and phenotypic robustness and mutation probabilities, i.e. for individual sequences and for averages over sequences with the same structure. Our results could have implications for the rate of neutral and non-neutral evolution.
Collapse
Affiliation(s)
- Nora S. Martin
- Theory of Condensed Matter Group, Cavendish Laboratory, University of Cambridge, JJ Thomson Avenue, Cambridge CB3 0HE, UK
- Sainsbury Laboratory, University of Cambridge, Bateman Street, Cambridge CB2 1LR, UK
| | - Sebastian E. Ahnert
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge CB3 0AS, UK
- The Alan Turing Institute, British Library, Euston Road, London NW1 2DB, UK
| |
Collapse
|
9
|
Lin M, Malik FK, Guo JT. A comparative study of protein-ssDNA interactions. NAR Genom Bioinform 2021; 3:lqab006. [PMID: 33655206 PMCID: PMC7902235 DOI: 10.1093/nargab/lqab006] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Revised: 11/24/2020] [Accepted: 01/26/2021] [Indexed: 12/18/2022] Open
Abstract
Single-stranded DNA-binding proteins (SSBs) play crucial roles in DNA replication, recombination and repair, and serve as key players in the maintenance of genomic stability. While a number of SSBs bind single-stranded DNA (ssDNA) non-specifically, the others recognize and bind specific ssDNA sequences. The mechanisms underlying this binding discrepancy, however, are largely unknown. Here, we present a comparative study of protein-ssDNA interactions by annotating specific and non-specific SSBs and comparing structural features such as DNA-binding propensities and secondary structure types of residues in SSB-ssDNA interactions, protein-ssDNA hydrogen bonding and π-π interactions between specific and non-specific SSBs. Our results suggest that protein side chain-DNA base hydrogen bonds are the major contributors to protein-ssDNA binding specificity, while π-π interactions may mainly contribute to binding affinity. We also found the enrichment of aspartate in the specific SSBs, a key feature in specific protein-double-stranded DNA (dsDNA) interactions as reported in our previous study. In addition, no significant differences between specific and non-specific groups with respect of conformational changes upon ssDNA binding were found, suggesting that the flexibility of SSBs plays a lesser role than that of dsDNA-binding proteins in conferring binding specificity.
Collapse
Affiliation(s)
- Maoxuan Lin
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Fareeha K Malik
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
- Research Center of Modeling and Simulation, National University of Science and Technology, Islamabad, 44000, Pakistan
| | - Jun-tao Guo
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
10
|
Wang X, Ma Q. Wzb of Vibrio vulnificus represents a new group of low-molecular-weight protein tyrosine phosphatases with a unique insertion in the W-loop. J Biol Chem 2021; 296:100280. [PMID: 33450227 PMCID: PMC7948962 DOI: 10.1016/j.jbc.2021.100280] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/28/2020] [Accepted: 01/08/2021] [Indexed: 12/23/2022] Open
Abstract
Protein tyrosine phosphorylation regulates the production of capsular polysaccharide, an essential virulence factor of the deadly pathogen Vibrio vulnificus. The process requires the protein tyrosine kinase Wzc and its cognate phosphatase Wzb, both of which are largely uncharacterized. Herein, we report the structures of Wzb of V. vulnificus (VvWzb) in free and ligand-bound forms. VvWzb belongs to the low-molecular-weight protein tyrosine phosphatase (LMWPTP) family. Interestingly, it contains an extra four-residue insertion in the W-loop, distinct from all known LMWPTPs. The W-loop of VvWzb protrudes from the protein body in the free structure, but undergoes significant conformational changes to fold toward the active site upon ligand binding. Deleting the four-residue insertion from the W-loop severely impaired the enzymatic activity of VvWzb, indicating its importance for optimal catalysis. However, mutating individual residues or even substituting the whole insertion with four alanine residues only modestly decreased the enzymatic activity, suggesting that the contribution of the insertion to catalysis is not determined by the sequence specificity. Furthermore, inserting the four residues into Escherichia coli Wzb at the corresponding position enhanced its activity as well, indicating that the four-residue insertion in the W-loop can act as a general activity enhancing element for other LMWPTPs. The novel W-loop type and phylogenetic analysis suggested that VvWzb and its homologs should be classified into a new group of LMWPTPs. Our study sheds new insight into the catalytic mechanism and structural diversity of the LMWPTP family and promotes the understanding of the protein tyrosine phosphorylation system in prokaryotes.
Collapse
Affiliation(s)
- Xin Wang
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China; Laboratory for Marine Biology and Biotechnology, Pilot National Laboratory for Marine Science and Technology, Qingdao, China; University of Chinese Academy of Sciences, Beijing, China
| | - Qingjun Ma
- Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Qingdao, China; Laboratory for Marine Biology and Biotechnology, Pilot National Laboratory for Marine Science and Technology, Qingdao, China; University of Chinese Academy of Sciences, Beijing, China; Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, China.
| |
Collapse
|
11
|
Mallajosyula VVA, Swaroop S, Varadarajan R. Influenza Hemagglutinin Head Domain Mimicry by Rational Design. Protein J 2020; 39:434-448. [PMID: 33068234 DOI: 10.1007/s10930-020-09930-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/09/2020] [Indexed: 02/07/2023]
Abstract
Despite diligent vaccination efforts, influenza virus infection remains a major cause for respiratory-related illness across the globe. The less-than-optimal immunity conferred by the currently prescribed seasonal vaccines and protracted production times warrant the development of novel vaccines. Induction of an epitope-focused antibody response targeting known neutralization epitopes is a viable strategy to enhance the breadth of protection against rapidly evolving infectious viruses. We report the development of a design framework to mimic the hemagglutinin (HA) head fragment of H1-subtype viruses by delineating the interaction network of invariant residues lining the receptor binding site (RBS); a site targeted by cross-reactive neutralizing antibodies. The incorporation of multiple sequence alignment information in our algorithm to fix the construct termini and engineer rational mutations facilitates the facile extension of the design to heterologous (subtype-specific) influenza strains. We evaluated our design protocol by generating head fragments from divergent influenza A H1N1 A/Puerto Rico/8/34 and pH1N1 A/California/07/2009 strains that share a sequence identity of only 74.4% within the HA1 subunit. The designed immunogens exhibited characteristics of a well-ordered protein, and bound conformation-specific RBS targeting antibodies with high affinity, a desirable feature for putative vaccine candidates. Additionally, the bacterial expression of these immunogens provides a low-cost, rapidly scalable alternative.
Collapse
Affiliation(s)
| | - Shiv Swaroop
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India.,Department of Biochemistry, Central University of Rajasthan, Kishangarh, Ajmer, 305817, India
| | - Raghavan Varadarajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India.
| |
Collapse
|
12
|
Lin M, Guo JT. New insights into protein-DNA binding specificity from hydrogen bond based comparative study. Nucleic Acids Res 2020; 47:11103-11113. [PMID: 31665426 PMCID: PMC6868434 DOI: 10.1093/nar/gkz963] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 10/06/2019] [Accepted: 10/10/2019] [Indexed: 12/25/2022] Open
Abstract
Knowledge of protein-DNA binding specificity has important implications in understanding DNA metabolism, transcriptional regulation and developing therapeutic drugs. Previous studies demonstrated hydrogen bonds between amino acid side chains and DNA bases play major roles in specific protein-DNA interactions. In this paper, we investigated the roles of individual DNA strands and protein secondary structure types in specific protein-DNA recognition based on side chain-base hydrogen bonds. By comparing the contribution of each DNA strand to the overall binding specificity between DNA-binding proteins with different degrees of binding specificity, we found that highly specific DNA-binding proteins show balanced hydrogen bonding with each of the two DNA strands while multi-specific DNA binding proteins are generally biased towards one strand. Protein-base pair hydrogen bonds, in which both bases of a base pair are involved in forming hydrogen bonds with amino acid side chains, are more prevalent in the highly specific protein-DNA complexes than those in the multi-specific group. Amino acids involved in side chain-base hydrogen bonds favor strand and coil secondary structure types in highly specific DNA-binding proteins while multi-specific DNA-binding proteins prefer helices.
Collapse
Affiliation(s)
- Maoxuan Lin
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| |
Collapse
|
13
|
Gonzalez CE, Roberts P, Ostermeier M. Fitness Effects of Single Amino Acid Insertions and Deletions in TEM-1 β-Lactamase. J Mol Biol 2019; 431:2320-2330. [PMID: 31034887 DOI: 10.1016/j.jmb.2019.04.030] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2019] [Revised: 04/17/2019] [Accepted: 04/18/2019] [Indexed: 11/16/2022]
Abstract
Short insertions and deletions (InDels) are a common type of mutation found in nature and a useful source of variation in protein engineering. InDel events have important consequences in protein evolution, often opening new pathways for adaptation. However, much less is known about the effects of InDels compared to point mutations and amino acid substitutions. In particular, deep mutagenesis studies on the distribution of fitness effects of mutations have focused almost exclusively on amino acid substitutions. Here, we present a near-comprehensive analysis of the fitness effects of single amino acid InDels in TEM-1 β-lactamase. While we found InDels to be largely deleterious, partially overlapping deletion-tolerant and insertion-tolerant regions were observed throughout the protein, especially in unstructured regions and at the end of helices. The signal sequence of TEM-1 tolerated InDels more than the mature protein. Most regions of the protein tolerated insertions more than deletions, but a few regions tolerated deletions more than insertions. We examined the relationship between InDel tolerance and a variety of measures to help understand its origin. These measures included evolutionary variation in β-lactamases, secondary structure identity, tolerance to amino acid substitutions, solvent accessibility, and side-chain weighted contact number. We found secondary structure, weighted contact number, and evolutionary variation in class A beta-lactamases to be the somewhat predictive of InDel fitness effects.
Collapse
Affiliation(s)
- Courtney E Gonzalez
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, USA
| | - Paul Roberts
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, USA
| | - Marc Ostermeier
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, USA.
| |
Collapse
|
14
|
Huang BCB, Kim YC, Bañas S, Barfield RM, Drake PM, Rupniewski I, Haskins WE, Rabuka D. Antibody-drug conjugate library prepared by scanning insertion of the aldehyde tag into IgG1 constant regions. MAbs 2018; 10:1182-1189. [PMID: 30252630 PMCID: PMC6284588 DOI: 10.1080/19420862.2018.1512327] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The advantages of site-specific over stochastic bioconjugation technologies include homogeneity of product, minimal perturbation of protein structure/function, and – increasingly – the ability to perform structure activity relationship studies at the conjugate level. When selecting the optimal location for site-specific payload placement, many researchers turn to in silico modeling of protein structure to identify regions predicted to offer solvent-exposed conjugatable sites while conserving protein function. Here, using the aldehyde tag as our site-specific technology platform and human IgG1 antibody as our target protein, we demonstrate the power of taking an unbiased scanning approach instead. Scanning insertion of the human formylglycine generating enzyme (FGE) recognition sequence, LCTPSR, at each of the 436 positions in the light and heavy chain antibody constant regions followed by co-expression with FGE yielded a library of antibodies bearing an aldehyde functional group ready for conjugation. Each of the variants was expressed, purified, and conjugated to a cytotoxic payload using the Hydrazinyl Iso-Pictet-Spengler ligation to generate an antibody-drug conjugate (ADC), which was analyzed in terms of conjugatability (assessed by drug-to-antibody ratio, DAR) and percent aggregate. We searched for insertion sites that could generate manufacturable ADCs, defined as those variants yielding reasonable antibody titers, DARs of ≥ 1.3, and ≥ 95% monomeric species. Through this process, we discovered 58 tag insertion sites that met these metrics, including 14 sites in the light chain, a location that had proved refractory to the placement of manufacturable tag sites using in silico modeling/rational approaches.
Collapse
|
15
|
Ramakrishna G, Kaur P, Nigam D, Chaduvula PK, Yadav S, Talukdar A, Singh NK, Gaikwad K. Genome-wide identification and characterization of InDels and SNPs in Glycine max and Glycine soja for contrasting seed permeability traits. BMC PLANT BIOLOGY 2018; 18:141. [PMID: 29986650 PMCID: PMC6038289 DOI: 10.1186/s12870-018-1341-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2017] [Accepted: 06/05/2018] [Indexed: 05/03/2023]
Abstract
BACKGROUND Water permeability governed by seed coat is a major facet of seed crops, especially soybean, whose seeds lack physiological dormancy and experience rapid deterioration in seed viability under prolonged storage. Moreover, the physiological and chemical characteristics of soybean seeds are known to vary with seed coat color. Thus, to underpin the genes controlling water permeability in soybean seeds, we carried out an in-depth characterization of the associated genomic variation. RESULTS In the present study, we have analyzed genomic variation between cultivated soybean and its wild progenitor with implications on seed permeability, a trait related to seed storability. Whole genome resequencing of G.max and G. soja, identified SNPs and InDels which were further characterized on the basis of their genomic location and impact on gene expression. Chromosomal density distribution of the variation was assessed across the genome and genes carrying SNPs and InDels were characterized into different metabolic pathways. Seed hardiness is a complex trait that is affected by the allelic constitution of a genetic locus as well as by a tricky web of plant hormone interactions. Seven genes that hold a probable role in the determination of seed permeability were selected and their expression differences at different stages of water imbibition were analyzed. Variant interaction network derived 205 downstream interacting partners of 7 genes confirmed their role in seed related traits. Interestingly, genes encoding for Type I- Inositol polyphosphate 5 phosphatase1 and E3 Ubiquitin ligase could differentiate parental genotypes, revealed protein conformational deformations and were found to segregate among RILs in coherence with their permeability scores. The 2 identified genes, thus showed a preliminary association with the desirable permeability characteristics. CONCLUSION In the light of above outcomes, 2 genes were identified that revealed preliminary, but a relevant association with soybean seed permeability trait and hence could serve as a primary material for understanding the molecular pathways controlling seed permeability traits in soybean.
Collapse
Affiliation(s)
- G. Ramakrishna
- ICAR- National Research Centre on Plant Biotechnology, Pusa Campus, New Delhi, 110012 India
| | - Parampreet Kaur
- ICAR- National Research Centre on Plant Biotechnology, Pusa Campus, New Delhi, 110012 India
| | - Deepti Nigam
- ICAR- National Research Centre on Plant Biotechnology, Pusa Campus, New Delhi, 110012 India
| | - Pavan K. Chaduvula
- ICAR- National Research Centre on Plant Biotechnology, Pusa Campus, New Delhi, 110012 India
| | - Sangita Yadav
- ICAR- IARI, Division of Seed Science and Technology, Pusa Campus, New Delhi, 110012 India
| | - Akshay Talukdar
- ICAR- IARI, Division of Genetics, Pusa Campus, New Delhi, India
| | - Nagendra Kumar Singh
- ICAR- National Research Centre on Plant Biotechnology, Pusa Campus, New Delhi, 110012 India
| | - Kishor Gaikwad
- ICAR- National Research Centre on Plant Biotechnology, Pusa Campus, New Delhi, 110012 India
| |
Collapse
|
16
|
Zang K, Li F, Ma Q. The dUTPase of white spot syndrome virus assembles its active sites in a noncanonical manner. J Biol Chem 2017; 293:1088-1099. [PMID: 29187596 DOI: 10.1074/jbc.m117.815266] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 11/14/2017] [Indexed: 01/04/2023] Open
Abstract
dUTPases are essential enzymes for maintaining genome integrity and have recently been shown to play moonlighting roles when containing extra sequences. Interestingly, the trimeric dUTPase of white spot syndrome virus (wDUT) harbors a sequence insert at the position preceding the C-terminal catalytic motif V (pre-V insert), rarely seen in other dUTPases. However, whether this extra sequence endows wDUT with additional properties is unknown. Herein, we present the crystal structures of wDUT in both ligand-free and ligand-bound forms. We observed that the pre-V insert in wDUT forms an unusual β-hairpin structure in the domain-swapping region and thereby facilitates a unique orientation of the adjacent C-terminal segment, positioning the catalytic motif V onto the active site of its own subunit instead of a third subunit. Consequently, wDUT employs two-subunit active sites, unlike the widely accepted paradigm that the active site of trimeric dUTPase is contributed by all three subunits. According to results from local structural comparisons, the active-site configuration of wDUT is similar to that of known dUTPases. However, we also found that residues in the second-shell region of the active site are reconfigured in wDUT as an adaption to its unique C-terminal orientation. We also show that deletion of the pre-V insert significantly reduces wDUT's enzymatic activity and thermal stability. We hypothesize that this rare structural arrangement confers additional functionality to wDUT. In conclusion, our study expands the structural diversity in the conserved dUTPase family and illustrates how sequence insertion and amino acid substitution drive protein evolution cooperatively.
Collapse
Affiliation(s)
- Kun Zang
- From the Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Nanhai Road 7, Qingdao 266071, China.,the Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China, and.,the University of Chinese Academy of Sciences, Beijing 100049, China
| | - Fuhua Li
- From the Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Nanhai Road 7, Qingdao 266071, China.,the Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China, and
| | - Qingjun Ma
- From the Key Laboratory of Experimental Marine Biology, Institute of Oceanology, Chinese Academy of Sciences, Nanhai Road 7, Qingdao 266071, China, .,the Laboratory for Marine Biology and Biotechnology, Qingdao National Laboratory for Marine Science and Technology, Qingdao 266237, China, and
| |
Collapse
|
17
|
Morelli A, Cabezas Y, Mills LJ, Seelig B. Extensive libraries of gene truncation variants generated by in vitro transposition. Nucleic Acids Res 2017; 45:e78. [PMID: 28130425 PMCID: PMC5449547 DOI: 10.1093/nar/gkx030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 01/20/2017] [Indexed: 11/14/2022] Open
Abstract
The detailed analysis of the impact of deletions on proteins or nucleic acids can reveal important functional regions and lead to variants with improved macromolecular properties. We present a method to generate large libraries of mutants with deletions of varying length that are randomly distributed throughout a given gene. This technique facilitates the identification of crucial sequence regions in nucleic acids or proteins. The approach utilizes in vitro transposition to generate 5΄ and 3΄ fragment sub-libraries of a given gene, which are then randomly recombined to yield a final library comprising both terminal and internal deletions. The method is easy to implement and can generate libraries in three to four days. We used this approach to produce a library of >9000 random deletion mutants of an artificial RNA ligase enzyme representing 32% of all possible deletions. The quality of the library was assessed by next-generation sequencing and detailed bioinformatics analysis. Finally, we subjected this library to in vitro selection and obtained fully functional variants with deletions of up to 18 amino acids of the parental enzyme that had been 95 amino acids in length.
Collapse
Affiliation(s)
- Aleardo Morelli
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.,BioTechnology Institute, University of Minnesota, St. Paul, MN 55108, USA
| | - Yari Cabezas
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.,BioTechnology Institute, University of Minnesota, St. Paul, MN 55108, USA
| | - Lauren J Mills
- Minnesota Supercomputing Institute, University of Minnesota, Minneapolis, MN 55455, USA
| | - Burckhard Seelig
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.,BioTechnology Institute, University of Minnesota, St. Paul, MN 55108, USA
| |
Collapse
|
18
|
Lin M, Whitmire S, Chen J, Farrel A, Shi X, Guo JT. Effects of short indels on protein structure and function in human genomes. Sci Rep 2017; 7:9313. [PMID: 28839204 PMCID: PMC5570956 DOI: 10.1038/s41598-017-09287-x] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 07/24/2017] [Indexed: 01/20/2023] Open
Abstract
Insertions and deletions (indels) represent the second most common type of genetic variations in human genomes. Indels can be deleterious and contribute to disease susceptibility as recent genome sequencing projects revealed a large number of indels in various cancer types. In this study, we investigated the possible effects of small coding indels on protein structure and function, and the baseline characteristics of indels in 2504 individuals of 26 populations from the 1000 Genomes Project. We found that each population has a distinct pattern in genes with small indels. Frameshift (FS) indels are enriched in olfactory receptor activity while non-frameshift (NFS) indels are enriched in transcription-related proteins. Structural analysis of NFS indels revealed that they predominantly adopt coil or disordered conformations, especially in proteins with transcription-related NFS indels. These results suggest that the annotated coding indels from the 1000 Genomes Project, while contributing to genetic variations and phenotypic diversity, generally do not affect the core protein structures and have no deleterious effect on essential biological processes. In addition, we found that a number of reference genome annotations might need to be updated due to the high prevalence of annotated homozygous indels in the general population.
Collapse
Affiliation(s)
- Maoxuan Lin
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Sarah Whitmire
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jing Chen
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Alvin Farrel
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
19
|
Jackson EL, Spielman SJ, Wilke CO. Computational prediction of the tolerance to amino-acid deletion in green-fluorescent protein. PLoS One 2017; 12:e0164905. [PMID: 28369116 PMCID: PMC5378326 DOI: 10.1371/journal.pone.0164905] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 03/21/2017] [Indexed: 01/29/2023] Open
Abstract
Proteins evolve through two primary mechanisms: substitution, where mutations alter a protein's amino-acid sequence, and insertions and deletions (indels), where amino acids are either added to or removed from the sequence. Protein structure has been shown to influence the rate at which substitutions accumulate across sites in proteins, but whether structure similarly constrains the occurrence of indels has not been rigorously studied. Here, we investigate the extent to which structural properties known to covary with protein evolutionary rates might also predict protein tolerance to indels. Specifically, we analyze a publicly available dataset of single-amino-acid deletion mutations in enhanced green fluorescent protein (eGFP) to assess how well the functional effect of deletions can be predicted from protein structure. We find that weighted contact number (WCN), which measures how densely packed a residue is within the protein's three-dimensional structure, provides the best single predictor for whether eGFP will tolerate a given deletion. We additionally find that using protein design to explicitly model deletions results in improved predictions of functional status when combined with other structural predictors. Our work suggests that structure plays fundamental role in constraining deletions at sites in proteins, and further that similar biophysical constraints influence both substitutions and deletions. This study therefore provides a solid foundation for future work to examine how protein structure influences tolerance of more complex indel events, such as insertions or large deletions.
Collapse
Affiliation(s)
- Eleisha L. Jackson
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Stephanie J. Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Claus O. Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- * E-mail:
| |
Collapse
|
20
|
Zhou K, Salamov A, Kuo A, Aerts AL, Kong X, Grigoriev IV. Alternative splicing acting as a bridge in evolution. Stem Cell Investig 2015; 2:19. [PMID: 27358887 DOI: 10.3978/j.issn.2306-9759.2015.10.01] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2015] [Accepted: 10/15/2015] [Indexed: 12/15/2022]
Abstract
BACKGROUND Alternative splicing (AS) regulates diverse cellular and developmental functions through alternative protein structures of different isoforms. Alternative exons dominate AS in vertebrates; however, very little is known about the extent and function of AS in lower eukaryotes. To understand the role of introns in gene evolution, we examined AS from a green algal and five fungal genomes using a novel EST-based gene-modeling algorithm (COMBEST). METHODS AS from each genome was classified with COMBEST that maps EST sequences to genomes to build gene models. Various aspects of AS were analyzed through statistical methods. The interplay of intron 3n length, phase, coding property, and intron retention (RI) were examined with Chi-square testing. RESULTS With 3 to 834 times EST coverage, we identified up to 73% of AS in intron-containing genes and found preponderance of RI among 11 types of AS. The number of exons, expression level, and maximum intron length correlated with number of AS per gene (NAG), and intron-rich genes suppressed AS. Genes with AS were more ancient, and AS was conserved among fungal genomes. Among stopless introns, non-retained introns (NRI) avoided, but major RI preferred 3n length. In contrast, stop-containing introns showed uniform distribution among 3n, 3n+1, and 3n+2 lengths. We found a clue to the intron phase enigma: it was the coding function of introns involved in AS that dictates the intron phase bias. CONCLUSIONS Majority of AS is non-functional, and the extent of AS is suppressed for intron-rich genes. RI through 3n length, stop codon, and phase bias bridges the transition from functionless to functional alternative isoforms.
Collapse
Affiliation(s)
- Kemin Zhou
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Asaf Salamov
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Alan Kuo
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Andrea L Aerts
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Xiangyang Kong
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| | - Igor V Grigoriev
- 1 US Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA ; 2 Roche Molecular Diagnostics, 4300 Hacienda Drive, Pleasanton, CA 94588, USA ; 3 Department of Clinical Medicine, Kunming University of Science and Technology, Kunming 650031, China
| |
Collapse
|
21
|
Surkont J, Diekmann Y, Ryder PV, Pereira-Leal JB. Coiled-coil length: Size does matter. Proteins 2015; 83:2162-9. [PMID: 26387794 DOI: 10.1002/prot.24932] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 08/23/2015] [Accepted: 09/14/2015] [Indexed: 11/09/2022]
Abstract
Protein evolution is governed by processes that alter primary sequence but also the length of proteins. Protein length may change in different ways, but insertions, deletions and duplications are the most common. An optimal protein size is a trade-off between sequence extension, which may change protein stability or lead to acquisition of a new function, and shrinkage that decreases metabolic cost of protein synthesis. Despite the general tendency for length conservation across orthologous proteins, the propensity to accept insertions and deletions is heterogeneous along the sequence. For example, protein regions rich in repetitive peptide motifs are well known to extensively vary their length across species. Here, we analyze length conservation of coiled-coils, domains formed by an ubiquitous, repetitive peptide motif present in all domains of life, that frequently plays a structural role in the cell. We observed that, despite the repetitive nature, the length of coiled-coil domains is generally highly conserved throughout the tree of life, even when the remaining parts of the protein change, including globular domains. Length conservation is independent of primary amino acid sequence variation, and represents a conservation of domain physical size. This suggests that the conservation of domain size is due to functional constraints.
Collapse
Affiliation(s)
| | - Yoan Diekmann
- Instituto Gulbenkian de Ciência, Oeiras, 2780-156, Portugal.,Physiology Course, Marine Biological Laboratory, Woods Hole, Massachusetts, 02543
| | - Pearl V Ryder
- Physiology Course, Marine Biological Laboratory, Woods Hole, Massachusetts, 02543.,Emory University School of Medicine, Atlanta, Georgia, 30322
| | - Jose B Pereira-Leal
- Instituto Gulbenkian de Ciência, Oeiras, 2780-156, Portugal.,Physiology Course, Marine Biological Laboratory, Woods Hole, Massachusetts, 02543
| |
Collapse
|
22
|
Wright ES. DECIPHER: harnessing local sequence context to improve protein multiple sequence alignment. BMC Bioinformatics 2015; 16:322. [PMID: 26445311 PMCID: PMC4595117 DOI: 10.1186/s12859-015-0749-z] [Citation(s) in RCA: 198] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Accepted: 09/23/2015] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Alignment of large and diverse sequence sets is a common task in biological investigations, yet there remains considerable room for improvement in alignment quality. Multiple sequence alignment programs tend to reach maximal accuracy when aligning only a few sequences, and then diminish steadily as more sequences are added. This drop in accuracy can be partly attributed to a build-up of error and ambiguity as more sequences are aligned. Most high-throughput sequence alignment algorithms do not use contextual information under the assumption that sites are independent. This study examines the extent to which local sequence context can be exploited to improve the quality of large multiple sequence alignments. RESULTS Two predictors based on local sequence context were assessed: (i) single sequence secondary structure predictions, and (ii) modulation of gap costs according to the surrounding residues. The results indicate that context-based predictors have appreciable information content that can be utilized to create more accurate alignments. Furthermore, local context becomes more informative as the number of sequences increases, enabling more accurate protein alignments of large empirical benchmarks. These discoveries became the basis for DECIPHER, a new context-aware program for sequence alignment, which outperformed other programs on large sequence sets. CONCLUSIONS Predicting secondary structure based on local sequence context is an efficient means of breaking the independence assumption in alignment. Since secondary structure is more conserved than primary sequence, it can be leveraged to improve the alignment of distantly related proteins. Moreover, secondary structure predictions increase in accuracy as more sequences are used in the prediction. This enables the scalable generation of large sequence alignments that maintain high accuracy even on diverse sequence sets. The DECIPHER R package and source code are freely available for download at DECIPHER.cee.wisc.edu and from the Bioconductor repository.
Collapse
Affiliation(s)
- Erik S Wright
- Department of Bacteriology, University of Wisconsin-Madison, Madison, WI, 53715, USA. .,Wisconsin Institute for Discovery, University of Wisconsin-Madison, 330 N. Orchard St., Madison, WI, 53715, USA.
| |
Collapse
|
23
|
Boschiero C, Gheyas AA, Ralph HK, Eory L, Paton B, Kuo R, Fulton J, Preisinger R, Kaiser P, Burt DW. Detection and characterization of small insertion and deletion genetic variants in modern layer chicken genomes. BMC Genomics 2015; 16:562. [PMID: 26227840 PMCID: PMC4563830 DOI: 10.1186/s12864-015-1711-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 06/22/2015] [Indexed: 01/17/2023] Open
Abstract
BACKGROUND Small insertions and deletions (InDels) constitute the second most abundant class of genetic variants and have been found to be associated with many traits and diseases. The present study reports on the detection and characterisation of about 883 K high quality InDels from the whole-genome analysis of several modern layer chicken lines from diverse breeds. RESULTS To reduce the error rates seen in InDel detection, this study used the consensus set from two InDel-calling packages: SAMtools and Dindel, as well as stringent post-filtering criteria. By analysing sequence data from 163 chickens from 11 commercial and 5 experimental layer lines, this study detected about 883 K high quality consensus InDels with 93% validation rate and an average density of 0.78 InDels/kb over the genome. Certain chromosomes, viz, GGAZ, 16, 22 and 25 showed very low densities of InDels whereas the highest rate was observed on GGA6. In spite of the higher recombination rates on microchromosomes, the InDel density on these chromosomes was generally lower relative to macrochromosomes possibly due to their higher gene density. About 43-87% of the InDels were found to be fixed within each line. The majority of detected InDels (86%) were 1-5 bases and about 63% were non-repetitive in nature while the rest were tandem repeats of various motif types. Functional annotation identified 613 frameshift, 465 non-frameshift and 10 stop-gain/loss InDels. Apart from the frameshift and stopgain/loss InDels that are expected to affect the translation of protein sequences and their biological activity, 33% of the non-frameshift were predicted as evolutionary intolerant with potential impact on protein functions. Moreover, about 2.5% of the InDels coincided with the most-conserved elements previously mapped on the chicken genome and are likely to define functional elements. InDels potentially affecting protein function were found to be enriched for certain gene-classes e.g. those associated with cell proliferation, chromosome and Golgi organization, spermatogenesis, and muscle contraction. CONCLUSIONS The large catalogue of InDels presented in this study along with their associated information such as functional annotation, estimated allele frequency, etc. are expected to serve as a rich resource for application in future research and breeding in the chicken.
Collapse
Affiliation(s)
- Clarissa Boschiero
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK. .,Current Address: Departamento de Zootecnia, University of Sao Paulo/ESALQ, Piracicaba, SP, 13418-900, Brazil.
| | - Almas A Gheyas
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Hannah K Ralph
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Lel Eory
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Bob Paton
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - Richard Kuo
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | | | | | - Pete Kaiser
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| | - David W Burt
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush Campus, Midlothian, EH25 9RG, UK.
| |
Collapse
|
24
|
Avelange-Macherel MH, Payet N, Lalanne D, Neveu M, Tolleter D, Burstin J, Macherel D. Variability within a pea core collection of LEAM and HSP22, two mitochondrial seed proteins involved in stress tolerance. PLANT, CELL & ENVIRONMENT 2015; 38:1299-311. [PMID: 25367071 DOI: 10.1111/pce.12480] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Revised: 10/17/2014] [Accepted: 10/21/2014] [Indexed: 05/10/2023]
Abstract
LEAM, a late embryogenesis abundant protein, and HSP22, a small heat shock protein, were shown to accumulate in the mitochondria during pea (Pisum sativum L.) seed development, where they are expected to contribute to desiccation tolerance. Here, their expression was examined in seeds of 89 pea genotypes by Western blot analysis. All genotypes expressed LEAM and HSP22 in similar amounts. In contrast with HSP22, LEAM displayed different isoforms according to apparent molecular mass. Each of the 89 genotypes harboured a single LEAM isoform. Genomic and RT-PCR analysis revealed four LEAM genes differing by a small variable indel in the coding region. These variations were consistent with the apparent molecular mass of each isoform. Indels, which occurred in repeated domains, did not alter the main properties of LEAM. Structural modelling indicated that the class A α-helix structure, which allows interactions with the mitochondrial inner membrane in the dry state, was preserved in all isoforms, suggesting functionality is maintained. The overall results point out the essential character of LEAM and HSP22 in pea seeds. LEAM variability is discussed in terms of pea breeding history as well as LEA gene evolution mechanisms.
Collapse
Affiliation(s)
| | - Nicole Payet
- INRA, UMR 1345 Institut de Recherche en Horticulture et Semences, Angers, F-49045, France
| | - David Lalanne
- INRA, UMR 1345 Institut de Recherche en Horticulture et Semences, Angers, F-49045, France
| | - Martine Neveu
- INRA, UMR 1345 Institut de Recherche en Horticulture et Semences, Angers, F-49045, France
| | - Dimitri Tolleter
- ANU College of Medicine, Biology and Environment, Acton, 2601, Australia
| | - Judith Burstin
- GEAPSI, INRA, UMR 1347 Agroécologie, centre de Dijon, F-21065, France
| | - David Macherel
- Université d'Angers, UMR 1345 Institut de Recherche en Horticulture et Semences, Angers, F-49045, France
| |
Collapse
|
25
|
Prakash A, Bateman A. Domain atrophy creates rare cases of functional partial protein domains. Genome Biol 2015; 16:88. [PMID: 25924720 PMCID: PMC4432964 DOI: 10.1186/s13059-015-0655-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 04/15/2015] [Indexed: 01/12/2023] Open
Abstract
BACKGROUND Protein domains display a range of structural diversity, with numerous additions and deletions of secondary structural elements between related domains. We have observed a small number of cases of surprising large-scale deletions of core elements of structural domains. We propose a new concept called domain atrophy, where protein domains lose a significant number of core structural elements. RESULTS Here, we implement a new pipeline to systematically identify new cases of domain atrophy across all known protein sequences. The output of this pipeline was carefully checked by hand, which filtered out partial domain instances that were unlikely to represent true domain atrophy due to misannotations or un-annotated sequence fragments. We identify 75 cases of domain atrophy, of which eight cases are found in a three-dimensional protein structure and 67 cases have been inferred based on mapping to a known homologous structure. Domains with structural variations include ancient folds such as the TIM-barrel and Rossmann folds. Most of these domains are observed to show structural loss that does not affect their functional sites. CONCLUSION Our analysis has significantly increased the known cases of domain atrophy. We discuss specific instances of domain atrophy and see that there has often been a compensatory mechanism that helps to maintain the stability of the partial domain. Our study indicates that although domain atrophy is an extremely rare phenomenon, protein domains under certain circumstances can tolerate extreme mutations giving rise to partial, but functional, domains.
Collapse
Affiliation(s)
- Ananth Prakash
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK.
| |
Collapse
|
26
|
Liu Y, Chen W, Ali T, Alkasir R, Yin J, Liu G, Han B. Staphylococcal enterotoxin H induced apoptosis of bovine mammary epithelial cells in vitro. Toxins (Basel) 2014; 6:3552-67. [PMID: 25533519 PMCID: PMC4280547 DOI: 10.3390/toxins6123552] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2014] [Revised: 12/09/2014] [Accepted: 12/15/2014] [Indexed: 02/01/2023] Open
Abstract
Staphylococcal enterotoxins (SEs) are powerful superantigenic toxins produced by Staphylococcus aureus (S. aureus). They can cause food poisoning and toxic shock. However, their impact on bovine mammary epithelial cells (bMECs) is still unknown. In this study, the distribution of SE genes was evaluated in 116 S. aureus isolates from bovine mastitis, and the most prevalent genes were seh (36.2%), followed by sei (12.1%), seg (11.2%), ser (4.3%), sec (3.4%), sea (2.6%) and sed (1.7%). To better understand the effect of staphylococcal enterotoxin H (SEH) on bMECs, the seh gene was cloned and inserted into the prokaryotic expression vector, pET28a, and transformed into Escherichia coli BL21 (DE3). The recombinant staphylococcal enterotoxin H (rSEH) was expressed and purified as soluble protein. Bioactivity analysis showed that rSEH possessed the activity of stimulating lymphocytes proliferation. The XTT assay showed that 100 μg/mL of rSEH produced the cytotoxic effect on bMECs, and fluorescence microscopy and flow cytometry analysis revealed that a certain dose of rSEH is effective at inducing bMECs apoptosis in vitro. This indicates that SEs can directly lead to cellular apoptosis of bMECs in bovine mastitis associated with S. aureus.
Collapse
Affiliation(s)
- Yongxia Liu
- Department of Clinical Medicine, College of Veterinary Medicine, China Agricultural University, Yuan Ming Yuan West Road No. 2, Haidian District, Beijing 100193, China.
| | - Wei Chen
- Department of Clinical Medicine, College of Veterinary Medicine, China Agricultural University, Yuan Ming Yuan West Road No. 2, Haidian District, Beijing 100193, China.
| | - Tariq Ali
- Department of Clinical Medicine, College of Veterinary Medicine, China Agricultural University, Yuan Ming Yuan West Road No. 2, Haidian District, Beijing 100193, China.
| | - Rashad Alkasir
- Department of Clinical Medicine, College of Veterinary Medicine, China Agricultural University, Yuan Ming Yuan West Road No. 2, Haidian District, Beijing 100193, China.
| | - Jinhua Yin
- Department of Clinical Medicine, College of Veterinary Medicine, China Agricultural University, Yuan Ming Yuan West Road No. 2, Haidian District, Beijing 100193, China.
| | - Gang Liu
- Department of Clinical Medicine, College of Veterinary Medicine, China Agricultural University, Yuan Ming Yuan West Road No. 2, Haidian District, Beijing 100193, China.
| | - Bo Han
- Department of Clinical Medicine, College of Veterinary Medicine, China Agricultural University, Yuan Ming Yuan West Road No. 2, Haidian District, Beijing 100193, China.
| |
Collapse
|
27
|
Bowers PM, Verdino P, Wang Z, da Silva Correia J, Chhoa M, Macondray G, Do M, Neben TY, Horlick RA, Stanfield RL, Wilson IA, King DJ. Nucleotide insertions and deletions complement point mutations to massively expand the diversity created by somatic hypermutation of antibodies. J Biol Chem 2014; 289:33557-67. [PMID: 25320089 DOI: 10.1074/jbc.m114.607176] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
During somatic hypermutation (SHM), deamination of cytidine by activation-induced cytidine deaminase and subsequent DNA repair generates mutations within immunoglobulin V-regions. Nucleotide insertions and deletions (indels) have recently been shown to be critical for the evolution of antibody binding. Affinity maturation of 53 antibodies using in vitro SHM in a non-B cell context was compared with mutation patterns observed for SHM in vivo. The origin and frequency of indels seen during in vitro maturation were similar to that in vivo. Indels are localized to CDRs, and secondary mutations within insertions further optimize antigen binding. Structural determination of an antibody matured in vitro and comparison with human-derived antibodies containing insertions reveal conserved patterns of antibody maturation. These findings indicate that activation-induced cytidine deaminase acting on V-region sequences is sufficient to initiate authentic formation of indels in vitro and in vivo and that point mutations, indel formation, and clonal selection form a robust tripartite system for antibody evolution.
Collapse
Affiliation(s)
| | - Petra Verdino
- From Anaptysbio Inc., San Diego, California 92121 and
| | | | | | - Mark Chhoa
- From Anaptysbio Inc., San Diego, California 92121 and
| | | | - Minjee Do
- From Anaptysbio Inc., San Diego, California 92121 and
| | | | | | - Robyn L Stanfield
- the Department of Integrative Structural and Computational Molecular Biology and Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, California 92037
| | - Ian A Wilson
- the Department of Integrative Structural and Computational Molecular Biology and Skaggs Institute for Chemical Biology, The Scripps Research Institute, La Jolla, California 92037
| | - David J King
- From Anaptysbio Inc., San Diego, California 92121 and
| |
Collapse
|
28
|
Light S, Sagit R, Sachenkova O, Ekman D, Elofsson A. Protein Expansion Is Primarily due to Indels in Intrinsically Disordered Regions. Mol Biol Evol 2013; 30:2645-53. [DOI: 10.1093/molbev/mst157] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
|
29
|
Ajawatanawong P, Baldauf SL. Evolution of protein indels in plants, animals and fungi. BMC Evol Biol 2013; 13:140. [PMID: 23826714 PMCID: PMC3706215 DOI: 10.1186/1471-2148-13-140] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Accepted: 06/24/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Insertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes. RESULTS Comparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold. CONCLUSIONS We find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.
Collapse
Affiliation(s)
- Pravech Ajawatanawong
- Department of Systematic Biology, Evolutionary Biology Centre (EBC), Uppsala University, Uppsala 75236, Sweden.
| | | |
Collapse
|
30
|
Long indels are disordered: a study of disorder and indels in homologous eukaryotic proteins. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:890-7. [PMID: 23333420 DOI: 10.1016/j.bbapap.2013.01.002] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2012] [Revised: 12/30/2012] [Accepted: 01/03/2013] [Indexed: 11/21/2022]
Abstract
Proteins evolve through point mutations as well as by insertions and deletions (indels). During the last decade it has become apparent that protein regions that do not fold into three-dimensional structures, i.e. intrinsically disordered regions, are quite common. Here, we have studied the relationship between protein disorder and indels using HMM-HMM pairwise alignments in two sets of orthologous eukaryotic protein pairs. First, we show that disordered residues are much more frequent among indel residues than among aligned residues and, also are more prevalent among indels than in coils. Second, we observed that disordered residues are particularly common in longer indels. Disordered indels of short-to-medium size are prevalent in the non-terminal regions of proteins while the longest indels, ordered and disordered alike, occur toward the termini of the proteins where new structural units are comparatively well tolerated. Finally, while disordered regions often evolve faster than ordered regions and disorder is common in indels, there are some previously recognized protein families where the disordered region is more conserved than the ordered region. We find that these rare proteins are often involved in information processes, such as RNA processing and translation. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.
Collapse
|
31
|
Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013; 449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions (‘decorations’ at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure–function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.
Collapse
|
32
|
Joseph AP, Valadié H, Srinivasan N, de Brevern AG. Local structural differences in homologous proteins: specificities in different SCOP classes. PLoS One 2012; 7:e38805. [PMID: 22745680 PMCID: PMC3382195 DOI: 10.1371/journal.pone.0038805] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 05/10/2012] [Indexed: 11/19/2022] Open
Abstract
The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMR 665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Hélène Valadié
- INSERM UMR-S 726, DSIMB, Université Paris Diderot - Paris 7, Paris, France
| | | | - Alexandre G. de Brevern
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMR 665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
- * E-mail:
| |
Collapse
|
33
|
De Ingeniis J, Kazanov MD, Shatalin K, Gelfand MS, Osterman AL, Sorci L. Glutamine versus ammonia utilization in the NAD synthetase family. PLoS One 2012; 7:e39115. [PMID: 22720044 PMCID: PMC3376133 DOI: 10.1371/journal.pone.0039115] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Accepted: 05/16/2012] [Indexed: 11/18/2022] Open
Abstract
NAD is a ubiquitous and essential metabolic redox cofactor which also functions as a substrate in certain regulatory pathways. The last step of NAD synthesis is the ATP-dependent amidation of deamido-NAD by NAD synthetase (NADS). Members of the NADS family are present in nearly all species across the three kingdoms of Life. In eukaryotic NADS, the core synthetase domain is fused with a nitrilase-like glutaminase domain supplying ammonia for the reaction. This two-domain NADS arrangement enabling the utilization of glutamine as nitrogen donor is also present in various bacterial lineages. However, many other bacterial members of NADS family do not contain a glutaminase domain, and they can utilize only ammonia (but not glutamine) in vitro. A single-domain NADS is also characteristic for nearly all Archaea, and its dependence on ammonia was demonstrated here for the representative enzyme from Methanocaldococcus jannaschi. However, a question about the actual in vivo nitrogen donor for single-domain members of the NADS family remained open: Is it glutamine hydrolyzed by a committed (but yet unknown) glutaminase subunit, as in most ATP-dependent amidotransferases, or free ammonia as in glutamine synthetase? Here we addressed this dilemma by combining evolutionary analysis of the NADS family with experimental characterization of two representative bacterial systems: a two-subunit NADS from Thermus thermophilus and a single-domain NADS from Salmonella typhimurium providing evidence that ammonia (and not glutamine) is the physiological substrate of a typical single-domain NADS. The latter represents the most likely ancestral form of NADS. The ability to utilize glutamine appears to have evolved via recruitment of a glutaminase subunit followed by domain fusion in an early branch of Bacteria. Further evolution of the NADS family included lineage-specific loss of one of the two alternative forms and horizontal gene transfer events. Lastly, we identified NADS structural elements associated with glutamine-utilizing capabilities.
Collapse
Affiliation(s)
- Jessica De Ingeniis
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
| | - Marat D. Kazanov
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
| | - Konstantin Shatalin
- Department of Biochemistry, New York University School of Medicine, New York, United States of America
| | - Mikhail S. Gelfand
- A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia
- Faculty of Bioengineering and Bioinformatics, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Andrei L. Osterman
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- * E-mail: (LS); (ALO)
| | - Leonardo Sorci
- Sanford-Burnham Medical Research Institute, La Jolla, California, United States of America
- Department of Clinical Sciences, Section of Biochemistry, Polytechnic University of Marche, Ancona, Italy
- * E-mail: (LS); (ALO)
| |
Collapse
|
34
|
Cadag E, Vitalis E, Lennox KP, Zhou CLE, Zemla AT. Computational analysis of pathogen-borne metallo β-lactamases reveals discriminating structural features between B1 types. BMC Res Notes 2012; 5:96. [PMID: 22333139 PMCID: PMC3293060 DOI: 10.1186/1756-0500-5-96] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2011] [Accepted: 02/14/2012] [Indexed: 01/25/2023] Open
Abstract
Background Genes conferring antibiotic resistance to groups of bacterial pathogens are cause for considerable concern, as many once-reliable antibiotics continue to see a reduction in efficacy. The recent discovery of the metallo β-lactamase blaNDM-1 gene, which appears to grant antibiotic resistance to a variety of Enterobacteriaceae via a mobile plasmid, is one example of this distressing trend. The following work describes a computational analysis of pathogen-borne MBLs that focuses on the structural aspects of characterized proteins. Results Using both sequence and structural analyses, we examine residues and structural features specific to various pathogen-borne MBL types. This analysis identifies a linker region within MBL-like folds that may act as a discriminating structural feature between these proteins, and specifically resistance-associated acquirable MBLs. Recently released crystal structures of the newly emerged NDM-1 protein were aligned against related MBL structures using a variety of global and local structural alignment methods, and the overall fold conformation is examined for structural conservation. Conservation appears to be present in most areas of the protein, yet is strikingly absent within a linker region, making NDM-1 unique with respect to a linker-based classification scheme. Variability analysis of the NDM-1 crystal structure highlights unique residues in key regions as well as identifying several characteristics shared with other transferable MBLs. Conclusions A discriminating linker region identified in MBL proteins is highlighted and examined in the context of NDM-1 and primarily three other MBL types: IMP-1, VIM-2 and ccrA. The presence of an unusual linker region variant and uncommon amino acid composition at specific structurally important sites may help to explain the unusually broad kinetic profile of NDM-1 and may aid in directing research attention to areas of this protein, and possibly other MBLs, that may be targeted for inactivation or attenuation of enzymatic activity.
Collapse
Affiliation(s)
- Eithon Cadag
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, 94550 CA, USA.
| | | | | | | | | |
Collapse
|
35
|
Lin WH, Kussell E. Evolutionary pressures on simple sequence repeats in prokaryotic coding regions. Nucleic Acids Res 2011; 40:2399-413. [PMID: 22123746 PMCID: PMC3315296 DOI: 10.1093/nar/gkr1078] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Simple sequence repeats (SSRs) are indel mutational hotspots in genomes. In prokaryotes, SSR loci can cause phase variation, a microbial survival strategy that relies on stochastic, reversible on–off switching of gene activity. By analyzing multiple strains of 42 fully sequenced prokaryotic species, we measure the relative variability and density distribution of SSRs in coding regions. We demonstrate that repeat type strongly influences indel mutation rates, and that the most mutable types are most strongly avoided across genomes. We thoroughly characterize SSR density and variability as a function of N→C position along protein sequences. Using codon-shuffling algorithms that preserve amino acid sequence, we assess evolutionary pressures on SSRs. We find that coding sequences suppress repeats in the middle of proteins, and enrich repeats near termini, yielding U-shaped SSR density curves. We show that for many species this characteristic shape can be attributed to purely biophysical constraints of protein structure. In multiple cases, however, particularly in certain pathogenic bacteria, we observe over enrichment of SSRs near protein N-termini significantly beyond expectation based on structural constraints. This increases the probability that frameshifts result in non-functional proteins, revealing that these species may evolutionarily tune SSR positions in coding regions to facilitate phase variation.
Collapse
Affiliation(s)
- Wei-Hsiang Lin
- Center for Genomics and Systems Biology, Department of Biology and Department of Physics, New York University, New York, NY 10003, USA
| | - Edo Kussell
- Center for Genomics and Systems Biology, Department of Biology and Department of Physics, New York University, New York, NY 10003, USA
- *To whom correspondence should be addressed. Tel: +1 212 998 7663;
| |
Collapse
|
36
|
Li T, Bonkovsky HL, Guo JT. Structural analysis of heme proteins: implications for design and prediction. BMC STRUCTURAL BIOLOGY 2011; 11:13. [PMID: 21371326 PMCID: PMC3059290 DOI: 10.1186/1472-6807-11-13] [Citation(s) in RCA: 102] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2010] [Accepted: 03/03/2011] [Indexed: 11/10/2022]
Abstract
BACKGROUND Heme is an essential molecule and plays vital roles in many biological processes. The structural determination of a large number of heme proteins has made it possible to study the detailed chemical and structural properties of heme binding environment. Knowledge of these characteristics can provide valuable guidelines in the design of novel heme proteins and help us predict unknown heme binding proteins. RESULTS In this paper, we constructed a non-redundant dataset of 125 heme-binding protein chains and found that these heme proteins encompass at least 31 different structural folds with all-α class as the dominating scaffold. Heme binding pockets are enriched in aromatic and non-polar amino acids with fewer charged residues. The differences between apo and holo forms of heme proteins in terms of the structure and the binding pockets have been investigated. In most cases the proteins undergo small conformational changes upon heme binding. We also examined the CP (cysteine-proline) heme regulatory motifs and demonstrated that the conserved dipeptide has structural implications in protein-heme interactions. CONCLUSIONS Our analysis revealed that heme binding pockets show special features and that most of the heme proteins undergo small conformational changes after heme binding, suggesting the apo structures can be used for structure-based heme protein prediction and as scaffolds for future heme protein design.
Collapse
Affiliation(s)
- Ting Li
- Cannon Research Center, Carolinas Medical Center, Charlotte, NC 28203, USA
| | | | | |
Collapse
|