1
|
Tenthorey JL, del Banco S, Ramzan I, Klingenberg H, Liu C, Emerman M, Malik HS. Indels allow antiviral proteins to evolve functional novelty inaccessible by missense mutations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.07.592993. [PMID: 38765965 PMCID: PMC11100679 DOI: 10.1101/2024.05.07.592993] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/22/2024]
Abstract
Antiviral proteins often evolve rapidly at virus-binding interfaces to defend against new viruses. We investigated whether antiviral adaptation via missense mutations might face limits, which insertion or deletion mutations (indels) could overcome. We report one such case of a nearly insurmountable evolutionary challenge: the human anti-retroviral protein TRIM5α requires more than five missense mutations in its specificity-determining v1 loop to restrict a divergent simian immunodeficiency virus (SIV). However, duplicating just one amino acid in v1 enables human TRIM5α to potently restrict SIV in a single evolutionary step. Moreover, natural primate TRIM5α v1 loops have evolved indels that confer novel antiviral specificities. Thus, indels enable antiviral proteins to overcome viral challenges inaccessible by missense mutations, revealing the potential of these often-overlooked mutations in driving protein innovation.
Collapse
Affiliation(s)
- Jeannette L. Tenthorey
- Cellular and Molecular Pharmacology Department, University of California, San Francisco; San Francisco, 94158, USA
| | - Serena del Banco
- Division of Basic Sciences, Fred Hutchinson Cancer Center; Seattle, USA
| | - Ishrak Ramzan
- Cellular and Molecular Pharmacology Department, University of California, San Francisco; San Francisco, 94158, USA
| | - Hayley Klingenberg
- Cellular and Molecular Pharmacology Department, University of California, San Francisco; San Francisco, 94158, USA
| | - Chang Liu
- Cellular and Molecular Pharmacology Department, University of California, San Francisco; San Francisco, 94158, USA
| | - Michael Emerman
- Division of Basic Sciences, Fred Hutchinson Cancer Center; Seattle, USA
- Division of Human Biology, Fred Hutchinson Cancer Center; Seattle, USA
| | - Harmit S. Malik
- Division of Basic Sciences, Fred Hutchinson Cancer Center; Seattle, USA
- Howard Hughes Medical Investigator, Fred Hutchinson Cancer Center; Seattle, USA
| |
Collapse
|
2
|
Yang Y, Braga MV, Dean MD. Insertion-Deletion Events Are Depleted in Protein Regions with Predicted Secondary Structure. Genome Biol Evol 2024; 16:evae093. [PMID: 38735759 PMCID: PMC11102076 DOI: 10.1093/gbe/evae093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Revised: 04/16/2024] [Accepted: 04/21/2024] [Indexed: 05/14/2024] Open
Abstract
A fundamental goal in evolutionary biology and population genetics is to understand how selection shapes the fate of new mutations. Here, we test the null hypothesis that insertion-deletion (indel) events in protein-coding regions occur randomly with respect to secondary structures. We identified indels across 11,444 sequence alignments in mouse, rat, human, chimp, and dog genomes and then quantified their overlap with four different types of secondary structure-alpha helices, beta strands, protein bends, and protein turns-predicted by deep-learning methods of AlphaFold2. Indels overlapped secondary structures 54% as much as expected and were especially underrepresented over beta strands, which tend to form internal, stable regions of proteins. In contrast, indels were enriched by 155% over regions without any predicted secondary structures. These skews were stronger in the rodent lineages compared to the primate lineages, consistent with population genetic theory predicting that natural selection will be more efficient in species with larger effective population sizes. Nonsynonymous substitutions were also less common in regions of protein secondary structure, although not as strongly reduced as in indels. In a complementary analysis of thousands of human genomes, we showed that indels overlapping secondary structure segregated at significantly lower frequency than indels outside of secondary structure. Taken together, our study shows that indels are selected against if they overlap secondary structure, presumably because they disrupt the tertiary structure and function of a protein.
Collapse
Affiliation(s)
- Yi Yang
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew V Braga
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Matthew D Dean
- Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
3
|
Boumajdi N, Bendani H, Kartti S, Alouane T, Belyamani L, Ibrahimi A. A Comprehensive Analysis of 3 Moroccan Genomes Revealed Contributions From Both African and European Ancestries. Evol Bioinform Online 2024; 20:11769343241229278. [PMID: 38327511 PMCID: PMC10848790 DOI: 10.1177/11769343241229278] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Accepted: 01/12/2024] [Indexed: 02/09/2024] Open
Abstract
Genetic variations in the human genome represent the differences in DNA sequence within individuals. This highlights the important role of whole human genome sequencing which has become the keystone for precision medicine and disease prediction. Morocco is an important hub for studying human population migration and mixing history. This study presents the analysis of 3 Moroccan genomes; the variant analysis revealed 6 379 606 single nucleotide variants (SNVs) and 1 050 577 small InDels. Of those identified SNVs, 219 152 were novel, with 1233 occurring in coding regions, and 5580 non-synonymous single nucleotide variants (nsSNP) variants were predicted to affect protein functions. The InDels produced 1055 coding variants and 454 non-3n length variants, and their size ranged from -49 and 49 bp. We further analysed the gene pathways of 8 novel coding variants found in the 3 genomes and revealed 5 genes involved in various diseases and biological pathways. We found that the Moroccan genomes share 92.78% of African ancestry, and 92.86% of Non-Finnish European ancestry, according to the gnomAD database. Then, population structure inference, by admixture analysis and network-based approach, revealed that the studied genomes form a mixed population structure, highlighting the increased genetic diversity in Morocco.
Collapse
Affiliation(s)
- Nasma Boumajdi
- Laboratory of Biotechnology, Medical and Pharmacy School, Mohammed V University, Rabat, Morocco
- Mohammed VI Center for Research & Innovation (CM6), Rabat, Morocco
| | - Houda Bendani
- Laboratory of Biotechnology, Medical and Pharmacy School, Mohammed V University, Rabat, Morocco
- Mohammed VI Center for Research & Innovation (CM6), Rabat, Morocco
| | - Souad Kartti
- Laboratory of Biotechnology, Medical and Pharmacy School, Mohammed V University, Rabat, Morocco
- Mohammed VI Center for Research & Innovation (CM6), Rabat, Morocco
| | - Tarek Alouane
- Laboratory of Biotechnology, Medical and Pharmacy School, Mohammed V University, Rabat, Morocco
| | - Lahcen Belyamani
- Mohammed VI Center for Research & Innovation (CM6), Rabat, Morocco
- Mohammed VI University of Health Sciences (UM6SS), Casablanca, Morocco
- Emergency Department, Military Hospital Mohammed V, Rabat Medical and Pharmacy School, Mohammed V University, Rabat, Morocco
| | - Azeddine Ibrahimi
- Laboratory of Biotechnology, Medical and Pharmacy School, Mohammed V University, Rabat, Morocco
- Mohammed VI Center for Research & Innovation (CM6), Rabat, Morocco
- Mohammed VI University of Health Sciences (UM6SS), Casablanca, Morocco
| |
Collapse
|
4
|
Struck TH, Golombek A, Hoesel C, Dimitrov D, Elgetany AH. Mitochondrial Genome Evolution in Annelida-A Systematic Study on Conservative and Variable Gene Orders and the Factors Influencing its Evolution. Syst Biol 2023; 72:925-945. [PMID: 37083277 PMCID: PMC10405356 DOI: 10.1093/sysbio/syad023] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2022] [Revised: 04/15/2023] [Accepted: 04/18/2023] [Indexed: 04/22/2023] Open
Abstract
The mitochondrial genomes of Bilateria are relatively conserved in their protein-coding, rRNA, and tRNA gene complement, but the order of these genes can range from very conserved to very variable depending on the taxon. The supposedly conserved gene order of Annelida has been used to support the placement of some taxa within Annelida. Recently, authors have cast doubts on the conserved nature of the annelid gene order. Various factors may influence gene order variability including, among others, increased substitution rates, base composition differences, structure of noncoding regions, parasitism, living in extreme habitats, short generation times, and biomineralization. However, these analyses were neither done systematically nor based on well-established reference trees. Several focused on only a few of these factors and biological factors were usually explored ad-hoc without rigorous testing or correlation analyses. Herein, we investigated the variability and evolution of the annelid gene order and the factors that potentially influenced its evolution, using a comprehensive and systematic approach. The analyses were based on 170 genomes, including 33 previously unrepresented species. Our analyses included 706 different molecular properties, 20 life-history and ecological traits, and a reference tree corresponding to recent improvements concerning the annelid tree. The results showed that the gene order with and without tRNAs is generally conserved. However, individual taxa exhibit higher degrees of variability. None of the analyzed life-history and ecological traits explained the observed variability across mitochondrial gene orders. In contrast, the combination and interaction of the best-predicting factors for substitution rate and base composition explained up to 30% of the observed variability. Accordingly, correlation analyses of different molecular properties of the mitochondrial genomes showed an intricate network of direct and indirect correlations between the different molecular factors. Hence, gene order evolution seems to be driven by molecular evolutionary aspects rather than by life history or ecology. On the other hand, variability of the gene order does not predict if a taxon is difficult to place in molecular phylogenetic reconstructions using sequence data or not. We also discuss the molecular properties of annelid mitochondrial genomes considering canonical views on gene evolution and potential reasons why the canonical views do not always fit to the observed patterns without making some adjustments. [Annelida; compositional biases; ecology; gene order; life history; macroevolution; mitochondrial genomes; substitution rates.].
Collapse
Affiliation(s)
- Torsten H Struck
- Natural History Museum, University of Oslo, P.O. Box 1172, Blindern, 0318 Oslo, Norway
- Centre of Molecular Biodiversity Research, Zoological Research Museum Alexander KoenigBonn 53113, Germany
- FB05 Biology/Chemistry; University of Osnabrück, Osnabrück 49069, Germany
| | - Anja Golombek
- Centre of Molecular Biodiversity Research, Zoological Research Museum Alexander KoenigBonn 53113, Germany
- FB05 Biology/Chemistry; University of Osnabrück, Osnabrück 49069, Germany
| | - Christoph Hoesel
- FB05 Biology/Chemistry; University of Osnabrück, Osnabrück 49069, Germany
| | - Dimitar Dimitrov
- Department of Natural History, University Museum of Bergen, University of Bergen, P.O. Box 7800, 5020 Bergen, Norway
| | - Asmaa Haris Elgetany
- Natural History Museum, University of Oslo, P.O. Box 1172, Blindern, 0318 Oslo, Norway
- Zoology Department, Faculty of Science, Damietta University, New Damietta, Central zone, 34517, Egypt
| |
Collapse
|
5
|
Miton CM, Tokuriki N. Insertions and Deletions (Indels): A Missing Piece of the Protein Engineering Jigsaw. Biochemistry 2023; 62:148-157. [PMID: 35830609 DOI: 10.1021/acs.biochem.2c00188] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Over the years, protein engineers have studied nature and borrowed its tricks to accelerate protein evolution in the test tube. While there have been considerable advances, our ability to generate new proteins in the laboratory is seemingly limited. One explanation for these shortcomings may be that insertions and deletions (indels), which frequently arise in nature, are largely overlooked during protein engineering campaigns. The profound effect of indels on protein structures, by way of drastic backbone alterations, could be perceived as "saltation" events that bring about significant phenotypic changes in a single mutational step. Should we leverage these effects to accelerate protein engineering and gain access to unexplored regions of adaptive landscapes? In this Perspective, we describe the role played by indels in the functional diversification of proteins in nature and discuss their untapped potential for protein engineering, despite their often-destabilizing nature. We hope to spark a renewed interest in indels, emphasizing that their wider study and use may prove insightful and shape the future of protein engineering by unlocking unique functional changes that substitutions alone could never achieve.
Collapse
Affiliation(s)
- Charlotte M Miton
- Michael Smith Laboratories, University of British Columbia, Vancouver, V6T 1Z4 BC, Canada
| | - Nobuhiko Tokuriki
- Michael Smith Laboratories, University of British Columbia, Vancouver, V6T 1Z4 BC, Canada
| |
Collapse
|
6
|
Mohammadi S, Özdemir Hİ, Ozbek P, Sumbul F, Stiller J, Deng Y, Crawford AJ, Rowland HM, Storz JF, Andolfatto P, Dobler S. Epistatic Effects Between Amino Acid Insertions and Substitutions Mediate Toxin resistance of Vertebrate Na+,K+-ATPases. Mol Biol Evol 2022; 39:6874786. [PMID: 36472530 PMCID: PMC9778839 DOI: 10.1093/molbev/msac258] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 11/10/2022] [Accepted: 11/14/2022] [Indexed: 12/13/2022] Open
Abstract
The recurrent evolution of resistance to cardiotonic steroids (CTS) across diverse animals most frequently involves convergent amino acid substitutions in the H1-H2 extracellular loop of Na+,K+-ATPase (NKA). Previous work revealed that hystricognath rodents (e.g., chinchilla) and pterocliform birds (sandgrouse) have convergently evolved amino acid insertions in the H1-H2 loop, but their functional significance was not known. Using protein engineering, we show that these insertions have distinct effects on CTS resistance in homologs of each of the two species that strongly depend on intramolecular interactions with other residues. Removing the insertion in the chinchilla NKA unexpectedly increases CTS resistance and decreases NKA activity. In the sandgrouse NKA, the amino acid insertion and substitution Q111R both contribute to an augmented CTS resistance without compromising ATPase activity levels. Molecular docking simulations provide additional insight into the biophysical mechanisms responsible for the context-specific mutational effects on CTS insensitivity of the enzyme. Our results highlight the diversity of genetic substrates that underlie CTS insensitivity in vertebrate NKA and reveal how amino acid insertions can alter the phenotypic effects of point mutations at key sites in the same protein domain.
Collapse
Affiliation(s)
- Shabnam Mohammadi
- Molecular Evolutionary Biology, Institute of Cell and Systems Biology of Animals, Universität Hamburg, Hamburg 20146, Germany.,Max Planck Institute for Chemical Ecology, Research Group Predators and Toxic Prey, Jena 07745, Germany
| | | | - Pemra Ozbek
- Department of Bioengineering, Marmara University, Göztepe, İstanbul 34722, Turkey
| | - Fidan Sumbul
- INSERM, Aix-Marseille Université, Inserm, CNRS, Marseille 13009, France
| | - Josefin Stiller
- Villum Centre for Biodiversity Genomics, University of Copenhagen, Copenhagen 2100, Denmark
| | - Yuan Deng
- Villum Centre for Biodiversity Genomics, University of Copenhagen, Copenhagen 2100, Denmark.,BGI-Shenzhen, Shenzhen 518083, China
| | - Andrew J Crawford
- Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia
| | - Hannah M Rowland
- Max Planck Institute for Chemical Ecology, Research Group Predators and Toxic Prey, Jena 07745, Germany
| | - Jay F Storz
- School of Biological Sciences, University of Nebraska, Lincoln, NE
| | - Peter Andolfatto
- Department of Biological Sciences, Columbia University, New York, NY
| | - Susanne Dobler
- Molecular Evolutionary Biology, Institute of Cell and Systems Biology of Animals, Universität Hamburg, Hamburg 20146, Germany
| |
Collapse
|
7
|
Batista RL, Mendonca BB. The Molecular Basis of 5α-Reductase Type 2 Deficiency. Sex Dev 2022; 16:171-183. [PMID: 35793650 DOI: 10.1159/000525119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/13/2022] [Indexed: 11/19/2022] Open
Abstract
The 5α-reductase type 2 enzyme catalyzes the conversion of testosterone into dihydrotestosterone, playing a crucial role in male development. This enzyme is encoded by the SRD5A2 gene, which maps to chromosome 2 (2p23), consists of 5 exons and 4 introns, and encodes a 254 amino acid protein. Disruptions in this gene are the molecular etiology of a subgroup of differences of sex development (DSD) in 46,XY patients. Affected individuals present a large range of external genitalia undervirilization, ranging from almost typically female external genitalia to predominantly typically male external genitalia with minimal undervirilization, including isolated micropenis. This is an updated review of the implication of the SRD5A2 gene in 5α-reductase type 2 enzyme deficiency. For that, we identified 451 cases from 48 countries of this particular 46,XY DSD from the literature with reported variants in the SRD5A2 gene. Herein, we present the SRD5A2 mutational profile, the SRD5A2 polymorphisms, and the functional studies related to SRD5A2 variants to detail the molecular etiology of this condition.
Collapse
Affiliation(s)
- Rafael L Batista
- Unidade de Endocrinologia do Desenvolvimento, Laboratório de Hormônios e Genética Molecular/LIM42, Hospital das Clínicas, Disciplina de Endocrinologia, do Departamento de Clínica Médica, Faculdade de Medicina da Universidade de São Paulo, São Paulo, Brazil.,Endocrine Oncology Unit, Instituto do Câncer do Estado de São Paulo, ICESP, Faculdade de Medicina da Universidade de São Paulo, São Paulo, Brazil
| | - Berenice B Mendonca
- Unidade de Endocrinologia do Desenvolvimento, Laboratório de Hormônios e Genética Molecular/LIM42, Hospital das Clínicas, Disciplina de Endocrinologia, do Departamento de Clínica Médica, Faculdade de Medicina da Universidade de São Paulo, São Paulo, Brazil
| |
Collapse
|
8
|
Using the Evolutionary History of Proteins to Engineer Insertion-Deletion Mutants from Robust, Ancestral Templates Using Graphical Representation of Ancestral Sequence Predictions (GRASP). METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2397:85-110. [PMID: 34813061 DOI: 10.1007/978-1-0716-1826-4_6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Analyzing the natural evolution of proteins by ancestral sequence reconstruction (ASR) can provide valuable information about the changes in sequence and structure that drive the development of novel protein functions. However, ASR has also been used as a protein engineering tool, as it often generates thermostable proteins which can serve as robust and evolvable templates for enzyme engineering. Importantly, ASR has the potential to provide an insight into the history of insertions and deletions that have occurred in the evolution of a protein family. Indels are strongly associated with functional change during enzyme evolution and represent a largely unexplored source of genetic diversity for designing proteins with novel or improved properties. Current ASR methods differ in the way they handle indels; inclusion or exclusion of indels is often managed subjectively, based on assumptions the user makes about the likelihood of each recombination event, yet most currently available ASR tools provide limited, if any, opportunities for evaluating indel placement in a reconstructed sequence. Graphical Representation of Ancestral Sequence Predictions (GRASP) is an ASR tool that maps indel evolution throughout a reconstruction and enables the evaluation of indel variants. This chapter provides a general protocol for performing a reconstruction using GRASP and using the results to create indel variants. The method addresses protein template selection, sequence curation, alignment refinement, tree building, ancestor reconstruction, evaluation of indel variants and approaches to library development.
Collapse
|
9
|
Chen J, Guo JT. Structural and functional analysis of somatic coding and UTR indels in breast and lung cancer genomes. Sci Rep 2021; 11:21178. [PMID: 34707120 PMCID: PMC8551294 DOI: 10.1038/s41598-021-00583-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 10/14/2021] [Indexed: 11/24/2022] Open
Abstract
Insertions and deletions (Indels) represent one of the major variation types in the human genome and have been implicated in diseases including cancer. To study the features of somatic indels in different cancer genomes, we investigated the indels from two large samples of cancer types: invasive breast carcinoma (BRCA) and lung adenocarcinoma (LUAD). Besides mapping somatic indels in both coding and untranslated regions (UTRs) from the cancer whole exome sequences, we investigated the overlap between these indels and transcription factor binding sites (TFBSs), the key elements for regulation of gene expression that have been found in both coding and non-coding sequences. Compared to the germline indels in healthy genomes, somatic indels contain more coding indels with higher than expected frame-shift (FS) indels in cancer genomes. LUAD has a higher ratio of deletions and higher coding and FS indel rates than BRCA. More importantly, these somatic indels in cancer genomes tend to locate in sequences with important functions, which can affect the core secondary structures of proteins and have a bigger overlap with predicted TFBSs in coding regions than the germline indels. The somatic CDS indels are also enriched in highly conserved nucleotides when compared with germline CDS indels.
Collapse
Affiliation(s)
- Jing Chen
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
10
|
Loewenthal G, Rapoport D, Avram O, Moshe A, Wygoda E, Itzkovitch A, Israeli O, Azouri D, Cartwright RA, Mayrose I, Pupko T. A probabilistic model for indel evolution: differentiating insertions from deletions. Mol Biol Evol 2021; 38:5769-5781. [PMID: 34469521 PMCID: PMC8662616 DOI: 10.1093/molbev/msab266] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Insertions and deletions (indels) are common molecular evolutionary events. However, probabilistic models for indel evolution are under-developed due to their computational complexity. Here, we introduce several improvements to indel modeling: 1) While previous models for indel evolution assumed that the rates and length distributions of insertions and deletions are equal, here we propose a richer model that explicitly distinguishes between the two; 2) we introduce numerous summary statistics that allow approximate Bayesian computation-based parameter estimation; 3) we develop a method to correct for biases introduced by alignment programs, when inferring indel parameters from empirical data sets; and 4) using a model-selection scheme, we test whether the richer model better fits biological data compared with the simpler model. Our analyses suggest that both our inference scheme and the model-selection procedure achieve high accuracy on simulated data. We further demonstrate that our proposed richer model better fits a large number of empirical data sets and that, for the majority of these data sets, the deletion rate is higher than the insertion rate.
Collapse
Affiliation(s)
- Gil Loewenthal
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Rapoport
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Oren Avram
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Asher Moshe
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Elya Wygoda
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Alon Itzkovitch
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Omer Israeli
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Dana Azouri
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.,School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, Arizona, USA.,School of Life Sciences, Arizona State University, Tempe, Arizona, USA
| | - Itay Mayrose
- School of Plant Sciences and Food Security, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Tal Pupko
- The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| |
Collapse
|
11
|
Soares PET, Dantas MDA, Silva-Portela RDCB, Agnez-Lima LF, Lanza DCF. Characterization of Penaeus vannamei mitogenome focusing on genetic diversity. PLoS One 2021; 16:e0255291. [PMID: 34329352 PMCID: PMC8323954 DOI: 10.1371/journal.pone.0255291] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2021] [Accepted: 07/13/2021] [Indexed: 11/23/2022] Open
Abstract
The diversity of the Penaeus vannamei mitochondrial genome has still been poorly characterized, there are no validated mitochondrial markers available for populational studies, and the heteroplasmy has not yet been investigated in this species. In this study, metagenomic reads extracted from the muscle of a single individual were used to assemble the mitochondrial genome (mtDNA). These data associated with mitochondrial genomes previously described allowed to evaluate the inter-individual variability and heteroplasmy. Comparison among 45 mtDNA control regions led to the detection of conserved and variable segments and the characterization of two hypervariable regions. The analysis of diversity revealed mostly low frequency polymorphisms, and heteroplasmy was found in practically all mitochondrial genes, with a high occurrence of indels. These results indicate that the design of mitochondrial markers for P. vannamei must be done with caution. The mapping of conserved and variable regions and the characterization of heteroplasmy presented here will contribute to increasing the efficiency of mitochondrial markers for population or individual studies.
Collapse
Affiliation(s)
- Paulo Eduardo T. Soares
- Applied Molecular Biology Lab—LAPLIC, Department of Biochemistry, Federal University of Rio Grande do Norte, Natal, Rio Grande do Norte, Brazil
- Postgraduate Program in Biochemistry, Federal University of Rio Grande do Norte, Natal, RN, Brazil
| | - Márcia Danielle A. Dantas
- Applied Molecular Biology Lab—LAPLIC, Department of Biochemistry, Federal University of Rio Grande do Norte, Natal, Rio Grande do Norte, Brazil
- Postgraduate Program in Biochemistry, Federal University of Rio Grande do Norte, Natal, RN, Brazil
| | - Rita de Cássia B. Silva-Portela
- Laboratory of Molecular Biology and Genomics, Department of Cellular Biology and Genetics, Federal University of Rio Grande do Norte, Natal, Rio Grande do Norte, Brazil
| | - Lucymara F. Agnez-Lima
- Postgraduate Program in Biochemistry, Federal University of Rio Grande do Norte, Natal, RN, Brazil
- Laboratory of Molecular Biology and Genomics, Department of Cellular Biology and Genetics, Federal University of Rio Grande do Norte, Natal, Rio Grande do Norte, Brazil
| | - Daniel Carlos F. Lanza
- Applied Molecular Biology Lab—LAPLIC, Department of Biochemistry, Federal University of Rio Grande do Norte, Natal, Rio Grande do Norte, Brazil
- Postgraduate Program in Biochemistry, Federal University of Rio Grande do Norte, Natal, RN, Brazil
| |
Collapse
|
12
|
Batista RL, Mendonca BB. Integrative and Analytical Review of the 5-Alpha-Reductase Type 2 Deficiency Worldwide. APPLICATION OF CLINICAL GENETICS 2020; 13:83-96. [PMID: 32346305 PMCID: PMC7167369 DOI: 10.2147/tacg.s198178] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/03/2019] [Accepted: 02/20/2020] [Indexed: 12/12/2022]
Abstract
Introduction The conversion of testosterone into dihydrotestosterone is catalyzed by the 5α-reductase type 2 enzyme which plays a crucial role in the external genitalia virilization. It is encoded by the SRD5A2 gene. Allelic variants in this gene cause a 46,XY DSD with no genotype-phenotype relationship. It was firstly reported in the early 70s from isolated clusters. Since then, several cases have been reported. Putting together, it will expand the knowledge on the molecular bases of androgen milieu. Methods We searched for SRD5A2 allelic variants (AV) in the literature (PubMed, Embase, MEDLINE) and websites (ensembl, HGMD, ClinVar). Only cases with AV in both alleles, either in homozygous or compound heterozygous were included. The included cases were analyzed according to ethnicity, exon, domain, aminoacid (aa) conservation, age at diagnosis, sex assignment, gender reassignment, external genitalia virilization and functional studies. External genitalia virilization was scored using Sinnecker scale. Conservation analysis was carried out using the CONSURF platform. For categorical variables, we used X2 test and Cramer's V. Continuous variables were analyzed by t test or ANOVA. Concordance was estimated by Kappa. Results We identified 434 cases of 5ARD2 deficiencies from 44 countries. Most came from Turkey (23%), China (17%), Italy (9%), and Brazil (7%). Sixty-nine percent were assigned as female. There were 70% of homozygous allelic variants and 30% compound heterozygous. Most were missense variants (76%). However, small indels (11%), splicing (5%) and large deletions (4%) were all reported. They were distributed along with all exons with exon 1 (33%) and exon 4 (25%) predominance. Allelic variants in the exon 4 (NADPH-binding domain) resulted in lower virilization (p<0.0001). The codons 55, 65, 196, 235 and 246 are hotspots making up 25% of all allelic variants. Most of them (76%) were located at conserved aa. However, allelic variants at non-conserved aa were more frequently indels (28% vs 6%; p<0.01). The overall rate of gender change from female to male ranged from 16% to 70%. The lowest rate of gender change from female to male occurred in Turkey and the highest in Brazil. External genitalia virilization was similar between those who changed and those who kept their assigned gender. The gender change rate was significantly different across the countries (V=0.44; p<0.001) even with similar virilization scores. Conclusion 5ARD2 deficiency has a worldwide distribution. Allelic variants at the NADPH-ligand region cause lower virilization. Genitalia virilization influenced sex assignment but not gender change which was influenced by cultural aspects across the countries. Molecular diagnosis influenced on sex assignment, favoring male sex assignment in newborns with 5α-reductase type 2 deficiency.
Collapse
Affiliation(s)
- Rafael Loch Batista
- Unidade de Endocrinologia do Desenvolvimento, Laboratório de Hormônios e Genética Molecular/LIM42, Hospital das Clínicas, Disciplina de Endocrinologia, do Departamento de Clínica Médica, Faculdade de Medicina da Universidade de São Paulo, São Paulo, Brazil
| | - Berenice Bilharinho Mendonca
- Unidade de Endocrinologia do Desenvolvimento, Laboratório de Hormônios e Genética Molecular/LIM42, Hospital das Clínicas, Disciplina de Endocrinologia, do Departamento de Clínica Médica, Faculdade de Medicina da Universidade de São Paulo, São Paulo, Brazil
| |
Collapse
|
13
|
Microsatellite instability in mismatch repair and tumor suppressor genes and their expression profiling provide important targets for the development of biomarkers in gastric cancer. Gene 2019; 710:48-58. [DOI: 10.1016/j.gene.2019.05.051] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2019] [Revised: 05/13/2019] [Accepted: 05/25/2019] [Indexed: 12/24/2022]
|
14
|
Zarin T, Strome B, Nguyen Ba AN, Alberti S, Forman-Kay JD, Moses AM. Proteome-wide signatures of function in highly diverged intrinsically disordered regions. eLife 2019; 8:46883. [PMID: 31264965 PMCID: PMC6634968 DOI: 10.7554/elife.46883] [Citation(s) in RCA: 98] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2019] [Accepted: 07/01/2019] [Indexed: 12/24/2022] Open
Abstract
Intrinsically disordered regions make up a large part of the proteome, but the sequence-to-function relationship in these regions is poorly understood, in part because the primary amino acid sequences of these regions are poorly conserved in alignments. Here we use an evolutionary approach to detect molecular features that are preserved in the amino acid sequences of orthologous intrinsically disordered regions. We find that most disordered regions contain multiple molecular features that are preserved, and we define these as ‘evolutionary signatures’ of disordered regions. We demonstrate that intrinsically disordered regions with similar evolutionary signatures can rescue function in vivo, and that groups of intrinsically disordered regions with similar evolutionary signatures are strongly enriched for functional annotations and phenotypes. We propose that evolutionary signatures can be used to predict function for many disordered regions from their amino acid sequences.
Collapse
Affiliation(s)
- Taraneh Zarin
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| | - Bob Strome
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada
| | - Alex N Nguyen Ba
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, United States
| | - Simon Alberti
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany.,Center for Molecular and Cellular Bioengineering, Biotechnology Center, Technische Universität Dresden, Dresden, Germany
| | - Julie D Forman-Kay
- Program in Molecular Medicine, Hospital for Sick Children, Toronto, Canada.,Department of Biochemistry, University of Toronto, Toronto, Canada
| | - Alan M Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, Canada.,Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Canada.,Department of Computer Science, University of Toronto, Toronto, Canada
| |
Collapse
|
15
|
Gagliano SA, Sengupta S, Sidore C, Maschio A, Cucca F, Schlessinger D, Abecasis GR. Relative impact of indels versus SNPs on complex disease. Genet Epidemiol 2018; 43:112-117. [PMID: 30565766 PMCID: PMC6330128 DOI: 10.1002/gepi.22175] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 09/07/2018] [Accepted: 10/29/2018] [Indexed: 11/30/2022]
Abstract
It is unclear whether insertions and deletions (indels) are more likely to influence complex traits than abundant single‐nucleotide polymorphisms (SNPs). We sought to understand which category of variation is more likely to impact health. Using the SardiNIA study as an exemplar, we characterized 478,876 common indels and 8,246,244 common SNPs in up to 5,949 well‐phenotyped individuals from an isolated valley in Sardinia. We assessed association between 120 traits, resulting in 89 nonoverlapping‐associated loci.We evaluated whether indels were enriched among credible sets of potential causal variants. These credible sets included 1,319 SNPs and 88 indels. We did not find indels to be significantly enriched. Indels were the most likely causal variant in seven loci, including one locus associated with monocyte count where an indel with causality and mechanism previously demonstrated (rs200748895:TGCTG/T) had a 0.999 posterior probability. Overall, our results show a very modest and nonsignificant enrichment for common indels in associated loci.
Collapse
Affiliation(s)
- Sarah A Gagliano
- Center for Statistical Genetics, and Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - Sebanti Sengupta
- Center for Statistical Genetics, and Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - Carlo Sidore
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - Andrea Maschio
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy
| | - Francesco Cucca
- Istituto di Ricerca Genetica e Biomedica, Consiglio Nazionale delle Ricerche (CNR), Cagliari, Italy.,Dipartimento di Scienze Biomediche, Università degli Studi di Sassari, Sassari, Italy
| | - David Schlessinger
- Laboratory of Genetics, National Institute on Aging, US National Institutes of Health, Baltimore, Maryland
| | - Gonçalo R Abecasis
- Center for Statistical Genetics, and Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| |
Collapse
|
16
|
Correlated Selection on Amino Acid Deletion and Replacement in Mammalian Protein Sequences. J Mol Evol 2018; 86:365-378. [PMID: 29955898 DOI: 10.1007/s00239-018-9853-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2017] [Accepted: 06/21/2018] [Indexed: 10/28/2022]
Abstract
A low ratio of nonsynonymous and synonymous substitution rates (dN/dS) at a codon is an indicator of functional constraint caused by purifying selection. Intuitively, the functional constraint would also be expected to prevent such a codon from being deleted. However, to the best of our knowledge, the correlation between the rates of deletion and substitution has never actually been estimated. Here, we use 8595 protein-coding region sequences from nine mammalian species to examine the relationship between deletion rate and dN/dS. We find significant positive correlations at the levels of both sites and genes. We compared our data against controls consisting of simulated coding sequences evolving along identical phylogenetic trees, where deletions occur independently of substitutions. A much weaker correlation was found in the corresponding simulated sequences, probably caused by alignment errors. In the real data, the correlations cannot be explained by alignment errors. Separate investigations on nonsynonymous (dN) and synonymous (dS) substitution rates indicate that the correlation is most likely due to a similarity in patterns of selection rather than in mutation rates.
Collapse
|
17
|
Lin M, Whitmire S, Chen J, Farrel A, Shi X, Guo JT. Effects of short indels on protein structure and function in human genomes. Sci Rep 2017; 7:9313. [PMID: 28839204 PMCID: PMC5570956 DOI: 10.1038/s41598-017-09287-x] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Accepted: 07/24/2017] [Indexed: 01/20/2023] Open
Abstract
Insertions and deletions (indels) represent the second most common type of genetic variations in human genomes. Indels can be deleterious and contribute to disease susceptibility as recent genome sequencing projects revealed a large number of indels in various cancer types. In this study, we investigated the possible effects of small coding indels on protein structure and function, and the baseline characteristics of indels in 2504 individuals of 26 populations from the 1000 Genomes Project. We found that each population has a distinct pattern in genes with small indels. Frameshift (FS) indels are enriched in olfactory receptor activity while non-frameshift (NFS) indels are enriched in transcription-related proteins. Structural analysis of NFS indels revealed that they predominantly adopt coil or disordered conformations, especially in proteins with transcription-related NFS indels. These results suggest that the annotated coding indels from the 1000 Genomes Project, while contributing to genetic variations and phenotypic diversity, generally do not affect the core protein structures and have no deleterious effect on essential biological processes. In addition, we found that a number of reference genome annotations might need to be updated due to the high prevalence of annotated homozygous indels in the general population.
Collapse
Affiliation(s)
- Maoxuan Lin
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Sarah Whitmire
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jing Chen
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Alvin Farrel
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA
| | - Jun-Tao Guo
- Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte, Charlotte, NC, 28223, USA.
| |
Collapse
|
18
|
Jackson EL, Spielman SJ, Wilke CO. Computational prediction of the tolerance to amino-acid deletion in green-fluorescent protein. PLoS One 2017; 12:e0164905. [PMID: 28369116 PMCID: PMC5378326 DOI: 10.1371/journal.pone.0164905] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 03/21/2017] [Indexed: 01/29/2023] Open
Abstract
Proteins evolve through two primary mechanisms: substitution, where mutations alter a protein's amino-acid sequence, and insertions and deletions (indels), where amino acids are either added to or removed from the sequence. Protein structure has been shown to influence the rate at which substitutions accumulate across sites in proteins, but whether structure similarly constrains the occurrence of indels has not been rigorously studied. Here, we investigate the extent to which structural properties known to covary with protein evolutionary rates might also predict protein tolerance to indels. Specifically, we analyze a publicly available dataset of single-amino-acid deletion mutations in enhanced green fluorescent protein (eGFP) to assess how well the functional effect of deletions can be predicted from protein structure. We find that weighted contact number (WCN), which measures how densely packed a residue is within the protein's three-dimensional structure, provides the best single predictor for whether eGFP will tolerate a given deletion. We additionally find that using protein design to explicitly model deletions results in improved predictions of functional status when combined with other structural predictors. Our work suggests that structure plays fundamental role in constraining deletions at sites in proteins, and further that similar biophysical constraints influence both substitutions and deletions. This study therefore provides a solid foundation for future work to examine how protein structure influences tolerance of more complex indel events, such as insertions or large deletions.
Collapse
Affiliation(s)
- Eleisha L. Jackson
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
| | - Stephanie J. Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Claus O. Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, United States of America
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, United States of America
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America
- * E-mail:
| |
Collapse
|
19
|
Selection maintains signaling function of a highly diverged intrinsically disordered region. Proc Natl Acad Sci U S A 2017; 114:E1450-E1459. [PMID: 28167781 DOI: 10.1073/pnas.1614787114] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Intrinsically disordered regions (IDRs) are characterized by their lack of stable secondary or tertiary structure and comprise a large part of the eukaryotic proteome. Although these regions play a variety of signaling and regulatory roles, they appear to be rapidly evolving at the primary sequence level. To understand the functional implications of this rapid evolution, we focused on a highly diverged IDR in Saccharomyces cerevisiae that is involved in regulating multiple conserved MAPK pathways. We hypothesized that under stabilizing selection, the functional output of orthologous IDRs could be maintained, such that diverse genotypes could lead to similar function and fitness. Consistent with the stabilizing selection hypothesis, we find that diverged, orthologous IDRs can mostly recapitulate wild-type function and fitness in S. cerevisiae We also find that the electrostatic charge of the IDR is correlated with signaling output and, using phylogenetic comparative methods, find evidence for selection maintaining this quantitative molecular trait despite underlying genotypic divergence.
Collapse
|
20
|
Wajnberg G, Passetti F. Using high-throughput sequencing transcriptome data for INDEL detection: challenges for cancer drug discovery. Expert Opin Drug Discov 2016; 11:257-68. [PMID: 26787005 DOI: 10.1517/17460441.2016.1143813] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
INTRODUCTION A cancer cell is a mosaic of genomic and epigenomic alterations. Distinct cancer molecular signatures can be observed depending on tumor type or patient genetic background. One type of genomic alteration is the insertion and/or deletion (INDEL) of nucleotides in the DNA sequence, which may vary in length, and may change the encoded protein or modify protein domains. INDELs are associated to a large number of diseases and their detection is done based on low-throughput techniques. However, high-throughput sequencing has also started to be used for detection of novel disease-causing INDELs. This search may identify novel drug targets. AREAS COVERED This review presents examples of using high-throughput sequencing (DNA-Seq and RNA-Seq) to investigate the incidence of INDELs in coding regions of human genes. Some of these examples successfully utilized RNA-Seq to identify INDELs associated to diseases. In addition, other studies have described small INDELs related to chemo-resistance or poor outcome of patients, while structural variants were associated with a better clinical outcome. EXPERT OPINION On average, there is twice as much RNA-Seq data available at the most used repositories for such data compared to DNA-Seq. Therefore, using RNA-Seq data is a promising strategy for studying cancer samples with unknown mechanisms of drug resistance, aiming at the discovery of proteins with potential as novel drug targets.
Collapse
Affiliation(s)
- Gabriel Wajnberg
- a Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute , Fundação Oswaldo Cruz (FIOCRUZ) , Rio de Janeiro , RJ , Brazil
| | - Fabio Passetti
- a Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute , Fundação Oswaldo Cruz (FIOCRUZ) , Rio de Janeiro , RJ , Brazil
| |
Collapse
|
21
|
Khan T, Douglas GM, Patel P, Nguyen Ba AN, Moses AM. Polymorphism Analysis Reveals Reduced Negative Selection and Elevated Rate of Insertions and Deletions in Intrinsically Disordered Protein Regions. Genome Biol Evol 2015; 7:1815-26. [PMID: 26047845 PMCID: PMC4494057 DOI: 10.1093/gbe/evv105] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Intrinsically disordered protein regions are abundant in eukaryotic proteins and lack stable tertiary structures and enzymatic functions. Previous studies of disordered region evolution based on interspecific alignments have revealed an increased propensity for indels and rapid rates of amino acid substitution. How disordered regions are maintained at high abundance in the proteome and across taxa, despite apparently weak evolutionary constraints, remains unclear. Here, we use single nucleotide and indel polymorphism data in yeast and human populations to survey the population variation within disordered regions. First, we show that single nucleotide polymorphisms in disordered regions are under weaker negative selection compared with more structured protein regions and have a higher proportion of neutral non-synonymous sites. We also confirm previous findings that nonframeshifting indels are much more abundant in disordered regions relative to structured regions. We find that the rate of nonframeshifting indel polymorphism in intrinsically disordered regions resembles that of noncoding DNA and pseudogenes, and that large indels segregate in disordered regions in the human population. Our survey of polymorphism confirms patterns of evolution in disordered regions inferred based on longer evolutionary comparisons.
Collapse
Affiliation(s)
- Tahsin Khan
- Department of Cell & Systems Biology, University of Toronto, Ontario, Canada
| | - Gavin M Douglas
- Department of Ecology & Evolutionary Biology, University of Toronto, Ontario, Canada
| | - Priyenbhai Patel
- Department of Cell & Systems Biology, University of Toronto, Ontario, Canada
| | - Alex N Nguyen Ba
- Department of Cell & Systems Biology, University of Toronto, Ontario, Canada
| | - Alan M Moses
- Department of Cell & Systems Biology, University of Toronto, Ontario, Canada Department of Ecology & Evolutionary Biology, University of Toronto, Ontario, Canada Centre for the Analysis of Genome Evolution and Function, University of Toronto, Ontario, Canada
| |
Collapse
|
22
|
Neumann LC, Feiner N, Meyer A, Buiting K, Horsthemke B. The imprinted NPAP1 gene in the Prader-Willi syndrome region belongs to a POM121-related family of retrogenes. Genome Biol Evol 2015; 6:344-51. [PMID: 24482533 PMCID: PMC3942032 DOI: 10.1093/gbe/evu019] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We have recently shown that the human Nuclear pore-associated protein (NPAP1)/C15orf2 gene encodes a nuclear pore-associated protein. This gene is one of several paternally expressed imprinted genes in the genomic region 15q11q13. Because the Prader–Willi syndrome is known to be caused by the loss of function of paternally expressed genes in 15q11q13, a phenotypic contribution of NPAP1 cannot be excluded. NPAP1 appears to be under strong positive Darwinian selection in primates, suggesting an important function in primate biology. Interestingly, however, in contrast to all other protein-coding genes in 15q11q13, NPAP1 has no ortholog in the mouse. Our investigation of the evolutionary origin of NPAP1 showed that the gene is specific to primate species and absent from the 15q11q13-orthologous regions in all nonprimate mammals. However, we identified a group of paralogous genes, which we call NPAP1L, in all placental mammals except rodents. Phylogenetic analysis revealed that NPAP1, NPAP1L, and another group of genes (UPF0607), which is also restricted to primates, are closely related to the vertebrate transmembrane nucleoporin gene POM121, although they lack the transmembrane domain. These three newly identified groups of genes all lack conserved introns, and hence, are likely retrogenes. We hypothesize that, in the common ancestor of placentals, the POM121 gene retrotransposed and gave rise to an NPAP1-ancestral retrogene NPAP1L/NPAP1/UPF0607. Our results suggest that the nuclear pore-associated gene NPAP1 originates from the vertebrate nucleoporin gene POM121 and—after several steps of retrotransposition and duplication—has been subjected to genomic imprinting and positive selection after integration into the imprinted SNRPN-UBE3A chromosomal domain.
Collapse
Affiliation(s)
- Lisa C Neumann
- Institut für Humangenetik, Universitätsklinikum Essen, Universität Duisburg-Essen, Germany
| | | | | | | | | |
Collapse
|
23
|
Evidence for stabilizing selection on codon usage in chromosomal rearrangements of Drosophila pseudoobscura. G3-GENES GENOMES GENETICS 2014; 4:2433-49. [PMID: 25326424 PMCID: PMC4267939 DOI: 10.1534/g3.114.014860] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
There has been a renewed interest in investigating the role of stabilizing selection acting on genome-wide traits such as codon usage bias. Codon bias, when synonymous codons are used at unequal frequencies, occurs in a wide variety of taxa. Standard evolutionary models explain the maintenance of codon bias through a balance of genetic drift, mutation and weak purifying selection. The efficacy of selection is expected to be reduced in regions of suppressed recombination. Contrary to observations in Drosophila melanogaster, some recent studies have failed to detect a relationship between the recombination rate, intensity of selection acting at synonymous sites, and the magnitude of codon bias as predicted under these standard models. Here, we examined codon bias in 2798 protein coding loci on the third chromosome of D. pseudoobscura using whole-genome sequences of 47 individuals, representing five common third chromosome gene arrangements. Fine-scale recombination maps were constructed using more than 1 million segregating sites. As expected, recombination was demonstrated to be significantly suppressed between chromosome arrangements, allowing for a direct examination of the relationship between recombination, selection, and codon bias. As with other Drosophila species, we observe a strong mutational bias away from the most frequently used codons. We find the rate of synonymous and nonsynonymous polymorphism is variable between different amino acids. However, we do not observe a reduction in codon bias or the strength of selection in regions of suppressed recombination as expected. Instead, we find that the interaction between weak stabilizing selection and mutational bias likely plays a role in shaping the composition of synonymous codons across the third chromosome in D. pseudoobscura.
Collapse
|
24
|
Bavarva JH, Tae H, McIver L, Karunasena E, Garner HR. The dynamic exome: acquired variants as individuals age. Aging (Albany NY) 2014; 6:511-521. [PMID: 25063753 PMCID: PMC4100812 DOI: 10.18632/aging.100674] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Accepted: 06/14/2014] [Indexed: 06/03/2023]
Abstract
A singular genome used for inference into population-based studies is a standard method in genomics. Recent studies show that spontaneous genomic variants can propagate into new generations and these changes can contribute to individual cell aging with environmental and evolutionary elements contributing to cumulative genomic variation. However, the contribution of aging to genomic changes in tissue samples remains uncharacterized. Here, we report the impact of aging on individual human exomes and their implications. We found the human genome to be dynamic, acquiring a varying number of mutations with age (5,000 to 50,000 in 9 to 16 years). This equates to a variation rate of 9.6x10(-7) to 8.4x10(-6) bp(-1) year(-1) for nonsynonymous single nucleotide variants and 2.0x10(-4) to 1.0x10(-3) locus(-1) year(-1) for microsatellite loci in these individuals. These mutations span across 3,000 to 13,000 genes, which commonly showed association with Wnt signaling and Gonadotropin releasing hormone receptor pathways, and indicated for individuals a specific and significant enrichment for increased risk for diabetes, kidney failure, cancer, Rheumatoid arthritis, and Alzheimer's disease--conditions usually associated with aging. The results suggest that "age" is an important variable while analyzing an individual human genome to extract individual-specific clinically significant information necessary for personalized genomics.
Collapse
Affiliation(s)
- Jasmin H Bavarva
- Virginia Bioinformatics Institute, Virginia Tech, Blacksburg, VA 24061, USA
| | | | | | | | | |
Collapse
|
25
|
Rockah-Shmuel L, Tóth-Petróczy Á, Sela A, Wurtzel O, Sorek R, Tawfik DS. Correlated occurrence and bypass of frame-shifting insertion-deletions (InDels) to give functional proteins. PLoS Genet 2013; 9:e1003882. [PMID: 24204297 PMCID: PMC3812077 DOI: 10.1371/journal.pgen.1003882] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2013] [Accepted: 09/02/2013] [Indexed: 11/19/2022] Open
Abstract
Short insertions and deletions (InDels) comprise an important part of the natural mutational repertoire. InDels are, however, highly deleterious, primarily because two-thirds result in frame-shifts. Bypass through slippage over homonucleotide repeats by transcriptional and/or translational infidelity is known to occur sporadically. However, the overall frequency of bypass and its relation to sequence composition remain unclear. Intriguingly, the occurrence of InDels and the bypass of frame-shifts are mechanistically related - occurring through slippage over repeats by DNA or RNA polymerases, or by the ribosome, respectively. Here, we show that the frequency of frame-shifting InDels, and the frequency by which they are bypassed to give full-length, functional proteins, are indeed highly correlated. Using a laboratory genetic drift, we have exhaustively mapped all InDels that occurred within a single gene. We thus compared the naive InDel repertoire that results from DNA polymerase slippage to the frame-shifting InDels tolerated following selection to maintain protein function. We found that InDels repeatedly occurred, and were bypassed, within homonucleotide repeats of 3–8 bases. The longer the repeat, the higher was the frequency of InDels formation, and the more frequent was their bypass. Besides an expected 8A repeat, other types of repeats, including short ones, and G and C repeats, were bypassed. Although obtained in vitro, our results indicate a direct link between the genetic occurrence of InDels and their phenotypic rescue, thus suggesting a potential role for frame-shifting InDels as bridging evolutionary intermediates. Homonucleotide repeats are exceptionally prone to insertions and/or deletions of bases (InDels). However, unless they occur in a multiplicity of 3 bases, InDels disrupt the reading frame and are thus expected to be purged from coding regions. Homonucleotide repeats, however, are also vulnerable to slippage by RNA polymerases and the ribosome. Using laboratory evolution techniques, we systematically mapped the occurrence of InDels within a given gene, before and after selection. Our data indicate that frame-shifting InDels were frequently bypassed to give functional proteins at surprisingly high frequencies. Further, we found a strict correlation between the repeat length, the frequency of occurrence of InDels at the DNA level, and the likelihood of bypass by transcriptional/translational slippage. Our results suggest that frame-shifting InDels might comprise functional evolutionary intermediates, and an effective mean of sequence divergence (e.g. when an adjacent InDel restores the frame, resulting in altered sequence and, potentially, in an altered protein structure).
Collapse
Affiliation(s)
- Liat Rockah-Shmuel
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Ágnes Tóth-Petróczy
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Asaf Sela
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Omri Wurtzel
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Rotem Sorek
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, Israel
| | - Dan S. Tawfik
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
- * E-mail:
| |
Collapse
|
26
|
Kvikstad EM, Duret L. Strong heterogeneity in mutation rate causes misleading hallmarks of natural selection on indel mutations in the human genome. Mol Biol Evol 2013; 31:23-36. [PMID: 24113537 PMCID: PMC3879449 DOI: 10.1093/molbev/mst185] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Elucidating the mechanisms of mutation accumulation and fixation is critical to understand the nature of genetic variation and its contribution to genome evolution. Of particular interest is the effect of insertions and deletions (indels) on the evolution of genome landscapes. Recent population-scaled sequencing efforts provide unprecedented data for analyzing the relative impact of selection versus nonadaptive forces operating on indels. Here, we combined McDonald-Kreitman tests with the analysis of derived allele frequency spectra to investigate the dynamics of allele fixation of short (1-50 bp) indels in the human genome. Our analyses revealed apparently higher fixation probabilities for insertions than deletions. However, this fixation bias is not consistent with either selection or biased gene conversion and varies with local mutation rate, being particularly pronounced at indel hotspots. Furthermore, we identified an unprecedented number of loci with evidence for multiple indel events in the primate phylogeny. Even in nonrepetitive sequence contexts (a priori not prone to indel mutations), such loci are 60-fold more frequent than expected according to a model of uniform indel mutation rate. This provides evidence of as yet unidentified cryptic indel hotspots. We propose that indel homoplasy, at known and cryptic hotspots, produces systematic errors in determination of ancestral alleles via parsimony and advise caution interpreting classic selection tests given the strong heterogeneity in indel rates across the genome. These results will have great impact on studies seeking to infer evolutionary forces operating on indels observed in closely related species, because such mutations are traditionally presumed homoplasy-free.
Collapse
Affiliation(s)
- Erika M Kvikstad
- Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, CNRS, Université Lyon 1, Villeurbanne, France
| | | |
Collapse
|
27
|
Light S, Sagit R, Sachenkova O, Ekman D, Elofsson A. Protein Expansion Is Primarily due to Indels in Intrinsically Disordered Regions. Mol Biol Evol 2013; 30:2645-53. [DOI: 10.1093/molbev/mst157] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
|
28
|
Ajawatanawong P, Baldauf SL. Evolution of protein indels in plants, animals and fungi. BMC Evol Biol 2013; 13:140. [PMID: 23826714 PMCID: PMC3706215 DOI: 10.1186/1471-2148-13-140] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Accepted: 06/24/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Insertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes. RESULTS Comparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold. CONCLUSIONS We find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.
Collapse
Affiliation(s)
- Pravech Ajawatanawong
- Department of Systematic Biology, Evolutionary Biology Centre (EBC), Uppsala University, Uppsala 75236, Sweden.
| | | |
Collapse
|
29
|
Tóth-Petróczy Á, Tawfik DS. Protein Insertions and Deletions Enabled by Neutral Roaming in Sequence Space. Mol Biol Evol 2013; 30:761-71. [DOI: 10.1093/molbev/mst003] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
|
30
|
Westesson O, Lunter G, Paten B, Holmes I. Accurate reconstruction of insertion-deletion histories by statistical phylogenetics. PLoS One 2012; 7:e34572. [PMID: 22536326 PMCID: PMC3335033 DOI: 10.1371/journal.pone.0034572] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2012] [Accepted: 03/05/2012] [Indexed: 11/24/2022] Open
Abstract
The Multiple Sequence Alignment (MSA) is a computational abstraction that represents a partial summary either of indel history, or of structural similarity. Taking the former view (indel history), it is possible to use formal automata theory to generalize the phylogenetic likelihood framework for finite substitution models (Dayhoff's probability matrices and Felsenstein's pruning algorithm) to arbitrary-length sequences. In this paper, we report results of a simulation-based benchmark of several methods for reconstruction of indel history. The methods tested include a relatively new algorithm for statistical marginalization of MSAs that sums over a stochastically-sampled ensemble of the most probable evolutionary histories. For mammalian evolutionary parameters on several different trees, the single most likely history sampled by our algorithm appears less biased than histories reconstructed by other MSA methods. The algorithm can also be used for alignment-free inference, where the MSA is explicitly summed out of the analysis. As an illustration of our method, we discuss reconstruction of the evolutionary histories of human protein-coding genes.
Collapse
Affiliation(s)
- Oscar Westesson
- University of California Berkeley and University of California San Francisco Graduate Program in Bioengineering, University of California, Berkeley, California, United States of America
| | - Gerton Lunter
- Wellcome Trust Center for Human Genetics, Oxford, Oxford, United Kingdom
| | - Benedict Paten
- Baskin School of Engineering, University of California Santa Cruz, Santa Cruz, California, United States of America
| | - Ian Holmes
- University of California Berkeley and University of California San Francisco Graduate Program in Bioengineering, University of California, Berkeley, California, United States of America
| |
Collapse
|
31
|
Lin WH, Kussell E. Evolutionary pressures on simple sequence repeats in prokaryotic coding regions. Nucleic Acids Res 2011; 40:2399-413. [PMID: 22123746 PMCID: PMC3315296 DOI: 10.1093/nar/gkr1078] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Simple sequence repeats (SSRs) are indel mutational hotspots in genomes. In prokaryotes, SSR loci can cause phase variation, a microbial survival strategy that relies on stochastic, reversible on–off switching of gene activity. By analyzing multiple strains of 42 fully sequenced prokaryotic species, we measure the relative variability and density distribution of SSRs in coding regions. We demonstrate that repeat type strongly influences indel mutation rates, and that the most mutable types are most strongly avoided across genomes. We thoroughly characterize SSR density and variability as a function of N→C position along protein sequences. Using codon-shuffling algorithms that preserve amino acid sequence, we assess evolutionary pressures on SSRs. We find that coding sequences suppress repeats in the middle of proteins, and enrich repeats near termini, yielding U-shaped SSR density curves. We show that for many species this characteristic shape can be attributed to purely biophysical constraints of protein structure. In multiple cases, however, particularly in certain pathogenic bacteria, we observe over enrichment of SSRs near protein N-termini significantly beyond expectation based on structural constraints. This increases the probability that frameshifts result in non-functional proteins, revealing that these species may evolutionarily tune SSR positions in coding regions to facilitate phase variation.
Collapse
Affiliation(s)
- Wei-Hsiang Lin
- Center for Genomics and Systems Biology, Department of Biology and Department of Physics, New York University, New York, NY 10003, USA
| | - Edo Kussell
- Center for Genomics and Systems Biology, Department of Biology and Department of Physics, New York University, New York, NY 10003, USA
- *To whom correspondence should be addressed. Tel: +1 212 998 7663;
| |
Collapse
|
32
|
Roncarati R, Latronico MVG, Musumeci B, Aurino S, Torella A, Bang ML, Jotti GS, Puca AA, Volpe M, Nigro V, Autore C, Condorelli G. Unexpectedly low mutation rates in beta-myosin heavy chain and cardiac myosin binding protein genes in Italian patients with hypertrophic cardiomyopathy. J Cell Physiol 2011; 226:2894-900. [PMID: 21302287 PMCID: PMC3229838 DOI: 10.1002/jcp.22636] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Hypertrophic cardiomyopathy (HCM) is the most common genetic cardiac disease. Fourteen sarcomeric and sarcomere-related genes have been implicated in HCM etiology, those encoding β-myosin heavy chain (MYH7) and cardiac myosin binding protein C (MYBPC3) reported as the most frequently mutated: in fact, these account for around 50% of all cases related to sarcomeric gene mutations, which are collectively responsible for approximately 70% of all HCM cases. Here, we used denaturing high-performance liquid chromatography followed by bidirectional sequencing to screen the coding regions of MYH7 and MYBPC3 in a cohort (n = 125) of Italian patients presenting with HCM. We found 6 MHY7 mutations in 9/125 patients and 18 MYBPC3 mutations in 19/125 patients. Of the three novel MYH7 mutations found, two were missense, and one was a silent mutation; of the eight novel MYBPC3 mutations, one was a substitution, three were stop codons, and four were missense mutations. Thus, our cohort of Italian HCM patients did not harbor the high frequency of mutations usually found in MYH7 and MYBPC3. This finding, coupled to the clinical diversity of our cohort, emphasizes the complexity of HCM and the need for more inclusive investigative approaches in order to fully understand the pathogenesis of this disease.
Collapse
Affiliation(s)
- Roberta Roncarati
- Instituto di Tecnologie Biomediche, Consiglio Nazionale delle Ricerche, Milan, Italy
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
33
|
Jordan G, Goldman N. The Effects of Alignment Error and Alignment Filtering on the Sitewise Detection of Positive Selection. Mol Biol Evol 2011; 29:1125-39. [DOI: 10.1093/molbev/msr272] [Citation(s) in RCA: 156] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
|
34
|
Chen CH, Liao BY, Chen FC. Exploring the selective constraint on the sizes of insertions and deletions in 5' untranslated regions in mammals. BMC Evol Biol 2011; 11:192. [PMID: 21726469 PMCID: PMC3146882 DOI: 10.1186/1471-2148-11-192] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2011] [Accepted: 07/05/2011] [Indexed: 12/30/2022] Open
Abstract
Background Small insertions and deletions ("indels" with size ≦ 100 bp) whose lengths are not multiples of three (non-3n) are strongly constrained and depleted in protein-coding sequences. Such a constraint has never been reported in noncoding genomic regions. In 5'untranslated regions (5'UTRs) in mammalian genomes, upstream start codons (uAUGs) and upstream open reading frames (uORFs) can regulate protein translation. The presence of non-3n indels in uORFs can potentially disrupt the functions of these regulatory elements. We thus hypothesize that natural selection disfavors non-3n indels in 5'UTRs when these regulatory elements are present. Results We design the Indel Selection Index to measure the selective constraint on non-3n indels in 5'UTRs. The index controls for the genomic compositions of the analyzed 5'UTRs and measures the probability of non-3n indel depletion downstream of uAUGs. By comparing the experimentally supported transcripts of human-mouse orthologous genes, we demonstrate that non-3n indels downstream of two types of uAUGs (alternative translation initiation sites and the uAUGs of coding sequence-overlapping uORFs) are underrepresented. The results hold well regardless of differences in alignment tool, gene structures between human and mouse, or the criteria in selecting alternatively spliced isoforms used for the analysis. Conclusions To our knowledge, this is the first study to demonstrate selective constraints on non-3n indels in 5'UTRs. Such constraints may be associated with the regulatory functions of uAUGs/uORFs in translational regulation or the generation of protein isoforms. Our study thus brings a new perspective to the evolution of 5'UTRs in mammals.
Collapse
Affiliation(s)
- Chun-Hsi Chen
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, 350 Taiwan
| | | | | |
Collapse
|
35
|
Sipos B, Massingham T, Jordan GE, Goldman N. PhyloSim - Monte Carlo simulation of sequence evolution in the R statistical computing environment. BMC Bioinformatics 2011; 12:104. [PMID: 21504561 PMCID: PMC3102636 DOI: 10.1186/1471-2105-12-104] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2010] [Accepted: 04/19/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Monte Carlo simulation of sequence evolution is routinely used to assess the performance of phylogenetic inference methods and sequence alignment algorithms. Progress in the field of molecular evolution fuels the need for more realistic and hence more complex simulations, adapted to particular situations, yet current software makes unreasonable assumptions such as homogeneous substitution dynamics or a uniform distribution of indels across the simulated sequences. This calls for an extensible simulation framework written in a high-level functional language, offering new functionality and making it easy to incorporate further complexity. RESULTS PhyloSim is an extensible framework for the Monte Carlo simulation of sequence evolution, written in R, using the Gillespie algorithm to integrate the actions of many concurrent processes such as substitutions, insertions and deletions. Uniquely among sequence simulation tools, PhyloSim can simulate arbitrarily complex patterns of rate variation and multiple indel processes, and allows for the incorporation of selective constraints on indel events. User-defined complex patterns of mutation and selection can be easily integrated into simulations, allowing PhyloSim to be adapted to specific needs. CONCLUSIONS Close integration with R and the wide range of features implemented offer unmatched flexibility, making it possible to simulate sequence evolution under a wide range of realistic settings. We believe that PhyloSim will be useful to future studies involving simulated alignments.
Collapse
Affiliation(s)
- Botond Sipos
- EMBL-European Bioinformatics Institute, Hinxton, UK.
| | | | | | | |
Collapse
|
36
|
Zhang Z, Wang Y, Wang L, Gao P. The combined effects of amino acid substitutions and indels on the evolution of structure within protein families. PLoS One 2010; 5:e14316. [PMID: 21179197 PMCID: PMC3001449 DOI: 10.1371/journal.pone.0014316] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2010] [Accepted: 11/16/2010] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND In the process of protein evolution, sequence variations within protein families can cause changes in protein structures and functions. However, structures tend to be more conserved than sequences and functions. This leads to an intriguing question: what is the evolutionary mechanism by which sequence variations produce structural changes? To investigate this question, we focused on the most common types of sequence variations: amino acid substitutions and insertions/deletions (indels). Here their combined effects on protein structure evolution within protein families are studied. RESULTS Sequence-structure correlation analysis on 75 homologous structure families (from SCOP) that contain 20 or more non-redundant structures shows that in most of these families there is, statistically, a bilinear correlation between the amount of substitutions and indels versus the degree of structure variations. Bilinear regression of percent sequence non-identity (PNI) and standardized number of gaps (SNG) versus RMSD was performed. The coefficients from the regression analysis could be used to estimate the structure changes caused by each unit of substitution (structural substitution sensitivity, SSS) and by each unit of indel (structural indel sensitivity, SIDS). An analysis on 52 families with high bilinear fitting multiple correlation coefficients and statistically significant regression coefficients showed that SSS is mainly constrained by disulfide bonds, which almost have no effects on SIDS. CONCLUSIONS Structural changes in homologous protein families could be rationally explained by a bilinear model combining amino acid substitutions and indels. These results may further improve our understanding of the evolutionary mechanisms of protein structures.
Collapse
Affiliation(s)
- Zheng Zhang
- State Key Laboratory of Microbial Technology, Shandong University, Jinan, Shandong, China
| | - Yuxiao Wang
- State Key Laboratory of Microbial Technology, Shandong University, Jinan, Shandong, China
- Division of Basic Science, UT Southwestern, Dallas, Texas, United States of America
| | - Lushan Wang
- State Key Laboratory of Microbial Technology, Shandong University, Jinan, Shandong, China
- * E-mail: (LW); (PG)
| | - Peiji Gao
- State Key Laboratory of Microbial Technology, Shandong University, Jinan, Shandong, China
- * E-mail: (LW); (PG)
| |
Collapse
|
37
|
Regional context in the alignment of biological sequence pairs. J Mol Evol 2010; 72:147-59. [PMID: 21107551 PMCID: PMC3064887 DOI: 10.1007/s00239-010-9409-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2010] [Accepted: 11/08/2010] [Indexed: 11/24/2022]
Abstract
Sequence divergence derives from either point substitution or indel (insertion or deletion) processes. We investigated the rates of these two processes both in protein and non-protein coding DNA. We aligned sequence pairs using two pair-hidden Markov models (PHMMs) conjoined by one silent state. The two PHMMs had their own set of parameters to model rates in their respective regions. The aim was to test the hypothesis that the indel mutation rate mimics the point mutation rate. That is, indels are found less often in conserved regions (slow point substitution rate) and more often in non-conserved regions (fast point substitution rate). Both polypeptides and rRNA molecules in our data exhibited a clear distinction between slow and fast rates of the two processes. These two rates served as surrogates to conserved and non-conserved secondary structure components, respectively. With polypeptides we found both the fast indel rate and the fast replacement rate were co-located with hydrophilic residues. We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when the model allows either of the two fast rates to colocate with hydrophilic residues. With rRNA molecules, our model did not detect colocation between the fast indel rate and the fast substitution rate. Nevertheless, coupling the indel rates with the point substitution rates across the two regions markedly increased model fit. This result suggests that rRNA pairwise alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions.
Collapse
|
38
|
Crutchley JL, Wang XQD, Ferraiuolo MA, Dostie J. Chromatin conformation signatures: ideal human disease biomarkers? Biomark Med 2010; 4:611-29. [PMID: 20701449 DOI: 10.2217/bmm.10.68] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
Human health is related to information stored in our genetic code, which is highly variable even amongst healthy individuals. Gene expression is orchestrated by numerous control elements that may be located anywhere in the genome, and can regulate distal genes by physically interacting with them. These DNA contacts can be mapped with the chromosome conformation capture and related technologies. Several studies now demonstrate that gene expression patterns are associated with specific chromatin structures, and may therefore correlate with chromatin conformation signatures. Here, we present an overview of genome organization and its relationship with gene expression. We also summarize how chromatin conformation signatures can be identified and discuss why they might represent ideal biomarkers of human disease in such genetically diverse populations.
Collapse
Affiliation(s)
- Jennifer L Crutchley
- Department of Biochemistry, McGill University, 3655 Promenade Sir-William-Osler, Room 814, Montréal, Québec, Canada
| | - Xue Qing David Wang
- Department of Biochemistry, McGill University, 3655 Promenade Sir-William-Osler, Room 814, Montréal, Québec, Canada
| | - Maria A Ferraiuolo
- Department of Biochemistry, McGill University, 3655 Promenade Sir-William-Osler, Room 814, Montréal, Québec, Canada
| | | |
Collapse
|
39
|
Zhang Z, Huang J, Wang Z, Wang L, Gao P. Impact of indels on the flanking regions in structural domains. Mol Biol Evol 2010; 28:291-301. [PMID: 20671041 DOI: 10.1093/molbev/msq196] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Amino acid substitution and insertions/deletions (indels) are two common events in protein evolution; however, current knowledge on indels is limited. In this study, we investigated the effects of indels on the flanking regions in protein structure superfamilies. Comprehensive analysis of structural classification of proteins superfamilies revealed that indels lead to a series of changes in the flanking regions, including the following: 1) structural shift in the tertiary structure, with a first-order exponential decay relation between structural shift and the distance to indels, 2) instability of the secondary structure elements in which parts of the α helix and β sheet are destroyed, and 3) an increase in the amino acid substitution rate of the primary structure and the nonsimilar amino acid substitution rate. In general, these quality changes are due to the combined effects of the "regional-inherent effect," "indel-accompanied effect," and "indel-following effect." Furthermore, these quality changes reflect changes in selective pressure. Indels are more likely to be preserved in regions with low selective pressure, and indels can further reduce the selective pressure on the flanking regions. These findings improve our understanding of the role of indels in protein evolution.
Collapse
Affiliation(s)
- Zheng Zhang
- State Key Laboratory of Microbial Technology, Shandong University, Jinan, China
| | | | | | | | | |
Collapse
|
40
|
An indel in transmembrane helix 2 helps to trace the molecular evolution of class A G-protein-coupled receptors. J Mol Evol 2009; 68:475-89. [PMID: 19357801 DOI: 10.1007/s00239-009-9214-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2008] [Revised: 02/05/2009] [Accepted: 02/16/2009] [Indexed: 10/25/2022]
Abstract
Class A G-protein-coupled receptors (GPCRs) constitute a large family of transmembrane receptors. Helical distortions play a major role in the overall fold of these receptors. Most are related to conserved proline residues. However, in transmembrane helix 2, the proline pattern is not conserved, and when present, proline may be located at position 2.58, 2.59, or 2.60. Sequence analysis, three-dimensional data mining, and molecular modeling were undertaken to investigate the origin of this unusual pattern. Taken together, the data strongly support the assumption that an indel led to two structural motifs for helix 2: a bulged structure in P2.59 and P2.60 receptors and a "typical" proline kink in P2.58 receptors. The proline pattern of helix 2 can be used as an evolutionary marker and helps to trace the molecular evolution of class A GPCRs. Two indel events yielding functional receptors occurred independently. One indel arose very early in GPCR evolution, in a bilaterian ancestor, before the protostome-deuterostome divergence. This indel led to the split between the P2.58 somatostatin/opioid receptors and other peptide receptors with the P2.59 pattern. A second indel also occurred in insect opsins and corresponds to a deletion. Subfamilies with proline at position 2.59 or no proline expanded earlier, whereas P2.60 receptors remained marginal throughout evolution. P2.58 receptors underwent rapid expansion in vertebrates with the development of the chemokine and purinergic receptor subfamilies from somatostatin/opioid-related ancestors.
Collapse
|