1
|
Zhang K, Zhang J, Ding N, Zellmer L, Zhao Y, Liu S, Liao DJ. ACTB and GAPDH appear at multiple SDS-PAGE positions, thus not suitable as reference genes for determining protein loading in techniques like Western blotting. Open Life Sci 2021; 16:1278-1292. [PMID: 34966852 PMCID: PMC8669867 DOI: 10.1515/biol-2021-0130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 10/21/2021] [Accepted: 11/01/2021] [Indexed: 11/19/2022] Open
Abstract
We performed polyacrylamide gel electrophoresis of human proteins with sodium dodecyl sulfate, isolated proteins at multiple positions, and then used liquid chromatography and tandem mass spectrometry (LC-MS/MS) to determine the protein identities. Although beta-actin (ACTB) and glyceraldehyde-3-phosphate dehydrogenase (GAPDH) are 41.7 and 36 kDa proteins, respectively, LC-MS/MS identified their peptides at all the positions studied. The National Center for Biotechnology Information (USA) database lists only one ACTB mRNA but five GAPDH mRNAs and one noncoding RNA. The five GAPDH mRNAs encode three protein isoforms, while our bioinformatics analysis identified a 17.6 kDa isoform encoded by the noncoding RNA. All LC-MS/MS-identified GAPDH peptides at all positions studied are unique, but some of the identified ACTB peptides are shared by ACTC1, ACTBL2, POTEF, POTEE, POTEI, and POTEJ. ACTC1 and ACTBL2 belong to the ACT family with significant similarities to ACTB in protein sequence, whereas the four POTEs are ACTB-containing chimeric genes with the C-terminus of their proteins highly similar to the ACTB. These data lead us to conclude that GAPDH and ACTB are poor reference genes for determining the protein loading in such techniques as Western blotting, a leading role these two genes have been playing for decades in biomedical research.
Collapse
Affiliation(s)
- Keyin Zhang
- Department of Pathology, School of Clinical Medicine, Guizhou Medical University , Guiyang 550004 , Guizhou Province , People’s Republic of China
| | - Ju Zhang
- Beijing Key Laboratory of Emerging Infectious Diseases, Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University , Beijing 100015 , People’s Republic of China
| | - Nan Ding
- Beijing Key Laboratory of Emerging Infectious Diseases, Institute of Infectious Diseases, Beijing Ditan Hospital, Capital Medical University , Beijing 100015 , People’s Republic of China
| | - Lucas Zellmer
- Department of Medicine, Hennepin County Medical Center , 730 South 8th St. , Minneapolis , MN 55415 , United States of America
| | - Yan Zhao
- Key Lab of Endemic and Ethnic Diseases of the Ministry of Education of China in Guizhou Medical University , Guiyang 550004 , Guizhou Province , People’s Republic of China
| | - Siqi Liu
- Beijing Genomic Institute, Building 11 of Beishan Industrial Zone, Tantian District , Shengzhen 518083 , Guangdong Province , People’s Republic of China
| | - Dezhong Joshua Liao
- Department of Pathology, School of Clinical Medicine, Guizhou Medical University , Guiyang 550004 , Guizhou Province , People’s Republic of China
- Key Lab of Endemic and Ethnic Diseases of the Ministry of Education of China in Guizhou Medical University , Guiyang 550004 , Guizhou Province , People’s Republic of China
- Department of Clinical Biochemistry, Guizhou Medical University Hospital , Guiyang 550004 , Guizhou Province , People’s Republic of China
| |
Collapse
|
2
|
Warsi O, Knopp M, Surkov S, Jerlström Hultqvist J, Andersson DI. Evolution of a New Function by Fusion between Phage DNA and a Bacterial Gene. Mol Biol Evol 2021; 37:1329-1341. [PMID: 31977019 PMCID: PMC7182210 DOI: 10.1093/molbev/msaa007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Mobile genetic elements, such as plasmids, phages, and transposons, are important sources for evolution of novel functions. In this study, we performed a large-scale screening of metagenomic phage libraries for their ability to suppress temperature-sensitivity in Salmonella enterica serovar Typhimurium strain LT2 mutants to examine how phage DNA could confer evolutionary novelty to bacteria. We identified an insert encoding 23 amino acids from a phage that when fused with a bacterial DNA-binding repressor protein (LacI) resulted in the formation of a chimeric protein that localized to the outer membrane. This relocalization of the chimeric protein resulted in increased membrane vesicle formation and an associated suppression of the temperature sensitivity of the bacterium. Both the host LacI protein and the extracellular 23-amino acid stretch are necessary for the generation of the novel phenotype. Furthermore, mutational analysis of the chimeric protein showed that although the native repressor function of the LacI protein is maintained in this chimeric structure, it is not necessary for the new function. Thus, our study demonstrates how a gene fusion between foreign DNA and bacterial DNA can generate novelty without compromising the native function of a given gene.
Collapse
Affiliation(s)
- Omar Warsi
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Michael Knopp
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Serhiy Surkov
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | | | - Dan I Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
3
|
Zhou Y, Zhang C. Evolutionary patterns of chimeric retrogenes in Oryza species. Sci Rep 2019; 9:17733. [PMID: 31776387 PMCID: PMC6881317 DOI: 10.1038/s41598-019-54085-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2019] [Accepted: 10/30/2019] [Indexed: 11/23/2022] Open
Abstract
Chimeric retroposition is a process by which RNA is reverse transcribed and the resulting cDNA is integrated into the genome along with flanking sequences. This process plays essential roles and drives genome evolution. Although the origination rates of chimeric retrogenes are high in plant genomes, the evolutionary patterns of the retrogenes and their parental genes are relatively uncharacterised in the rice genome. In this study, we evaluated the substitution ratio of 24 retrogenes and their parental genes to clarify their evolutionary patterns. The results indicated that seven gene pairs were under positive selection. Additionally, soon after new chimeric retrogenes were formed, they rapidly evolved. However, an unexpected pattern was also revealed. Specifically, after an undefined period following the formation of new chimeric retrogenes, the parental genes, rather than the new chimeric retrogenes, rapidly evolved under positive selection. We also observed that one retro chimeric gene (RCG3) was highly expressed in infected calli, whereas its parental gene was not. Finally, a comparison of our Ka/Ks analysis with that of other species indicated that the proportion of genes under positive selection is greater for chimeric retrogenes than for non-chimeric retrogenes in the rice genome.
Collapse
Affiliation(s)
- Yanli Zhou
- The Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, No. 132 Lanhei Road, Kunming, 650201, Yunnan, China
| | - Chengjun Zhang
- The Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, No. 132 Lanhei Road, Kunming, 650201, Yunnan, China. .,Haiyan Engineering & Technology Center, Kunming Institute of Botany, Chinese Academy of Science, Jiaxing, 314300, Zhejiang, China.
| |
Collapse
|
4
|
Tandem duplications lead to novel expression patterns through exon shuffling in Drosophila yakuba. PLoS Genet 2017; 13:e1006795. [PMID: 28531189 PMCID: PMC5460883 DOI: 10.1371/journal.pgen.1006795] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Revised: 06/06/2017] [Accepted: 05/03/2017] [Indexed: 01/06/2023] Open
Abstract
One common hypothesis to explain the impacts of tandem duplications is that whole gene duplications commonly produce additive changes in gene expression due to copy number changes. Here, we use genome wide RNA-seq data from a population sample of Drosophila yakuba to test this ‘gene dosage’ hypothesis. We observe little evidence of expression changes in response to whole transcript duplication capturing 5′ and 3′ UTRs. Among whole gene duplications, we observe evidence that dosage sharing across copies is likely to be common. The lack of expression changes after whole gene duplication suggests that the majority of genes are subject to tight regulatory control and therefore not sensitive to changes in gene copy number. Rather, we observe changes in expression level due to both shuffling of regulatory elements and the creation of chimeric structures via tandem duplication. Additionally, we observe 30 de novo gene structures arising from tandem duplications, 23 of which form with expression in the testes. Thus, the value of tandem duplications is likely to be more intricate than simple changes in gene dosage. The common regulatory effects from chimeric gene formation after tandem duplication may explain their contribution to genome evolution. The enclosed work shows that whole gene duplications rarely affect gene expression, in contrast to widely held views that the adaptive value of duplicate genes is related to additive changes in gene expression due to gene copy number. We further explain how tandem duplications that create shuffled gene structures can force upregulation of gene sequences, de novo gene creation, and multifold changes in transcript levels. These results show that tandem duplications can produce new genes that are a source of immediate novelty associated with more extreme expression changes than previously suggested by theory. Further, these gene expression changes are a potential source of both beneficial and pathogenic mutations, immediately relevant to clinical and medical genetics in humans and other metazoans.
Collapse
|
5
|
Abstract
By definition, pseudogenes are relics of former genes that no longer possess biological functions. Operationally, they are identified based on disruptions of open reading frames (ORFs) or presumed losses of promoter activities. Intriguingly, a recent human proteomic study reported peptides encoded by 107 pseudogenes. These peptides may play currently unrecognized physiological roles. Alternatively, they may have resulted from accidental translations of pseudogene transcripts and possess no function. Comparing between human and macaque orthologs, we show that the nonsynonymous to synonymous substitution rate ratio (ω) is significantly smaller for translated pseudogenes than other pseudogenes. In particular, five of 34 translated pseudogenes amenable to evolutionary analysis have ω values significantly lower than 1, indicative of the action of purifying selection. This and other findings demonstrate that some but not all translated pseudogenes have selected functions at the protein level. Hence, neither ORF disruption nor presence of protein product disproves or proves gene functionality at the protein level.
Collapse
Affiliation(s)
- Jinrui Xu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor
| |
Collapse
|
6
|
Stolzer M, Siewert K, Lai H, Xu M, Durand D. Event inference in multidomain families with phylogenetic reconciliation. BMC Bioinformatics 2015; 16 Suppl 14:S8. [PMID: 26451642 PMCID: PMC4610023 DOI: 10.1186/1471-2105-16-s14-s8] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Reconstructing evolution provides valuable insights into the processes of gene evolution and function. However, while there have been great advances in algorithms and software to reconstruct the history of gene families, these tools do not model the domain shuffling events (domain duplication, insertion, transfer, and deletion) that drive the evolution of multidomain protein families. Protein evolution through domain shuffling events allows for rapid exploration of functions by introducing new combinations of existing folds. This powerful mechanism was key to some significant evolutionary innovations, such as multicellularity and the vertebrate immune system. A method for reconstructing this important evolutionary process is urgently needed. RESULTS Here, we introduce a novel, event-based framework for studying multidomain evolution by reconciling a domain tree with a gene tree, with additional information provided by the species tree. In the context of this framework, we present the first reconciliation algorithms to infer domain shuffling events, while addressing the challenges inherent in the inference of evolution across three levels of organization. CONCLUSIONS We apply these methods to the evolution of domains in the Membrane associated Guanylate Kinase family. These case studies reveal a more vivid and detailed evolutionary history than previously provided. Our algorithms have been implemented in software, freely available at http://www.cs.cmu.edu/˜durand/Notung.
Collapse
|
7
|
Gaudry MJ, Storz JF, Butts GT, Campbell KL, Hoffmann FG. Repeated evolution of chimeric fusion genes in the β-globin gene family of laurasiatherian mammals. Genome Biol Evol 2014; 6:1219-34. [PMID: 24814285 PMCID: PMC4041002 DOI: 10.1093/gbe/evu097] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/05/2014] [Indexed: 12/13/2022] Open
Abstract
The evolutionary fate of chimeric fusion genes may be strongly influenced by their recombinational mode of origin and the nature of functional divergence between the parental genes. In the β-globin gene family of placental mammals, the two postnatally expressed δ- and β-globin genes (HBD and HBB, respectively) have a propensity for recombinational exchange via gene conversion and unequal crossing-over. In the latter case, there are good reasons to expect differences in retention rates for the reciprocal HBB/HBD and HBD/HBB fusion genes due to thalassemia pathologies associated with the HBD/HBB "Lepore" deletion mutant in humans. Here, we report a comparative genomic analysis of the mammalian β-globin gene cluster, which revealed that chimeric HBB/HBD fusion genes originated independently in four separate lineages of laurasiatherian mammals: Eulipotyphlans (shrews, moles, and hedgehogs), carnivores, microchiropteran bats, and cetaceans. In cases where an independently derived "anti-Lepore" duplication mutant has become fixed, the parental HBD and/or HBB genes have typically been inactivated or deleted, so that the newly created HBB/HBD fusion gene is primarily responsible for synthesizing the β-type subunits of adult and fetal hemoglobin (Hb). Contrary to conventional wisdom that the HBD gene is a vestigial relict that is typically inactivated or expressed at negligible levels, we show that HBD-like genes often encode a substantial fraction (20-100%) of β-chain Hbs in laurasiatherian taxa. Our results indicate that the ascendancy or resuscitation of genes with HBD-like coding sequence requires the secondary acquisition of HBB-like promoter sequence via unequal crossing-over or interparalog gene conversion.
Collapse
Affiliation(s)
- Michael J Gaudry
- Department of Biological Sciences, University of Manitoba, Winnipeg, MB, Canada
| | - Jay F Storz
- School of Biological Sciences, University of Nebraska, Lincoln
| | - Gary Tyler Butts
- Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State University
| | - Kevin L Campbell
- Department of Biological Sciences, University of Manitoba, Winnipeg, MB, Canada
| | - Federico G Hoffmann
- Department of Biochemistry, Molecular Biology, Entomology, and Plant Pathology, Mississippi State UniversityInstitute for Genomics, Biocomputing and Biotechnology, Mississippi State University
| |
Collapse
|
8
|
Rogers RL, Cridland JM, Shao L, Hu TT, Andolfatto P, Thornton KR. Landscape of standing variation for tandem duplications in Drosophila yakuba and Drosophila simulans. Mol Biol Evol 2014; 31:1750-66. [PMID: 24710518 PMCID: PMC4069613 DOI: 10.1093/molbev/msu124] [Citation(s) in RCA: 65] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
We have used whole genome paired-end Illumina sequence data to identify tandem duplications in 20 isofemale lines of Drosophila yakuba and 20 isofemale lines of D. simulans and performed genome wide validation with PacBio long molecule sequencing. We identify 1,415 tandem duplications that are segregating in D. yakuba as well as 975 duplications in D. simulans, indicating greater variation in D. yakuba. Additionally, we observe high rates of secondary deletions at duplicated sites, with 8% of duplicated sites in D. simulans and 17% of sites in D. yakuba modified with deletions. These secondary deletions are consistent with the action of the large loop mismatch repair system acting to remove polymorphic tandem duplication, resulting in rapid dynamics of gain and loss in duplicated alleles and a richer substrate of genetic novelty than has been previously reported. Most duplications are present in only single strains, suggesting that deleterious impacts are common. Drosophila simulans shows larger numbers of whole gene duplications in comparison to larger proportions of gene fragments in D. yakuba. Drosophila simulans displays an excess of high-frequency variants on the X chromosome, consistent with adaptive evolution through duplications on the D. simulans X or demographic forces driving duplicates to high frequency. We identify 78 chimeric genes in D. yakuba and 38 chimeric genes in D. simulans, as well as 143 cases of recruited noncoding sequence in D. yakuba and 96 in D. simulans, in agreement with rates of chimeric gene origination in D. melanogaster. Together, these results suggest that tandem duplications often result in complex variation beyond whole gene duplications that offers a rich substrate of standing variation that is likely to contribute both to detrimental phenotypes and disease, as well as to adaptive evolutionary change.
Collapse
Affiliation(s)
- Rebekah L Rogers
- Department of Ecology and Evolutionary Biology, University of California, Irvine
| | - Julie M Cridland
- Department of Ecology and Evolutionary Biology, University of California, IrvineDepartment of Ecology and Evolutionary Biology, University of California, Davis
| | - Ling Shao
- Department of Ecology and Evolutionary Biology, University of California, Irvine
| | - Tina T Hu
- Department of Ecology and Evolutionary Biology and the Lewis Sigler Institute for Integrative Genomics, Princeton University
| | - Peter Andolfatto
- Department of Ecology and Evolutionary Biology and the Lewis Sigler Institute for Integrative Genomics, Princeton University
| | - Kevin R Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine
| |
Collapse
|
9
|
Zhang C, Wang J, Marowsky NC, Long M, Wing RA, Fan C. High occurrence of functional new chimeric genes in survey of rice chromosome 3 short arm genome sequences. Genome Biol Evol 2013; 5:1038-48. [PMID: 23651622 PMCID: PMC3673630 DOI: 10.1093/gbe/evt071] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
In an effort to identify newly evolved genes in rice, we searched the genomes of Asian-cultivated rice Oryza sativa ssp. japonica and its wild progenitors, looking for lineage-specific genes. Using genome pairwise comparison of approximately 20-Mb DNA sequences from the chromosome 3 short arm (Chr3s) in six rice species, O. sativa, O. nivara, O. rufipogon, O. glaberrima, O. barthii, and O. punctata, combined with synonymous substitution rate tests and other evidence, we were able to identify potential recently duplicated genes, which evolved within the last 1 Myr. We identified 28 functional O. sativa genes, which likely originated after O. sativa diverged from O. glaberrima. These genes account for around 1% (28/3,176) of all annotated genes on O. sativa's Chr3s. Among the 28 new genes, two recently duplicated segments contained eight genes. Fourteen of the 28 new genes consist of chimeric gene structure derived from one or multiple parental genes and flanking targeting sequences. Although the majority of these 28 new genes were formed by single or segmental DNA-based gene duplication and recombination, we found two genes that were likely originated partially through exon shuffling. Sequence divergence tests between new genes and their putative progenitors indicated that new genes were most likely evolving under natural selection. We showed all 28 new genes appeared to be functional, as suggested by Ka/Ks analysis and the presence of RNA-seq, cDNA, expressed sequence tag, massively parallel signature sequencing, and/or small RNA data. The high rate of new gene origination and of chimeric gene formation in rice may demonstrate rice's broad diversification, domestication, its environmental adaptation, and the role of new genes in rice speciation.
Collapse
Affiliation(s)
- Chengjun Zhang
- Department of Ecology and Evolution, University of Chicago, USA
| | | | | | | | | | | |
Collapse
|
10
|
Teixeira PJPL, Costa GGL, Fiorin GL, Pereira GAG, Mondego JMC. Novel receptor-like kinases in cacao contain PR-1 extracellular domains. MOLECULAR PLANT PATHOLOGY 2013; 14:602-9. [PMID: 23573899 PMCID: PMC6638629 DOI: 10.1111/mpp.12028] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/08/2023]
Abstract
Members of the pathogenesis-related protein 1 (PR-1) family are well-known markers of plant defence responses, forming part of the arsenal of the secreted proteins produced on pathogen recognition. Here, we report the identification of two cacao (Theobroma cacao L.) PR-1s that are fused to transmembrane regions and serine/threonine kinase domains, in a manner characteristic of receptor-like kinases (RLKs). These proteins (TcPR-1f and TcPR-1g) were named PR-1 receptor kinases (PR-1RKs). Phylogenetic analysis of RLKs and PR-1 proteins from cacao indicated that PR-1RKs originated from a fusion between sequences encoding PR-1 and the kinase domain of a LecRLK (Lectin Receptor-Like Kinase). Retrotransposition marks surround TcPR-1f, suggesting that retrotransposition was involved in the origin of PR-1RKs. Genes with a similar domain architecture to cacao PR-1RKs were found in rice (Oryza sativa), barrel medic (Medicago truncatula) and a nonphototrophic bacterium (Herpetosiphon aurantiacus). However, their kinase domains differed from those found in LecRLKs, indicating the occurrence of convergent evolution. TcPR-1g expression was up-regulated in the biotrophic stage of witches' broom disease, suggesting a role for PR-1RKs during cacao defence responses. We hypothesize that PR-1RKs transduce a defence signal by interacting with a PR-1 ligand.
Collapse
Affiliation(s)
- Paulo José Pereira Lima Teixeira
- Laboratório de Genômica e Expressão, Departamento de Genética, Evolução e Bioagentes, Instituto de Biologia, Universidade Estadual de Campinas-Uicamp, CP 6109, Campinas, SP 13083-970, Brazil
| | | | | | | | | |
Collapse
|
11
|
Schrider DR, Navarro FCP, Galante PAF, Parmigiani RB, Camargo AA, Hahn MW, de Souza SJ. Gene copy-number polymorphism caused by retrotransposition in humans. PLoS Genet 2013; 9:e1003242. [PMID: 23359205 PMCID: PMC3554589 DOI: 10.1371/journal.pgen.1003242] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2012] [Accepted: 11/28/2012] [Indexed: 01/05/2023] Open
Abstract
The era of whole-genome sequencing has revealed that gene copy-number changes caused by duplication and deletion events have important evolutionary, functional, and phenotypic consequences. Recent studies have therefore focused on revealing the extent of variation in copy-number within natural populations of humans and other species. These studies have found a large number of copy-number variants (CNVs) in humans, many of which have been shown to have clinical or evolutionary importance. For the most part, these studies have failed to detect an important class of gene copy-number polymorphism: gene duplications caused by retrotransposition, which result in a new intron-less copy of the parental gene being inserted into a random location in the genome. Here we describe a computational approach leveraging next-generation sequence data to detect gene copy-number variants caused by retrotransposition (retroCNVs), and we report the first genome-wide analysis of these variants in humans. We find that retroCNVs account for a substantial fraction of gene copy-number differences between any two individuals. Moreover, we show that these variants may often result in expressed chimeric transcripts, underscoring their potential for the evolution of novel gene functions. By locating the insertion sites of these duplicates, we are able to show that retroCNVs have had an important role in recent human adaptation, and we also uncover evidence that positive selection may currently be driving multiple retroCNVs toward fixation. Together these findings imply that retroCNVs are an especially important class of polymorphism, and that future studies of copy-number variation should search for these variants in order to illuminate their potential evolutionary and functional relevance. Recent studies of human genetic variation have revealed that, in addition to differing at single nucleotide polymorphisms, individuals differ in copy-number at many regions of the genome. These copy-number variants (CNVs) are caused by duplication or deletion events and often affect functional sequences such as genes. Efforts to reveal the functional impact of CNVs have identified many variants increasing the risk of various disorders, and some that are adaptive. However, these studies mostly fail to detect gene duplications caused by retrotransposition, in which an mRNA transcript is reverse-transcribed and reinserted into the genome, yielding a new intron-less gene copy. Here we describe a method leveraging next-generation sequence data to accurately detect gene copy-number variants caused by retrotransposition, or retroCNVs, and apply this method to hundreds of whole-genome sequences from three different human subpopulations. We find that these variants account for a substantial number of gene copy-number differences between individuals, and that gene retrotransposition may often result in both deleterious and beneficial mutations. Indeed, we present evidence that two of these new gene duplications may be adaptive. These results imply that retroCNVs are an especially important class of CNV and should be included in future studies of human copy-number variation.
Collapse
Affiliation(s)
- Daniel R. Schrider
- Department of Biology and School of Informatics and Computing, Indiana University, Bloomington, Indiana, United States of America
- * E-mail: (DRS); (FCPN)
| | - Fabio C. P. Navarro
- São Paulo Branch, Ludwig Institute for Cancer Research, São Paulo, Brazil
- Departamento de Bioquímica, Universidade de São Paulo, São Paulo, Brazil
- Centro de Oncologia Molecular–Hospital Sírio-Libanês, São Paulo, Brazil
- * E-mail: (DRS); (FCPN)
| | - Pedro A. F. Galante
- São Paulo Branch, Ludwig Institute for Cancer Research, São Paulo, Brazil
- Centro de Oncologia Molecular–Hospital Sírio-Libanês, São Paulo, Brazil
| | - Raphael B. Parmigiani
- São Paulo Branch, Ludwig Institute for Cancer Research, São Paulo, Brazil
- Centro de Oncologia Molecular–Hospital Sírio-Libanês, São Paulo, Brazil
| | - Anamaria A. Camargo
- São Paulo Branch, Ludwig Institute for Cancer Research, São Paulo, Brazil
- Centro de Oncologia Molecular–Hospital Sírio-Libanês, São Paulo, Brazil
| | - Matthew W. Hahn
- Department of Biology and School of Informatics and Computing, Indiana University, Bloomington, Indiana, United States of America
| | - Sandro J. de Souza
- São Paulo Branch, Ludwig Institute for Cancer Research, São Paulo, Brazil
- Brain Institute, Federal University of Rio Grande do Norte, Natal, Brazil
| |
Collapse
|
12
|
Owens SM, Harberson NA, Moore RC. Asymmetric functional divergence of young, dispersed gene duplicates in Arabidopsis thaliana. J Mol Evol 2013; 76:13-27. [PMID: 23344714 DOI: 10.1007/s00239-012-9530-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2012] [Accepted: 10/29/2012] [Indexed: 11/28/2022]
Abstract
One prediction of the classic Ohno model of gene duplication predicts that new genes form from the asymmetric functional divergence of a newly arisen, redundant duplicate locus. In order to understand the mechanisms which give rise to functional divergence of newly formed dispersed duplicates, we assessed the expression and molecular evolutionary divergence of a suite of 19 highly similar dispersed duplicates in Arabidopsis thaliana. These duplicates have a K sil equal to or less than 5 % and are specific to the A. thaliana lineage; thus, they predictably represent some of the youngest duplicates in the A. thaliana genome. We found that the majority of young duplicate loci exhibit asymmetric expression patterns, with the daughter locus exhibiting reduced expression across all tissues analyzed relative to the progenitor locus or simply not expressed. Furthermore, daughter loci, on the whole, have significantly more nonsynonymous substitutions than the progenitor loci. We also identified four pairs of loci which exhibit significant (P < 0.05) evolutionary rate asymmetry, three of which exhibit elevated dN/dS in the duplicate copy. We suggest, based on these data, that functional diversification initially takes the form of asymmetric regulatory divergence that can be a direct consequence of the mode of duplication. The reduced and/or absence of expression in the daughter copy relaxes functional constraint on its protein coding sequence leading to the asymmetric accumulation of nonsynonymous mutations. Thus, our data both affirm Ohno's prediction while explaining the mechanism by which functional divergence initially occurs following duplication for dispersed gene duplicates.
Collapse
Affiliation(s)
- Sarah M Owens
- Botany Department, Miami University, Oxford, OH 45056, USA
| | | | | |
Collapse
|
13
|
Katju V. In with the old, in with the new: the promiscuity of the duplication process engenders diverse pathways for novel gene creation. INTERNATIONAL JOURNAL OF EVOLUTIONARY BIOLOGY 2012; 2012:341932. [PMID: 23008799 PMCID: PMC3449122 DOI: 10.1155/2012/341932] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 06/03/2012] [Indexed: 01/26/2023]
Abstract
The gene duplication process has exhibited far greater promiscuity in the creation of paralogs with novel exon-intron structures than anticipated even by Ohno. In this paper I explore the history of the field, from the neo-Darwinian synthesis through Ohno's formulation of the canonical model for the evolution of gene duplicates and culminating in the present genomic era. I delineate the major tenets of Ohno's model and discuss its failure to encapsulate the full complexity of the duplication process as revealed in the era of genomics. I discuss the diverse classes of paralogs originating from both DNA- and RNA-mediated duplication events and their evolutionary potential for assuming radically altered functions, as well as the degree to which they can function unconstrained from the pressure of gene conversion. Lastly, I explore theoretical population-genetic considerations of how the effective population size (N(e)) of a species may influence the probability of emergence of genes with radically altered functions.
Collapse
Affiliation(s)
- Vaishali Katju
- Department of Biology, University of New Mexico, Albuquerque, NM 87131, USA
| |
Collapse
|
14
|
Novel genes from formation to function. INTERNATIONAL JOURNAL OF EVOLUTIONARY BIOLOGY 2012; 2012:821645. [PMID: 22811949 PMCID: PMC3395120 DOI: 10.1155/2012/821645] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2012] [Accepted: 04/26/2012] [Indexed: 11/29/2022]
Abstract
The study of the evolution of novel genes generally focuses on the formation of new coding sequences. However, equally important in the evolution of novel functional genes are the formation of regulatory regions that allow the expression of the genes and the effects of the new genes in the organism as well. Herein, we discuss the current knowledge on the evolution of novel functional genes, and we examine in more detail the youngest genes discovered. We examine the existing data on a very recent and rapidly evolving cluster of duplicated genes, the Sdic gene cluster. This cluster of genes is an excellent model for the evolution of novel genes, as it is very recent and may still be in the process of evolving.
Collapse
|
15
|
Ranz JM, Parsch J. Newly evolved genes: moving from comparative genomics to functional studies in model systems. How important is genetic novelty for species adaptation and diversification? Bioessays 2012; 34:477-83. [PMID: 22461005 DOI: 10.1002/bies.201100177] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Genes are gained and lost over the course of evolution. A recent study found that over 1,800 new genes have appeared during primate evolution and that an unexpectedly high proportion of these genes are expressed in the human brain. But what are the molecular functions of newly evolved genes and what is their impact on an organism's fitness? The acquisition of new genes may provide a rich source of genetic diversity that fuels evolutionary innovation. Although gene manipulation experiments are not feasible in humans, studies in model organisms, such as Drosophila melanogaster, have shown that new genes can quickly become integrated into genetic networks and become essential for survival or fertility. Future studies of new genes, especially chimeric genes, and their functions will help determine the role of genetic novelty in the adaptation and diversification of species.
Collapse
Affiliation(s)
- José M Ranz
- Department of Ecology and Evolutionary Biology, University of California-Irvine, CA, USA.
| | | |
Collapse
|
16
|
Wu YC, Rasmussen MD, Kellis M. Evolution at the subgene level: domain rearrangements in the Drosophila phylogeny. Mol Biol Evol 2011; 29:689-705. [PMID: 21900599 PMCID: PMC3258039 DOI: 10.1093/molbev/msr222] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023] Open
Abstract
Although the possibility of gene evolution by domain rearrangements has long been appreciated, current methods for reconstructing and systematically analyzing gene family evolution are limited to events such as duplication, loss, and sometimes, horizontal transfer. However, within the Drosophila clade, we find domain rearrangements occur in 35.9% of gene families, and thus, any comprehensive study of gene evolution in these species will need to account for such events. Here, we present a new computational model and algorithm for reconstructing gene evolution at the domain level. We develop a method for detecting homologous domains between genes and present a phylogenetic algorithm for reconstructing maximum parsimony evolutionary histories that include domain generation, duplication, loss, merge (fusion), and split (fission) events. Using this method, we find that genes involved in fusion and fission are enriched in signaling and development, suggesting that domain rearrangements and reuse may be crucial in these processes. We also find that fusion is more abundant than fission, and that fusion and fission events occur predominantly alongside duplication, with 92.5% and 34.3% of fusion and fission events retaining ancestral architectures in the duplicated copies. We provide a catalog of ∼9,000 genes that undergo domain rearrangement across nine sequenced species, along with possible mechanisms for their formation. These results dramatically expand on evolution at the subgene level and offer several insights into how new genes and functions arise between species.
Collapse
Affiliation(s)
- Yi-Chieh Wu
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Massachusetts, USA.
| | | | | |
Collapse
|
17
|
Rogers RL, Hartl DL. Chimeric genes as a source of rapid evolution in Drosophila melanogaster. Mol Biol Evol 2011; 29:517-29. [PMID: 21771717 DOI: 10.1093/molbev/msr184] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Chimeric genes form through the combination of portions of existing coding sequences to create a new open reading frame. These new genes can create novel protein structures that are likely to serve as a strong source of novelty upon which selection can act. We have identified 14 chimeric genes that formed through DNA-level mutations in Drosophila melanogaster, and we investigate expression profiles, domain structures, and population genetics for each of these genes to examine their potential to effect adaptive evolution. We find that chimeric gene formation commonly produces mid-domain breaks and unites portions of wholly unrelated peptides, creating novel protein structures that are entirely distinct from other constructs in the genome. These new genes are often involved in selective sweeps. We further find a disparity between chimeric genes that have recently formed and swept to fixation versus chimeric genes that have been preserved over long periods of time, suggesting that preservation and adaptation are distinct processes. Finally, we demonstrate that chimeric gene formation can produce qualitative expression changes that are difficult to mimic through duplicate gene formation, and that extremely young chimeric genes (d(S) < 0.03) are more likely to be associated with selective sweeps than duplicate genes of the same age. Hence, chimeric genes can serve as an exceptional source of genetic novelty that can have a profound influence on adaptive evolution in D. melanogaster.
Collapse
Affiliation(s)
- Rebekah L Rogers
- Department of Organismic and Evolutionary Biology, Harvard University, USA.
| | | |
Collapse
|
18
|
Dynamic programming procedure for searching optimal models to estimate substitution rates based on the maximum-likelihood method. Proc Natl Acad Sci U S A 2011; 108:7860-5. [PMID: 21521791 DOI: 10.1073/pnas.1018621108] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The substitution rate in a gene can provide valuable information for understanding its functionality and evolution. A widely used method to estimate substitution rates is the maximum-likelihood method implemented in the CODEML program in the PAML package. A limited number of branch models, chosen based on a priori information or an interest in a particular lineage(s), are tested, whereas a large number of potential models are neglected. A complementary approach is also needed to test all or a large number of possible models to search for the globally optional model(s) of maximum likelihood. However, the computational time for this search even in a small number of sequences becomes impractically long. Thus, it is desirable to explore the most probable spaces to search for the optimal models. Using dynamic programming techniques, we developed a simple computational method for searching the most probable optimal branch-specific models in a practically feasible computational time. We propose three search methods to find the optimal models, which explored O(n) (method 1) to O(n(2)) (method 2 and method 3) models when the given phylogeny has n branches. In addition, we derived a formula to calculate the number of all possible models, revealing the complexity of finding the optimal branch-specific model. We show that in a reanalysis of over 50 previously published studies, the vast majority obtained better models with significantly higher likelihoods than the conventional hypothesis model methods.
Collapse
|
19
|
Rogers RL, Bedford T, Lyons AM, Hartl DL. Adaptive impact of the chimeric gene Quetzalcoatl in Drosophila melanogaster. Proc Natl Acad Sci U S A 2010; 107:10943-8. [PMID: 20534482 PMCID: PMC2890713 DOI: 10.1073/pnas.1006503107] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Chimeric genes, which form through the genomic fusion of two protein-coding genes, are a significant source of evolutionary novelty in Drosophila melanogaster. However, the propensity of chimeric genes to produce adaptive phenotypic changes is not fully understood. Here, we describe the chimeric gene Quetzalcoatl (Qtzl; CG31864), which formed in the recent past and swept to fixation in D. melanogaster. Qtzl arose through a duplication on chromosome 2L that united a portion of the mitochondrially targeted peptide CG12264 with a segment of the polycomb gene escl. The 3' segment of the gene, which is derived from escl, is inherited out of frame, producing a unique peptide sequence. Nucleotide diversity is drastically reduced and site frequency spectra are significantly skewed surrounding the duplicated region, a finding consistent with a selective sweep on the duplicate region containing Qtzl. Qtzl has an expression profile that largely resembles that of escl, with expression in early pupae, adult females, and male testes. However, expression patterns appear to have been decoupled from both parental genes during later embryonic development and in head tissues of adult males, indicating that Qtzl has developed a distinct regulatory profile through the rearrangement of different 5' and 3' regulatory domains. Furthermore, misexpression of Qtzl suppresses defects in the formation of the neuromuscular junction in larvae, demonstrating that Qtzl can produce phenotypic effects in cells. Together, these results show that chimeric genes can produce structural and regulatory changes in a single mutational step and may be a major factor in adaptive evolution.
Collapse
Affiliation(s)
- Rebekah L. Rogers
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138; and
| | - Trevor Bedford
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109
| | - Ana M. Lyons
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138; and
| | - Daniel L. Hartl
- Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138; and
| |
Collapse
|
20
|
Cridland JM, Thornton KR. Validation of rearrangement break points identified by paired-end sequencing in natural populations of Drosophila melanogaster. Genome Biol Evol 2010; 2:83-101. [PMID: 20333226 PMCID: PMC2839345 DOI: 10.1093/gbe/evq001] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/08/2010] [Indexed: 01/17/2023] Open
Abstract
Several recent studies have focused on the evolution of recently duplicated genes in Drosophila. Currently, however, little is known about the evolutionary forces acting upon duplications that are segregating in natural populations. We used a high-throughput, paired-end sequencing platform (Illumina) to identify structural variants in a population sample of African D. melanogaster. Polymerase chain reaction and sequencing confirmation of duplications detected by multiple, independent paired-ends showed that paired-end sequencing reliably uncovered the break points of structural rearrangements and allowed us to identify a number of tandem duplications segregating within a natural population. Our confirmation experiments show that rates of confirmation are very high, even at modest coverage. Our results also compare well with previous studies using microarrays (Emerson J, Cardoso-Moreira M, Borevitz JO, Long M. 2008. Natural selection shapes genome wide patterns of copy-number polymorphism in Drosophila melanogaster. Science. 320:1629-1631. and Dopman EB, Hartl DL. 2007. A portrait of copy-number polymorphism in Drosophila melanogaster. Proc Natl Acad Sci U S A. 104:19920-19925.), which both gives us confidence in the results of this study as well as confirms previous microarray results.We were also able to identify whole-gene duplications, such as a novel duplication of Or22a, an olfactory receptor, and identify copy-number differences in genes previously known to be under positive selection, like Cyp6g1, which confers resistance to dichlorodiphenyltrichloroethane. Several "hot spots" of duplications were detected in this study, which indicate that particular regions of the genome may be more prone to generating duplications. Finally, population frequency analysis of confirmed events also showed an excess of rare variants in our population, which indicates that duplications segregating in the population may be deleterious and ultimately destined to be lost from the population.
Collapse
Affiliation(s)
- Julie M Cridland
- Department of Ecology and Evolutionary Biology, University of California, Irvine, USA
| | | |
Collapse
|
21
|
Hahn MW. Distinguishing among evolutionary models for the maintenance of gene duplicates. J Hered 2009; 100:605-17. [PMID: 19596713 DOI: 10.1093/jhered/esp047] [Citation(s) in RCA: 259] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Determining the evolutionary forces responsible for the maintenance of gene duplicates is key to understanding the processes leading to evolutionary adaptation and novelty. In his highly prescient book, Susumu Ohno recognized that duplicate genes are fixed and maintained within a population with 3 distinct outcomes: neofunctionalization, subfunctionalization, and conservation of function. Subsequent researchers have proposed a multitude of population genetic models that lead to these outcomes, each differing largely in the role played by adaptive natural selection. In this paper, I present a nonmathematical review of these models, their predictions, and the evidence collected in support of each of them. Though the various outcomes of gene duplication are often strictly associated with the presence or absence of adaptive natural selection, I argue that determining the outcome of duplication is orthogonal to determining whether natural selection has acted. Despite an ever-growing field of research into the fate of gene duplicates, there is not yet clear evidence for the preponderance of one outcome over the others, much less evidence for the importance of adaptive or nonadaptive forces in maintaining these duplicates.
Collapse
Affiliation(s)
- Matthew W Hahn
- Department of Biology and School of Informatics, Indiana University, Bloomington, IN 47405, USA.
| |
Collapse
|
22
|
Opazo JC, Sloan AM, Campbell KL, Storz JF. Origin and ascendancy of a chimeric fusion gene: the beta/delta-globin gene of paenungulate mammals. Mol Biol Evol 2009; 26:1469-78. [PMID: 19332641 PMCID: PMC2727371 DOI: 10.1093/molbev/msp064] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/15/2009] [Indexed: 11/12/2022] Open
Abstract
The delta-globin gene (HBD) of eutherian mammals exhibits a propensity for recombinational exchange with the closely linked beta-globin gene (HBB) and has been independently converted by the HBB gene in multiple lineages. Here we report the presence of a chimeric beta/delta fusion gene in the African elephant (Loxodonta africana) that was created by unequal crossing-over between misaligned HBD and HBB paralogs. The recombinant chromosome that harbors the beta/delta fusion gene in elephants is structurally similar to the "anti-Lepore" duplication mutant of humans (the reciprocal exchange product of the hemoglobin Lepore deletion mutant). However, the situation in the African elephant is unique in that the chimeric beta/delta fusion gene supplanted the parental HBB gene and is therefore solely responsible for synthesizing the beta-chain subunits of adult hemoglobin. A phylogenetic survey of beta-like globin genes in afrotherian and xenarthran mammals revealed that the origin of the chimeric beta/delta fusion gene and the concomitant inactivation of the HBB gene predated the radiation of "Paenungulata," a clade of afrotherian mammals that includes three orders: Proboscidea (elephants), Sirenia (dugongs and manatees), and Hyracoidea (hyraxes). The reduced fitness of the human Hb Lepore deletion mutant helps to explain why independently derived beta/delta fusion genes (which occur on an anti-Lepore chromosome) have been fixed in a number of mammalian lineages, whereas the reciprocal delta/beta fusion gene (which occurs on a Lepore chromosome) has yet to be documented in any nonhuman mammal. This illustrates how the evolutionary fates of chimeric fusion genes can be strongly influenced by their recombinational mode of origin.
Collapse
Affiliation(s)
- Juan C Opazo
- School of Biological Sciences, University of Nebraska, Nebraska, USA
| | | | | | | |
Collapse
|
23
|
Patterns of amino acid evolution in the Drosophila ananassae chimeric gene, siren, parallel those of other Adh-derived chimeras. Genetics 2008; 180:1261-3. [PMID: 18780749 DOI: 10.1534/genetics.108.090068] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
siren1 and siren2 are novel alcohol dehydrogenase (Adh)-derived chimeric genes in the Drosophila bipectinata complex. D. ananassae, however, harbors a single homolog of these genes. Like other Adh-derived chimeric genes, siren evolved adaptively shortly after it was formed. These changes likely shifted the catalytic activity of siren.
Collapse
|
24
|
Chen ST, Cheng HC, Barbash DA, Yang HP. Evolution of hydra, a recently evolved testis-expressed gene with nine alternative first exons in Drosophila melanogaster. PLoS Genet 2008; 3:e107. [PMID: 17616977 PMCID: PMC1904467 DOI: 10.1371/journal.pgen.0030107] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2007] [Accepted: 05/15/2007] [Indexed: 12/26/2022] Open
Abstract
We describe here the Drosophila gene hydra that appears to have originated de novo in the melanogaster subgroup and subsequently evolved in both structure and expression level in Drosophila melanogaster and its sibling species. D. melanogaster hydra encodes a predicted protein of ~300 amino acids with no apparent similarity to any previously known proteins. The syntenic region flanking hydra on both sides is found in both D. ananassae and D. pseudoobscura, but hydra is found only in melanogaster subgroup species, suggesting that it originated less than ~13 million y ago. Exon 1 of hydra has undergone recurrent duplications, leading to the formation of nine tandem alternative exon 1s in D. melanogaster. Seven of these alternative exons are flanked on their 3′ side by the transposon DINE-1 (Drosophila interspersed element-1). We demonstrate that at least four of the nine duplicated exon 1s can function as alternative transcription start sites. The entire hydra locus has also duplicated in D. simulans and D. sechellia. D. melanogaster hydra is expressed most intensely in the proximal testis, suggesting a role in late-stage spermatogenesis. The coding region of hydra has a relatively high Ka/Ks ratio between species, but the ratio is less than 1 in all comparisons, suggesting that hydra is subject to functional constraint. Analysis of sequence polymorphism and divergence of hydra shows that it has evolved under positive selection in the lineage leading to D. melanogaster. The dramatic structural changes surrounding the first exons do not affect the tissue specificity of gene expression: hydra is expressed predominantly in the testes in D. melanogaster, D. simulans, and D. yakuba. However, we have found that expression level changed dramatically (~ >20-fold) between D. melanogaster and D. simulans. While hydra initially evolved in the absence of nearby transposable element insertions, we suggest that the subsequent accumulation of repetitive sequences in the hydra region may have contributed to structural and expression-level evolution by inducing rearrangements and causing local heterochromatinization. Our analysis further shows that recurrent evolution of both gene structure and expression level may be characteristics of newly evolved genes. We also suggest that late-stage spermatogenesis is the functional target for newly evolved and rapidly evolving male-specific genes. Similar groups of animals have similar numbers of genes, but not all of these genes are the same. While some genes are highly conserved and can be easily and uniquely identified in species ranging from yeast to plants to humans, other genes are sometimes found in only a small number or even in a single species. Such newly evolved genes may help produce traits that make species unique. We describe here a newly evolved gene called hydra that occurs only in a small subgroup of Drosophila species. hydra is expressed in the testes, suggesting that it may have a function in male fertility. hydra has evolved significantly in its structure and protein-coding sequence among species. The authors named the gene hydra after the nine-headed monster slain by Hercules because in one species, Drosophila melanogaster, hydra has nine potential alternative first exons. Perhaps because of this or other structural changes, the level of RNA made by hydra differs significantly between one pair of species. This analysis reveals that newly created genes may evolve rapidly in sequence, structure, and expression level.
Collapse
Affiliation(s)
- Shou-Tao Chen
- Faculty of Life Sciences and Institute of Genome Sciences, National Yang-Ming University, Taipei, Taiwan, Republic of China
| | - Hsin-Chien Cheng
- Faculty of Life Sciences and Institute of Genome Sciences, National Yang-Ming University, Taipei, Taiwan, Republic of China
| | - Daniel A Barbash
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
| | - Hsiao-Pei Yang
- Faculty of Life Sciences and Institute of Genome Sciences, National Yang-Ming University, Taipei, Taiwan, Republic of China
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
25
|
Sequence similarity network reveals common ancestry of multidomain proteins. PLoS Comput Biol 2008; 4:e1000063. [PMID: 18475320 PMCID: PMC2377100 DOI: 10.1371/journal.pcbi.1000063] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2007] [Accepted: 03/18/2008] [Indexed: 11/25/2022] Open
Abstract
We address the problem of homology identification in complex multidomain families with varied domain architectures. The challenge is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. There are two major obstacles to multidomain homology identification: lack of a formal definition and lack of curated benchmarks for evaluating the performance of new methods. We offer preliminary solutions to both problems: 1) an extension of the traditional model of homology to include domain insertions; and 2) a manually curated benchmark of well-studied families in mouse and human. We further present Neighborhood Correlation, a novel method that exploits the local structure of the sequence similarity network to identify homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison using our curated data, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is well suited for automated, genome-scale analyses. It is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Homolog predictions obtained with our method, as well as our manually curated benchmark and a web-based visualization tool for exploratory analysis of the network neighborhood structure, are available at http://www.neighborhoodcorrelation.org. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. In contrast to current approaches that either focus on the homology of individual domains or consider only families with identical domain architectures, we show that homology can be rationally defined for multidomain families with diverse architectures by considering the genomic context of the genes that encode them. Our study demonstrates the utility of mining network structure for evolutionary information, suggesting this is a fertile approach for investigating evolutionary processes in the post-genomic era. New genes evolve through the duplication and modification of existing genes. As a result, genes that share common ancestry tend to have similar structure and function. Computational methods that use common ancestry have been extraordinarily successful in inferring function. The practice of discerning evolutionary relationships is stymied, however, by modular sequences made up of two or more domains. When two genes share some domains but not others, it is difficult to distinguish a case of common ancestry from insertion of the same domain into both genes. We present a formal framework to define how multidomain genes are related, and propose a novel method for rapid, robust characterization of evolutionary relationships. In an empirical comparison with the current state of the art, we demonstrate superior performance of our method using a large hand-curated set of sequences known to share common ancestry. The success of our method derives from its unique ability to infer evolutionary history from local topology in the sequence similarity network. This represents a departure from the view that protein family classification must be restricted to families with conserved architecture. By exploiting the structure of the sequence similarity network, our approach surmounts this limitation and opens the door to studies of the role of modularity in protein evolution.
Collapse
|
26
|
Thornton KR. The neutral coalescent process for recent gene duplications and copy-number variants. Genetics 2007; 177:987-1000. [PMID: 17720930 PMCID: PMC2034660 DOI: 10.1534/genetics.107.074948] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
I describe a method for simulating samples from gene families of size two under a neutral coalescent process, for the case where the duplicate gene either has fixed recently in the population or is still segregating. When a duplicate locus has recently fixed by genetic drift, diversity in the new gene is expected to be reduced, and an excess of rare alleles is expected, relative to the predictions of the standard coalescent model. The expected patterns of polymorphism in segregating duplicates ("copy-number variants") depend both on the frequency of the duplicate in the sample and on the rate of crossing over between the two loci. When the crossover rate between the ancestral gene and the copy-number variant is low, the expected pattern of variability in the ancestral gene will be similar to the predictions of models of either balancing or positive selection, if the frequency of the duplicate in the sample is intermediate or high, respectively. Simulations are used to investigate the effect of crossing over between loci, and gene conversion between the duplicate loci, on levels of variability and the site-frequency spectrum.
Collapse
Affiliation(s)
- Kevin R Thornton
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697, USA.
| |
Collapse
|
27
|
Song N, Sedgewick RD, Durand D. Domain architecture comparison for multidomain homology identification. J Comput Biol 2007; 14:496-516. [PMID: 17572026 DOI: 10.1089/cmb.2007.a009] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Homology identification is the first step for many genomic studies. Current methods, based on sequence comparison, can result in a substantial number of mis-assignments due to the similarity of homologous domains in otherwise unrelated sequences. Here we propose methods to detect homologs through explicit comparison of protein domain content. We developed several schemes for scoring the homology of a pair of protein sequences based on methods used in the field of information retrieval. We evaluate the proposed methods and methods used in the literature using a benchmark of fifteen sequence families of known evolutionary history. The results of these studies demonstrate the effectiveness of comparing domain architectures using these similarity measures. We also demonstrate the importance of both weighting promiscuous domains and of compensating for the statistical effect of having a large number of domains in a protein. Using logistic regression, we demonstrate the benefit of combining similarity measures based on domain content with sequence similarity measures.
Collapse
Affiliation(s)
- N Song
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA
| | | | | |
Collapse
|
28
|
Fan C, Long M. A New Retroposed Gene in Drosophila Heterochromatin Detected by Microarray-Based Comparative Genomic Hybridization. J Mol Evol 2006; 64:272-83. [PMID: 17177089 DOI: 10.1007/s00239-006-0169-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Accepted: 08/17/2006] [Indexed: 10/23/2022]
Abstract
A genomic pattern of new gene origination is often dependent on a genomic method that can efficiently identify a statistically adequate number of recently originated genes. The heterochromatic regions have often been viewed as genomic deserts with low coding potential and thus a low flux of new genes. However, increasing reports revealed unexpected roles of heterochromatic regions in the evolution of genes and genomes. We identified recently retroposed genes that originated in heterochromatic regions in Drosophila, by developing microarray-based comparative genomic hybridization (CGH) with multiple species. This new gene family, named Ifc-2h, originated in the common ancestor of the clade of D. simulans, D. mauritiana, and D. sechellia. The sequence features and phylogenetic distribution indicated that Ifc-2h resulted from the retroposition from its parental gene, Infertile crescent (Ifc), and integrated into heterochromatic region of common ancestor of the three sibling species 2 million years ago. Expression analysis revealed that Ifc-2h had developed a new expression pattern by recruiting a putative regulatory element from its target sequence. The distribution of indel variation in Ifc-2h of D. simulans and D. mauritiana revealed a significant sequence constraint, suggesting that the Ifc-2h gene may be functional. These analyses cast fresh insight into the evolution of heterochromatin and the origin of its coding regions.
Collapse
Affiliation(s)
- Chuanzhu Fan
- Department of Ecology and Evolution, The University of Chicago, 1101 East 57th Street, Chicago, IL 60637, USA
| | | |
Collapse
|
29
|
Akhunov ED, Akhunova AR, Dvorak J. Mechanisms and rates of birth and death of dispersed duplicated genes during the evolution of a multigene family in diploid and tetraploid wheats. Mol Biol Evol 2006; 24:539-50. [PMID: 17135334 DOI: 10.1093/molbev/msl183] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
A family of 5 genes that evolved within the past 1.9 Myr in diploid wheat was characterized. The ancestral gene, ALP-A1, is on chromosome 1A and encodes an aci-reductone dioxygenase-like protein. The duplicated genes ALP-A2, ALP-A3, ALP-A4.1, and ALP-A4.2 acquired complete coding sequences but lost the original promoter. They are on chromosomes 4A, 2A, 6A and 6A, respectively, and evolved sequentially, the youngest duplicated gene always producing the next duplicate. It is shown that dispersed gene duplication rate consists of the primary rate (duplications of ancestral genes) and the secondary rate (duplications of genes that had been generated by recent duplications). The primary rate was 2.5 x 10(-3) gene(-1) Myr(-1) in diploid wheat. The secondary rate was 5.2 x 10(-2) gene(-1) Myr(-1) in the ALP family. The 20-fold acceleration of the secondary rate was caused by the insertion of the ALP-A2 gene into a novel type transposon. Only the ALP-A1 and ALP-A3 genes are transcribed. The transcription of ALP-A3 is directed by a promoter within a DNA fragment similar to a CACTA type of DNA transposons, making ALP-A3 a new gene. The ALP-A3 transcript is longer than that of the ALP-A1. The half-life of ALP duplicated genes was estimated to be 0.87 Myr. Strong purifying selection acting on the ancestral gene ALP-A1 was undiminished by the evolution of duplicated genes. The evolution of the ALP family shows that repeated elements facilitate both gene duplication and expression of duplicated genes and highlights their importance for the evolution of gene repertoire in large plant genomes.
Collapse
Affiliation(s)
- Eduard D Akhunov
- Department of Plant Sciences, University of California, Davis, USA
| | | | | |
Collapse
|
30
|
Masly JP, Jones CD, Noor MAF, Locke J, Orr HA. Gene transposition as a cause of hybrid sterility in Drosophila. Science 2006; 313:1448-50. [PMID: 16960009 DOI: 10.1126/science.1128721] [Citation(s) in RCA: 163] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
We describe reproductive isolation caused by a gene transposition. In certain Drosophila melanogaster-D. simulans hybrids, hybrid male sterility is caused by the lack of a single-copy gene essential for male fertility, JYAlpha. This gene is located on the fourth chromosome of D. melanogaster but on the third chromosome of D. simulans. Genomic and molecular analyses show that JYAlpha transposed to the third chromosome during the evolutionary history of the D. simulans lineage. Because of this transposition, a fraction of hybrids completely lack JYAlpha and are sterile, representing reproductive isolation without sequence evolution.
Collapse
Affiliation(s)
- John P Masly
- Department of Biology, University of Rochester, Rochester, NY 14627, USA.
| | | | | | | | | |
Collapse
|
31
|
Arguello JR, Chen Y, Yang S, Wang W, Long M. Origination of an X-linked testes chimeric gene by illegitimate recombination in Drosophila. PLoS Genet 2006; 2:e77. [PMID: 16715176 PMCID: PMC1463047 DOI: 10.1371/journal.pgen.0020077] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2006] [Accepted: 04/05/2006] [Indexed: 12/02/2022] Open
Abstract
The formation of chimeric gene structures provides important routes by which novel proteins and functions are introduced into genomes. Signatures of these events have been identified in organisms from wide phylogenic distributions. However, the ability to characterize the early phases of these evolutionary processes has been difficult due to the ancient age of the genes or to the limitations of strictly computational approaches. While examples involving retrotransposition exist, our understanding of chimeric genes originating via illegitimate recombination is limited to speculations based on ancient genes or transfection experiments. Here we report a case of a young chimeric gene that has originated by illegitimate recombination in Drosophila. This gene was created within the last 2–3 million years, prior to the speciation of Drosophila simulans, Drosophila sechellia, and Drosophila mauritiana. The duplication, which involved the Bällchen gene on Chromosome 3R, was partial, removing substantial 3′ coding sequence. Subsequent to the duplication onto the X chromosome, intergenic sequence was recruited into the protein-coding region creating a chimeric peptide with ~ 33 new amino acid residues. In addition, a novel intron-containing 5′ UTR and novel 3′ UTR evolved. We further found that this new X-linked gene has evolved testes-specific expression. Following speciation of the D. simulans complex, this novel gene evolved lineage-specifically with evidence for positive selection acting along the D. simulans branch. Illegitimate recombination, the non-homologous recombination that occurs between DNA sequences with few or no identical nucleotides, is a general phenomenon that has been known to cause many medically important deleterious changes. However, little is known about the positive side of such a process. For example, little is known about its relative role in the origin of new gene functions that confer increased fitness to species. This work contributes to the understanding of the significance of this process. Here the authors report on a young chimeric gene that has originated by illegitimate recombination in Drosophila. The term “chimeric gene” refers to gene structures—both coding and noncoding—which have been generated from distinct parental loci. This chimeric gene was created within the last 2–3 million years, prior to the speciation of Drosophila simulans, Drosophila sechellia, and Drosophila mauritiana. A gene on Chromosome 3R was duplicated onto the X chromosome and recruited intergenic sequence, creating a chimeric peptide. It was found that this new X-linked gene has evolved testes-specific expression. Following speciation of the D. simulans complex, this novel gene evolved lineage-specifically under positive Darwinian selection.
Collapse
Affiliation(s)
- J. Roman Arguello
- Committee on Evolutionary Biology, University of Chicago, Chicago, Illinois, United States of America
| | - Ying Chen
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America
| | - Shuang Yang
- Chinese Academy of Sciences–Max Planck Junior Scientist Group, Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology, Kunming, Yunnan, China
| | - Wen Wang
- Chinese Academy of Sciences–Max Planck Junior Scientist Group, Key Laboratory of Cellular and Molecular Evolution, Kunming Institute of Zoology, Kunming, Yunnan, China
- * To whom correspondence should be addressed. E-mail: (WW); (ML)
| | - Manyuan Long
- Committee on Evolutionary Biology, University of Chicago, Chicago, Illinois, United States of America
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America
- * To whom correspondence should be addressed. E-mail: (WW); (ML)
| |
Collapse
|
32
|
Nozawa M, Aotsuka T, Tamura K. A novel chimeric gene, siren, with retroposed promoter sequence in the Drosophila bipectinata complex. Genetics 2005; 171:1719-27. [PMID: 16143626 PMCID: PMC1456098 DOI: 10.1534/genetics.105.041699] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2005] [Accepted: 08/09/2005] [Indexed: 12/30/2022] Open
Abstract
Retrotransposons often produce a copy of host genes by their reverse transcriptase activity operating on host gene transcripts. Since transcripts normally do not contain promoter, a retroposed gene copy usually becomes a retropseudogene. However, in Drosophila bipectinata and a closely related species we found a new chimeric gene, whose promoter was likely produced by retroposition. This chimeric gene, named siren, consists of a tandem duplicate of Adh and a retroposed fragment of CG11779 containing the promoter and a partial intron in addition to the first exon. We found that this unusual structure of a retroposed fragment was obtained by retroposition of nanos, which overlaps with CG11779 on the complementary strand. The potential of retroposition to produce a copy of promoter and intron sequences in the context of gene overlapping was demonstrated.
Collapse
Affiliation(s)
- Masafumi Nozawa
- Department of Biological Sciences, Graduate School of Science, Tokyo Metropolitan University, 1-1 Minami-ohsawa, Hachioji-shi, Tokyo 192-0397, Japan
| | | | | |
Collapse
|
33
|
Jones CD, Begun DJ. Parallel evolution of chimeric fusion genes. Proc Natl Acad Sci U S A 2005; 102:11373-8. [PMID: 16076957 PMCID: PMC1183565 DOI: 10.1073/pnas.0503528102] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2005] [Indexed: 12/28/2022] Open
Abstract
To understand how novel functions arise, we must identify common patterns and mechanisms shaping the evolution of new genes. Here, we take advantage of data from three Drosophila genes, jingwei, Adh-Finnegan, and Adh-Twain, to find evolutionary patterns and mechanisms governing the evolution of new genes. All three of these genes are independently derived from Adh, which enabled us to use the extensive literature on Adh in Drosophila to guide our analyses. We discovered a fundamental similarity in the temporal, spatial, and types of amino acid changes that occurred. All three genes underwent rapid adaptive amino acid evolution shortly after they were formed, followed by later quiescence and functional constraint. These genes also show striking parallels in which amino acids change in the Adh region. We showed that these early changes tend to occur at amino acid residues that seldom, if ever, evolve in Drosophila Adh. Changes at these slowly evolving sites are usually associated with loss of function or hypomorphic mutations in Drosophila melanogaster. Our data indicate that shifting away from ancestral functions may be a critical step early in the evolution of chimeric fusion genes. We suggest that the patterns we observed are both general and predictive.
Collapse
Affiliation(s)
- Corbin D Jones
- Department of Biology and Carolina Center for Genome Sciences, University of North Carolina, Chapel Hill, NC 27599, USA.
| | | |
Collapse
|